AI writes the code. Tests verify correctness. More code enables more features.
这个简洁描述揭示了AI在软件开发中的完整闭环:AI生成代码,测试验证正确性,更多代码创造更多功能。这种自增强循环可能使软件开发成为AI最具颠覆性的应用领域。
AI writes the code. Tests verify correctness. More code enables more features.
这个简洁描述揭示了AI在软件开发中的完整闭环:AI生成代码,测试验证正确性,更多代码创造更多功能。这种自增强循环可能使软件开发成为AI最具颠覆性的应用领域。
On the SWE-Pro benchmark, M2.7 scores 56.22%, nearly matching Opus's best level.
这一结果令人惊讶,因为M2.7作为一个开源模型在软件工程专业基准测试中接近顶级商业模型性能,这可能预示着开源AI与闭源商业模型之间的差距正在迅速缩小,改变AI发展的竞争格局。
M2.7 demonstrates excellent performance in real-world software engineering, including end-to-end project delivery, log analysis for bug hunting, code security, and machine learning tasks.
这一声明暗示AI模型已经超越了简单的代码生成,能够完成完整的软件开发生命周期,这代表了AI在工程领域应用的重大突破,可能重新定义软件开发的未来模式。
On the SWE-Pro benchmark, M2.7 scores 56.22%, nearly matching Opus's best level.
令人惊讶的是:MiniMax M2.7在SWE-Pro基准测试中获得了56.22%的分数,几乎达到了Opus模型的最佳水平。这一成绩表明,开源AI模型在软件工程领域已经能够与顶级闭源模型相媲美,打破了人们对开源模型性能落后的传统认知,为开源AI生态系统的发展注入了新的活力。
Claude Opus 4.6 autonomously reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks.
这是一个惊人的发现,表明AI已经能够完成通常需要人类工程师数周时间才能完成的复杂编程任务。这不仅挑战了我们对AI当前能力的认知,也暗示了软件工程领域可能即将发生重大变革。这种级别的自主编程能力远超当前主流AI编程助手的表现。
GLM-5.1 achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).
令人惊讶的是:GLM-5.1在软件工程代理任务上取得了最先进的性能,特别是在代码仓库生成和真实终端任务方面大幅领先其前代模型。这表明AI在理解和执行复杂软件工程任务方面取得了质的飞跃。
Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.
【启发】这个比喻极具启发性:把知识库管理类比为软件工程——Obsidian 是 IDE,LLM 是程序员,Wiki 是代码库。这个框架的深远意义是:知识工作可以借鉴软件工程的全套工具链——版本控制(git)、代码审查(lint)、持续集成(自动 ingest)、重构(wiki 清理)。知识管理的「工程化」不是比喻,而是字面意义上可操作的。
Looking under the hood is cheating. You're only supposed to have vague conversations with the machine about what it's doing.
大多数人认为查看和审查代码是软件开发的标准实践,但作者认为这是一种'作弊'行为,因为'氛围编程'文化鼓励开发者完全避免查看底层实现。这与软件工程的基本原则相悖,通常代码审查被认为是提高质量和发现问题的关键步骤。
Virtual envrionment
This is an example comment that can be left on the page. You can leave notes, updates, comments or requests for clarification here using Hypothesis.
There is a tremendous power in thinking about everything as a single kind of thing, because then you don’t have to juggle lots of different ideas about different kinds of things; you can just think about your problem.
In my experience this is also the main benefit of using node.js as your backend. Being able to write your front and backend code in the same language (javascript) removes a switching cost I didn't fully realize existed until I tried node the first time.
Getting hooked on computers is easy—almost anybody can make a program work, just as almost anybody can nail two pieces of wood together in a few tries. The trouble is that the market for two pieces of wood nailed together—inexpertly—is fairly small outside of the "proud grandfather" segment, and getting from there to a decent set of chairs or fitted cupboards takes talent, practice, and education.
This is a great analogy
the Peter Principle, the idea that in an organization where promotion is based on achievement, success, and merit, that organization's members will eventually be promoted beyond their level of ability
Applying the principle to software, you will find that you need three different versions of the make program, a macroprocessor, an assembler, and many other interesting packages. At the bottom of the food chain, so to speak, is libtool, which tries to hide the fact that there is no standardized way to build a shared library in Unix. Instead of standardizing how to do that across all Unixen the Peter Principle was applied and made it libtool's job instead.
This is not a discrete project but an ongoing process and should always be competing for focus in strategic decision making.
Absolutely agreed. One limitation of the Iron Triangle concept is that it often seems to be used to make decisions based on a snapshot in time (i.e. which two are we choosing now), when some choices have longer half-lives than others.
By jumping into unfamiliar areas of code, even if you do not "solve" the bug, you can learn new areas of the code, tricks for getting up to speed quickly, and debugging techniques.
Building a mental model of the codebase, as Jennifer Moore says over at Jennifer++:
The fundamental task of software development is not writing out the syntax that will execute a program. The task is to build a mental model of that complex system, make sense of it, and manage it over time.
Thinking about how you will observe whether things are working correctly or not ahead of time can also have a big impact on the quality of the code you write.
YES. This feel similar to the way that TDD can also improve the code that you write, but with a broader/more comprehensive outlook.
Platform engineering is trying to deliver the self-service tools teams want to consume to rapidly deploy all components of software. While it may sound like a TypeScript developer would feel more empowered by writing their infrastructure in TypeScript, the reality is that it’s a significant undertaking to learn to use these tools properly when all one wants to do is create or modify a few resources for their project. This is also a common source of technical debt and fragility. Most users will probably learn the minimal amount they need to in order to make progress in their project, and oftentimes this may not be the best solution for the longevity of a codebase. These tools are straddling an awkward line that is optimized for no-one. Traditional DevOps are not software engineers and software engineers are not DevOps. By making infrastructure a software engineering problem, it puts all parties in an unfamiliar position. I am not saying no-one is capable of using these tools well. The DevOps and software engineers I’ve worked with are more than capable. This is a matter of attention. If you look at what a DevOps engineer has to deal with day-in and day-out, the nuances of TypeScript or Go will take a backseat. And conversely, the nuances of, for example, a VPC will take a backseat to a software engineer delivering a new feature. The gap that the AWS CDK and Pulumi try to bridge is not optimized for anyone and this is how we get bugs, and more dangerously, security holes.
Deploy engines as separate app instances and have them only communicate over network boundaries. This is something we’re starting to do more.
Before moving to this microservice approach, it's important to consider whether the benefits are worth the extra overhead. Jumping to microservices prematurely is something I've seen happen more than once in my career, and it often leads to a lot of rework.
While you might think that pairing less experienced engineers is a waste of time, every single time I had a less experienced engineer work by themselves, I ended up regretting it.
This has been my experience this year
The major benefit of foreign keys is that they guarantee referential integrity. For example, say you have customers in one table that may refer to a number of invoices in another. Without foreign keys, you could delete a customer, but forget to remove its invoices, thereby leaving a bunch of orphaned invoices that reference an customer that’s gone.
Note that GH doesn't use FK (at least back in 2016) https://github.com/github/gh-ost/issues/331#issuecomment-266027731 due to: * MySQL doesn't support it on partitioned tables * Performance impact. * FKs don't work well with online schema migrations
From Postgres has foreign keys to be fully compatible with partitioned tables since 12. But still it's not that commonly used for larger DBs.
If an operator ever queries the database directly they’re even more likely to forget deleted_at because normally the ORM does the work for them.
This happens relatively often, especially for 1) engineers that run SQL queries directly against the DB for analysis or triaging production issues, and 2) data scientists that do not use the same programming language as the enginners
When a product manager trusts that the engineers on the team have the interest of the product at heart, they also trust the engineer’s judgment when adding technical tasks to the backlog and prioritizing them. This enables the balanced mix of feature and technical work that we’re aiming for.
Why is it so common for engineering teams to be mistrusted by other parts of the business?
Part of that is definitely on engineers: chasing the new shiny, over-engineering, etc.
That seems unlikely to account for all of it, though.
An interesting directory of personal blogs on software and security.
While it aggregates from various sources and allows people to submit directly to it, it also calculates a quality score/metric by using a total number of Hacker News points earned by the raw URL
Apparently uses a query like: https://news.ycombinator.com/from?site=example.com to view all posts from HN.
Coordination: More environments require more coordination. Teams need to track which feature is deployed to which environment. Bugs need to be associated with environments. Every environment represents a particular ‘state’ of the codebase, and this has to be tracked somewhere to make sure that customers & stakeholders are seeing the right things;
Try to remember the last time you heard one of the following phrases:
Sorry you’re surprised. Issues are filed at about a rate of 1 per day against GLib. Merge requests at a rate of about 1 per 2 days. Each issue or merge request takes a minimum of about 30 minutes (across at least 2 people) to analyse, put together a fix, test it, review it, fix it, review it and merge it. I’d estimate the average is closer to 3 hours than 30 minutes. Even at the fastest rate, it would take 3 working months to clear the backlog of ~1000 issues. I get a small proportion of my working time to spend on GLib (not full time).
“Functional programming language” is not a clearly defined term. From the various properties that are typically associated with functional programming I only want to focus on one: “Immutability” and referential transparency.
I mean not clearly defined seems wrong, there are common accepted characteristics that make a language functional.
kentbeck,
One of the primary tasks of engineers is to minimize complexity. JSX changes such a fundamental part (syntax and semantics of the language) that the complexity bubbles up to everything it touches. Pretty much every pipeline tool I've had to work with has become far more complex than necessary because of JSX. It affects AST parsers, it affects linters, it affects code coverage, it affects build systems. That tons and tons of additional code that I now need to wade through and mentally parse and ignore whenever I need to debug or want to contribute to a library that adds JSX support.
An Idiom is a low-level pattern specific to a programming language. An idiom describes how to implement particular aspects of components or the relationships between them using the features of the given language.
A Design Pattern provides a scheme for refining the subsystems or components of a software system, or the relationships between them. It describes a commonly recurring structure of communicating components that solves a general design problem within a particular context.
Building blocks are what you use: patterns can tell you how you use them, when, why, and what trade-offs you have to make in doing so.
patterns are considered to be a way of putting building blocks into context
A "pattern" has been defined as: "an idea that has been useful in one practical context and will probably be useful in others"
INVEST
According to this checklist, a User Story should be:
Indepedent (of all others)
Negociable (not a specific contract for features)
Valuable (or vertical)
Estimable (to a good approximation)
Small (so as to fit within an iteration)
Testable (in principle, even if there isn't a test for it yet)
Source(s):
This is a great, short guide for optimizing pull requests for review-ability.
As such, scrum adopts an empirical approach—accepting that the problem cannot be fully understood or defined, focusing instead on maximizing the team's ability to deliver quickly, to respond to emerging requirements and to adapt to evolving technologies and changes in market conditions.
算法开发工作就有这种特点
the study of innovation shows that everything hinges on the hard work of taking a promising idea and making it work — technically, legally, financially, culturally, ecologically. Constraints are great enablers of innovation.
But there’s a downside to the hackathon hype, and our research on designing workplace projects for innovation and learning reveals why. Innovation is usually a lurching journey of discovery and problem solving. Innovation is an iterative, often slow-moving process that requires patience and discipline. Hackathons, with their feverish pace, lack of parameters and winner-take-all culture, discourage this process. We could find few examples of hackathons that have directly led to market success.
what if projects were designed to combine a hacking mindset with rigorous examination of the data and experience they glean? This would reward smart failures that reveal new insights and equip leaders with the information needed to rescale, pivot or axe their projects.
Sounds somewhat like agile devlopment.
In summary, teams which are "fairer", in two senses, tend to be more effective: