26 Matching Annotations
  1. Jan 2024
    1. At a minimum, survey questions should be based on well-defined constructs and be rigorously tested in interviews for consistent interpretation.6 When possible, consult experts with significant experience in survey development.

      Oof.

    2. Developer experience focuses on the lived experience of developers and the points of friction they encounter in their everyday work.

      What's the definition of "DevEx"?

    1. One thing many of these scaleups focus on is measuring “moveable metrics.” A moveable metric is one that developer productivity teams can “move” by impacting it positively or negatively with their work. Moveable metrics are helpful for developer productivity teams to showcase their own impact.

      Pick a metric that'll move quickly.

    2. The survey happens twice a year. It’s sent to a random sample of roughly half of developers. With this approach, individual developers only need to participate in one survey per year, minimizing the overall time spent on filling out surveys, while still providing a statistically significant representative set of data results.

      It's a good idea.

    3. Time to 1st and 10th PR for all new hires

      I guess that by also looking at the 10th PR you try to ensure that the speed is real and that people aren't just trying to speedrun the first PR?

    4. Winsorized means are a way to recognize improvements made within the outlier metrics. Winsorized means are calculated by replacing high and low end values with numbers closer to the middle. Here is how Grant Jenks explains this approach: “What a winsorized mean does is it says: figure out your 99th percentile and instead of throwing away all the data points that are above the 99th percentile, clip them. So if your 99th percentile is a hundred seconds and you have a data point that’s 110 seconds, you cross out 110 and you write a hundred, and now you calculate your (winsorized) mean that results in a more useful number.”

      New to me, this Winsorized mean.

    5. the specific metrics LinkedIn uses. Here are examples of those which the company focuses on:Developer Net User Satisfaction (NSAT) measures how happy developers are overall with LinkedIn’s development systems. It’s measured on a quarterly basis.Developer Build Time (P50 and P90) measures in seconds how long developers spend waiting for their builds to finish locally during development.Code Reviewer Response Time (P50 and P90) measures how long it takes, in business hours, for code reviewers to respond to each code review update from the author.Post-Commit CI Speed (P50 and P90) measures how long it takes, in minutes, for each commit to get through the continuous integration (CI) pipeline.CI Determinism is the opposite of test flakiness. It’s the likelihood a test suite’s result will be valid and not a flake.Deployment Success Rate measures how often deployments to production succeed.

      The metrics LinkedIn tracks.

    1. But the actual improvements that were effective were the results of having a better understanding of work itself, by focusing on the messy details of representative cases to generate insights.

      The one-on-ones will tell you more about what's wrong and what to improve than high-level metrics.

      <==> Look for the person six levels down for what needs to be fixes.

    2. Anesthesia’s successful method was largely intensive—detailed, in-depth analysis of single cases chosen for their learning potential (often but not always critical incidents). The broader patient safety field used a health services research approach that was largely extensive—aggregation of large numbers of cases chosen for a common property (an “error” or a bad outcome) and averaging of results across the aggregate. In the extensive approach, the contributory details and compensatory actions that would be fundamentally important to safety scientists tend to disappear, averaged out in the aggregate as “messy details.”

      <==> Job-to-be-done interviews. The individual stories have more power than aggregated statistics.

    3. The classification scheme used here is possibly the clearest one I've had the chance to see, one I wish I had seen in use at every place I worked at. Here are the broad categories: Surrogate variable that is measured: the thing we measure because it is measurable Variable of true or greater interest: the thing we actually want to know about Measurement technique of surrogate variable: whether we have the ability to get the actual direct value, or whether it is rather inferred from other observations Artefactual influences: what are the things that can mess up the data in measuring it Certainty of "Normal Range": how sure are we that the value we read is representative of what we care about?

      What are we really measuring with metrics, and how could things go wrong?

    1. Donald Wheeler, who writes, in Understanding Variation:When people are pressured to meet a target value there are three ways they can proceed:1) They can work to improve the system2) They can distort the system3) Or they can distort the data

      <=> Form–Context fit

    1. Don’t expect people to change their behavior just so you can measure it. For example, don’t expect that everybody will tag their bugs, PRs, etc. in some special way just so that you can count them.

      100% true.

    1. It should be clearly good or bad when a metric goes up, and clearly good or bad when it goes down. One direction should be good, and the other direction should be bad.

      Remove the need for context for interpreting whether a change is good or not.

    2. When we measure “developer time” for parallel actions, we only care about how much wall-clock time was taken up by the action, not the sum total of machine hours used to execute it.

      Obviously.

    3. When you display a metric on a graph, the event you are measuring should be assigned to a date or time on the graph by the end time of the event.

      That's interesting, and surprising. But the rationale makes sense: if you're using the end time, you never need to go back and updated the graph: the thing has ended.

  2. Dec 2023
    1. The effort-output-outcome-impact model describes software engineering, and works just as well for smaller tasks as it does to model thinking about feature development, or shipping complex projects.

      What are the four steps of a general mental model for software development (and pretty much any kind of work in business?)?

    2. So which outcomes and impacts can be measured for engineering teams? DORA and SPACE give some pointers, and we offer two more:Please the customer at least once per team, per week. This output might not sound so impressive, but in practice is very hard to achieve. The most productive teams – and nimble companies – can pull this off, though. If you consider why a startup moves so fast and executes so well, it’s because they have to do so out of necessity, even if they do not measure this.Delivering business impact committed to by the team. There is a good reason why “impact” is so prevalent at the likes of Meta, Uber, and other fast-moving tech companies. By rewarding impact, the business incentivizes software engineers to understand the business, and to prioritize helping it reach its goals. Is shipping a $2M/year cost-saving exercise via a configuration change, less valuable than shipping a $500K/year cost-saving exercise that takes 5 engineering months? No! You don’t want to focus solely on impact, but not focusing on the end goal of delivering business value is a poor choice.

      What other metrics do Beck and Orosz suggest to measure an engineering team?

  3. May 2022
    1. Consensus-driven organizations need a detailed description of the end state to engage with it and establish a new consensus. Western companies launch a transformation based on a vision and engage the organization to define the new model. But in Japanese companies, this step must occur earlier, before a broader group is engaged. “Building a plane as we fly” is never easy, but in these Japanese organizations it is a nonstarter.

      Make it concrete.

    2. In summary, the conditions in which immediate followers could follow do not exist in Japan. Creating them would not be a simple change-management process—it often involves achieving the change itself. Second-level management often becomes an obstacle to any change and stifles the drive for innovation and improvement in the corporate operating model and culture.

      An unsolvable puzzle?

    3. A change-management approach focused on “the last man standing”—in addition to the “change leader or agent” model at the core of most Western approaches—may be the one most suitable for consensus-driven organizations in Japan and other cultures. Leaders in Japan understand that change programs fail at the middle-management level, where just a few people (or even one) can impede the consensus-building process. We believe that change-management programs can increase their chances of success not by fighting a consensus-oriented culture (or by strengthening the top-down communication cascade) but by focusing greater effort on the potential blockers. A set of simple practices can increase the chances of success in major transformations.

      In Japan, "follow the leader" is not sufficient.

  4. Jan 2022
    1. As long as a particular activity remains in the learning box, leaders must adopt a beginner’s mindset, questioning subordinates in a way that suggests they don’t already know the answer (because they don’t). This differs starkly from the way leaders question subordinates about activities in the owning and teaching boxes.

      Accountability without control sounds like a nightmare if you're not fit for it.

    2. First, Apple competes in markets where the rates of technological change and disruption are high, so it must rely on the judgment and intuition of people with deep knowledge of the technologies responsible for disruption. Long before it can get market feedback and solid market forecasts, the company must make bets about which technologies and designs are likely to succeed in smart-phones, computers, and so on. Relying on technical experts rather than general managers increases the odds that those bets will pay off.

      Experts leading experts helps spot opportunities.

    3. When Jobs arrived back at Apple, it had a conventional structure for a company of its size and scope. It was divided into business units, each with its own P&L responsibilities. General managers ran the Macintosh products group, the information appliances division, and the server products division, among others. As is often the case with decentral-ized business units, managers were inclined to fight with one another, over transfer prices in particular. Believing that conventional management had stifled innovation, Jobs, in his first year returning as CEO, laid off the general managers of all the business units (in a single day), put the entire com-pany under one P&L, and combined the disparate functional departments of the business units into one functional organi-zation.

      Business units with P&L responsibility caused internal competition and stifled innovation.

    4. But a full-fledged transformation requires that leaders also transition to a functional organization.

      That's one step beyond matrixed, where people report both to a mission-oriented and a function team, towards full functional. Pendulum pendulum!

    5. Being a teacher doesn’t mean that Rosner gives instruction at a whiteboard; rather, he offers strong, often passionate critiques of his team’s work. (Clearly, general managers without his core expertise would find it difficult to teach what they don’t know.)

      He's teaching taste.

  5. Jun 2021
    1. balance exploring uncertain regions, which might unexpectedly have high gold content, against focusing on regions we already know have higher gold content (a kind of exploitation).

      Explore <-> Exploit tradeoff.