1,048 Matching Annotations
  1. Oct 2017
  2. Sep 2017
  3. Jul 2017
    1. In practice, this is accomplished by monitoring the amount of operational work being done by SREs, and redirecting excess operational work to the product development teams: reassigning bugs and tickets to development managers, [re]integrating developers into on-call pager rotations, and so on. The redirection ends when the operational load drops back to 50% or lower.

      Ensuring that SREs spend 50% of their time doing operational work.

    2. The hero jack-of-all-trades on-call engineer does work, but the practiced on-call engineer armed with a playbook works much better. While no playbook, no matter how comprehensive it may be, is a substitute for smart engineers able to think on the fly, clear and thorough troubleshooting steps and tips are valuable when responding to a high-stakes or time-sensitive page.
    3. The business or the product must establish the system’s availability target. Once that target is established, the error budget is one minus the availability target. A service that’s 99.99% available is 0.01% unavailable. That permitted 0.01% unavailability is the service’s error budget. We can spend the budget on anything we want, as long as we don’t overspend it.

      The goal of SREs is no longer "zero outages", but to allow for maximum product development velocity as long as it stays within the error budget.

    4. Monitoring is one of the primary means by which service owners keep track of a system’s health and availability. As such, monitoring strategy should be constructed thoughtfully.

      Three types of valid monitoring input:

      1. Alerts: A human needs to take action immediately.
      2. Tickets: A human needs to take action, but not immediately, even up to a few days.
      3. Logging: No human needs to look at this, it is recorded for diagnostic or forensic purposes.
    5. Reliability is a function of mean time to failure (MTTF) and mean time to repair (MTTR) [Sch15]. The most relevant metric in evaluating the effectiveness of emergency response is how quickly the response team can bring the system back to health—that is, the MTTR.
    6. In general, for any software service or system, 100% is not the right reliability target because no user can tell the difference between a system being 100% available and 99.999% available.
    7. When they are focused on operations work, on average, SREs should receive a maximum of two events per 8–12-hour on-call shift. This target volume gives the on-call engineer enough time to handle the event accurately and quickly, clean up and restore normal service, and then conduct a postmortem. If more than two events occur regularly per on-call shift, problems can’t be investigated thoroughly and engineers are sufficiently overwhelmed to prevent them from learning from these events.
    8. In general, an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s).
    9. Therefore, Google places a 50% cap on the aggregate "ops" work for all SREs—tickets, on-call, manual tasks, etc. This cap ensures that the SRE team has enough time in their schedule to make the service stable and operable.

      The other 50% of the time is devoted to development.

    10. By design, it is crucial that SRE teams are focused on engineering. Without constant engineering, operations load increases and teams will need more people just to keep pace with the workload.
    11. What exactly is Site Reliability Engineering, as it has come to be defined at Google? My explanation is simple: SRE is what happens when you ask a software engineer to design an operations team.
    12. Google has chosen to run our systems with a different approach: our Site Reliability Engineering teams focus on hiring software engineers to run our products and to create systems to accomplish the work that would otherwise be performed, often manually, by sysadmins.
  4. May 2017
  5. Mar 2017
  6. Dec 2016
  7. Jul 2016
  8. May 2016
    1. p. 6 On knowledge fetishisation. Note he is writing in the period 1995-2000, before it became absolutely clear that this was vanishing and that it would become a fetish:

      One purpose [for acquiring information[ is simply possession and the satisfaction it gives; witness the pride some people take in their phenomenal memory for trivia, their extensive libraries, their collections of compact disks, maps, or computer programs. Possessing information also confers prestige. Erudition and especially initiation into the esoteric knowledge of a small sect or secret society--Freemasons, cosmologists, and the like--have conferred prestige and awed the ignorant throughout history.

  9. Feb 2016
    1. Al-Ghazali launched a philosophical critique against Neoplatonic-influenced early Islamic philosophers such as Al-Farabi and Ibn Sina. In response to the philosophers' claim that the created order is governed by secondary efficient causes (God being, as it were, the Primary and Final Cause in an ontological and logical sense), Ghazali argues that what we observe as regularity in nature based presumably upon some natural law is actually a kind of constant and continual regularity. There is no independent necessitation of change and becoming, other than what God has ordained.

      God as a source of secondary causes, not just a prime mover. Probability/chance, tychism

  10. Jan 2016
  11. Dec 2015
    1. we did so, we solicited questions through various social media, and we made sure to address as many of them as we could. And then, we worked through the transcripts, again and again, clarifying our concepts, refining our arguments, shuffling the pieces to insure greater clarity and accessibility

      I appreciate this "explaining how we did this" as a process statement.

  12. Nov 2015
    1. such that they cannot be experienced in any meaningful way without the mediation of an electronic device

      Another way to say that, as they can be printed, e-books are not e-lit, even if they have never been published on paper, but just Digital Literature, isn't it?

  13. Sep 2015
  14. Aug 2015
    1. “I believe that the motion picture is destined to revolutionize our educational system and that in a few years it will supplant largely, if not entirely, the use of textbooks.”

      this is fascinating to me - i had no idea of his connections with education. Makes me wonder about Tesla's thoughts on education now. And how he'd feel about filmstrips, which are in essence super cheap motion pictures.

  15. Jul 2015
  16. Jun 2015
  17. Dec 2014
  18. Aug 2013
    1. なるほどこの孤独な行為はアンブロシウスと同じだなあと、感心して見とれていた。

      音読による読書は社会的な行為だったが、黙読は孤独で個人的な行為。