Hypothesis

24 Matching Annotations

Sep 2025
fairmlbook.org fairmlbook.org

Introduction

17
1. makash 10 Sep 2025
  
  in Public
  
  If we’re not careful, learning algorithms will generalize based on the majority culture, leading to a high error rate for minority groups. Attempting to avoid this by making the model more complex runs into a different problem: overfitting to the training data, that is, picking up patterns that arise due to random noise rather than true differences. One way to avoid this is to explicitly model the differences between groups, although there are both technical and ethical challenges associated with this.
  
  Challenging to address high error rate for minority groups
  
  Summary
2. makash 10 Sep 2025
  
  in Public
  
  machine learning might perform worse for some groups than others is sample size disparity. If we construct our training set by sampling uniformly from the training data, then by definition we’ll have fewer data points about minorities. Of course, machine learning works better when there’s more data, so it will work less well for members of minority groups, assuming that members of the majority and minority groups are systematically different in terms of the prediction task.
  
  minorities under-represented in data used for training leads to performance issues
  
  Summary
3. makash 10 Sep 2025
  
  in Public
  
  A telling example of this comes from machine translation.
  
  Would this issue persists if two human translators (one translating English -> Turkish, the other translating Turkish -> English) were involved instead?
  
  Commentary
4. makash 10 Sep 2025
  
  in Public
  
  Absent specific intervention, machine learning will extract stereotypes, including incorrect and harmful ones, in the same way that it extracts knowledge.
  
  ML emphasizes societal stereotypes.
  
  Summary
5. makash 10 Sep 2025
  
  in Public
  
  photography technology involves a series of choices about what is relevant and what isn’t, and transformations of the captured data based on those choices.
  
  Further discussions can be had about cameras that capture infra-red (thermal/night vision) and ultra-violet or x-ray wavelengths. Often these images are "colorized" based on the preference of some graphic artist (e.g., colours added to black-hole images, Cosmic micro-wave background, etc.). The colour scheme used can also be subjective.
  
  Commentary
6. makash 10 Sep 2025
  
  in Public
  
  Race is not a stable category; how we measure race often changes how we conceive of it, and changing conceptions of race may force us to alter what we measure.
  
  Another, perhaps more contemporary, example has to do with gender. Gender-based data from early 2000s and mid 2020's would look very different!
  
  Commentary
7. makash 10 Sep 2025
  
  in Public
  
  In fact, measurement is fraught with subjective decisions and technical difficulties.
  
  "How to Lie with Statistics" is a classic book that explores measurement problems as well as display issues.
  
  Commentary
8. makash 10 Sep 2025
  
  in Public
  
  Kate Crawford points out that the data reflect the patterns of smartphone ownership, which are higher in wealthier parts of the city compared to lower-income areas and areas with large elderly populations.
  
  I have similar gripe about the multi-factor authentication systems (MFAs) being used ubiquitously.
  
  Commentary
9. makash 10 Sep 2025
  
  in Public
  
  The figure below shows the stages of a typical system that produces outputs using machine learning.
  
  This directed graph is similar to ones used for mathematical modelling, except "Individuals" usually replaced by "Computations"
  
  Commentary
10. makash 10 Sep 2025
  
  in Public
  
  A statistical estimator is said to be biased if its expected or average value differs from the true value that it aims to estimate.
  
  definition: statistical bias
  
  terminology
11. makash 10 Sep 2025
  
  in Public
  
  Amazon argued that its system was justified because it was designed based on efficiency and cost considerations and that race wasn’t an explicit factor.
  
  Defending prejudiced systems often uses blindness as a resort
  
  Commentary
12. makash 10 Sep 2025
  
  in Public
  
  it’s important to understand when observed disparities can be considered to be discrimination.
  
  disparities -> discrimination
  
  Summary
13. makash 10 Sep 2025
  
  in Public
  
  younger defendants are statistically more likely to re-offend, judges are loath to take this into account in deciding sentence lengths, viewing younger defendants as less morally culpable.
  
  Important example of using experience and intuition over data
  
  Summary
14. makash 10 Sep 2025
  
  in Public
  
  But there are serious risks in learning from examples. Learning is not a process of simply committing examples to memory. Instead, it involves generalizing from examples: honing in on those details that are characteristic of (say) cats in general, not just the specific cats that happen to appear in the examples. This is the process of induction: drawing general rules from specific examples—rules that effectively account for past cases, but also apply to future, as yet unseen cases, too. The hope is that we’ll figure out how future cases are likely to be similar to past cases, even if they are not exactly the same.
  
  flaws in only relying on examples.
  
  Summary
15. makash 10 Sep 2025
  
  in Public
  
  We cannot hand code a program that exhaustively enumerates all the relevant factors that allow us to recognize objects from every possible perspective or in all their potential visual configurations.
  
  The need for "learning" over "branching"
  
  Summary
16. makash 10 Sep 2025
  
  in Public
  
  In many head-to-head comparisons on fixed tasks, data-driven decisions are more accurate than those based on intuition or expertise.
  
  Data-driven decisions triumph over intuition or experience. However, intuition and experience plays an important role in addressing or resolving unusual (outlier) situations.
  
  Summary
17. makash 10 Sep 2025
  
  in Public
  
  we will briefly representational harms in Chapters 7 and 9.
  
  typo: missing word.
  
  errata
Visit annotations in context

Tags

Commentary

errata

terminology

Summary

Annotators

makash

URL

fairmlbook.org/introduction.html
www.deeplearningbook.org www.deeplearningbook.org

Untitled document

7
1. makash 10 Sep 2025
  
  in Public
  
  a supervised deep learning algorithm will generally achieve acceptableperformance with around 5,000 labeled examples per category and will match or20 CHAPTER 1. INTRODUCTIONexceed human performance when trained with a dataset containing at least 10million labeled examples.
  
  Does this sentence need a qualifier about the type of task?
2. makash 10 Sep 2025
  
  in Public
  
  Theﬁeld of deep learning is primarily concerned with how to build computer systemsthat are able to successfully solve tasks requiring intelligence, while the ﬁeld ofcomputational neuroscience is primarily concerned with building more accuratemodels of how the brain actually works.
  
  key difference between deep learning and computational neuroscience.
3. makash 10 Sep 2025
  
  in Public
  
  A comprehensive history of deep learning is beyond the scope of this textbook.
  
  There is a lot of mention of biological and neurological sciences in the history of deep learning. However, a group that may be should get some recognition in this field are the computational chemists. Computational chemists, since the 70's, have been using sophisticated techniques (LCAO, density functional theory, coupled cluster theory, etc.) to develop models for multi-particle systems. These have aspects of parameter fitting, updating weights through convolution layers, and feedback loops for updating models.
4. makash 10 Sep 2025
  
  in Public
  
  there is no single correct value for thedepth of an architecture, just as there is no single correct value for the length ofa computer program. Nor is there a consensus about how much depth a modelrequires to qualify as “deep.”
  
  This is interesting. How important is the concept of 'depth' in deep learning?
5. makash 10 Sep 2025
  
  in Public
  
  This is because the system’s understanding of the simpler concepts can be reﬁnedgiven information about the more complex concepts.
  
  A feedback loop that updates the prior based on new information to eventually reach a good posterior.
6. makash 10 Sep 2025
  
  in Public
  
  Deep learningsolves this central problem in representation learning by intro-ducing representations that are expressed in terms of other, simpler representations.
  
  Are the simpler representations some sort of 'building blocks'?
7. makash 10 Sep 2025
  
  in Public
  
  Example of diﬀerent representations: suppose we want to separate twocategories of data by drawing a line between them in a scatterplot. In the plot on the left,we represent some data using Cartesian coordinates, and the task is impossible. In the ploton the right, we represent the data with polar coordinates and the task becomes simple tosolve with a vertical line.
  
  The 'representations' displayed here are just transformations of the dataset. With multidimensional data, it is perhaps also important to recognize how the data was generated, and if there are causal hints as to which representation to utilize.
Visit annotations in context

Annotators

makash

URL

deeplearningbook.org/contents/intro.html

Tags

Annotators

URL

Annotators

URL