Hypothesis

34 Matching Annotations

Jan 2020
pubs.aeaweb.org pubs.aeaweb.org

Machine Learning: An Applied Econometric Approach

26
1. daaronr 13 Jan 2020
  
  in Public
  
  Suppose the algorithm chooses a tree that splits on education but not on age. Conditional on this tree, the estimated coefficients are consistent. But that does not imply that treatment effects do not also vary by age, as education may well covary with age; on other draws of the data, in fact, the same procedure could have chosen a tree that split on age instead
  
  a caveat
  
  ml-reading-group
2. daaronr 13 Jan 2020
  
  in Public
  
  hese heterogenous treatment effects can be used to assign treatments; Misra and Dubé (2016) illustrate this on the problem of price targeting, applying Bayesian regularized methods to a large-scale experiment where prices were randomly assigned
  
  todo -- look into the implication for treatment assignment with heterogeneity
  
  ml-reading-group
3. daaronr 13 Jan 2020
  
  in Public
  
  Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016) take care of high-dimensional controls in treatment effect estimation by solving two simultaneous prediction problems, one in the outcome and one in the treatment equation.
  
  this seems similar to my idea of regularizing on only a subset of the variables
  
  ml-reading-group daaronr
4. daaronr 13 Jan 2020
  
  in Public
  
  These same techniques applied here result in split-sample instrumental variables (Angrist and Krueger 1995) and “jackknife” instrumental variables
  
  some classical solutions to IV bias are akin to ML solutions
  
  ml-reading-group
5. daaronr 13 Jan 2020
  
  in Public
  
  Understood this way, the finite-sample biases in instrumental variables are a consequence of overfitting.
  
  traditional 'finite sample bias of IV' is really overfitting
  
  ml-reading-group
6. daaronr 13 Jan 2020
  
  in Public
  
  Even when we are interested in a parameter β ˆ, the tool we use to recover that parameter may contain (often implicitly) a prediction component. Take the case of linear instrumental variables understood as a two-stage procedure: first regress x = γ′z + δ on the instrument z, then regress y = β′x + ε on the fitted values x ˆ. The first stage is typically handled as an estimation step. But this is effectively a prediction task: only the predictions x ˆ enter the second stage; the coefficients in the first stage are merely a means to these fitted values.
  
  first stage of IV -- handled as an estimation problem, but really it's a prediction problem!
  
  ml-reading-group
7. daaronr 13 Jan 2020
  
  in Public
  
  Prediction in the Service of Estimation
  
  This is especially relevant to economists across the board, even the ML skeptics
  
  ml-reading-group
8. daaronr 13 Jan 2020
  
  in Public
  
  New Data
  
  The first application: constructing variables and meaning from high-dimensional data, especially outcome variables
  
  satellite images (of energy use, lights etc) --> economic activity
  
  cell phone data, Google street view to measure wealth
  
  extract similarity of firms from 10k reports
  
  even traditional data .. matching individuals in historical censuses
  
  ml-reading-group
9. daaronr 13 Jan 2020
  
  in Public
  
  Zhao and Yu (2006) who establish asymptotic model-selection consistency for the LASSO. Besides assuming that the true model is “sparse”—only a few variables are relevant—they also require the “irrepresentable condition” between observables: loosely put, none of the irrelevant covariates can be even moderately related to the set of relevant ones.
  
  Basically unrealistic for microeconomic applications imho
  
  ml-reading-group
10. daaronr 13 Jan 2020
  
  in Public
  
  First, it encourages the choice of less complex, but wrong models. Even if the best model uses interactions of number of bathrooms with number of rooms, regularization may lead to a choice of a simpler (but worse) model that uses only number of fireplaces. Second, it can bring with it a cousin of omitted variable bias, where we are typically concerned with correlations between observed variables and unobserved ones. Here, when regular-ization excludes some variables, even a correlation between observed variables and other observed (but excluded) ones can create bias in the estimated coefficients.
  
  Is this equally a problem for procedures that do not assum sparsity, such as the Ridge model?
  
  ml-reading-group
11. daaronr 13 Jan 2020
  
  in Public
  
  97the variables are correlated with each other (say the number of rooms of a house and its square-footage), then such variables are substitutes in predicting house prices. Similar predictions can be produced using very different variables. Which variables are actually chosen depends on the specific finite sample.
  
  Lasso-chosen variables are unstable because of what we usually call 'multicollinearity.'<br> This presents a problem for making inferences from estimated coefficients.
  
  ml-reading-group
12. daaronr 12 Jan 2020
  
  in Public
  
  Through its regularizer, LASSO produces a sparse prediction function, so that many coefficients are zero and are “not used”—in this example, we find that more than half the variables are unused in each run
  
  This is true but they fail to mention that LASSO also shrinks the coefficients on variables that it keeps towards zero (relative to OLS). I think this is commonly misunderstood (from people I've spoken with).
  
  ml-reading-group
13. daaronr 12 Jan 2020
  
  in Public
  
  One obvious problem that arises in making such inferences is the lack of stan-dard errors on the coefficients. Even when machine-learning predictors produce familiar output like linear functions, forming these standard errors can be more complicated than seems at first glance as they would have to account for the model selection itself. In fact, Leeb and Pötscher (2006, 2008) develop conditions under which it is impossible to obtain (uniformly) consistent estimates of the distribution of model parameters after data-driven selection.
  
  This is a very serious limitation for Economics academic work.
  
  ml-reading-group
14. daaronr 12 Jan 2020
  
  in Public
  
  First, econometrics can guide design choices, such as the number of folds or the function class.
  
  How would Econometrics guide us in this?
  
  ml-reading-group
15. daaronr 12 Jan 2020
  
  in Public
  
  These choices about how to represent the features will interact with the regularizer and function class: A linear model can reproduce the log base area per room from log base area and log room number easily, while a regression tree would require many splits to do so.
  
  The choice of 'how to represent the features' is consequential ... it's not just 'throw it all in' (kitchen sink approach)
  
  ml-reading-group
16. daaronr 12 Jan 2020
  
  in Public
  
  Ta b l e 2Some Machine Learning Algorithms
  
  This is a very helpful table!
  
  ml-reading-group
17. daaronr 12 Jan 2020
  
  in Public
  
  Picking the prediction func-tion then involves two steps: The first step is, conditional on a level of complexity, to pick the best in-sample loss-minimizing function.8 The second step is to estimate the optimal level of complexity using empirical tuning (as we saw in cross-validating the depth of the tree).
  
  ML explained while standing on one leg.
  
  ml-reading-group
18. daaronr 12 Jan 2020
  
  in Public
  
  egularization combines with the observability of predic-tion quality to allow us to fit flexible functional forms and still find generalizable structure.
  
  But we can't really make statistical inferences about the structure, can we?
  
  ml-reading-group
19. daaronr 12 Jan 2020
  
  in Public
  
  This procedure works because prediction quality is observable: both predic-tions y ˆ and outcomes y are observed. Contrast this with parameter estimation, where typically we must rely on assumptions about the data-generating process to ensure consistency.
  
  I'm not clear what the implication they are making here is. Does it in some sense 'not work' with respect to parameter estimation?
  
  ml-reading-group
20. daaronr 12 Jan 2020
  
  in Public
  
  In empirical tuning, we create an out-of-sample experiment inside the original sample.
  
  remember that tuning is done within the training sample
  
  ml-reading-group
21. daaronr 12 Jan 2020
  
  in Public
  
  Performance of Different Algorithms in Predicting House Values
  
  Any reason they didn't try a Ridge or an Elastic net model here? My instinct is that these will beat LASSO for most Economic applications.
  
  ml-reading-group
22. daaronr 12 Jan 2020
  
  in Public
  
  We consider 10,000 randomly selected owner-occupied units from the 2011 metropolitan sample of the American Housing Survey. In addition to the values of each unit, we also include 150 variables that contain information about the unit and its location, such as the number of rooms, the base area, and the census region within the United States. To compare different prediction tech-niques, we evaluate how well each approach predicts (log) unit value on a separate hold-out set of 41,808 units from the same sample. All details on the sample and our empirical exercise can be found in an online appendix available with this paper athttp://e-jep.org
  
  Seems a useful example for trying/testing/benchmarking. But the link didn't work for me. Can anyone find it? Is it interactive? (This is why I think papers should be html and not pdfs...)
  
  ml-reading-group
23. daaronr 12 Jan 2020
  
  in Public
  
  Making sense of complex data such as images and text often involves a prediction pre-processing step.
  
  In using 'new kinds of data' in Economics we often need to do a 'classification step' first
  
  ml-reading-group
24. daaronr 12 Jan 2020
  
  in Public
  
  The fundamental insight behind these breakthroughs is as much statis-tical as computational. Machine intelligence became possible once researchers stopped approaching intelligence tasks procedurally and began tackling them empirically.
  
  I hadn't thought about how this unites the 'statistics to learn stuff' part of ML and the 'build a tool to do a task' part. Well-phrased.
  
  ml-reading-group
25. daaronr 10 Jan 2020
  
  in Public
  
  In another category of applications, the key object of interest is actually a parameter β, but the inference procedures (often implicitly) contain a prediction task. For example, the first stage of a linear instrumental variables regres-sion is effectively prediction. The same is true when estimating heterogeneous treatment effects, testing for effects on multiple outcomes in experiments, and flexibly controlling for observed confounders.
  
  This is most relevant tool for me. Before I learned about ML I often thought about using 'stepwise selection' for such tasks... to find the best set of 'control variables' etc. But without regularisation this seemed problematic.
  
  ml-reading-group
26. daaronr 10 Jan 2020
  
  in Public
  
  Machine Learning: An Applied Econometric Approach
  
  Shall we use Hypothesis to have a discussion ?
  
  ml-reading-group
Visit annotations in context

Tags

daaronr

ml-reading-group

Annotators

daaronr

URL

pubs.aeaweb.org/doi/pdfplus/10.1257/jep.31.2.87
May 2019
www.montrealdatalicense.com www.montrealdatalicense.com

Montreal Data License

1
1. mlenc 13 May 2019
  
  in Public
  
  reading group open data license
Visit annotations in context

Tags

open data license

reading group

Annotators

mlenc

URL

montrealdatalicense.com/en
Sep 2014
nciphub.org nciphub.org

Untitled document

1
1. pythia 11 Sep 2014
  
  in Public
  
  "the scholarly and scientific record is rapidly evolving to become a formalized web of content, in which any node must be able to be linked to any other node, with formal, typed relationships" (2)
  
  DAMS OpenGLAM RIT reading group
Visit annotations in context

Tags

OpenGLAM

RIT reading group

DAMS

Annotators

pythia

URL

nciphub.org/resources/355/download/researchsupportnetwork9.pdf
Feb 2014
ubuntuone.com ubuntuone.com

Untitled document

1
1. aculich 11 Feb 2014
  
  in Public
  
  Chapter 1, The Art of Community We begin the book with a bird’s-eye view of how communities function at a social science level. We cover the underlying nuts and bolts of how people form communities, what keeps them involved, and the basis and opportunities behind these interactions. Chapter 2, Planning Your Community Next we carve out and document a blueprint and strategy for your community and its future growth. Part of this strategy includes the target objectives and goals and how the community can be structured to achieve them. PREFACE xix Chapter 3, Communicating Clearly At the heart of community is communication, and great communicators can have a tremendously positive impact. Here we lay down the communications backbone and the best practices associated with using it
  
  Reading the first 3 chapters of AoC for discussion in #coasespenguin on 2013-02-11.
  
  session context coasespenguin reading group 2014-02-11 AoC Chapters 1 2 3 session type first read
Visit annotations in context

Tags

Chapters 1 2 3

2014-02-11

reading group

AoC

session type first read

coasespenguin

session context

Annotators

aculich

URL

ubuntuone.com/0n352YwUjlcFR8PjIELH67
Jan 2014
www.yale.edu www.yale.edu

Untitled document

5
1. aculich 30 Jan 2014
  
  in Public
  
  This suggests that peer production will thrive where projects have three characteristi cs
  
  If thriving is a metric (is it measurable? too subjective?) of success then the 3 characteristics it must have are:
  
  modularity: divisible into components
  
  granularity: fine-grained modularity
  
  integrability: low-cost integration of contributions
  
  I don't dispute that these characteristics are needed, but they are too general to be helpful, so I propose that we look at these three characteristics through the lens of the type of contributor we are seeking to motivate.
  
  How do these characteristics inform what we should focus on to remove barriers to collaboration for each of these contributor-types?
  
  Below I've made up a rough list of lenses. Maybe you have links or references that have already made these classifications better than I have... if so, share them!
  
  Roughly here are the classifications of the types of relationships to open source projects that I commonly see:
  
  core developers: either hired by a company, foundation, or some entity to work on the project. These people care most about integrability.
  
  ecosystem contributors: someone either self-motivated or who receives a reward via some mechanism outside the institution that funds the core developers (e.g. reputation, portfolio for future job prospects, tools and platforms that support a consulting business, etc). These people care most about modularity.
  
  feature-driven contributors: The project is useful out-of-the-box for these people and rather than build their own tool from scratch they see that it is possible for the tool to work they way they want by merely contributing code or at least a feature-request based on their idea. These people care most about granularity.
  
  The above lenses fit the characteristics outlined in the article, but below are other contributor-types that don't directly care about these characteristics.
  
  the funder: a company, foundation, crowd, or some other funding body that directly funds the core developers to work on the project for hire.
  
  consumer contributors: This class of people might not even be aware that they are contributors, but simply using the project returns direct benefits through logs and other instrumented uses of the tool to generate data that can be used to improve the project.
  
  knowledge-driven contributors: These contributors are most likely closest to the ecosystem contributors, maybe even a sub-species of those, that contribute to documentation and learning the system; they may be less-skilled at coding, but still serve a valuable part of the community even if they are not committing to the core code base.
  
  failure-driven contributors: A primary source of bug reports and may also be any one of the other lenses.
  
  What other lenses might be useful to look through? What characteristics are we missing? How can we reduce barriers to contribution for each of these contributor types?
  
  I feel that there are plenty of motivations... but what barriers exist and what motivations are sufficient for enough people to be willing to surmount those barriers? I think it may be easier to focus on the barriers to make contributing less painful for the already-convinced, than to think about the motivators for those needing to be convinced-- I think the consumer contributors are some of the very best suited to convince the unconvinced; our job should be to remove the barriers for people at each stage of community we are trying to build.
  
  A note to the awesome folks at Hypothes.is who are reading our consumer contributions... given the current state of the hypothes.is project, what class of contributors are you most in need of?
  
  reading group peer production motivation characteristics metric of success question lenses contributors coasespenguin barriers
2. aculich 30 Jan 2014
  
  in Public
  
  the proposition that diverse motivations animate human beings, and, more importantly, that there exist ranges of human experience in which the presence of monetary rewards is inversely related to the presence of other, social-psychological rewards.
  
  The first analytic move.
  
  reading group the answer peer production motivation rewards coasespenguin
3. aculich 30 Jan 2014
  
  in Public
  
  common appropriation regimes do not give a complete answer to the sustainability of motivation and organization for the truly open, large-scale nonproprietary peer production projects we see on the Internet.
  
  Towards the end of our last conversation the text following "common appropriation" seemed an interesting place to dive into further for our future discussions.
  
  I have tagged this annotation with "meta" because it is a comment about our discussion and where to continue it rather than an annotation focused on the content itself.
  
  In the future I would be interested in exploring the idea of "annotation types" that can be selectively turned on and off, but for now will handle that with ad hoc tags like "meta".
  
  reading group to read peer production the problem motivation organization meta coasespenguin
4. aculich 30 Jan 2014
  
  in Public
  
  The following selection from The Yale Law Journal is not paginated and should not be used for citation purposes.
  
  Note that this disclaimer only says the document should not be used for citation purposes, but doesn't say we can't use it for annotation purposes like testing out the Chrome PDF.js + Hypothes.is extension! :)
  
  You can install the extension from the Chrome Web Store with this link:
  
  https://chrome.google.com/webstore/detail/pdfjs-%2B-hypothesis/bipacimpfefoidapjkknffflfpfmjdog/related
  
  reading group meta installation chrome extension coasespenguin
5. aculich 30 Jan 2014
  
  in Public
  
  understanding that when a project of any size is broken up into little pieces, each of which can be performed by an individual in a short amount of time, the motivation to get any given individual to contribute need only be very small.
  
  The second analytic move.
  
  reading group the answer peer production motivation partitioning
Visit annotations in context

Tags

installation

the answer

to read

metric of success

peer production

the problem

characteristics

coasespenguin

motivation

barriers

contributors

partitioning

rewards

lenses

meta

reading group

question

organization

chrome extension

Annotators

aculich

URL

yale.edu/yalelj/112/BenklerWEB.pdf

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL