Andrej Karpathy built a simple automation pipeline for AI agents to optimize training in 5-minute increments.
这个案例展示了AI系统在自动化研究中的应用,5分钟的增量优化时间是一个精细的时间尺度,表明AI系统已经能够进行快速迭代的实验。61K+的GitHub星标表明这种方法在AI研究社区中引起了广泛关注。
Andrej Karpathy built a simple automation pipeline for AI agents to optimize training in 5-minute increments.
这个案例展示了AI系统在自动化研究中的应用,5分钟的增量优化时间是一个精细的时间尺度,表明AI系统已经能够进行快速迭代的实验。61K+的GitHub星标表明这种方法在AI研究社区中引起了广泛关注。
Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way.
大多数人认为科学论文完整记录了研究过程,但作者认为传统科学论文实际上丢弃了大部分发现,只呈现线性叙事,这构成了所谓的'故事税'。这种观点挑战了学术界对出版物完整性的普遍认知。
To guide evolution, we derive 'textual gradients,' structured natural language feedback from execution traces, to pinpoint failures and suggest granular modifications.
为了引导进化,作者推导出'文本梯度',这是从执行跟踪中获得的具有结构的自然语言反馈,用于定位失败并建议细粒度的修改,显示了方法论的独特之处。
The duo had jump-started the AI-for-Erdős craze late last year by prompting a free version of ChatGPT with open problems chosen at random from the Erdős problems website.
时间点'late last year'表明这种现象已持续数月,不是一时兴起。'随机选择'的方法暗示了大规模AI辅助数学探索的潜力,但文章未提供具体解决了多少问题或成功率,这些数据缺失限制了我们对AI数学能力的全面评估。
An AI researcher subsequently gifted them each a ChatGPT Pro subscription to encourage their 'vibe mathing.'
大多数人认为严肃的数学研究需要严谨的方法和深厚的专业知识,但作者使用'vibe mathing'这种非正式术语描述这种研究方式,挑战了学术研究方法论的传统规范。
An AI researcher subsequently gifted them each a ChatGPT Pro subscription to encourage their 'vibe mathing.'
大多数人认为严肃的数学研究需要严谨的方法和深厚的理论基础,但研究人员用'vibe mathing'这种非正式方式描述他们的工作,暗示数学发现可能源于看似随性的探索而非严格的规划。
_Self-reported score with custom Anthropic scaffold._ SWEPro were evaluated with the mini-swe-agent scaffold. However, we use the scores reported by Anthropic for Opus with the max thinking efforts due to frequent timeouts during our evaluation trials.
脚注2揭示了重要数据点:Opus 4.6的53.4分是Anthropic的自报分数,因为作者在评估过程中频繁遇到超时问题,无法自行验证。这表明性能比较中存在数据可靠性问题,特别是对于Opus的评估依赖于厂商自报数据,可能存在偏差。
In our internal evals and testing, medium effort achieved slightly lower intelligence with significantly less latency for the majority of tasks.
大多数人认为内部评估和测试足以代表用户真实体验,但作者承认他们的内部测试未能准确捕捉到用户对AI智能度的实际感知差异。这暗示了实验室环境与实际使用场景之间存在根本性脱节,挑战了传统产品测试方法论的有效性。
We examine whether AI capabilities are accelerating by fitting statistical models to benchmark performance over time, and comparing their predictive accuracies.
研究方法基于统计模型拟合和预测准确度比较,这是一种严谨的方法论。通过比较不同曲线拟合的预测能力,可以更客观地判断是否存在加速趋势,而非仅凭直观观察。
We have been calling this the 'reasoning' / 'non-reasoning' split, but this is not a perfectly clean dichotomy. Several correlated but not strictly identical changes happened over the same few months: scaling inference compute, heavier use of RL in post-training, and models producing reasoning tokens.
这里承认了分类方法的局限性,指出2024年左右的AI能力加速可能是由多个因素共同作用的结果,而非仅仅是推理能力的提升。这表明文章作者对数据的复杂性有清醒认识,但缺乏对这些因素相对重要性的量化分析。
We pre-selected the 6-month horizon as our primary metric, balancing genuine forecasting distance against the limited date range of our data.
6个月的预测时间窗口是一个关键选择,既考虑了实际预测意义,又受限于数据的时间范围。这个时间跨度相对较短,可能不足以捕捉长期趋势,但适合检测最近的加速变化。选择这一窗口反映了研究者在数据有限情况下的务实权衡。
We study a mix of Hugging Face downloads and model derivatives, inference market share, performance metrics and more to make a comprehensive picture of the ecosystem.
研究方法结合了多种数据源(下载量、衍生模型、推理市场份额等),这种多维度的分析框架避免了单一指标的局限性,提供了更全面的生态系统评估。这种混合方法可能成为未来AI生态研究的标准范式。
The H100-equivalent unit uses a chip's highest 8-bit operation/second specifications to convert between chips. The actual utility of a particular chip depend on workload assumptions, so H100e does not perfectly reflect real-world performance differences across chip types.
研究方法中使用的H100等效转换存在重要局限性,它简化了不同芯片间的性能差异,这可能低估了某些专用架构的实际价值。这种标准化方法虽然在比较中提供了便利,但可能掩盖了AI硬件生态系统的多样性和创新潜力。
The standard autoresearch loop (brainstorm from code, run experiments, check metrics) works when the optimization surface is visible in the source. The Liquid results prove that. But for problems where the codebase doesn't contain enough information to generate good hypotheses, giving the agent access to papers and competing implementations changes what it tries.
这一声明清晰地区分了两种优化场景:代码可见的优化和需要外部知识的优化。它揭示了AI代理开发中的一个关键洞察:优化方法必须根据问题性质进行调整。对于某些问题,简单的代码分析就足够了;但对于更复杂的问题,需要引入外部知识和研究。这一发现对AI辅助编程系统的设计具有重要指导意义。
Gemini 3 Flash achieves the highest score of 24.0%
在原始论文中,Gemini 3 Flash 以 24.0% 的成绩位列第一——而 Artificial Analysis 的独立复测中,它的成绩是 27.7%,被 GPT-5.4 和 Claude Opus 超越。两个不同时间、不同方法论的测试得出了不同的排名。这揭示了 AI Agent 评测的根本脆弱性:同一个 benchmark,不同实施者得出不同结论。「谁第一」在 AI 评测中是一个随时间和方法论变化的流动答案。
As AI moves from a destination to a feature, our methodology will need to shift.
这句话点破 AI 产品形态的根本转变:早期 AI 是「你要去的地方」,现在变成「你已在的地方」。流量统计将越来越失真——最重度的 AI 用户可能完全不出现在 Web 访问数据中。未来 AI 竞争的关键指标,可能不再是独立访问量,而是「嵌入深度」:你有多深入用户的工作流。
two participants gave it 9/10 and one "11/10"
一个 2 小时的桌游式推演,三位顶级 AI 安全研究员给出了 9-11 分的评价——这本身就是一个信号:严肃的 AI 研究机构正在用「角色扮演」的方式准备未来。这种方法论(预演未来能力下的工作流)在其他领域有先例——军事桌游、灾难演习、情景规划——但将其用于 AI 能力演进,是 METR 独特的研究品味的体现。
Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior.
【启发】这句话提示了一种全新的 AI 研究范式:与其问「模型能做什么」,不如问「模型为什么这样做」。把情绪作为切入口去理解模型行为,本质上是把心理学方法论引入了 AI 可解释性研究。这对从业者的启发是:未来最有价值的 AI 研究,可能不在算法创新,而在「为已知现象寻找机制性解释」——就像这篇论文做的那样。
We convert chip computing capabilities into H100 equivalents (H100e) based on their relative FLOP/s specifications, specifically their maximum 8-bit specification.
用「H100 等效值」作为算力通用货币,这个方法论选择本身值得深思:它把 NVIDIA H100 确立为算力的基准单位,就像用美元作为全球储备货币。然而 Epoch AI 自己也承认这种换算「最准确的场景是模型训练」——对于推理负载,TPU 的实际效率可能被系统性低估,意味着 Google 的真实算力优势可能比数字显示的更大。
leading baselines achieve only about half the accuracy at the same efficiency
作者暗示当前主流的KV缓存压缩方法在相同效率水平下只能达到约一半的准确率,这表明现有方法存在根本性缺陷。这一尖锐的批评挑战了当前领域内的技术路线,暗示大多数同行可能一直在错误的方向上优化KV压缩。
To enable true process-level verification, we audit fine-grained intermediate states rather than just final answers, and quantify efficiency via an overthinking metric relative to human trajectories.
主流评估方法通常只关注最终答案的正确性,而作者提出了一种革命性的评估方法:关注中间过程状态并引入'过度思考'指标来衡量效率。这一观点与当前AI评估领域的传统做法背道而驰,暗示单纯追求正确答案可能掩盖了AI系统在效率和推理路径上的严重缺陷。
We introduce Iterative Reward Calibration, a methodology for designing per-turn rewards using empirical discriminative analysis of rollout data
大多数人认为奖励设计应该基于领域专家的直觉或预定义的规则,但作者提出了一种基于经验判别分析的迭代奖励校准方法。这挑战了传统的奖励工程方法,表明数据驱动的奖励设计可能比专家设计的奖励更有效,尤其是在复杂的多轮对话任务中。
We introduce Iterative Reward Calibration, a methodology for designing per-turn rewards using empirical discriminative analysis of rollout data
大多数人认为奖励设计应基于领域专家知识和预定义规则,但作者提出应基于实际训练数据的经验判别分析来迭代校准奖励。这种方法挑战了传统的奖励工程方法论,将奖励设计从'专家驱动'转向'数据驱动'。
We recruited 12 active researchers by word of mouth followed bysnowball sampling [ 53 ]
Interviews were video and audio recorded. We transcribed the audio using OpenAI's Whisper automatic speech recognition system and anonymized the transcript before analysis.
sentence describing any interview procedures
Immediately after the focused reading task, we conducted a short interview asking participants to reflect on their experience with both tasks (Appendix J.2), followed by a post-study survey (Appendix J.3).
sentence describing any interview procedures
The study concluded with a 15-minute semi-structured interview. During the interview, participants saw screenshots from the three conditions and were asked which they preferred and disliked, why, what they wished the interface had, what influenced their skimming, and how they normally skimmed texts.
sentence describing any interview procedures
After the interviews, we analyzed the data using the process described in Appendix B
sentence describing any interview procedures
We used these mock-ups as design probes [31] to inspire ideation and elicit creative responses. Specifically, we asked participants to compare and contrast alternative mock-ups and reflect on how they could be used or improved to support their known or emerging synthesis and information-foraging goals.
sentence describing any interview procedures
In the second part of the session, we provided participants with mock-ups of possible reifications of cross-document relationships that might help them synthesize information across abstracts.
sentence describing any interview procedures
In the first part of the session, we asked participants about their strategies for selecting publication venues for their manuscript submissions, how they identify and synthesize information from venues, their approaches to writing manuscripts, and finally, the technology they have used to help with these processes, current technology shortcomings, and ideas for addressing these challenges.
sentence describing any interview procedures
Sessions, which were held on Zoom, lasted 55 minutes on average. Participants were compensated with $15 USD.
sentence describing any interview procedures
The interview sessions were divided into two parts: an open-ended semi-structured interview about their backgrounds and practices, followed by feedback on a range of mock-ups, including novel reified relationships between analogous sentences in different abstracts (Figure 2).
sentence describing any interview procedures
In order to determine (1) the context in which we might offer novel views of scientific abstracts and (2) the intelligibility of various novel prototype designs for reifying cross-abstract relationships, we conducted a formative interview study with 12 active researchers (see Appendix A for participant information).
sentence describing any interview procedures
or example, Lunde-berg, Fox, and Puncochar [1994] argue that the reason why somestudies do not find gender differences in confidence on generalknowledge is because it is not in the masculine domain
so our task needs to be something that is categorised as a masculine domain
suggests quantitative methods wrt predicting future tech impact on behaviour/socialaspects, in contrast with the usual qualitative narrative methods (futurism, narrative inquiry, scenarios presumably) The Science Fiction Science Method as PDF in Zotero PDF available CC BY at https://www.researchgate.net/publication/394323287_The_Science_Fiction_Science_Method
via Bruce Sterling (Mastodon)
Using discourse analysis, the article identifies “interpretative repertoires” (Gil-bert and Mulkay 1984) and linguistic resources that are employed by the authors of profes-sional journal articles and blogs and that characterize makerspaces in particular ways. In atheory of discourse, librarians who identify themselves within these discursive constructsbecome subjects of those discourses, thus reproducing particular ways of thinking aboutmakerspaces
While not a typical empirical research article, there is a methodology used for identifying relevant sources for its literature review and analyzing those sources.
educationaldesign research methodology.
Educational design research methodology.
. We note, however, a few limitations ofour work
the sample differed along a number of other dimensions (e.g., non-DDP minestended to produce gold; DDP mines produce 3T)
why IPIS/ULULA study is not representative or generalisable
IPIS research teams collect quantitative and qualitative data through a combination of observations andinterviews with a selection of stakeholders at and around ASM sites, support villages, and trading hubs.Data collection methods and verification rely extensively on triangulation of sources, and intervieweesinclude artisanal miners, representatives of cooperatives, heads of miner camps, state agents and CSOrepresentatives.
IPIS visited 354 active mining sites, which employ an estimated 55,604 miners (Table A). Gold mines ac-counted for 86% of mines visited, followed by tin, tantalum, and tungsten (3T) mines, which accountedfor 20% of mines (Table B). These proportions are consistent with the larger IPIS database on ASM sitesin eastern DRC
When comparing data collected on the mining sites during field visits with data from mobilesurveys, it is important to keep in mind the two levels of analysis employed: mining sites and individualexperience and perception.It worth noting that the low number of gold mining sites covered by due diligence programmes meansthat comparing DDP and non-DDP mining sites will inherently be subject to any difference between 3Tand gold exploitations.
methodological 'problems'
IPIS surveyors visited mine sites andtrading centres (so-called points de vente) in two consecutive field missions, during which they collectedphone numbers of community members, made observations, and conducted interviews with several keyinformants so as to complete an extensive questionnaire on each mine site (using the OpenDataKit tool)
field visits? visit surveys and observations ODK
y combining social science methods and underwater reef surveys we identify a number of countervailing challenges and opportunities presented by globalization that both nurture and suppress the island's resilience to high amplitude, low-frequency disturbances like tsunamis
Mathodology: Combining social science methods and underwater reef surveys.
Investigating social structures through the use of network or graphs Networked structures Usually called nodes ((individual actors, people, or things within the network) Connections between nodes: Edges or Links Focus on relationships between actors in addition to the attributes of actors Extensively used in mapping out social networks (Twitter, Facebook) Examples: Palantir, Analyst Notebook, MISP and Maltego
two-tailed t-test
Statistical significance test evaluating whether a sample is greater than or less than a specific value range. Critical distribution area is two-sided.
Significance levels
The probability of rejecting the null hypothesis when it is true.
between-subject
Variability for individuals themselves in the sample.
within
Variability of specific scores for individuals in the sample.
voxel
A value on a regular grid in three-dimensional space. In this case, composes the 3-dimensional brain image.
T2*-sensitive functional imag-ing
MR imaging frequency that displays CSF as the brightest contrast, white matter as the second brightest contrast, and gray matter as the third brightest contrast. Stronger than T2 MR imaging frequency.
3-T Siemens Trio MRI scan-ner
3 Tesla-powered scanner model. In our replication, we will be using a different scanner.
fixation cross
A cross presented to research participants in a perception task with the intent directing the participants' attention to wherever the investigator wants them to look.
inter stimulus intervals
The amount of time between the end of one stimulus being presented and the start of another stimulus being presented.
T1-weighted images
MR imaging frequency that displays gray matter as the brightest contrast, white matter as the second brightest contrast, and CSF matter as the third brightest contrast.
cerebellum
Structure at the lower back of the brain, associated with motor control.
cortex
Outermost brain layer, associated with higher order cognitive abilities.
Mulot, M., Segalas, C., Leyrat, C., & Besançon, L. (2021). Re: Subramanian and Kumar. Vaccination rates and COVID-19 cases. European Journal of Epidemiology, 36(12), 1243–1244. https://doi.org/10.1007/s10654-021-00817-6
The waterfall methodology for software development is rapidly losing popularity, while the Agile methodology is increasingly used for software development by companies around the world.
Important differences between Agile and Waterfall methodology
The waterfall methodology for software development is rapidly losing popularity, while the Agile methodology is increasingly used for software development by companies around the world.

Users are shown four animated GIFs, each correspondingto modifying an attribute for a given image.
ee Sec. 4.2.2 for user studiesthat further demonstrate this
Quite nice that they strengthened their analysis with user studies, something that I haven't seen often in these type of ML papers.
AttFind proceeds as follows: At each iteration it consid-ers all K style coordinates and calculates their effect on theprobability of y.3 4 It then selects the coordinate with largesteffect, and removes all images where changing this coordi-nate had a large effect on their probability to belong to classy
AttFind takes as input the trained model and a set of N im-ages whose predicted label by C is different from y. Foreach class y (e.g., y=“cat” or y=“dog”), AttFind then findsa set Sy of M style coordinates (i.e., Sy ⊂ [1, . . . , K] and|Sy|= M ), such that changing these coordinates increasesthe average probability of the class y on these images.
Question: How can you actually define a style coordinate? How should I envision this?
Replicating scientific results is tough—But essential. (2021). Nature, 600(7889), 359–360. https://doi.org/10.1038/d41586-021-03736-4
Tunç, D. U., Tunç, M. N., & Eper, Z. B. (2021). Is Open Science Neoliberal? PsyArXiv. https://doi.org/10.31234/osf.io/ft8dc
Grimmer & Stewart (2013) - Text as Data: The Promise and Pitfalls of Automatic ContentAnalysis Methods for Political Texts
(the VTA is also part ofthis system, but is too small to image with standard fMRImethods, but see [35] for successful imaging methods).
All imaging studies face questions of validity and should (and many do) link to comprehensive details on instrumentation, methodology, and interpretation. Apparently, the professional consensus remains that, properly executed and interpreted, fMRI and other functional imaging techniques based on detection of oxygenation can lead to highly valid conclusions. (See Nautil.us article.)
How to Conduct Agile Market Research for Your Digital Product
rank correlation analysis, using Kendall's tau-c. For this purpose the three urban renewal status classes are assumed to constitute a scale. Evidence that such an assumption is reasonable is present in Tables 1 and 3.
Kendall's tau-c
a non-parametric measure of relationships between columns of ranked data. The Tau correlation coefficient returns a value of 0 to 1, where: 0 is no relationship, 1 is a perfect relationship. source: https://www.statisticshowto.com/kendalls-tau/)
Two other controls having to do with the socioeconomic level of the resident pop- ulation are used
Control variable (alternative explanations) **Socioeconomic level of residents: When controlling for High socioeconomic factors (such as high education and high income in relationship to median income).
**Region Assumes there may be regional difference between cities, such as how old they are, their state of disrepair)_ Categorized regions northeast, north central, south and west.
Since the significance of the number of functions varies with the number of all other functions (i.e., the size of the employed labor force), it should be ex- pressed as a ratio to the latter. Hence the lower the ratio of managers, proprietors, and officials7 to the employed labor force the greater is the concentration of power. (This measure will hereafter be called the MPO ratio.
Sub: The ability to mobilize community people and resources requires a management class.
Independent Variable: Managers, Proprietors and officals/ to employed labor force. Census data, number of managers
Dependent Variable: Success in Urban Renewal: Stages: Planning, Execution, Completion She classifies cities by: Those that exectued Those the abandoned the problem (for whatever reason) Those who never tried urban renewal - although they quality..
Intervening variables:
Too bad there wasn't more information in the citations, even just the author & title, let alone a short summary. I wouldn't follow the link.
acknowledge the value of the knowledge thatis held by research communities in the data collection process
Dealing with power asymmetries.
requested consent to quote brief,anonymized portions of conversations to provide anecdotalcontext.
consent
The Indian Express. “Why We Need to Count the Covid Dead,” July 20, 2021. https://indianexpress.com/article/opinion/columns/india-covid-deaths-second-wave-7412619/.
Logg, Jennifer M., and Charles A. Dorison. “Pre-Registration: Weighing Costs and Benefits for Researchers.” Organizational Behavior and Human Decision Processes 167 (November 1, 2021): 18–27. https://doi.org/10.1016/j.obhdp.2021.05.006.
Metascience 2021. (n.d.). Retrieved June 27, 2021, from https://metascience2021.org/
Mosleh, M., Pennycook, G., & Rand, D. G. (2021). Field experiments on social media [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/dgmc2
Calster, B. V., Wynants, L., Riley, R. D., Smeden, M. van, & Collins, G. S. (2021). Methodology over metrics: Current scientific standards are a disservice to patients and society. Journal of Clinical Epidemiology, 0(0). https://doi.org/10.1016/j.jclinepi.2021.05.018
Baghal, T. A., Wenz, A., Sloan, L., & Jessop, C. (2021). Linking Twitter and survey data: Asymmetry in quantity and its impact. EPJ Data Science, 10(1), 1–20. https://doi.org/10.1140/epjds/s13688-021-00286-7
Reproduction number (R) and growth rate: Methodology—GOV.UK. (n.d.). Retrieved May 13, 2021, from https://www.gov.uk/government/publications/reproduction-number-r-and-growth-rate-methodology/reproduction-number-r-and-growth-rate-methodology
In this practice, only errors from outside the program's control are to be handled (such as user input); the software itself, as well as data from within the program's line of defense, are to be trusted in this methodology.
n any nonviolent campaign there are four basic steps: collection of the facts to determine whether injustices exist; negotiation; self purification; and direct action. We have gone through all these steps in Birmingham
He is detailing his methodology - it’s not random
Untools - Tools for better thinking
Collection of thinking tools and frameworks to help you solve problems, make decisions and understand systems.
methodological arms race,
See: Weaver, J. A. and N. Snaza (2017). "Against Methodocentrism in Educational Research." Educational Philosophy and Theory 49(11): 1055-1065.
Bauer, B., Larsen, K. L., Caulfield, N., Elder, D., Jordan, S., & Capron, D. (2020). Review of Best Practice Recommendations for Ensuring High Quality Data with Amazon’s Mechanical Turk. PsyArXiv. https://doi.org/10.31234/osf.io/m78sf
This article introduces a special issue of Qualitative Inquiry that focuses on using "concept" as method in the education and social sciences. They describe this exploratory approach where the method emerges during the process of research.
Online Research Tools and Techniques. (2020, September 16). https://www.youtube.com/watch?v=wGWqBtDkOFs
Health Nerd on Twitter. (n.d.). Twitter. Retrieved October 17, 2020, from https://twitter.com/GidMK/status/1316511734115385344
Dr Natalie Shenker on Twitter. (n.d.). Twitter. Retrieved October 13, 2020, from https://twitter.com/DrNShenker/status/1314475759508107265
Kekecs, Z., Szaszi, B., & Aczel, B. (2020). ECO, an expert consensus procedure for developing robust scientific outputs [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/9gqru
Computational Social Science to Address the (Post) COVID-19 Reality. (2020, June 27). https://www.youtube.com/watch?v=7d-Dq0e1JJ0&list=PL9UNgBC7ODr6eZkwB6W0QSzpDs46E8WPN&index=4
Online Research: From Funding to Data Collection. (n.d.). Association for Psychological Science - APS. Retrieved September 25, 2020, from https://www.psychologicalscience.org/news/online-research.html
Laghaie, A., & Otter, T. (2020). Measuring evidence for mediation in the presence of measurement error [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/5bz3f
Fujita, Shigeru, Giuseppe Moscarini, and Fabien Postel-Vinay. ‘Measuring Employer-to-Employer Reallocation’. Working Paper. Working Paper Series. National Bureau of Economic Research, July 2020. https://doi.org/10.3386/w27525.
Coronavirus: The inside story of how UK’s “chaotic” testing regime “broke all the rules.” (n.d.). Sky News. Retrieved July 17, 2020, from https://news.sky.com/story/coronavirus-the-inside-story-of-how-uks-chaotic-testing-regime-broke-all-the-rules-12022566
James Evans—Designing Diversity for Collective Advance (ACM CI’20). (n.d.). Retrieved June 25, 2020, from https://www.youtube.com/watch?v=1OMmJJz0oF0
Puthillam, Arathy. ‘Too WEIRD, Too Fast: Preprints about COVID-19 in the Psychological Sciences’. Preprint. PsyArXiv, 10 June 2020. https://doi.org/10.31234/osf.io/5w7du.
Lanovaz, M., & Turgeon, S. (2020). Tutorial: Applying Machine Learning in Behavioral Research [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/9w6a3
Analyses linking Pentecostalism to the pursuit of personal gain through the breakdown of social relationships are problematic on at least two fronts. First, the vast majority of these studies are based largely, if not exclusively, on sermons and interviews with church leaders. While these are certainly an important part of any anthropological discussion of Pentecostal ritual life, in the absence of a robust ethnographic engagement with those who spend their time listening to these messages, it is impossible to determine the extent to which believers are putting their leaders' words into practice (see Engelke 2010).
The author also harshly criticizes non-ethnographic qualitative studies solely based on sermons and interviews that try to argue for a individualizing, socially corrosive prosperity Pentecostalism without data that actually documents sociality.
“To me, this is symptomatic of a much larger problem of transparency within the company. Nobody is forthcoming with information that dramatically affects editorial,” Binkowski said. “One of those things was me not knowing if I was in trouble.”
This firing of Snopes managing editor reiterates previous concerns over an unhealthy working environment. This article is old enough I'd like to see followups.
When I asked Barbara to comment on the GoFundMe page, she noticed her erasure. “Was surprised to see my life’s work described as having been ‘a small one-person effort,’”
Diminishing your 50/50 partner's contributions & erasing their attributions (accidentally or not) is one of the things I find the most annoying.
When I asked how many articles she’d written for the site, she came back with a “verified count” of 1,905. She told me how she came to that number: “By examining every Snopes.com HTML file on my computer, rereading every email David and I exchanged from 1997 until now, and in cases where doubt still existed, examining my research files. The task took a week, but I am satisfied I now have a fair list and that all lurking doubles (a result of David’s penchant for renaming files) have been excised.”
Impressive. Her alleged painstaking data-hoarding makes me like her. I'm not sure what to think of David's ambiguity yet.
Methodology The classic OSINT methodology you will find everywhere is strait-forward: Define requirements: What are you looking for? Retrieve data Analyze the information gathered Pivoting & Reporting: Either define new requirements by pivoting on data just gathered or end the investigation and write the report.
Etienne's blog! Amazing resource for OSINT; particularly focused on technical attacks.
Appendix A. The survey instrument
SURVEY DATA ON NUCLEAR POWER
Comparing survey data (i.e. individual/public opinion) with news coverage (media framing)
To be sure, the topicality, novelty or potential benefits of a given line of research might help it attract notice and support, butscientific researchfundamentally stands or falls on the thoroughness with which activities and reasoning can be tied together. You just can’t get in the game without a solid methodology.
Methodology is the critical factor for scientific study, not the result.
telephone interviews with 37 participants
I have to wonder at telephone samples of this age group given the propensity of youth to not communicate via voice phone.
"A través de las dificultades de acceso al agua, lo que estaba en juego no era pues solamente una mejora de las condiciones materiales de existencia, sino una lucha por el reconocimiento de su existencia social. Y en la medida en que mi compromiso etnográfico afectaba la dimensión material de las condiciones de vida de los residentes, se inscribía también, de hecho, en la dimensión simbólica de la vida del barrio, revelando así que incluso en las zonas más marginales en apariencia, la vida social tiene que ver con una «doble verdad»16 que el trabajo etnográfico debe aprehender. En este caso preciso, el acceso a la propiedad y a los servicios es el primer momento del acceso a la existencia social, y a su reconocimiento político. " P. 358
https://marcarand.files.wordpress.com/2011/06/bourdieu-la-doble-verdad-del-trabajo.pdf
"Inversamente, los esclarecimientos metodológicos aportados por las otras técnicas de investigación, y la multiplicación de los lugares estudiados, me permitían dirigir otra mirada sobre mi campo etnográfico, y de alguna manera «revisitarlo» (Burawoy, 2010). Retomando el vocabulario de Burawoy, yo pasaba de la «revisita de confirmación» a la «revisita en continuo», pero de tipo «empirista», cuando el investigador descubre dinámicas que superan la comunidad previamente estudiada." P. 356
El enfoque etnográfico efectuado inicialmente me permitía reintegrar, en el momento de la objetivación, el «sentido vivido» por los agentes (Bourdieu, 1980b): me daba cuenta que este me había servido de aquello que Loïc Wacquant llama un «instrumento de deconstrucción de las categorías» (Wacquant, 2008) utilizadas en los enfoques estadísticos. p. 356
Because of both the content that people upload and the behavioral traces that they leavebehind, social network sites have unprecedented quantities of data concerning humaninteraction. This presents unique opportunities and challenges. On one hand, SNSs offera vibrant “living lab” and access to behavioral data at a scale inconceivable to manysocial scientists. On the other, the data that are available present serious research ethicsquestions and introduce new types of biases that must be examined (boyd and Crawford2012)
The scope and scale of trace data —from settings, public facing fatures, and server-side — presents similar challenges as technological platform changes = new ethics/privacy issues.
For those of us who believe that social network sites are socio-technical systems, in whichsocial and technical factors shape one another, failing to describe the site under studyignores the fact that the technological constraints and affordances of a site will shapeuser practices and that social norms will emerge over time. Not including informationabout what the feature set was at the time of data collection forecloses the possibility ofidentifying patterns that emerge over time and through the accumulated scholarshipacross a range of sites and user samples. Unfortunately, because they have no knowledgeabout how things will continue to evolve and which features will becomeimportant to track, researchers may not be able to identify the salient features to reportand may struggle with devoting scarce publication space to these details, but this doesn’tundermine the importance of conscientious consideration towards describing the artifactbeing analyzed.
What about documenting technological features/artifacts on a stand-alone website or public repository, like Github to account for page limits?
In order to produce scholarship that will be enduring, the onus is on social mediaresearchers to describe the technological artifact that they are analyzing with as muchcare as survey researchers take in describing the population sampled, and with as muchdetail as ethnographers use when describing their field site. This is not to say thatresearchers must continue to describe technologies as if no one knows what they are—weare beyond the point where researchers must explain how electronic mail or “email” islike or unlike postal mail. But, rather, researchers must clearly describe the socio-technical context of the particular site, service, or application their scholarship isaddressing. In addition to attending to the technology itself, and the interchange betweentechnical and social processes, we believe SNS researchers should make a concertedeffort to include the date of data collection and to describe the site at the moment of datacollection and the relevant practices of its users. These descriptions will enable laterresearchers to synthesize across studies to identify patterns, much in the same wayreporting exact effect sizes allows for future meta-analyses
Excellent point and important for my SBTF studies.
I think of research as a conversation, and it really is very much like a conversation. No single person dominates it, but what does happen is when you interject something, when you contribute something to a conversation, you want to be understood, you want to be heard, you would like people to pay attention, you would like it to have some influence on the way the conversation goes. You don't control it. But thinking in conversational terms and trying to say something that is interesting as a criteria, not merely publishable but actually is interesting -- that's been part of what moved me.
studied the social network structure underlying Youtube videos
I wonder what the methodology of this paper was?
Emma’s multiplicity of subplots, andits preoccupation with the reading, rereading and misreading of writing within,and events internal to, the text, renders the novel a manifesto for Austen’sapproach to the‘‘judicious’’, critical reading necessary to understanding thefunction of literary influence in her fiction
The author makes it clear that she will be explicating her thesis via the example of Emma. In this sentence, she connects Austen's approach to reading, writing, and readership to the notion of literary fiction. Again, I'm not sure if the following paragraphs do live up to the expectation she sets up here.
The catalyst forthe novel, however, seems to have been a straightforward reaction to a newwork by an author Austen considered her competition*the Scottish MaryBrunton’sDiscipline(1814).Disciplineis a fictional autobiography with the strong religious themes ofsin, repentance and redemption.
The author claims here that Emma was inspired by the 1814 novel Discipline by Mary Brunton, which surely is not part of the male literary canon laid out earlier in the article. The author outlines the main themes of Discipline and explains the relationship between the two authors.
I feel like a broken record here, but again, this seems to be a very tenuous point without computational analysis. The author's own language belies this tenuousness as she says that the novel's inspiration "seems to have been a straightforward reaction" to another novel. The word "seems" does not inspire confidence.
Still, it may not be others' actual behavior that drives ourown ad-dictive behavior, but our perceptions of their behavior, where the twoconflict.
This relates to the above annotation. I also agree that people's perceptions of others behaviors are unreliable; but this is an interesting point that in this case, perception may be more important than reality. Of course, that would need to be tested for us to know for sure. I think an interesting future study would be to use SNA and have both the egos and their alters actually track their substance use activities day by day. This would address the perception vs reality issue, as well as the underestimation of own behavior issue. There could still be some social desirability bias though.
This suggested that for this survivor, those feelings were the most powerful aspects of her experience.
This is an excellent use of SNA. I have never seen it used chronologically to map feelings. I love it!
snowball sampling method where we initially asked 17 re-spondents about their social and expert networks. This in-volved asking informants to name the seven most importantpersons in their lives, starting with the most important, outsideoftheir household
Think about this for your own research. Pay attention to the methods so you can begin to think about how you will collect your own data.
Austen allows Emma to imaginatively misattribute herself. In doing so,she offers the reader a literary red herring. While Harriet may fall in and out oflove as if she is subject to one of Puck’s spells,Emmatakes its cues from adifferent Shakespearean comedy.24Emma, who has‘‘very little intention of ever marrying at all’’, yet is happyto consider Frank Churchill as a potential husband (84), resembles Olivia, the‘‘too proud’’heiress of Shakespeare’sTwelfth Night, whose resolution to live‘‘like a cloistress’’is quickly abandoned when she meets Viola, disguised as aboy.25
In this brief introduction to the next section of the paper, Murphy challenges existing scholarship that aligns Emma with Shakespeare's A Midsummer Night's Dream. Rather, the author outlines the parallels between Shakespeare's Twelfth Night. I find the connection somewhat tenuous as it almost ignores all of the gender bending and performance of Twelfth Night. While the author's later claim that "the broader themes of deliberate misrepresentation and self-serving delusions" are the tie between the two plays, I find that ignoring the aspects of performance and disguise is problematic.
I also think that this takes away from Murphy's main argument, which is that Austen's view of influence is broader than the historically main canon, evidence by her parody of Brunton's novel. This section seems to show the opposite, which is a parallel between Austen and Shakespeare.
It is not to be expected that any character withinEmmamight be able toexercise the kind of judgment of its creator or perform the kind of judiciousreading that Austen’s text ultimately demands. This does not prevent Austenfrom demonstrating how her characters can betaughtto read and to judgeclearly.
Here, Murphy makes the connection back to readership and the characters of Emma.
This, incidentally, made me think of the quote on the new British ten pound note: "I declare after all there is no enjoyment like reading!" which was certainly a satirical denotation.
If we enlarge our understanding of the concept of‘‘influence’’, we canbegin to see the ways in which artistically unremarkable, canonicallydisregarded works inform the development even of masterpieces. Ros Ballastercorrectly states that:[...] most women novelists of the eighteenth century tended to locatetheir own writing in relation to a strong line of male predecessors orcontemporaries [...] if women read each other’s work they did not, forthe most part, openly acknowledge influence.16Jane Austen is the exception to this rule. Far from shamefacedly concealing herdebt to Brunton’s novel, on the contrary, Austen’s linguistic allusions toDisciplineinEmmadraw the reader’s attention to the two novels’intimateconnection
This is a key section. Here, the author claims that Jane Austen's Emma is influenced by the rather unremarkable and certainly much less well known novel Discipline. This is in contrast to the existing tradition. Murphy cites and agrees with Ballaster's argument that 18th century women authors situated their own work within the male tradition and did not seek recognition for the influence of other female authors. However, Murphy argues that Austen makes obvious the connection to Brunton.
Such active, criticalreading, of course, distinguishes the work still expected of students andscholars of literature, and Austen’s assumption of this ability in her readers wasunderpinned by historic changes in the study of literature.
Here, Murphy makes a claim and supports it with a secondary source on the place of literature in English education. In doing so, Murphy is illuminating the basis of Austen's assumption of a certain body of literary knowledge.
‘‘I do not write for such dull Elves’’, wrote Jane Austen to her sisterCassandra in 1813,‘‘As have not a great deal of Ingenuity themselves’’.
Here, the author turns to a primary source, a letter by Jane Austen, to research her views of her readers. While authorial intention is often inscrutable, such a primary source can assist in evaluating Austen's notions of readership.
look at the edges, the connections between the nodes.
This is what we mean with the term 'sociological imagination'. Theory allows us to 'see' below the surface of society and to understand the invisible network of norms, values, structures, institutions and systems of inequality that shape individual choice and behavior. In this way, SNA should be fundamental to sociological methods.
descriptive analysis
The challenge of descriptive research for traditional social scientists is the change in how questions are asked. Questions should focus on describing something in a deep and informative way. Traditional social science relies more on predictive and inferential analysis. Can I predict what will happen if this variable changes in this way? Research questions identify the independent and dependent variables. SNA does not have IV and DV so questions are more about revealing what is going on underneath; i.e. how do the members of corporations know each other?
challenging for those using SNA.
It is the blessing and the curse of SNA--it can do so much but it can also do too much. The analyst has to be clear in their question, defining what is a node and what is a link. It can get even trickier since nodes and links can also be reversed. Sometimes a node can be a link and a link can also be a node!
interaction of two actors.
Excellent! Therefore SNA requires three points of data--Node A, Node B and the link between them. There is no dependent and independent variable. The means there is no inferential or predictive questions. Questions are more descriptive and comparative.
relational data
There are three points of data in SNA; node A, node B and the link between them. Traditional social science requires only two--independent variable and dependent variable.
molded by our social connections
Isn't this the foundation of the sociological imagination? Why is SNA not more central to sociological methods?
heart of sociological thought is the belief that we are all a part of a vast tapestry of social connections.
As I think about this, I am always a bit perplexed as to why SNA is not more foundational to Sociology. SNA reveals that which is very fabric of our society. Why is is not more utilized as a methodology? I suspect it has something to do with how hard it is to collect data.
aspect of an average person’s life to very soundly prove their point.
These outcomes have been linked to friends and social contacts. Research asks how many friends or how often do you socialize? While this hints at the issue of networks, asking for lists or numbers does not produce network data. You have to find the links between people and between those people that people know.
just how important our social networks are to every aspect of our lives
Social networks are like 'air'; they surround us and we don't even see them. To me, this is what makes the methodology of SNA harder to grasp; how to access the data for 'air'? How to understand the discreet influence of 'air'? But once you see it, you can't unsee it! Very powerful!
ask people to list those in their social circles who have intervened in abusive situations, people they have talked to about bystander intervention, or people whose opinion on intervening is important to them.
What would be the links between these people? If you asked someone to list their friends, you will get lists which produce a star network. There needs to be a second round of questions involving friends of friends. Getting network data requires asking interrelated people.
individuals, groups, or systems.
This perspective of looking at individuals in categories is the foundation of statistics--two variables that are mutually exclusive and the goal is to see if they relate in any way. SNA is very different--all variables are related and dependent. That is why SNA is descriptive--can't do predictive without mutually exclusive variables.
So with social networking graphs, we will be able to get a better view on connections and their movement in the #rhizo14 constellation.
Different methodology for research.
n total we received 40 faculty and 39 student
What this doesn't tell us is how many of the faculty/students were reporting experiences on the same projects.
Page 8
Jockers talking about the old approach in the 1990s to anecdotal evidence:
… in the 1990s, gathering literary evidence meant reading books, noting "things" (a phallic symbol here, a bibliographical reference there, a stylistic flourish, an allusion, and so on) and then interpreting: making sense and arguments out of those observations. Today, in the age of digital libraries and large-scale book-digitization projects, the nature of the "evidence" available to us has changed, radically. Which is not to say that we should no longer read books looking for, or noting, random "things," but rather to emphasize that massive digital corpora offer is unprecedented access to literally record an invite, even demand, a new type of evidence gathering and meaning making. The literary scholar of the 21st-century can no longer be content with anecdotal evidence, with random "things" gathered from a few, even "representative," text. We must strive to understand the things we find interesting in the context of everything else, including a massive possibly "uninteresting" text.
Pages 7 and 8
Jockers is talking here about Ian Watt’s method in Rise of the Novel
What are we to do with the other three to five thousand works of fiction published in the eighteenth century? What of the works that Watt did not observe and account for with his methodology, and how are we to now account for works not penned by Defoe, by Richardson, or by Fielding? Might other novelists tell a different story? Can we, in good conscience, even believe that Defoe, Richardson, and Fielding are representative writers? Watt’s sampling was not random; it was quite the opposite. But perhaps we only need to believe that these three (male) authors are representative of the trend towards "realism" that flourished in the nineteenth century. Accepting this premise makes Watts magnificent synthesis into no more than a self-fulfilling project, a project in which the books are stacked in advance. No matter what we think of the sample, we must question whether in fact realism really did flourish. Even before that, we really ought to define what it means "to flourish" in the first place. Flourishing certainly seems to be the sort of thing that could, and ought, to be measured. Watt had no yardstick against which to make such a measurement. He had only a few hundred texts that he had read. Today things are different. The larger literary record can no longer be ignored: it is here, and much of it is now accessible.
Page 217
Methods for organizing information in the humanities follow from their research practices. Humanists fo not rely on subject indexing to locate material to the extent that the social sciences or sciences do. They are more likely to be searching for new interpretations that are not easily described in advance; the journey through texts, libraries, and archives often is the research.
Page 213
Humanities scholarship is even more difficult to characterize than are the sciences and social sciences. Generally speaking, the humanities are more interpretative than data driven, but some humanists conduct qualitative studies using social sciences methods, and others employ quantitative methods. Digital humanities scholarship often reflects sophisticated computational expertise. Humanists value new interpretations, perspectives, and sources of data to examine age-old questions of art and culture.
But the passage from de man does disservice to the discussion of close reading in one important respect. It makes it sound as though all you need is a negative disci-pline, a refusal to leap to the kind of paraphrases one has been led to expect, so that effective close reading requires no technique or training, only an avoidance of bad or dubious training. The suggestion seems to be that if one strips away these bad habits and simply encounters the text, without preconceptions, close reading will occur. If, as de man puts it, you are “attentive” and “honest,” close reading “cannot fail to respond to structures of language” that most literary education strives “to keep hidden.” atten-tion is important but not, alas, enough. Readers can always fail to respond—though then de man might not want to dignify the practice with the name of reading.
Discussion of the methodological difficulties involved in close reading: i.e. there is no such thing as "just reading."
Culler, Jonathan. 2010. “The Closeness of Close Reading.” ADE Bulletin, 20–25. doi:10.1632/ade.149.20.
Distant Reading: Performance, Readership, and Consumption in Contempo-rary Poetry, Peter middleton calls close reading “our contemporary term for a hetero-geneous and largely unorganized set of practices and assumptions”
Discussion of the methodology of close reading: middleton, Peter. Distant Reading: Performance, Readership, and Consumption in Contemporary Poetry. Tuscaloosa: U of alabama P, 2005. Print.
Page 16
One benefit of traditional hermeneutical practices such as close reading is that the trained reader need not install anything, run any software, wrestle with settings, or wait for results. The experienced reader can just enjoy iteratively reading, thinking, and rereading. Similarly the reader of another person's interpretation, if the book being interpreted is at hand, can just pick it up, follow the references, and recapitulate the reading. To be as effective as close reading, analytical methods have to be significantly easier to apply and understand. They have to be like reading, or, better yet, a part of reading. Those invested in the use of digital analytics need to think differently about what is shown and what is hidden: the rhetorical presentation of analytics matters. Further, literary readers of interpretive works want to learn about the interpretation. Much of the literature in journals devoted to humanities computing suffers from being mostly about the computing; it is hard to find scholarship that is addressed to literary scholars and is based in computing practices.
Page 6
Computer-assisted research in the humanities, by contrast to the Cartesian story and traditional humanities practices, has almost always been collaborative. This is due to the variety of skills needed to implement digital humanities projects. It is also linked to the relationship between the practices of interpretation in the development of the tools of interpretation, be the tools for analyzing text or digital editions. Anyone who has used tools forged by another person is in collaboration, even if one isn't personally influencing the provider of the tools. The need to collaborate, though acknowledged in various ways, has been a professional hindrance, as anyone who submits a curriculum vitae for promotion listing nothing but co-authored papers knows.
Pages 6-7
Collaboration is not always good. It separates the interpreter/scholar from the designer/programmer who implements the scholarly methods. Willard McCarthy notes that the introduction of software "separated the conception of the problems (domain of the scholar) from the computational means of working them out (baliwick of the programmer) and so came at a significant cost.” As computing is introduced into research, it separates consumption, implementation, and interpretation in ways that can be overcome only through dialogue and collaboration across very different fields. Typically, humanities scholars know little about programming and software engineering, and programmers know little about humanities scholarship. Going it alone is an option only for the few who have time to master both. The rest of us and up depending on others.
do these analyses without ever mentioning diversity when we recruit survey-takers or within our survey
Indeed, a major advantage, in terms of methodology. Bias is introduced in myriad ways, but that one way would likely have a much deeper effect than many others.
DARIAH the challenge involved conducting, analysing and understandingresearch practices of arts and humanities researchers, a largely ill-definedcommunity encompassing a wide spectrum of disciplines. Each of them dealswith a variety of objects employing an extensive number of methods. In thecontext of EHRI, the challenge is slightly different, due to the involvementof a better-defined research community. Holocaust researchers share well-identified objects, common ground on methods, and handle similar setbacks. In
Really interesting idea: do an analysis of humanities researchers in general (DARIAH) and Holocaust researchers in specific (EHRI). One is very heterogeneous, the other very homogeneous (at least in terms of working conditions and, broadly speaking, data sources).
argely ill-definedcommunity encompassing a wide spectrum of disciplines.
description of "arts and humanities researchers"
AN APPROACH TO ANALYSING WORKING PRACTICES OFRESEARCH COMMUNITIES IN THE HUMANITIE
Benardou, Agiatis, Panos Constantopoulos, and Costis Dallas. 2013. “An Approach to Analyzing Working Practices of Research Communities in the Humanities.” International Journal of Humanities and Arts Computing 7 (1–2): 105–27. doi:10.3366/ijhac.2013.0084.
is wiseto avoid generalizations and to concentrate instead on show-ing how interactions between coworkers, specifically theorchestration of information exchange and coauthorship, aregrounded in local culture.
"it is wise to avoid generalizations and to concentrate instead on showing how interactions between coworkers, specifically the orchestration of information exchange and coauthorship, are grounded in local culture."
to research ‘sensory perception and reception’ requires methods that ‘are capable of grasping “the most profound type of knowledge [which] is not spoken of at all and thus inaccessible to ethnographic observation or interview” (Bloch 1998: 46)’ (Bendix 2000: 41). Thus sensory ethnography discussed in the book does not privilege any one type of data or research method. Rather, it is open to multiple ways of knowing and to the exploration of and reflection on new routes to knowledge.
Hawhee: why do I buy the "profound," the "most profound" as a description of sensory knowledge?
A major problem is that this possibility of exploring a network is often lost when it is published. The rich experience of interacting with the network within Gephi is converted to a pdf or png format,
Is it not the task of simplifying, that the research denies herself, when dreaming of showing the full complexity of a phenomenon to it audience?