233 Matching Annotations
  1. Mar 2017
    1. the area under the curve (often referred to as simply the AUC) is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative')

      AUC能够在CTR应用中有指导意义的原因

  2. Feb 2017
    1. Robert Mercer, Steve Bannon, Breitbart, Cambridge Analytica, Brexit, and Trump.

      “The danger of not having regulation around the sort of data you can get from Facebook and elsewhere is clear. With this, a computer can actually do psychology, it can predict and potentially control human behaviour. It’s what the scientologists try to do but much more powerful. It’s how you brainwash someone. It’s incredibly dangerous.

      “It’s no exaggeration to say that minds can be changed. Behaviour can be predicted and controlled. I find it incredibly scary. I really do. Because nobody has really followed through on the possible consequences of all this. People don’t know it’s happening to them. Their attitudes are being changed behind their backs.”

      -- Jonathan Rust, Cambridge University Psychometric Centre

  3. Jan 2017
    1. AI criticism is also limited by the accuracy of human labellers, who must carry out a close reading of the ‘training’ texts before the AI can kick in. Experiments show that readers tend to take longer to process events that are distant in time or separated by a time shift (such as ‘a day later’).
    2. Even though AI annotation schemes are versatile and expressive, they’re not foolproof. Longer, book-length texts are prohibitively expensive to annotate, so the power of the algorithms is restricted by the quantity of data available for training them.
    3. In most cases, this analysis involves what’s known as ‘supervised’ machine learning, in which algorithms train themselves from collections of texts that a human has laboriously labelled.
  4. Dec 2016
    1. The team on Google Translate has developed a neural network that can translate language pairs for which it has not been directly trained. "For example, if the neural network has been taught to translate between English and Japanese, and English and Korean, it can also translate between Japanese and Korean without first going through English."

  5. Oct 2016
    1. In machine learning, the term "ground truth" refers to the accuracy of the training set's classification for supervised learning techniques.

      Ground truth in machine learning

  6. May 2016
    1. the algorithm was somewhat more accurate than a coin flip

      In machine learning it's also important to evaluate not just against random, but against how well other methods (e.g. parole boards) do. That kind of analysis would be nice to see.

  7. Apr 2016
    1. We should have control of the algorithms and data that guide our experiences online, and increasingly offline. Under our guidance, they can be powerful personal assistants.

      Big business has been very militant about protecting their "intellectual property". Yet they regard every detail of our personal lives as theirs to collect and sell at whim. What a bunch of little darlings they are.

  8. Feb 2016
    1. Patrick Ball—a data scientist and the director of research at the Human Rights Data Analysis Group—who has previously given expert testimony before war crimes tribunals, described the NSA's methods as "ridiculously optimistic" and "completely bullshit." A flaw in how the NSA trains SKYNET's machine learning algorithm to analyse cellular metadata, Ball told Ars, makes the results scientifically unsound.
    1. “Search is the cornerstone of Google,” Corrado said. “Machine learning isn’t just a magic syrup that you pour onto a problem and it makes it better. It took a lot of thought and care in order to build something that we really thought was worth doing.”
  9. Jan 2016
    1. UT Austin SDS 348, Computational Biology and Bioinformatics. Course materials and links: R, regression modeling, ggplot2, principal component analysis, k-means clustering, logistic regression, Python, Biopython, regular expressions.

  10. Dec 2015
    1. OpenAI is a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.
    1. Big Sur is our newest Open Rack-compatible hardware designed for AI computing at a large scale. In collaboration with partners, we've built Big Sur to incorporate eight high-performance GPUs
  11. Nov 2015
    1. a study by Stephen Schueller, published last year in the Journal of Positive Psychology, found that people assigned to a happiness activity similar to one for which they previously expressed a preference showed significantly greater increases in happiness than people assigned to an activity not based on a prior preference. This, writes Schueller, is “a model for positive psychology exercises similar to Netflix for movies or Amazon for books and other products.”
    1. TPOT is a Python tool that automatically creates and optimizes machine learning pipelines using genetic programming. Think of TPOT as your “Data Science Assistant”: TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines, then recommending the pipelines that work best for your data.

      https://github.com/rhiever/tpot TPOT (Tree-based Pipeline Optimization Tool) Built on numpy, scipy, pandas, scikit-learn, and deap.

    1. Nanodegree Program Summary Machine learning represents a key evolution in the fields of computer science, data analysis, software engineering, and artificial intelligence. It has quickly become industry's preferred way to make sense of the staggering volume of data our modern world produces. Machine learning engineers build programs that dynamically perform the analyses that data scientists used to perform manually. These programs can “learn” based on millions of experiences, all rigorously and numerically defined.
  12. Oct 2015
    1. I have the feeling we do not need to use models as complicated as some outlined in the text; we can (and finally will have to) abstract from most of the issues we can imagine. I expect that "magic" (an undisclosed heuristic, perhaps in combination with machine learning) will deal with the issues, a black box that will be considered inherently flawed and practical enough at the same time. The results from experimental ethics can help form the heuristic while the necessity for easy implementation and maintainability will limit the applications significantly.

  13. Sep 2015
  14. Aug 2015
  15. Jul 2015
  16. Jun 2015
    1. Enter the Daily Mail website, MailOnline, and CNN online. These sites display news stories with the main points of the story displayed as bullet points that are written independently of the text. “Of key importance is that these summary points are abstractive and do not simply copy sentences from the documents,” say Hermann and co.

      Someday, maybe projects like Hypothesis will help teach computers to read, too.

  17. Jan 2015
    1. Logistic regression, also called a logit model, is used to model dichotomous outcome variables. In the logit model the log odds of the outcome is modeled as a linear combination of the predictor variables.