424 Matching Annotations
  1. Dec 2016
    1. Key points:

      1. Scale of data is especially good for large NNs
      2. Having a combination of HPC and AI skills is important to have optimal impact (handle scale challenges and bigger/complex NN)
      3. Most of the value right now comes from CNNS, FCs, RNNS. Unsupervised, GANs and others might be future but they are research topics right now.
      4. E2E DL might be relevant for some cases in future like speech -> transcript, Image -> captioning, text -> image
      5. Self driving cars might also move to E2E, but none of us have enough data image -> steer

      Workflow:

      1. Bias = Training error - Human error. Try Bigger model, run longer, New model architecture
      2. Variance = Dev error - Train error. Try More data, Regularization, New model architecture.
      3. Conflict between bias and variance is weaker in DL. We can have bigger model with more data.

      More data:

      1. Data synthesis/augmentation is becoming useful and popular: OCR (superpose alphabets on various images), Speech (Superpose various background noises), NLP(?) But does have drawbacks, if it is not representative
      2. Unified data warehouse helps leverage data usage across company

      Data set breakdown:

      1. Dev and test should come from same distribution. As we spend a lot of time optimizing for Dev accuracy.

      Progress plateaus above Human level performance:

      • But there is theoretical optimal error rate (Bayes rate)

      What to do when bias is high:

      • Look at examples of the ones machine got it wrong
      • Get labels from humans?
      • Error analysis: Segment training - identify segments where training error is higher than human.
      • Estimate bias/variance effect?

      How do you define human level performance: Example: Error of a panel of experts

      Size of data:

      1. How do you define a NN as small vs medium vs large?
      2. Is the reason large NN can leverage bigger data is because it would not cause overfitting unlike on smaller NNs?
  2. Nov 2016
    1. Deep neural networks use multiple layers with each layer requiring it's own weight and bias.

      Every layer needs its own weights and bias. And in tensorflow, it is a good practice to put all weights inside a dictionary, which is easier for management.

  3. Oct 2016
    1. 这里要求,输入的数据时成对存在,每一对都有一个公共的label,是否是同一个类别。

      Verification signal

  4. Jul 2016
  5. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. half-spaces sepa-rated by a hyperplane19.

      传统算法的局限,在图像和语音领域,需要对不相干的钝感和对几个很小地方差异的敏感

    2. Deep learning

      四大金刚中的三个

    3. The key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure.

      深度学习的最重要的一方面就是多层特征自动学习

    4. most practitioners use a procedure called stochastic gradient descent (SGD).

      随机梯度下降算法,讲的很好

    5. , The chain rule of derivatives tells us how two small effects (that of a small change of x on y, and that of y on z) are composed.

      我擦!原来如此!!!

    6. The backpropagation procedure to compute the gradient of an objective function with respect to the weights of a multilayer stack of modules is nothing more than a practical application of the chain rule for derivatives.

      反向传播过程来计算一个具有多层模块权重的目标函数的梯度其实不过是求导链式规则实际应用。

    1. 根据评论区 @山丹丹@啸王 的提醒,更正了一些错误(用斜体显示),在此谢谢各位。并根据自己最近的理解,增添了一些东西(用斜体显示)。如果还有错误,欢迎大家指正。第一个问题:为什么引入非线性激励函数?如果不用激励函数(其实相当于激励函数是f(x) = x),在这种情况下你每一层输出都是上层输入的线性函数,很容易验证,无论你神经网络有多少层,输出都是输入的线性组合,与没有隐藏层效果相当,这种情况就是最原始的感知机(Perceptron)了。正因为上面的原因,我们决定引入非线性函数作为激励函数,这样深层神经网络就有意义了(不再是输入的线性组合,可以逼近任意函数)。最早的想法是sigmoid函数或者tanh函数,输出有界,很容易充当下一层输入(以及一些人的生物解释balabala)。第二个问题:为什么引入Relu呢?第一,采用sigmoid等函数,算激活函数时(指数运算),计算量大,反向传播求误差梯度时,求导涉及除法,计算量相对大,而采用Relu激活函数,整个过程的计算量节省很多。第二,对于深层网络,sigmoid函数反向传播时,很容易就会出现梯度消失的情况(在sigmoid接近饱和区时,变换太缓慢,导数趋于0,这种情况会造成信息丢失,参见 @Haofeng Li 答案的第三点),从而无法完成深层网络的训练。第三,Relu会使一部分神经元的输出为0,这样就造成了网络的稀疏性,并且减少了参数的相互依存关系,缓解了过拟合问题的发生(以及一些人的生物解释balabala)。当然现在也有一些对relu的改进,比如prelu,random relu等,在不同的数据集上会有一些训练速度上或者准确率上的改进,具体的大家可以找相关的paper看。多加一句,现在主流的做法,会在做完relu之后,加一步batch normalization,尽可能保证每一层网络的输入具有相同的分布[1]。而最新的paper[2],他们在加入bypass connection之后,发现改变batch normalization的位置会有更好的效果。大家有兴趣可以看下。

      ReLU的好

    1. Unsupervised Learning of 3D Structure from Images Authors: Danilo Jimenez Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas Heess (Submitted on 3 Jul 2016) Abstract: A key goal of computer vision is to recover the underlying 3D structure from 2D observations of the world. In this paper we learn strong deep generative models of 3D structures, and recover these structures from 3D and 2D images via probabilistic inference. We demonstrate high-quality samples and report log-likelihoods on several datasets, including ShapeNet [2], and establish the first benchmarks in the literature. We also show how these models and their inference networks can be trained end-to-end from 2D images. This demonstrates for the first time the feasibility of learning to infer 3D representations of the world in a purely unsupervised manner.

      The 3D representation of a 2D image is ambiguous and multi-modal. We achieve such reasoning by learning a generative model of 3D structures, and recover this structure from 2D images via probabilistic inference.

    1. When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning as standard practice for improved new task performance.

      Learning w/o Forgetting: distilled transfer learning

  6. Jun 2016
  7. Apr 2016
    1. We should have control of the algorithms and data that guide our experiences online, and increasingly offline. Under our guidance, they can be powerful personal assistants.

      Big business has been very militant about protecting their "intellectual property". Yet they regard every detail of our personal lives as theirs to collect and sell at whim. What a bunch of little darlings they are.

  8. Dec 2015
    1. OpenAI is a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.
    1. let you share a URL that navigates directly to the selection you shared.

      An important form of deep linking.

  9. Jul 2015
    1. Levy, D.M. (1997) "I read the news today oh boy".

      Check out this article and share with JD.

  10. Jun 2015
    1. ¿Podría contar alguna anécdota de su participación en el desarrollo de Deep Blue? Hay muchas anécdotas, pero son largas de contar. Solo te diré que el día que empecé a trabajar para IBM para enumerar los puntos débiles de Kaspárov el folio se quedó en blanco.