 Jun 2017

arxiv.org arxiv.org

Who is Mistaken?Benjamin EysenbachMITbce@mit.eduCarl VondrickMITvondrick@mit.eduAntonio TorralbaMITtorralba@csail.mit.eduFigure 1: Can you determine who has a false belief about this scene? In this paper, we study how to recognize when a person in a short sequence is mistaken. Above, the woman is mistaken about the chair being pulled away from her.TimeFigure 1:Can you determine who believes something incorrectly in this scene?In this paper, we study how to recognizewhen a person in a scene is mistaken. Above, the woman is mistaken about the chair being pulled away from her in the thirdframe, causing her to fall down. Thered arrowindicates false belief. We introduce a new dataset of abstract scenes to studywhen people have false beliefs. We propose approaches to learn to recognizewhois mistaken andwhenthey are mistaken.AbstractRecognizing when people have false beliefs is crucial forunderstanding their actions. We introduce the novel problem of identifying when people in abstract scenes have incorrect beliefs. We present a dataset of scenes, each visuallydepicting an 8frame story in which a character has a mistaken belief. We then create a representation of characters’beliefs for two tasks in human action understanding: predicting who is mistaken, and when they are mistaken. Experiments suggest that our method for identifying mistakencharacters performs better on these tasks than simple baselines. Diagnostics on our model suggest it learns importantcues for recognizing mistaken beliefs, such as gaze. We believe models of people’s beliefs will have many
Interesting



The analysis showsthat, although they are superficially similar, NCE is a general parameter estimation technique that is asymptotically unbiased, while negative sampling is best understood as a family of binary classification modelsthat are useful for learning word representations but not asa generalpurpose estimator
I think NCE is slightly different from CE. Unfortunately, Chris sort of ignores Noah's work on CE in this explanation. Although, the connection between NCE and NS is nicely explained.


pdfs.semanticscholar.org pdfs.semanticscholar.org

We present an extension to Jaynes’ maximum entropy principle that handles latent variables. Theprinciple oflatent maximum entropywe propose is different from both Jaynes’ maximum entropy principleand maximum likelihood estimation, but often yields better estimates in the presence of hidden variablesand limited training data. We first show that solving for a latent maximum entropy model poses a hardnonlinear constrained optimization problem in general. However, we then show that feasible solutions tothis problem can be obtained efficiently for the special case of loglinear models—which forms the basisfor an efficient approximation to the latent maximum entropy principle. We derive an algorithm thatcombines expectationmaximization with iterative scaling to produce feasible loglinear solutions. Thisalgorithm can be interpreted as an alternating minimization algorithm in the information divergence, andreveals an intimate connection between the latent maximum entropy and maximum likelihood principles.To select a final model, we generate a series of feasible candidates, calculate the entropy of each, andchoose the model that attains the highest entropy. Our experimental results show that estimation basedon the latent maximum entropy principle generally gives better results than maximum likelihood whenestimating latent variable models on small observed data samples.
Towards intelligent negative sampling


pdfs.semanticscholar.org pdfs.semanticscholar.org

Wang et al. (2002) discuss the latent maximumentropy principle. They advocate running EM manytimes and selecting the local maximum that maximizes entropy. One might do the same for the localmaxima of any CE objective, though theoretical andexperimental support for this idea remain for futurework.
Interesting proposal, quite similar to the neg. sampling with 'exploration / exploitation'.
Definitely, worth atleast a couple reads!

One can envision amixedobjective function that tries to fit the labeledexamples while discriminating unlabeled examplesfrom their neighborhoods.
Interesting  a mixed objective function > this seems like a multitask framework!
> Reread and understand

We have presentedcontrastive estimation, a newprobabilistic estimation criterion that forces a modelto explain why the given training data were betterthan bad data implied by the positive examples.
This is again an interesting way to see it: "... forces a model to explain why the given training data were better than bad data implied by the positive examples."

Viewed as a CE method, this approach (though effective when there are few hypotheses) seems misguided; the objective says to move mass to each example at the expense of all other training examples
A very cool remark and makes sense!!

An alternative is to restrict theneighborhood to the set of observed training examples rather than all possible examples (Riezler, 1999;Johnson et al., 1999; Riezler et al., 2000):
This equation is reminiscent of the equation proposed by Nickel et al., 2017  the Poincare Embeddings paper. Especially, look for Negative Sampling.


beamandrew.github.io beamandrew.github.io

Data v/s Deep Learning


smerity.com smerity.com

Google's NMT.


news.ycombinator.com news.ycombinator.com

On Foundations


www.alexirpan.com www.alexirpan.com

Implementational issues in batch norm.
