77 Matching Annotations
  1. Aug 2022
  2. May 2022
    1. Note that differentiating the gradient of Lw.r.t to x requires a second-order derivative of the considered parametrized function and L-BFGSneeds to construct a third-order derivative approximation, which is challenging for neural networkswith ReLU units for which higher-order derivatives are discontinuous

      非常有用的信息和分析

    2. These works make strong assumptions on the model architecture andmodel parameters that make reconstructions easier, but violate the threat model that we consider inthis work and lead to less realistic scenarios

      这篇文章意思就是他这是一个比较通用的框架,之前的都基于一些很强的针对模型结构的假设,这样应用范围其实就会受到限制

  3. Feb 2022
    1. The final head pose in theoutput image is given by Rd Ð RuRd and td Ð tu `td. Invideo conferencing, we can change a person’s head pose inthe video stream freely despite the original view angle

      这就是free的原因

    2. we reuse xc,k, which wereextracted from the source image s. This is because theface in the output image must have the same identity as theone in the source image s.

      为什么对driving video不提去keypoint的原因

    3. We note that theextracted keypoints are meant to be independent of the face’spose and expression. They shall only encode a person’sgeometry signature in a neutral pose and expression.

      这就是这里的keypoint的含义

    4. The Jacobian represents how a local patch around thekeypoint can be transformed into the corresponding patchin another image via an affine transformation

      所以其实这就是fomm最主要的思想

  4. Jan 2022
    1. Humans are able to guess the whole scene given a partialobservation of it. In a similar fashion, we aim to build a generator that trains with image patches, andinference images of unbounded arbitrary-large size.

      具体的是如何实现的呢?

    1. we further believe that our investigations to the separa-tion of high-level attributes and stochastic effects,

      这个节藕这里到底是什么意思?

    1. Multi-Head Attention

      为什么要multi-head的attention呢? 因为是dot的。 利用linear学习h次的投影。有点像多个conv channel的感觉 。给了你h词机会,学习h个高纬到低纬到的linear prejection

    2. sequence of symbol representations (x1,...,xn) to a sequenceof continuous representations z = (z1,...,zn). Given z, the decoder then generates an outputsequence (y1,...,ym) of sy

      编码的时候可以一次性看完整个句子,但是解码的时候得一个一个的生成

    3. This makesit more difficult to learn dependencies between distant position

      卷积对长序列的信息很难建模,如果两个像素块间隔很远的话,那就需要很多层的卷积最后才能把这两块像素之间的联系建立起来

    1. from a collection of single-view 2D photographs

      这些2D的图片之间有什么关系,满足多识图集合吗,是直接满足还是见解满足呢?

    1. Deep generative models have had lessof an impact, due to the difficulty of approximating many intractable probabilistic computations thatarise in maximum likelihood estimation and related strategies, and due to difficulty of leveragingthe benefits of piecewise linear units in the generative context. We propose a new generative modelestimation procedure that sidesteps these difficulties.

      这篇文章的故事

  5. Dec 2021
    1. Avatars

      static or moving image or other graphic representation that acts as a proxy for a person or is associated with a specific digital account or identity, as on the internet

  6. Oct 2021
    1. Point clouds are a simple representation thatalso supports arbitrary topology [21, 39, 77] and does notrequire data registration, but highly detailed geometry re-quires many points.

      所以讲来讲去就都关注了点云了,所以之前的应该是已经做的差不多了吧

    1. A semantic position encoding mechanismis designed to facilitate semantic-level position information andpreserve the texture patterns in the exemplars

      记录一下,蛮有新意的

  7. Sep 2021
    1. learning thistransformation completely without built-in priors and caneven learn to predict depth in an unsupervised fashion

      如何做得到的,非监督的还能估计深度?

    2. a probabilistic formulation necessary to capture the ambi-guity inherent in predicting novel views from a single image,thereby overcoming the limitations of previous approachesthat are restricted to relatively small viewpoint changes

      这个怎么理解?

    1. the 3D scene structure and the proximitybetween the body and the scene are not explicitly modeled,especially for the regions that are occluded from the cam-era view, making it hard to effectively enforce constraints in3D, such as no inter-penetration and proper contact.

      这篇文章的出发点

    1. which however often leads toloss of fine spatial information

      那么如何让MLP不丧失Spatial的能力呢?如果在MLP中加入卷机是不是就可以替换Transformer从而可能更加轻量级