32 Matching Annotations
  1. Mar 2020
  2. Nov 2019
    1. a second-rank tensor can be represented by 3 to the power of 2 or 9 numbers.

      回顾矩阵的秩的定义:设在矩阵\(\mathbf{A}\)中有一个不等于0的\(r\)阶子式\(D\),且所有\(r+1\)阶子式(如果存在的话)全等于0,那么\(D\)称为矩阵\(\mathbf{A}\)的最高阶非零子式,数\(r\)称为矩阵\(\mathbf{A}\)的秩。

  3. Oct 2019
    1. x[n] = \sum\limits_{k = -\infty}^\infty \left( \, y_{high}[k] \cdot g[-n + 2k] \, \right) + \left( \, y_{low}[k] \cdot h[-n + 2k] \, \right)x[n] = \sum\limits_{k = -\infty}^\infty \left( \, y_{high}[k] \cdot g[-n + 2k] \, \right) + \left( \, y_{low}[k] \cdot h[-n + 2k] \, \right)

      $$x[n] = \sum\limits_{k = -\infty}^\infty \left( \, y_{high}[k] \cdot g[-n + 2k] \, \right) + \left( \, y_{low}[k] \cdot h[-n + 2k] \, \right)$$

      Note that the summing variable is now \(k\)

    2. starting from the last level of decomposition

      With the highest frequency resolution, lowest time resolution.

      For 2D-DWT of an image, the last level of decomposition will look "small" in size (because of the low spatial resolution).

    3. hence double the frequency resolution

      Is it because the range of the frequency resolution has been "shrinked" that we say the frequency resolution is increased?

  4. Jun 2019
  5. Dec 2018
    1. Figure 10.16

      Each neuron has two weight matrix, one for multiplying \(x^{(t)}\) (denoted as \(U\)) and one for multiplying by \(h^{(t-1)}\) (denoted as \(W\)). 4 pairs of \(U\) and \(W\) in total.

  6. Nov 2018
  7. Sep 2018
    1. Truly useless and horrifying

      Nope, not exactly useless. I believe this is the very basics that lie under the implementation of torch.no_grad()

    1. *args, **kwargs

      wrapper乍一看在do_twice中定义的时候似乎没有传入参数的必要,但仔细一想便会发现wrapper是主程序中事实上被执行的那个(外层)函数,因此As a rule of thumb, wrapper的参数列表应该和被修饰函数一致。于是最普适的方法就是设置参数列表为*args, **kwargs。

    2. my_decorator(say_whee)

      实效上在运行时利用decorator提供的“环境”,利用say_whee这个“零件”组装了一个新的函数

    1. criterion = nn.MSELoss()

      实例化某个损失函数,并起了一个泛化的变量名(这里是criterion,你想叫它loss_fn也可以)

  8. Aug 2018
  9. Apr 2018
  10. Mar 2018
    1. ReLU activation function

      This is also known as a ramp function (斜度函数或者铰链函数) and is analogous to half-wave rectification in electrical engineering. (回忆正弦交流电加在二极管上)

  11. Dec 2017
    1. svm.ipynb

      Correct me if I am wrong: there is a typo at line 50, column 58 of file assignment1/cs231n/classifiers/linear_classifier.py. The X_batch matrix should have shape (batch_size, dim) rather than being transposed.

      (I am still confused why you make everything transposed of what you have mentioned in the lecture video)

    1. Once you derive the expression for the gradient it is straight-forward to implement the expressions and use them to perform the gradient update.

      I think it would be clearer if you figure out the expression of gradient, which is written as weights_grad, grad or dW in the assignment code:

      dW= $$ \nabla{\mathbf{W}} \mathbf{L} = \frac{\partial L}{\partial \mathbf{W}} = \frac{1}{N} \left( \begin{array}{ccc} \sum{i=1}^{N} \frac{\partial L_i}{\partial \mathbf{w}1} \ \sum{i=1}^{N} \frac{\partial L_i}{\partial \mathbf{w}2} \ \vdots \ \sum{i=1}^{N} \frac{\partial L_i}{\partial \mathbf{w}_c} \end{array} \right)

      • \lambda \frac{\partial R(\mathbf{W})}{\partial \mathbf{W}} $$

      where N is the number of training samples, and c is the number of classes.