55 Matching Annotations
  1. Mar 2023
    1. Let’s try a different data set

      looks like it works... you're missing the experiments on gamma fluctuation and circle data generation thought

    1. Example showing batch size influencing convergence speed.

      this is a bit confusing with the placement here... I'd move it above the corresponding graph and code block

    2. This shows that small batch size takes longer to converge, 100 iterations for large batch compared to 600 for small batc

      sure, but you might want to include a few more batch sizes to show that there isn't an optimal batch size somewhere between 2 and 20

    3. Although the model still completed running, it’s loss was 0.26

      you want to prove that a large enough learning rate stops the algorithm from converging... try a larger value for alpha? if step size is too large you'll just end up with large oscillations that stops the model from converging

    1. I think that if learning rate is very small, it might not be able to converge to minimizer, as shown below:

      sure, but what if we make learning rate really large? say 10, 50, or 100?

    2. fit_stochastic() function with momentum:

      it would be nice to see your two graphs on the same plot so we can visually compare fit, stochastic, and stochastic with momentum

    1. in general, we can conclude that the use of momentum significantly speeds up convergence.

      If the one with momentum only sometimes outperforms the one without, you can't really make this claim for the general case. It does with your code and some of the cases you tested, but that doesn't necessarily mean it does on a general basis

    2. The smaller the batch, the more quickly the algorithm converges.

      If you're going to make a claim like this, it makes sense to visualize more than two different batch sizes (1,2,5,10,20?) to prove it

    1. Implementing logistic regression

      It would be nice to see some illustrations of your data and the accuracy/loss over time of each algorithm similar to Prof Chodrows

    1. np.seterr(all='ignore')

      I assume this is to override a division by zero error? regardless it would be better to write conditional checks or find another way to solve the error than just muting it

    1. (this is specific to my implementation of the algorithm, where I am looking at accuracy / adding the score to a property history for th perceptron)

      not sure about this. The runtime shouldn't be O(# data points) - the significant operation is the calculation of the update step

    1. draw_line(p2.w, -2, 2)

      I would have expected the line to at least intersect some of the data, so this is an interesting final position for the dividing line- it doesn't seem optimized for the data

    1. predict()

      the predict function is actually not strictly necessary for model training because that it is just used for recording historical accuracy.

    2. Based on the graph above, it does not seem that the data is linearly separable, as the accuracy does not seem to improve with the number of iterations making it unlikely that it will converge.

      instead of assigning centers at random, try assigning them such that you can ensure there is no overlap between the data points (as we saw in experiment 2 can cause problems) and see if the algorithm can classify them

    1. [0.51, 0.825, 0.915, 0.935, 0.965, 0.965, 0.965, 0.965, 0.965, 0.96] [0.98, 0.98, 0.98, 0.98, 0.985, 0.98, 0.98, 0.98, 0.98, 0.98]

      same thing here- a line graph would be a great way to visualize this data

    2. [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

      it would be helpful to see a graph of accuracy over all time... this doesn't tell the reader very much

    1. [0.99, 0.99, 0.99, 0.99, 0.99, 0.99, 0.99, 0.99, 0.99, 1.0]

      you don't really need this if you're plotting the accuracy over time right after

    1. self.w = np.random.rand(X_.shape[1])

      you could add a check here to make sure that w does not already exist which would let you run fit multiple times if you want to do more training - not strictly necessary for this case though

    1. import numpy as np

      please see the assignment instructions- you need to include your perceptron implementation code, the third experiment, and a runtime analysis (for a single iteration).

    1. self.score(X, y)

      better practice might be to call this once inside the function when you add it to history and then check if the most recent item in history is a 1 (if so, break). This works though, I'm just nitpicking (also if you do it this way you can use a for loop instead of a while loop)

    2. self.w = np.random.rand(p)

      here you might want to add a check to make sure that the weights haven't already been initialized in case you want to call the fit function multiple times for more training- not necessary because you're only calling it once, but good practice