class LinearRegression:
you should remove commented out code snippets and nonused functions when the code works and is ready for submission
class LinearRegression:
you should remove commented out code snippets and nonused functions when the code works and is ready for submission
Below is a jointplot graph that shows both a scatter plot and bell curves of the distribution for each species
cool graph
Penguin Classification
looks good
Classifying Palmer Penguins
looks good
?
looks good
Let’s try a different data set
looks like it works... you're missing the experiments on gamma fluctuation and circle data generation thought
class
all the commented out code create a lot of clutter- if your code works I'd remove the commented out snippets
Example: Momentum Converging Faster than Regular Stochastic:
same deal here
Example showing batch size influencing convergence speed.
this is a bit confusing with the placement here... I'd move it above the corresponding graph and code block
This shows that small batch size takes longer to converge, 100 iterations for large batch compared to 600 for small batc
sure, but you might want to include a few more batch sizes to show that there isn't an optimal batch size somewhere between 2 and 20
Although the model still completed running, it’s loss was 0.26
you want to prove that a large enough learning rate stops the algorithm from converging... try a larger value for alpha? if step size is too large you'll just end up with large oscillations that stops the model from converging
#
you can create a multi-line python comment with triple quotes- i.e.: """ comment """
I think that if learning rate is very small, it might not be able to converge to minimizer, as shown below:
sure, but what if we make learning rate really large? say 10, 50, or 100?
fit_stochastic() function with momentum:
it would be nice to see your two graphs on the same plot so we can visually compare fit, stochastic, and stochastic with momentum
""" Compute the gradient descent for the fit() method """
you need more elaborate docstrings explaining this and your stochastic method (as per part 3 of the assignment)
in general, we can conclude that the use of momentum significantly speeds up convergence.
If the one with momentum only sometimes outperforms the one without, you can't really make this claim for the general case. It does with your code and some of the cases you tested, but that doesn't necessarily mean it does on a general basis
The smaller the batch, the more quickly the algorithm converges.
If you're going to make a claim like this, it makes sense to visualize more than two different batch sizes (1,2,5,10,20?) to prove it
as shown below
interesting that your graph seems to get disconnected after each local maximum... could be worth looking into
given small enough
what's small enough? at what point do they no longer converge? does it differ between the algorithms?
# shuffle the points randomly order = np.arange(n) # create an array for numbers from 0 to n np.random.shuffle(order) # shuffle the array
you could also do this in one step with something like np.random.randint
Implementing logistic regression
It would be nice to see some illustrations of your data and the accuracy/loss over time of each algorithm similar to Prof Chodrows
.
Looks good
Illustration
It looks like you're missing parts 2&3 (experiments and documentation)
gradint
typo
####
same here & throughout
$ f_w(x) = w, x $
doesn't look like this is rendering the way you intended
def fit_stochastic
your code is a bit hard to read- including some comments and blocking the code a bit more would be helpful
np.seterr(all='ignore')
I assume this is to override a division by zero error? regardless it would be better to write conditional checks or find another way to solve the error than just muting it
(this is specific to my implementation of the algorithm, where I am looking at accuracy / adding the score to a property history for th perceptron)
not sure about this. The runtime shouldn't be O(# data points) - the significant operation is the calculation of the update step
Runtime
You missed experiment 3
1 *
unnecessary
1 *
unnecessary
draw_line(p2.w, -2, 2)
I would have expected the line to at least intersect some of the data, so this is an interesting final position for the dividing line- it doesn't seem optimized for the data
.
looks good
while accuracy!= 1 and steps<maxsteps:
you could also do this with a for loop and a break statement if the accuracy condition is reached
.
everything here looks good
def plot_simple_scatter
not a serious issue but I'd recommend using same plotting function for every graph just for consistency's sake
predict()
the predict function is actually not strictly necessary for model training because that it is just used for recording historical accuracy.
Based on the graph above, it does not seem that the data is linearly separable, as the accuracy does not seem to improve with the number of iterations making it unlikely that it will converge.
instead of assigning centers at random, try assigning them such that you can ensure there is no overlap between the data points (as we saw in experiment 2 can cause problems) and see if the algorithm can classify them
update the weights with the following equation from class
could be worth explaining a bit what the equation is actually doing here.
Perceptron Blog
you're missing the last part of the assignment regarding runtime analysis
[0.51, 0.825, 0.915, 0.935, 0.965, 0.965, 0.965, 0.965, 0.965, 0.96] [0.98, 0.98, 0.98, 0.98, 0.985, 0.98, 0.98, 0.98, 0.98, 0.98]
same thing here- a line graph would be a great way to visualize this data
[0.98, 0.98, 0.98, 0.98, 0.985, 0.98, 0.98, 0.98, 0.98, 0.98]
same thing here
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
it would be helpful to see a graph of accuracy over all time... this doesn't tell the reader very much
In experiment three I created another non linear data set using the make_circles() function.
I really like this- great example
[0.99, 0.99, 0.99, 0.99, 0.99, 0.99, 0.99, 0.99, 0.99, 1.0]
you don't really need this if you're plotting the accuracy over time right after
Link to source code
code looks good
high=
you don't explicitly need this here, randint will assume the second parameter is the upper bound
self.w = np.random.rand(X_.shape[1])
you could add a check here to make sure that w does not already exist which would let you run fit multiple times if you want to do more training - not strictly necessary for this case though
perceptron.py
code looks good
Note: The scale of the axis may also look different
why? its more readable to have them as the same scale
import numpy as np
please see the assignment instructions- you need to include your perceptron implementation code, the third experiment, and a runtime analysis (for a single iteration).
self.score(X, y)
better practice might be to call this once inside the function when you add it to history and then check if the most recent item in history is a 1 (if so, break). This works though, I'm just nitpicking (also if you do it this way you can use a for loop instead of a while loop)
self.w = np.random.rand(p)
here you might want to add a check to make sure that the weights haven't already been initialized in case you want to call the fit function multiple times for more training- not necessary because you're only calling it once, but good practice