your
the
your
the
10000
entry wrong
is
delete
row
column
Fig. 1.14 Two social networks from different social media sites. The nodes are people’s accounts on the social media sites (they are the same for both sites) and the edges indicate which pairs of accounts follow one another for that particular site. For three of the people in the network, you know their accounts.
caption error
simply feed the collections of random walks
paths, which is abasically a sentence, a sequence of words
6.4.1. A simplified first-order random walk on a network¶
put 6.4.1 and 6.4.2 in appendix
𝑠𝑡sts_{t}
St+1
the author’s
our
estimates
you need to show both kinds of labels for both plots to make this point
3 communities
why 3
On a two-truths phenomenon in spectral graph clustering
hyperlink
Next
write out math for B Z B'
The difference here is that 𝑃PP itself is equal to this quantity, not just a “reduced rank” representation of 𝑃PP like you had for the Laplacian spectral embedding.
SBM
symmetric
Copy to clipboard Click to show _ = pairplot(embedding, labels=labelsbig, legend_name="Community", title="Laplacian Spectral Embedding pairs plot"); Copy to clipboard
take all sections wih svdd in title => appendix, plus 6.2.9.0
minka
is minka in graspologic instead of ZG2?
smallest 𝑛−𝑘
sometimes
Laplacian Spectral Embedding
sometimes we care about the smallest negative eigenvalues/singular values
later in this Chapter.
which chapert? also, chapter is not caps here
𝑃
X
plot_latents(latents, ax=ax, labels=labels)
make the dots bigger
.8
0.8
.2
use something other than 0.2
plot
i want to know what phat is? maybe make titles include p=0.2 and phat = 0.21, or whatever
𝑛
add |Y|
Network Models with Covariates¶
Network moodels with graph coveriates
add section on. network. models with node covariatess
500,000500,000500,000
space after ccomma
5.4.4
add usbsection. asserting th existence of GRDDPG
𝜌
mention the exisence of /Rho-RDPG?
𝜌−𝑅𝐷𝑃𝐺𝑛(𝑋)ρ−RDPGn(X)\rho-RDPG_n(X).
no dash
convey
discover?
from graspologic.simulations import sbm from graspologic.utils import to_laplacian # Make network B = np.array([[0.8, 0.1], [0.1, 0.8]]) n = [25, 25] A2, labels2 = sbm(n=n, p=B, return_labels=True) # Form new laplacian L2 = to_laplacian(A2) # decompose k = 2 U2, E2, Ut2 = svd(L2) k_matrices = U2[:, k] low_rank_approximation = U2[:,0:k] @ (np.diag(E2[0:k]) @ Ut2[0:k, :]) # Plotting fig, axs = plt.subplots(1, 2, figsize=(12, 6)) l2_hm = heatmap(L2, ax=axs[0], cbar=False, title="$L$") l2approx_hm = heatmap(low_rank_approximation, ax=axs[1], cbar=False, title="$L_2 = \sum_{{i = 1}}^{2} \sigma_i u_i u_i^T$") l2_hm.set_xlabel("Full-rank Laplacian for a 50-node matrix", fontdict={'size': 15}) l2approx_hm.set_xlabel("Sum of only two low-rank matrices", fontdict={'size': 15}); fig.suptitle("Summing only two low-rank matrices approximates the normalized Laplacian pretty well!", fontsize=24) plt.tight_layout() Copy to clipboard
hide
th 454545 total employees. This company is a real innovator, and is focused on developing software for network machine learning for business use. 101010 of these employees are company administrative executives, 252525 of these employees are network machine learning experts, and 101010 of these employees are marketing experts. For each day over the course of a full 303030-day month, we study the emails that go back and forth between the employees in the company. We summarize the emailing habits within the company using a nework, where the nodes of the network are employees, and the edges indicate the emailing behaviors between each pair of employees. An edge is said to exist if the two employees have exchanged an email on that day, and an edge does not exist if the two employees have not exchanged an email on that day. In most companies, it is common for employees in a similar role to tend to work more closely together, so we might expect that there is some level of a community structure to the emailing habits. For instance, if two employees are both network machine learning experts, they might exchange more emails between one another than a machine learning expert and a marketing expert. For the sake of this example, we will assume that the networks are organized such that the first day is a Monday, the second day is a Tuesday, so on and so forth. Let’s take a look at an example of some possible realizations of the first 333 days worth of emails. What we will see below is that all of the networks appear to have the same community organization, though on Wednesday, we will assume there was an executive board meeting, and in the morning leading up to the board meeting, the administrative executives of the company exchanged more emails than on other days. This is reflected in the fact that there are m
facebook, twitter, linkedin
𝑆𝐼𝐸𝑀𝑛
wSIEM_n(Z, \vec F)
0.5222222222222223
update with j1 code from paper. aalso, what is this number?
a value of 0
4.4.2.5. Diagonal augmentation¶ https://github.com/microsoft/graspologic/blob/f9c4353488e29d367dec62fdb4729e6a7344fd89/graspologic/embed/ase.py#L58
finish
10.1.4. Finding our Test Statistic With Hypothesis Testing¶
jovo stopped here
𝑐cc:
i'd not wriet "c:" here, just the inequality
– 𝑋(𝑖)X(i)X^{(i)} –
commas not hyphens
Networks
email cep, gudong, jovo, about putting gudong stuff in graspologic
These approaches are each useful in different situations.
I have no clue what this plot is exhibiting. WHatt are x and y? make the seed nodes more obvious (consider making them different shapes?) consider adding the non-nominees in a light color (e.g., low-alpha gray?)
nearest
how many nearest neighbors for each seed node? 4?
plot.annotate(text="Higher-ranked nominations tend to be\n in the same group as the seed nodes", xytext=(.45, .15), xy=(.6, .5), arrowprops={"arrowstyle": "->", "color": "k", }); # TODO: add colorbar
what is y? what is x?
𝑟ℎ𝑜
/rho?
from graspologic.match import GraphMatch gmp = GraphMatch() gmp = gmp.fit(A,B) B = B[np.ix_(gmp.perm_inds_, gmp.perm_inds_)] print("Number of adjecnecy disagreements: ", np.sum(abs(A-B))) fig, axs = plt.subplots(1, 3, figsize=(20, 20)) heatmap(A, ax=axs[0], cbar=False, title = 'A [ER-NP(6, 0.3) Simulation]') heatmap(B, ax=axs[1], cbar=False, title = 'B [Unshuffled]') heatmap((A-B), ax=axs[2], cbar=False, title = 'A-B [Unshuffled]')
add the scales?
0→00→00 \rightarrow 0 1→11→11 \rightarrow 1 2→32→32 \rightarrow 3 3→2
there are two node correspondences, make this simpler perhaps or more straightforward. maybe talk about a mapping to permute F?
heatmap(B@P.T, ax=axs[2], cbar=False, title = r'Row Permutation $BP^T$:')
column permutation?
o about solving this(more specific) mathematically? First, we need a metric that tells us how similar two networks are to each other. For graph matching, this similarity metric is defined as 𝑓(𝐴,𝐵)=||𝐴−𝐵||2𝐹f(A,B)=||A−B||F2f(A, B) = ||A - B||_F^2 for unweighted adjacency matrices 𝐴,𝐵∈ℝ𝑛×𝑛A,B∈Rn×nA, B \in \mathbb{R}^{n \times n}. In other words, 𝑓(𝐴,𝐵)f(A,B)f(A, B) is the sum of the squared elementwise differences between 𝐴AA and 𝐵BB. To understand this functionally, consider the best possible case where 𝐴=𝐵A=BA=B, that is, the networks are identical. The difference will be a matrix of all zeros, and taking the squared norm will then yield 𝑓(𝐴,𝐵)=0f(A,B)=0f(A,B) = 0. If we remove one edge from 𝐴AA, then 𝑓(𝐴,𝐵)=1f(A,B)=1f(A,B) = 1. If we consider the worst possible case (every edge in 𝐴AA does not exist in 𝐵BB, and vice versa), then 𝑓(𝐴,𝐵)=𝑛2f(A,B)=n2f(A,B) = n^2. This metric effectively counts the number of adjacnecy disagreements between 𝐴AA and 𝐵BB. Thus, we want to find the mapping where
what is the equation? Show it. Give an actual example
show A, B, |A - B|
show when they are same, 1 edge different, and all edges different
indicate frob norm as title
𝐴
P
𝑃
Phat Vhat Rhat
𝐴(𝑖)=𝑉𝑅(𝑖)𝑉⊤
P not A
𝑚
N
Here, we will use a matrix 𝑉VV, which is largely similar in intuition to the latent position matrix 𝑋XX. The matrix 𝑉VV will have 𝑛nn rows (the number of nodes in each of the 𝑁NN networks) and 𝑑dd columns. Like for the RDPG, 𝑑dd will again refer to the latent dimensionality of the collection of random networks. The matrix 𝑉VV has the special property that the 𝑑dd columns of 𝑉VV are orthonormal. The matrix 𝑉VV looks like this: 𝑉=⎡⎣⎢⎢⊤𝑣⃗ 1⊥⊤𝑣⃗ 2⊥...⊤𝑣⃗ 𝑑⊥⎤⎦⎥⎥V=[⊤⊤⊤v→1v→2...v→d⊥⊥⊥]\begin{align*} V &= \begin{bmatrix} \top & \top & & \top \\ \vec v_1 & \vec v_2 & ... & \vec v_d \\ \perp & \perp & & \perp \end{bmatrix} \end{align*} That 𝑉VV is orthonormal means that, for any column 𝑘kk of the 𝑑dd total columns, that 𝑣⊤𝑘𝑣𝑘=1vk⊤vk=1v_k ^\top v_k = 1, and for any pair of columns 𝑘kk and 𝑙ll where 𝑘≠𝑙k≠lk \neq l, then 𝑣⊤𝑘𝑣𝑙=0vk⊤vl=0v_k^\top v_l = 0. For the COSIE model, we also have a collection of score matrices, {𝑅(1),...,𝑅(𝑁)}{R(1),...,R(N)}\left\{R^{(1)}, ..., R^{(N)}\right\}. Each score matrix 𝑅(𝑚)R(m)R^{(m)} is a matrix with 𝑑dd columns and 𝑑dd rows which is symmetric. That 𝑅(𝑚)R(m)R^{(m)} is symmetric means that for any two latent dimensions 𝑘kk and 𝑙ll, that 𝑅(𝑚)𝑘𝑙=𝑅(𝑚)𝑙𝑘Rkl(m)=Rlk(m)R^{(m)}_{kl} = R^{(m)}_{lk}.
think with alex about what to talk about here in the context of a shared low-rank subspace V.
Now, we’d like to order the rest of the vertices in this network by their degree of similarity to the seed nodes. Remember that we talked about two ways of doing this: we could find a separate set of nominations for each seed node, or we could find a single set of nominations for all of the seed nodes. Let’s start by finding a single set, using the centroid.
lacking a sense of wheher this actually picks up a useful signal in the context of the model? Consider putting exmaple as though you have true labels, and you want to study how well it did at picking up the actual trackers? I don't know
----------------- ImportError Traceback (most recent call last) /tmp/ipykernel_1759/298421806.py in <module> ----> 1 from graphbook_code import networkplot 2 3 plot = networkplot(A, x=X[:, 0], y=X[:, 1], node_hue=nominations.flatten(), 4 palette="Purples", edge_alpha=0.05) 5 ImportError: cannot import name 'networkplot' from 'graphbook_code' (/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/graphbook_code/__init__.py)
gg code failure
like the Mahalanobis distance, which is essentially Euclidean distance but with a coordinate system and a rescaling defined by the covariance in your data.
add something about mahalonobis -- doesn't really come up elsewhere in the book?
Remember that the latent position matrix 𝑋XX has 𝑛nn rows, with each row corresponding to a node in your network, and the dot product between rows 𝑖ii and 𝑗jj of 𝑋XX is the probability that node 𝑖ii will connect with node 𝑗jj. (Here, remember that 𝑋XX is from the adjacency matrix, not the Laplacian). The block probability matrix 𝑃PP has the edge probability between nodes 𝑖ii and 𝑗jj in the (𝑖,𝑗)𝑡ℎ(i,j)th(i, j)_{th} position. How would you generate 𝑃PP from 𝑋XX?
this is not how you generate P if you believe that the network is SBM. This is how you generate P if you believe the network is RDPG. If you think you have an SBM, you use RDPG only to estimate the community assignments, and then use regular SBM estimation with the assigned communities.
6.2.3.1. The Latent Position Matrix Tells You About Edge Probabilities¶ The latent position matrix 𝑋XX that we can find from spectral embedding is related to these edge probabilities: If you take the dot product (or the weighted sum) of row 𝑖ii with row 𝑗jj, you’ll get an estimate of the probability that nodes 𝑖ii and 𝑗jj have an edge between them. Incidentally, this means that the dot product between any two rows of the latent position matrix has to be bound between 0 and 1.
this is redundant; I would circle back to an explanation in the context of coin flip probabilities and back-link to the section on RDPGs
v
I'd prefer you keep the notation homogeneous? Either use all math text or code-y text, or at the very least, make it, say, a201 and v201
𝑡ℎjthj_{th}
susperscript
first
zeroth? or 101(th)?
We know now that we can use the latent position matrix 𝑋XX to get a set of estimated edge probabilities for every node in a network. Our goal is to go the other direction - to start with “estimated edge probabilities”, and end with a latent position.
between the out of sample node and every other node in the network?
Further, we can also perform analysis on the matrix 𝑋XX itself, which will prove very useful for estimation of SBMs.
delete sentence
𝜏𝑖=2,𝜏𝑗=1
Tj = 2
well, almost any - you wouldn’t want to make an embedding with more dimensions than your network!
you can't "embed" into more dimensions than you have in any reasonable fashion? not relevant
⎡⎣⎢⎢⎢⎢𝑥11"𝑥22⋱"𝑥𝑛𝑛⎤⎦⎥⎥⎥⎥
it is not obvious that this is symmetric
The latent position matrix 𝑋XX that we’ve found from spectral embedding is related to these edge probabilities: If you take the dot product (or the weighted sum) of row 𝑖ii with row 𝑗jj, you’ll get an estimate of the probability that nodes 𝑖ii and 𝑗jj have an edge between them.
say more? this is a huge concept. also, this has nothing to do with laplacian spectral embedding, so harmonize the two better.
𝑆
$D$
minimize
minimize the COST f(A, B, P) = ...
isomorphicisomorphic\textit{isomorphic}
nobody gonna know this word
The main steps of a gradient descent method are choosing a suitable initial position (can be chosen randomly), then gradually improving the cost function one step at a time, until the function is changing by a very small amount, converging to a minimum.
I would also explain that you haven't found the global minimum somewhere
Due to the one-to-one nature of these matchings, they are also known as bijectionsbijections\textit{bijections}
I would explain this term a lot more in depth since it's extrmeely crucial and comes up literally nowhere else in the book
probability simplex (3-dimensional 𝜋⃗ π→\vec\pis):
show a specific pi for each
Finally, let’s think about how to write down the generative model for the a priori DCSBM. We say that 𝜏𝑖=𝑘′τi=k′\tau_i = k' and 𝜏𝑗=𝑘τj=k\tau_j = k, 𝐚𝑖𝑗aij\mathbf a_{ij} is sampled independently from a 𝐵𝑒𝑟𝑛(𝜃𝑖𝜃𝑗𝑏𝑘′𝑘)Bern(θiθjbk′k)Bern(\theta_i \theta_j b_{k'k}) distribution for all 𝑗>𝑖j>ij > i. As we can see, 𝜃𝑖θi\theta_i in a sense is “correcting” the probabilities of each adjacency to node 𝑖ii to be higher, or lower, depending on the value of 𝜃𝑖θi\theta_i that that which is given by the block probabilities 𝑏ℓ𝑘bℓkb_{\ell k}. If 𝐀A\mathbf A is an a priori DCSBM network with parameters and 𝐵BB, we write that 𝐀∼𝐷𝐶𝑆𝐵𝑀𝑛,𝜏⃗ (𝜃⃗ ,𝐵)A∼DCSBMn,τ→(θ→,B)\mathbf A \sim DCSBM_{n,\vec\tau}(\vec \theta, B).
add a code example
Maximum Likelihood Estimat
no caps
In the special case
this should be bullet 2.
𝙰𝚍𝚓𝚊𝚌𝚎𝚗𝚌𝚢𝚂𝚙𝚎𝚌𝚝𝚛𝚊𝚕𝙴𝚖𝚋𝚎𝚍𝚍𝚒𝚗𝚐(𝐴,𝑑)X^=AdjacencySpectralEmbedding(A,d)\hat X = \texttt{AdjacencySpectralEmbedding}(A, d)
delet
This helps gives an intuition for why our embedding matrix gives a representation of our network. You can take columns of it, turn those columns into matrices, and sum those matrices, and get the best-possible estimation for the Laplacian for the network. That means the columns of our embedding network contain all of the information necessary to estimate the network!
my main concern overall is there is just a lot of ambiguity over what's an estimate of what, and terminology wise and such. I would be really careful when using words "approximation", "estimate", and the names you give things and try to match them. I think approximation works for the network/laplacian itself, "rank k estimate" works for the probability true/expected laplacian, etc. approximations of realizations; estimates of parameters (properties of the actual random quantity).
𝑢𝑖𝑢𝑇
be careful about adding vector hats
With Spectral Embedding specifically, you take a network’s adjacency matrix, or optionally, its Laplacian matrix. Then, you find the eigenvectors corresponding to the 𝑑dd largest eigenvalues, depending on how many dimensions (𝑑dd) you’d like to embed your network down to. You scale those eigenvectors by their eigenvalues (or sometimes the square root of their eigenvalues). You’ll then have a rectangular matrix, where the columns are the 𝑑dd largest eigenvectors. The rows of that matrix will be the embedding for each node.
I would mention that you are going to describe how this occurs, specifically, before just running off the algorithm
Most machine learning methods require our data to be organized like this.
I would make this sentence say less requires, and more that "if we want to leverage stuff we already know how to use, we can do X"
Proceeding using the result we derived above, and using the fact that 𝑑𝑑𝑢log(𝑢)=1𝑢ddulog(u)=1u\frac{d}{du} \log(u) = \frac{1}{u} and that 𝑑𝑑𝑢log(1−𝑢)=−11−𝑢ddulog(1−u)=−11−u\frac{d}{du} \log(1 - u) = -\frac{1}{1 - u}:
break up more
6.2.3. a priori Stochastic Block Model¶
add code example
6.2.2. Erdös-Rényi (ER)¶
add code example
𝐵𝑒𝑟𝑛(𝑝)Bern(p)Bern(p) distribution
Bernoulli distribution with parameter $p$
understatement. To actually figure out the number, think about the first node: there are two possibilities (weighted or not weighted), so you can generate two networks from a one-node model. Now, let’s add an additional node. For each of the first two possibilities, there are two more – so there are 2×2=42×2=42 \times 2 = 4 total possible networks. Every node that we add doubles the number of networks - and since a network with 𝑛nn nodes has 𝑛×𝑛n×nn \times n edges, the total number of possible networks ends up being 2𝑛×𝑛=2𝑛22n×n=2n22^{n \times n} = 2^{n^2}! So this ten-node model can generate 2102=21002102=21002^{10^2} = 2^{100} networks, which is, when you think carefully, an absurdly, ridiculously big number.
be specific about directed + loops
(weighted or not weighted
edge or no edge
𝐾
subscript
𝑑
make it K
This time, our samples 𝑥𝑖xix_i take a different value depending on whether 𝑖ii is male or female. This example is called the Gaussian Mixture (GM), and we have shown an example which is an equal-weight mixture of two Gaussians (one with mean 000, and the other with mean 555, both with variance 111). Under the GM, we allow samples to be taken from one of 𝐾KK Gaussians with a weight parameter 𝜋𝑘πk\pi_k, for each 𝑘kk from 111 to 𝐾KK.
"we have two communities instead of 1" change text to discussing communities and not mixture distributions
We’re cheating a bit here because we know in advance which nodes belong to which community.
I wouldn't state "we're cheating a bit", I would say, "we didn't use this information, but it is just there to show you how useful/effective this unsupervised technique is
A common question as a scientist that we might have is how, exactly, we could describe the network in the simplest way possible.
"We have some question, and data (network). Are people from the same school more connected than people from different schools. The data is X. That network is noisy, and we need quantitative procedures that account for the possibility of network uncertainty." Say specifically what the nodes and edges are. Be specific about what is uncertain. People are friends, but didn't add each other. People are not friends, but didn't remove each other (fight). In order to answer our question, we need to account for this uncertainty. Why not just use a t-test? It's a network, the assumptions for traditional statistical modeling don't make sense. It's one network, not n points in d-dimensions. Read ssg and statistical connectomics paper. Model ENTIRE network, not point-by-point. A realization is an entire network, an entirely new dataset. Make this point better in the example.
We say that our network is a realization of a network model when we find a model that we think might reasonably describe the network generation process
use the term model correctly; model => RV
Suppose that 𝑋XX only contains 0’s and 1’s. To interpret 𝑋𝑋𝑇XXTXX^T, remember from linear algebra that we’re taking the weighted sum - or, in math parlance, the dot product - of each row with each other row, as shown below (where each of these vectors represent rows of 𝑋XX): (2)¶⎡⎣⎢⎢111⎤⎦⎥⎥⋅⎡⎣⎢⎢011⎤⎦⎥⎥=1×0+1×1+1×1=2[111]⋅[011]=1×0+1×1+1×1=2\begin{align} \begin{bmatrix} 1 \\ 1 \\ 1 \\ \end{bmatrix} \cdot \begin{bmatrix} 0 \\ 1 \\ 1 \\ \end{bmatrix} = 1\times 0 + 1\times 1 + 1\times 1 = 2 \end{align} If there are two overlapping 1’s in the same position of the left vector 𝑖ii and the right vector 𝑗jj, then there will be an additional 1 added to their weighted sum. The resulting value, 𝑋𝑋𝑇𝑖,𝑗XXi,jTXX^T_{i, j}, will be equal to the number of shared locations in which vectors 𝑖ii and 𝑗jj both have ones.
I like this information, but I think it belongs in a supplementary chapter or much, much earlier in the book than right here because XX^\transpose comes up a LOT win RDPGs in general, and not just CASE. I think this would go in a "preliminaries" type of section alongside other basic linear algebra and probability/statiistics results.