98 Matching Annotations
  1. Mar 2023
  2. May 2022
    1. Fig. 1.14 Two social networks from different social media sites. The nodes are people’s accounts on the social media sites (they are the same for both sites) and the edges indicate which pairs of accounts follow one another for that particular site. For three of the people in the network, you know their accounts.

      caption error

  3. Apr 2022
    1. The difference here is that 𝑃PP itself is equal to this quantity, not just a “reduced rank” representation of 𝑃PP like you had for the Laplacian spectral embedding.
      1. sample rather than P
      2. ASE rather than LSE
    1. Copy to clipboard Click to show _ = pairplot(embedding, labels=labelsbig, legend_name="Community", title="Laplacian Spectral Embedding pairs plot"); Copy to clipboard

      take all sections wih svdd in title => appendix, plus 6.2.9.0

  4. Mar 2022
    1. from graspologic.simulations import sbm from graspologic.utils import to_laplacian # Make network B = np.array([[0.8, 0.1], [0.1, 0.8]]) n = [25, 25] A2, labels2 = sbm(n=n, p=B, return_labels=True) # Form new laplacian L2 = to_laplacian(A2) # decompose k = 2 U2, E2, Ut2 = svd(L2) k_matrices = U2[:, k] low_rank_approximation = U2[:,0:k] @ (np.diag(E2[0:k]) @ Ut2[0:k, :]) # Plotting fig, axs = plt.subplots(1, 2, figsize=(12, 6)) l2_hm = heatmap(L2, ax=axs[0], cbar=False, title="$L$") l2approx_hm = heatmap(low_rank_approximation, ax=axs[1], cbar=False, title="$L_2 = \sum_{{i = 1}}^{2} \sigma_i u_i u_i^T$") l2_hm.set_xlabel("Full-rank Laplacian for a 50-node matrix", fontdict={'size': 15}) l2approx_hm.set_xlabel("Sum of only two low-rank matrices", fontdict={'size': 15}); fig.suptitle("Summing only two low-rank matrices approximates the normalized Laplacian pretty well!", fontsize=24) plt.tight_layout() Copy to clipboard

      hide

  5. Jan 2022
  6. Dec 2021
    1. th 454545 total employees. This company is a real innovator, and is focused on developing software for network machine learning for business use. 101010 of these employees are company administrative executives, 252525 of these employees are network machine learning experts, and 101010 of these employees are marketing experts. For each day over the course of a full 303030-day month, we study the emails that go back and forth between the employees in the company. We summarize the emailing habits within the company using a nework, where the nodes of the network are employees, and the edges indicate the emailing behaviors between each pair of employees. An edge is said to exist if the two employees have exchanged an email on that day, and an edge does not exist if the two employees have not exchanged an email on that day. In most companies, it is common for employees in a similar role to tend to work more closely together, so we might expect that there is some level of a community structure to the emailing habits. For instance, if two employees are both network machine learning experts, they might exchange more emails between one another than a machine learning expert and a marketing expert. For the sake of this example, we will assume that the networks are organized such that the first day is a Monday, the second day is a Tuesday, so on and so forth. Let’s take a look at an example of some possible realizations of the first 333 days worth of emails. What we will see below is that all of the networks appear to have the same community organization, though on Wednesday, we will assume there was an executive board meeting, and in the morning leading up to the board meeting, the administrative executives of the company exchanged more emails than on other days. This is reflected in the fact that there are m

      facebook, twitter, linkedin

  7. Nov 2021
    1. 4.4.2.5. Diagonal augmentation¶ https://github.com/microsoft/graspologic/blob/f9c4353488e29d367dec62fdb4729e6a7344fd89/graspologic/embed/ase.py#L58

      finish

    1. These approaches are each useful in different situations.

      I have no clue what this plot is exhibiting. WHatt are x and y? make the seed nodes more obvious (consider making them different shapes?) consider adding the non-nominees in a light color (e.g., low-alpha gray?)

    2. plot.annotate(text="Higher-ranked nominations tend to be\n in the same group as the seed nodes", xytext=(.45, .15), xy=(.6, .5), arrowprops={"arrowstyle": "->", "color": "k", }); # TODO: add colorbar

      what is y? what is x?

    1. from graspologic.match import GraphMatch gmp = GraphMatch() gmp = gmp.fit(A,B) B = B[np.ix_(gmp.perm_inds_, gmp.perm_inds_)] print("Number of adjecnecy disagreements: ", np.sum(abs(A-B))) fig, axs = plt.subplots(1, 3, figsize=(20, 20)) heatmap(A, ax=axs[0], cbar=False, title = 'A [ER-NP(6, 0.3) Simulation]') heatmap(B, ax=axs[1], cbar=False, title = 'B [Unshuffled]') heatmap((A-B), ax=axs[2], cbar=False, title = 'A-B [Unshuffled]')

      add the scales?

    2. 0→00→00 \rightarrow 0 1→11→11 \rightarrow 1 2→32→32 \rightarrow 3 3→2

      there are two node correspondences, make this simpler perhaps or more straightforward. maybe talk about a mapping to permute F?

    3. o about solving this(more specific) mathematically? First, we need a metric that tells us how similar two networks are to each other. For graph matching, this similarity metric is defined as 𝑓(𝐴,𝐵)=||𝐴−𝐵||2𝐹f(A,B)=||A−B||F2f(A, B) = ||A - B||_F^2 for unweighted adjacency matrices 𝐴,𝐵∈ℝ𝑛×𝑛A,B∈Rn×nA, B \in \mathbb{R}^{n \times n}. In other words, 𝑓(𝐴,𝐵)f(A,B)f(A, B) is the sum of the squared elementwise differences between 𝐴AA and 𝐵BB. To understand this functionally, consider the best possible case where 𝐴=𝐵A=BA=B, that is, the networks are identical. The difference will be a matrix of all zeros, and taking the squared norm will then yield 𝑓(𝐴,𝐵)=0f(A,B)=0f(A,B) = 0. If we remove one edge from 𝐴AA, then 𝑓(𝐴,𝐵)=1f(A,B)=1f(A,B) = 1. If we consider the worst possible case (every edge in 𝐴AA does not exist in 𝐵BB, and vice versa), then 𝑓(𝐴,𝐵)=𝑛2f(A,B)=n2f(A,B) = n^2. This metric effectively counts the number of adjacnecy disagreements between 𝐴AA and 𝐵BB. Thus, we want to find the mapping where

      what is the equation? Show it. Give an actual example

      show A, B, |A - B|

      show when they are same, 1 edge different, and all edges different

      indicate frob norm as title

    1. Here, we will use a matrix 𝑉VV, which is largely similar in intuition to the latent position matrix 𝑋XX. The matrix 𝑉VV will have 𝑛nn rows (the number of nodes in each of the 𝑁NN networks) and 𝑑dd columns. Like for the RDPG, 𝑑dd will again refer to the latent dimensionality of the collection of random networks. The matrix 𝑉VV has the special property that the 𝑑dd columns of 𝑉VV are orthonormal. The matrix 𝑉VV looks like this: 𝑉=⎡⎣⎢⎢⊤𝑣⃗ 1⊥⊤𝑣⃗ 2⊥...⊤𝑣⃗ 𝑑⊥⎤⎦⎥⎥V=[⊤⊤⊤v→1v→2...v→d⊥⊥⊥]\begin{align*} V &= \begin{bmatrix} \top & \top & & \top \\ \vec v_1 & \vec v_2 & ... & \vec v_d \\ \perp & \perp & & \perp \end{bmatrix} \end{align*} That 𝑉VV is orthonormal means that, for any column 𝑘kk of the 𝑑dd total columns, that 𝑣⊤𝑘𝑣𝑘=1vk⊤vk=1v_k ^\top v_k = 1, and for any pair of columns 𝑘kk and 𝑙ll where 𝑘≠𝑙k≠lk \neq l, then 𝑣⊤𝑘𝑣𝑙=0vk⊤vl=0v_k^\top v_l = 0. For the COSIE model, we also have a collection of score matrices, {𝑅(1),...,𝑅(𝑁)}{R(1),...,R(N)}\left\{R^{(1)}, ..., R^{(N)}\right\}. Each score matrix 𝑅(𝑚)R(m)R^{(m)} is a matrix with 𝑑dd columns and 𝑑dd rows which is symmetric. That 𝑅(𝑚)R(m)R^{(m)} is symmetric means that for any two latent dimensions 𝑘kk and 𝑙ll, that 𝑅(𝑚)𝑘𝑙=𝑅(𝑚)𝑙𝑘Rkl(m)=Rlk(m)R^{(m)}_{kl} = R^{(m)}_{lk}.

      think with alex about what to talk about here in the context of a shared low-rank subspace V.

    1. Now, we’d like to order the rest of the vertices in this network by their degree of similarity to the seed nodes. Remember that we talked about two ways of doing this: we could find a separate set of nominations for each seed node, or we could find a single set of nominations for all of the seed nodes. Let’s start by finding a single set, using the centroid.

      lacking a sense of wheher this actually picks up a useful signal in the context of the model? Consider putting exmaple as though you have true labels, and you want to study how well it did at picking up the actual trackers? I don't know

    2. ----------------- ImportError Traceback (most recent call last) /tmp/ipykernel_1759/298421806.py in <module> ----> 1 from graphbook_code import networkplot 2 3 plot = networkplot(A, x=X[:, 0], y=X[:, 1], node_hue=nominations.flatten(), 4 palette="Purples", edge_alpha=0.05) 5 ImportError: cannot import name 'networkplot' from 'graphbook_code' (/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/graphbook_code/__init__.py)

      gg code failure

    3. like the Mahalanobis distance, which is essentially Euclidean distance but with a coordinate system and a rescaling defined by the covariance in your data.

      add something about mahalonobis -- doesn't really come up elsewhere in the book?

  8. Oct 2021
    1. Remember that the latent position matrix 𝑋XX has 𝑛nn rows, with each row corresponding to a node in your network, and the dot product between rows 𝑖ii and 𝑗jj of 𝑋XX is the probability that node 𝑖ii will connect with node 𝑗jj. (Here, remember that 𝑋XX is from the adjacency matrix, not the Laplacian). The block probability matrix 𝑃PP has the edge probability between nodes 𝑖ii and 𝑗jj in the (𝑖,𝑗)𝑡ℎ(i,j)th(i, j)_{th} position. How would you generate 𝑃PP from 𝑋XX?

      this is not how you generate P if you believe that the network is SBM. This is how you generate P if you believe the network is RDPG. If you think you have an SBM, you use RDPG only to estimate the community assignments, and then use regular SBM estimation with the assigned communities.

    2. 6.2.3.1. The Latent Position Matrix Tells You About Edge Probabilities¶ The latent position matrix 𝑋XX that we can find from spectral embedding is related to these edge probabilities: If you take the dot product (or the weighted sum) of row 𝑖ii with row 𝑗jj, you’ll get an estimate of the probability that nodes 𝑖ii and 𝑗jj have an edge between them. Incidentally, this means that the dot product between any two rows of the latent position matrix has to be bound between 0 and 1.

      this is redundant; I would circle back to an explanation in the context of coin flip probabilities and back-link to the section on RDPGs

    1. We know now that we can use the latent position matrix 𝑋XX to get a set of estimated edge probabilities for every node in a network. Our goal is to go the other direction - to start with “estimated edge probabilities”, and end with a latent position.

      between the out of sample node and every other node in the network?

  9. Sep 2021
    1. well, almost any - you wouldn’t want to make an embedding with more dimensions than your network!

      you can't "embed" into more dimensions than you have in any reasonable fashion? not relevant

    2. The latent position matrix 𝑋XX that we’ve found from spectral embedding is related to these edge probabilities: If you take the dot product (or the weighted sum) of row 𝑖ii with row 𝑗jj, you’ll get an estimate of the probability that nodes 𝑖ii and 𝑗jj have an edge between them.

      say more? this is a huge concept. also, this has nothing to do with laplacian spectral embedding, so harmonize the two better.

    1. The main steps of a gradient descent method are choosing a suitable initial position (can be chosen randomly), then gradually improving the cost function one step at a time, until the function is changing by a very small amount, converging to a minimum.

      I would also explain that you haven't found the global minimum somewhere

    2. Due to the one-to-one nature of these matchings, they are also known as bijectionsbijections\textit{bijections}

      I would explain this term a lot more in depth since it's extrmeely crucial and comes up literally nowhere else in the book

    1. Finally, let’s think about how to write down the generative model for the a priori DCSBM. We say that 𝜏𝑖=𝑘′τi=k′\tau_i = k' and 𝜏𝑗=𝑘τj=k\tau_j = k, 𝐚𝑖𝑗aij\mathbf a_{ij} is sampled independently from a 𝐵𝑒𝑟𝑛(𝜃𝑖𝜃𝑗𝑏𝑘′𝑘)Bern(θiθjbk′k)Bern(\theta_i \theta_j b_{k'k}) distribution for all 𝑗>𝑖j>ij > i. As we can see, 𝜃𝑖θi\theta_i in a sense is “correcting” the probabilities of each adjacency to node 𝑖ii to be higher, or lower, depending on the value of 𝜃𝑖θi\theta_i that that which is given by the block probabilities 𝑏ℓ𝑘bℓkb_{\ell k}. If 𝐀A\mathbf A is an a priori DCSBM network with parameters and 𝐵BB, we write that 𝐀∼𝐷𝐶𝑆𝐵𝑀𝑛,𝜏⃗ (𝜃⃗ ,𝐵)A∼DCSBMn,τ→(θ→,B)\mathbf A \sim DCSBM_{n,\vec\tau}(\vec \theta, B).

      add a code example

  10. Aug 2021
    1. 𝙰𝚍𝚓𝚊𝚌𝚎𝚗𝚌𝚢𝚂𝚙𝚎𝚌𝚝𝚛𝚊𝚕𝙴𝚖𝚋𝚎𝚍𝚍𝚒𝚗𝚐(𝐴,𝑑)X^=AdjacencySpectralEmbedding(A,d)\hat X = \texttt{AdjacencySpectralEmbedding}(A, d)

      delet

    1. This helps gives an intuition for why our embedding matrix gives a representation of our network. You can take columns of it, turn those columns into matrices, and sum those matrices, and get the best-possible estimation for the Laplacian for the network. That means the columns of our embedding network contain all of the information necessary to estimate the network!

      my main concern overall is there is just a lot of ambiguity over what's an estimate of what, and terminology wise and such. I would be really careful when using words "approximation", "estimate", and the names you give things and try to match them. I think approximation works for the network/laplacian itself, "rank k estimate" works for the probability true/expected laplacian, etc. approximations of realizations; estimates of parameters (properties of the actual random quantity).

    2. With Spectral Embedding specifically, you take a network’s adjacency matrix, or optionally, its Laplacian matrix. Then, you find the eigenvectors corresponding to the 𝑑dd largest eigenvalues, depending on how many dimensions (𝑑dd) you’d like to embed your network down to. You scale those eigenvectors by their eigenvalues (or sometimes the square root of their eigenvalues). You’ll then have a rectangular matrix, where the columns are the 𝑑dd largest eigenvectors. The rows of that matrix will be the embedding for each node.

      I would mention that you are going to describe how this occurs, specifically, before just running off the algorithm

    1. Most machine learning methods require our data to be organized like this.

      I would make this sentence say less requires, and more that "if we want to leverage stuff we already know how to use, we can do X"

  11. Jul 2021
    1. Proceeding using the result we derived above, and using the fact that 𝑑𝑑𝑢log(𝑢)=1𝑢ddulog⁡(u)=1u\frac{d}{du} \log(u) = \frac{1}{u} and that 𝑑𝑑𝑢log(1−𝑢)=−11−𝑢ddulog⁡(1−u)=−11−u\frac{d}{du} \log(1 - u) = -\frac{1}{1 - u}:

      break up more

  12. Jun 2021
    1. understatement. To actually figure out the number, think about the first node: there are two possibilities (weighted or not weighted), so you can generate two networks from a one-node model. Now, let’s add an additional node. For each of the first two possibilities, there are two more – so there are 2×2=42×2=42 \times 2 = 4 total possible networks. Every node that we add doubles the number of networks - and since a network with 𝑛nn nodes has 𝑛×𝑛n×nn \times n edges, the total number of possible networks ends up being 2𝑛×𝑛=2𝑛22n×n=2n22^{n \times n} = 2^{n^2}! So this ten-node model can generate 2102=21002102=21002^{10^2} = 2^{100} networks, which is, when you think carefully, an absurdly, ridiculously big number.

      be specific about directed + loops

  13. Apr 2021
    1. This time, our samples 𝑥𝑖xix_i take a different value depending on whether 𝑖ii is male or female. This example is called the Gaussian Mixture (GM), and we have shown an example which is an equal-weight mixture of two Gaussians (one with mean 000, and the other with mean 555, both with variance 111). Under the GM, we allow samples to be taken from one of 𝐾KK Gaussians with a weight parameter 𝜋𝑘πk\pi_k, for each 𝑘kk from 111 to 𝐾KK.

      "we have two communities instead of 1" change text to discussing communities and not mixture distributions

    1. We’re cheating a bit here because we know in advance which nodes belong to which community.

      I wouldn't state "we're cheating a bit", I would say, "we didn't use this information, but it is just there to show you how useful/effective this unsupervised technique is

  14. Mar 2021
    1. A common question as a scientist that we might have is how, exactly, we could describe the network in the simplest way possible.

      "We have some question, and data (network). Are people from the same school more connected than people from different schools. The data is X. That network is noisy, and we need quantitative procedures that account for the possibility of network uncertainty." Say specifically what the nodes and edges are. Be specific about what is uncertain. People are friends, but didn't add each other. People are not friends, but didn't remove each other (fight). In order to answer our question, we need to account for this uncertainty. Why not just use a t-test? It's a network, the assumptions for traditional statistical modeling don't make sense. It's one network, not n points in d-dimensions. Read ssg and statistical connectomics paper. Model ENTIRE network, not point-by-point. A realization is an entire network, an entirely new dataset. Make this point better in the example.

    1. We say that our network is a realization of a network model when we find a model that we think might reasonably describe the network generation process

      use the term model correctly; model => RV

    1. Suppose that 𝑋XX only contains 0’s and 1’s. To interpret 𝑋𝑋𝑇XXTXX^T, remember from linear algebra that we’re taking the weighted sum - or, in math parlance, the dot product - of each row with each other row, as shown below (where each of these vectors represent rows of 𝑋XX): (2)¶⎡⎣⎢⎢111⎤⎦⎥⎥⋅⎡⎣⎢⎢011⎤⎦⎥⎥=1×0+1×1+1×1=2[111]⋅[011]=1×0+1×1+1×1=2\begin{align} \begin{bmatrix} 1 \\ 1 \\ 1 \\ \end{bmatrix} \cdot \begin{bmatrix} 0 \\ 1 \\ 1 \\ \end{bmatrix} = 1\times 0 + 1\times 1 + 1\times 1 = 2 \end{align} If there are two overlapping 1’s in the same position of the left vector 𝑖ii and the right vector 𝑗jj, then there will be an additional 1 added to their weighted sum. The resulting value, 𝑋𝑋𝑇𝑖,𝑗XXi,jTXX^T_{i, j}, will be equal to the number of shared locations in which vectors 𝑖ii and 𝑗jj both have ones.

      I like this information, but I think it belongs in a supplementary chapter or much, much earlier in the book than right here because XX^\transpose comes up a LOT win RDPGs in general, and not just CASE. I think this would go in a "preliminaries" type of section alongside other basic linear algebra and probability/statiistics results.