Another potentially interesting feature of social networks and other kinds of networks is called clustering. There are various definitions of it, but the basic idea is, are my friends, friends? One way of measuring this is called the local clustering coefficient. Here's the formula. So in this formula, i is a node where you're computing the coefficient T is the number of triangles. Having that node as a vertex. D sub I is the degree. Why triangles? Well, let's look at this note here. Here are its neighbors. Got degree for. Now, each time a pair of neighbors are also friends like this. You see we get a triangle. You get one there and one there. But these two nodes aren't friends and so I don't get the triangle that I would otherwise. So triangles are just showing us the number of connections between friends, not including connections to me. Denominator then is the number of potential connections, because I have d sub i friends. And that means there are d sub I choose two possible edges between pairs of friends. So the clustering coefficient goes between 0 and n1. Returning to this one here, I've just got those two triangles. So the clustering would be 2 over 4, choose 2, which is 2 over 6 or 1. Third, Let's have a look at clustering and graphs. Here's a simple, we'll graph to get us started. Now as a manual computation. If I wanted to know what the clustering is at the hub here and node 0. Then I have to count the edges among all of the neighbors of node 0, but not including the edges to 0 itself. One way to do that automatically is to extract a subgraph involving those neighboring nodes. So remember W0 is a list of the neighbors of node 0. And then I can create a subgraph out of the nodes in that list. So that is the same as the original without central node. And then I can just ask for number of edges of that subgraph. And it'll tell me that there are six edges. Whereas that's also the same as the number of triangles that have no 0 is a vertex. Or if you don't have to do all that manually every time there is a function clustering. So if you give clustering a node, it will, it will return a local clustering coefficient at that node. Or if you just call clustering on the graph without giving a node, then it'll give you a dictionary like object that has the values for all of the notes. Finally, there's an average clustering function. So you can appear, you can get the list of all the clustering coefficients and then take the average. Or you can just have this function do it for you if all your interested in is the average. Clustering in Erdos-Renyi graphs turns out to be pretty boring. If you think about the construction process, this is where every edge is included with equal probability. There's not going to be any tendency for your friends to build friendships with one another. So here I'm going to do an experiment for n and p fixed. And I'll do 400 different graphs. I realize one of those graphs. Then just compute the average clustering coefficient for that graph, keeping track and this list called results. So we get something which is approximately binomial are normally distributed, can't be normal because can't go past 0 or one. But it's pretty close to that. And you can see that the average value in the middle here is around 0.05, which is the same as the value of p. That's the probability of including each edge. Really that's a theorem. Since every edge is included with probability p. And the long run, the number, the fraction of edges in your neighbor subgraph is also expected to be p. And so p is the expected average clustering and ER graphs. Now that's look at the clustering coefficients in the real-world twitch network. There. I'm just loading that network again. And if I just ask for now the distribution of all the clustering coefficients and see that a lot or 0 or very small, you can't quite tell wedge, but they do turn out to be 0, many of them. And average value is about 0.131. Now if you want to think about what's an equivalent Erdos-Renyi, an Erdos-Renyi graph. Everything. If you want to have the same number of nodes and the same average degree, then that tells you what the probability p for that graph is. And p is the same thing as the average expected clustering coefficient. So in this case you see that the clustering values and the equivalent ER graph, or a 100 times smaller than what we're actually seeing and the twitch graph. So there's no reason to think that the ER model at a random network, as much to do with the way the twitch network has formed. The Watts-Strogatz Model was specifically introduced. Try to model the small-world phenomenon. It's also a random graph construction, but with a more complicated set of rules. So a Watts Strogatz graph has three parameters. One of them is, and that's the number of nodes that we're usually using in this notation. K is an even integer. And this is going to turn out to be the average degree. And q is a probability, so it should be between 01. We start out with n nodes arranged in a circle. And then let's say we start with the one at the top here. We're going to connect it to k over two neighbors on each side of it. So always the closest neighbors. So if, for example, k is equal to four, I'm going to connect to the two neighbors closest on the right or in the clockwise direction. And the two closest on the left. And then we do that all the way around. So this one here would connect to two to the right and two to the left. One of the connections is already been made. And so on going all the way around. Okay, so once that's completed, that's the end of the first phase of the construction. And now what you do is you visit every edge in turn, and you decide with probability q that you're going to rewire that edge. So rewiring in this case means I replace this edge. Let's say I'm going to replace that edge there. And I replace it with one that links anywhere else at random that I'm not already linked to. And you do this all the way round for every existing edge. The idea, right, is that for the most part, you're connected to a local community. But sometimes you have a connection to somebody in a very different part of the graph. And it's this fraction of distant links, which tends to reduce distances between pairs of points in the graph. So here's a look at the Watts-Strogatz construction. So if I set the Q probability to 0, then this is just the sort of initial configuration where everybody's length, in this case, everybody is linked to their six closest neighbors. Then if I increase the probability of the rewiring, right, a few of those edges are flipped to cross cutting connections. And if I keep increasing that probability, then more and more of them do get flipped. Eventually, as this probability of rewiring goes close to 1, I'm getting, get back to what's essentially an Erdos-Renyi graph again. All right, let's take a look at a small, small Watts-Strogatz graphs with six connections, the average degree six. And now I'm going to let this q probability, the rewiring probability vary from almost 0 to almost one. For each value of Q, I'm going to do 50 experiments. I'm going to realize 50 different graphs. And I'm going to keep track of the average clustering value or the average clustering coefficient for each of those 50 random graphs for each value of Q. Then I put all that result into a DataFrame and plot the distribution. Remember for each value of Q, I've got 50 realizations. So I have a little bit of an error ribbon around here. But for the most part, the, the values lie in our very small range. When Q is small, we have that tightly knit community that we're starting with. So naturally the clustering is pretty high for that because most of your neighbors are neighbors of each other as well. As Q increases as we randomly rewire more and more of the edges, then the clustering number drops. So now let's repeat the same experiment for a graph the same size as the switch network. Since the average degree in twitches about 9.9, it's pretty convenient to round that up to 10. Will use 10 as the value in our Watts Strogatz graph. I'm going to choose a smaller value of range of cues here, because this takes a few seconds. And I'm only gonna look at one graph for each value of q just to get an idea of what's going on, looking at the average clustering. So again, we're seeing that they drop. The numbers are different than, than before because n and k are different, but the trend is the same, the values drop as Q increases. Now up above, we found that the clustering of the actual twitch network is about 0.131. So it's somewhere in-between these two values of q. Let's say it's about 0.42. So I'm going to do an experiment with Q fixed now at that value. And I'm just going to take 10 different graphs and see if I do get an average clustering. That's about the same as the twitch network. And I do, it's very close. It's two significant digits, the same.
Clustering coefficients
From Tobin Driscoll April 28, 2022
3 plays
3
0 comments
0
You unliked the media.