I've been hacking at social network analysis concepts using some sample data from livejournal.com. I will have another post with code snippets of this analysis, but here are some concepts I've learned so far during my analysis:
The data gathered from livejournal is done using a BFS like algorithm called Snowball Sampling.
Snowball Sampling: uses a small pool of initial informants to nominate, through their social networks, other participants who meet the eligibility criteria and could potentially contribute to a specific study.
Pseudocode:
# start with central node
# get friends of central node
# for every one of friends:
# sample friends of friends
# For every friend of friend
# sample friends of friends of friends
*Date is stored in Pajek format using simple text file. Python NetworkX library contains useful methods of analysis, including Pajek output support.
Though there are advantages & disadvantages to Snowball Sampling, humans have a limited sense of perception in social networks. This phenomenon is known as Horizon of Observability. In other words, we have a good idea of who are friends are, but less insight to friends-of-friends, and considerably or almost no knowledge of friends-of-friends-of-friends.
One of the first approaches for analyzing social networks is to measure power, influence or other characteristics of people based on connections.
Degree Centrality: # of connections that a node has.
Closeness Centrality: How close is this node to celebrity nodes (lower distance to high degree individuals).
Boundary Spanners: Nodes who act as bridges between 2 or more communities that wouldn't be able to communicate with one another.
Eigenvector Centrality: A node is central to extent that node is connected to others who are central. In other words, a node is high on Eigenvector centrality is connected to many other nodes who are connected to many others.
PageRank is similar to eigenvector centrality, but the algorithm scales much beteter to very large networks which change over time. PageRank is an iterative process, knows as anytime algorithm.
Centrality metrics are point-measures on the network. It does not tell us why a high centrality visionary is surrounded by followers, or what forces bring or tear people apart. Inorder to understand those concepts further, we need to consider deeper point measures such as cliques and clusters.
No comments:
Post a Comment