Social Network Analysis #2


A brief introduction [1]

Some writers use the concept of ‘networks’ to refer to a form of organisation that is distinct from bureaucratic hierarchies. Social network analysis (SNA) is focused on uncovering the patterning of people’s interaction. Network analysis is based on the intuitive notion that these patterns are important features of the lives of the individuals who display them. Network analysts believe that how an individual lives depends in large part on how that individual is tied into the larger web of social connections. Many believe, moreover, that the success or failure of societies and organizations often depends on the patterning of their internal structure.

Beginning in the 1930s, a systematic approach to theory and research began to emerge influenced by Wolfgang Kohler’s ‘gestalt’ theory.  This led to Jacob Moreno’s work on the ideas and tools of sociometry and the sociogram[2]; and studies on ‘group dynamics’ and information flows through groups.  Other influences were British Social Anthropologist, Radcliffe-Brown who emphasised the importance of informal, interpersonal relations in social systems. And at the end of World War II, Alex Bavelas founded the Group Networks Laboratory at M.I.T.

From the outset, the network approach to the study of behaviour has involved two commitments:

  • it is guided by formal theory organized in mathematical terms, and
  • it is grounded in the systematic analysis of empirical data.

SNA is widely known for its visualisation techniques: the diagrams are technically known as ‘graphs’.  Graph theory experienced rapid development when relatively powerful computers became readily available, then the study of social networks began to take off as an interdisciplinary specialty. This has been further enhanced since computers have become inter-linked, so today we can see SNA used to study:

  • organizational behaviour,
  • inter-organizational relations,
  • the spread of contagious diseases,
  • mental health,
  • social support,
  • the diffusion of information
  • animal social organization.

SNA analysts do use a specialized language for describing the structure and contents of the sets of observations that they use. But, network data can also be described and understood using the ideas and concepts of more familiar methods, like cross-sectional survey research.  Perhaps some terms are appropriate here.

Networks consist of Nodes or ‘Vertices’ (a meeting point of two lines that form an angle), and these may be individuals, social roles, organisations, anything that can enter into the type of relations being studies.  The Relations (or ‘ties’ or ‘connections’) may also be defined in different ways: exchanges of resources, friendships, sexual contact, kinship, co-operation, although it is always necessary to specify precisely how relations are being defined in the context of an analysis.  Relations may be directed or undirected (reciprocated or unreciprocated) and may be weighted as in surveys.  Connecting lines are also called ‘edges’ (when undirected) and ‘arc’ but there is a great deal of nomenclature which we will pass briefly over.[3]

The fundamental data structure is one that leads us to compare how actors are similar or dissimilar to each other across attributes (by comparing rows). Or, perhaps more commonly, we examine how variables are similar or dissimilar to each other in their distributions across actors (by comparing or correlating columns).

The rows of the array are the cases, or subjects, or observations. The columns of the array are — and note the key difference from conventional data — the same set of cases, subjects, or observations. In each cell of the array describes a relationship between the actors.

“Network” data (in their purest form) consist of a square array of measurements: a Matrix…nothing to do with the film!  Matrix in the sense of a square set of numbers or whatever, like this diagram on ‘Who likes whom’: for some reason everyone who makes these diagrams uses the names Bob, Carol, Ted and Alice from the cheesy 60s film (and how useless is Alice’s responses—she even likes herself).

Who likes whom?
Chooser Bob Carol Ted Alice
Bob 0 X X
Carol X 0 X
Ted 0 X X
Alice X 0 0 X

The first major emphasis of network analysis is seeing how actors are located or “embedded” in the overall network.  A network analyst is also likely to look at the data structure in a second way — holistically.

The analyst might note that there are about equal numbers of ones and zeros in the matrix. This suggests that there is a moderate “Density” of liking overall. The analyst might also compare the cells above and below the diagonal to see if there is reciprocity in choices (e.g. Bob chose Ted, did Ted choose Bob?).

The second major emphasis of network analysis is seeing how the whole pattern of individual choices gives rise to more holistic patterns.

Network analysts look at the data in some rather fundamentally different ways. Rather than thinking about how an actor’s ties with other actors describes the attributes of “ego,” network analysts instead see a structure of connections, within which the actor is embedded. Actors are described by their Relations, not by their Attributes. And, the relations themselves are just as fundamental as the actors that they connect.

The major difference between conventional and network data is that conventional data focuses on Actors and Attributes; network data focus on Actors and Relations. The difference in emphasis is consequential for the choices that a researcher must make in deciding on research design, in conducting sampling, developing measurement, and handling the resulting data. I

So to recap a little: the two parts of network data: nodes (or actors) and edges (or relations). The big difference between SNA and more conventional surveys is that while they are Monadic, SNA is Dyadic.  Surveys ask an individual (a Monad) what their opinions are (it deals with attributes) whereas SNA is Dyadic (more Greek) and is relational.

The nodes or actors included in non-network studies tend to be the result of independent probability sampling. Network studies are much more likely to include all of the actors who occur within some (usually naturally occurring) boundary. Often network studies don’t use “samples” at all, at least in the conventional sense. Rather, they tend to include all of the actors in some population or populations. Of course, the populations included in a network study may be a sample of some larger set of populations. For example, when we study patterns of interaction among students in a classrooms, we include all of the children in a classroom (that is, we study the whole population of the classroom). The classroom itself, though, might have been selected by probability methods from a population of classrooms (say all of those in a school).

Most commonly, network analysts will identify some population and conduct a census (i.e. include all elements of the population as units of observation). A network analyst might examine all of the nouns and objects occurring in a text, all of the persons at a birthday party, all members of a kinship group, of an organization, neighborhood, or social class.

The boundaries of the populations studied by network analysts are of two main types. Probably most commonly, the boundaries are those imposed or created by the actors themselves. All the members of a classroom, organization, club, neighborhood, or community can constitute a population. These are naturally occurring clusters, or networks. So, in a sense, social network studies often draw the boundaries around a population that is known, a priori, to be a network. Alternatively, a network analyst might take a more “demographic” or “ecological” approach to defining population boundaries. We might draw observations by contacting all of the people who are found in a bounded spatial area, or who meet some criterion. Here, we might have reason to suspect that networks exist, but the entity being studied is an abstract aggregation imposed by the investigator — rather than a pattern of institutionalized social action that has been identified and labeled by its participants.

The network analyst tends to see individual people nested within networks of face-to-face relations with other persons. Most social network analysts think of individual persons as being embedded in networks that are embedded in networks that are embedded in networks. Such structures as “multi-modal.” In our school example, individual students and teachers form one mode, classrooms a second, schools a third, and so on. A data set that contains information about two types of social entities (say persons and organizations) is a two mode network.

One advantage of network thinking and method is that it naturally predisposes the analyst to focus on multiple levels of analysis simultaneously. That is, the network analyst is always interested in how the individual is embedded within a structure and how the structure emerges from the micro-relations between individual parts. The ability of network methods to map such multi-modal relations is, at least potentially, a step forward in rigor.

Visualisation

Visualisation alone can form the basis for an analysis of a network.  A ‘Dendrogram’ is a tree diagram (Dendron is the Greek word for tree) frequently used to illustrate the arrangement of the clusters produced by a clustering algorithm. Wikipedia is very good on all this and is a good place to start.

For a clustering example, suppose this data is to be clustered using distance as the metric.

The dendrogram would look like this:


Here the top row of ‘nodes’ represent data, and the remaining nodes represent the clusters to which the data belong, and the arrows represent the distance.

OK time to drag out the manual for UCINET 5 for Windows Software for Social Network Analysis. Well they say you can input data in UCINET using a Word File.  All you need to do is save it as MS-DOS.

 

All it comprises of is a set of numbers (the data) preceded by a series of keywords that describe the data:

Dl  n = 4format = fullmatrixData:

0110

1011

1100

0100

 

  • You have to put DL at the top (that tells the system it’s a data line (DL)).
  • The phrase n = 4 tells it that it is a matrix with 4 rows and columns (you don’t even need the = sign).
  • The phrase ‘format = fullmatrix’ indicates that the data is entered as an ordinanary matrix and not something fancy.
  • The ‘data’ keyword indicates that there is no more information about the data and that what follows is the data itself .

You have to watch the order — if you had dl data: n+4 the program would blow up (no kidding that’s what it says in the manual).

So to use UCINET 5 you make a data matrix (which theoretically you can do in Word although it also accepts spreadsheets) and it’s put into a file then you open up the software.  UCINET enables you to select different ways visualize (and interpret) that data in the different ways the software can render it, just like in SPSS: all you do is select the dataset and click on a button and out it comes, it’s that simple.  You can ask it to find ‘cliques’, ‘hangers on’ the terminology is quite familiar…to any university student.

Additional Information

  • There’s a general introduction here:

http://www.analytictech.com/networks/intro/index.html

  • You can download the software here:

http://www.analytictech.com/ucinet/ucinet.htm

  • There’s a tutorial here (along with the handbook):

http://faculty.ucr.edu/~hanneman/nettext/

UCINET (from Scott, J. (2006) Social Network Analysis, Sage)

UCINET was produced by a group of network analysts at the University of California, Irvine (UCI). The current development team is Stephen Borgatti, Martin Everett and Linton Freeman. It began as a set of modules written in BASIC, progressed to an integrated DOS program, and is now available as a Windows program. It is a general purpose, easy to use, program that covers the basic graph theoretical concepts, positional analysis and multidimensional scaling. It is, in my opinion, the best of the currently available programs and the one that is most accessible for the novice user. The program will run on virtually any modem PC, trading off speed against the ability to handle large data sets. It can handle up to 500 points for basic clique procedures, though procedures such as multidimensional scaling can be run only on smaller networks.

The data files are in matrix format and consist of simple alphanumeric files. The rows in a data file represent the rows in an incidence or adjacency matrix, but a header row contains details on the number of rows and columns and the labels to be used for them. The program contains in-built procedures for converting earlier UCINET data files, and it will also convert STRUCTURE and NEGOPY files into UCINET format. In addition to exporting in various formats, a number of conversion utilities are provided to allow UCINET to feed, almost seamlessly, into other social network analysis programs.

As well as a series of commands for file management and setting program options, the menu bar has four principal options: DATA, TRANSFORM, NETWORK and TOOLS. The DATA and TRANSFORM options together allow most of the basic data management tasks to be carried out: inputting, transforming and exporting are alI handled in this way.

The easiest way to produce data files is by using the intuitive and built-in, spreadsheet-style data entry system, which is accessible from the DATA menu or from a button on the tool bar. This uses a linked list format that shows, for each point, the code numbers of all the other points to which it is connected. As well as entering and editing through the UCINET spreadsheet, it is possible to import (and export) data from EXCEL worksheets. The data file can be edited after the initial data entry, and various permutations and transformations can be performed on it so as to identify subsets for further analysis. For example, the rows and the columns can be permutated, sorted, or transposed, or the weightings of lines can be altered. This latter procedure – termed ‘dichotomizing’ the matrix – makes it easy to prepare a series of data files for use in the analysis of, for example, nested components.

The principal social network analysis procedures are found under the NETWORK menu, where there are sub-menus for COHESION, COMPONENTS, CENTRALITY, SUBGROUPS, ROLES & POSITIONS, and various more specialized procedures. COHESION gives access to basic line calculations of paths, distances and geodesics, and a separate PROPERTIES menu allows the calculation of density. CENTRALITY is the venue for all the various measures of degree, closeness, betweenness and other approaches to centrality and prominence. The SUBGROUPS menu gives access to a number of powerful techniques for the detection of n-cliques, n-clans and k-plexes, while the COMPONENTS option detects simple components, cyclic components and k-cores. Complementing these graph theoretical measures are the measures for structural equivalence that are found under ROLES & POSITIONS. Here it is possible to run both CONCOR and REGE, as well as other algorithms for positional analysis. Finally, the TOOLS option is used for metric and non-metric multidimensional scaling, cluster analyses, factor analysis and correspondence analysis. The output from these procedures can be plotted on screen as scatter diagrams or dendrograms. These will be quite suitable for many purposes, though proper visual inspection of sociograms means transferring the output into a more specialist program.


[1] The clever stuff is taken from Robert A. Hanneman and Mark Riddle (2005) Introduction to social network methods < http://faculty.ucr.edu/~hanneman/nettext/ > John Scott’s (2006) Social Network Analysis, Nick Crossley’s (2007) Social networks and Extraparliamentary Politics with other sources cited at the end.

[2] A way of representing the formal properties of social configurations.

[3] Analysts have derived different ways of measuring the properties of networks.  The ‘Density’ of the network, for example, is calculated by dividing the actual number of connections within it by the maximum number of possible connections (giving a number between 0 and 1).  UCINET measures this for you (Network > Cohesion > Density). ‘Cliques’ are a cluster of individuals who know each other and whose ‘density’ as a cluster, is therefore 1 UCINET also measures this (Network > Subgroups > Cliques) . ‘Isolates’ are those with no close ties to others in a group.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: