Teach/Me Data Analysis

You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.

Table of Contents Bivariate Data Correlation Ordinal Association	Index
See also: Pearson's Correlation Coefficient

Ordinal Association

In statistics, rank correlation is the study of relationships between different rankings on the same set of items. It deals with the measurement of correspondence between two rankings, and the calculation of the significance of the correspondence.

Kruskal's gamma

Gamma, also called Goodman and Kruskal's gamma, is a symmetric measure which varies from +1 to -1, based on the difference between concordant pairs (P) and discordant pairs (Q). The concept of pairs is discussed separately in the section on association. That is, gamma is computed as (P - Q)/(P + Q).

Gamma is the surplus of concordant pairs over discordant pairs, as a percentage of all pairs ignoring ties. This can be given a PRE (proportionate reduction in error) interpretation. If we ignore tied pairs and are guessing the ranking of two pairs based on knowledge of the independent (column) variable x, then if we are presented with the x values for two randomly selected pairs, we will predict that if the second x is more than the first, then the rank of the second y value will be greater than the rank of the first y value. If gamma is .636, we may say that knowing the the independent variable reduces our errors in predicting the rank (not value) of the dependent variable by 63.6%.

Gamma defines perfect association as weak monotonicity (see discussion in the section on association). Under statistical independence, gamma will be 0, but it can be 0 at other times as well (whenever concordant minus discordant pairs are 0).

Kendall's tau-b

Kendall's tau-b is a measure of association often used with but not limited to 2-by-2 tables. It is computed as the excess of concordant over discordant pairs (P - Q), divided by a term representing the geometric mean between the number of pairs not tied on x (X0) and the number not tied on y (Y0):

tau-b = (P - Q)/ SQRT[((P + Q + Y0)(P + Q + X0))]

There is no well-defined intuitive meaning for tau-b, which is the surplus of concordant over discordant pairs as a percentage of concordant, discordant, and approximately one-half of tied pairs. The rationale for this is that if the direction of causation is unknown, then the surplus of concordant over discordant pairs should be compared with the total of all relevant pairs, where those relevant are the concordant pairs, the discordant pairs, plus either the X-ties or Y-ties but not both, and since direction is not known, the geometric mean is used as an estimate of relevant tied pairs.

Tau-b defines perfect association as strict monotonicity, as discussed in the section on association. Although it requires strict monotonicity to reach 1.0, it does not penalize ties as much as some other measures. It defines null relationship as statistical independence.

Kendall's tau-c

Kendall's tau-c, also called Stuart's tau-c or Kendall-Stuart tau-c, is a variant of tau-b for larger tables. It equals the excess of concordant over discordant pairs times another term representing an adjustment for the size of the table.

tau-c = (P - Q)*[2m/(n2(m-1))]

where m is the number of rows or columns, whichever is smaller, and n is sample size. Correlation coefficients Suppose we rank a group of eight people by height and by weight:

Person A B C D E F G H Rank by Height 1 2 3 4 5 6 7 8 Rank by Weight 3 4 1 2 5 7 8 6

We can see that there is some correlation between the two rankings but that the correlation is far from perfect, and we would like some way of objectively measuring the degree of correspondence. In the 1940s Maurice Kendall developed a coefficient, t, for this purpose that has the following properties:

If the agreement between the two rankings is perfect, ie. the two rankings are the same, the value of the coefficient is equal to 1.
If the disagreement beween the two rankings is perfect, ie. one ranking is the reverse of the other, the coefficient is equal to -1.
For all other arrangements the value lies between -1 and 1, and increasing values imply increasing agreement between the rankings.
A value of 0 implies that the two rankings are independent.

It is defined by

where n is the number of items, and P is a quantity derived from the rankings as follows: In the Weight ranking above, the first entry, 3, has five higher ranks to the right of it; the contribution to P of this entry is 5. Moving to the second entry, 4, we see that there are four higher ranks to the right of it and the contribution to P is 4. Continuing this way, we find that P = 5 + 4 + 5 + 4 + 3 + 1 + 0 + 0 = 22. Thus . This result indicates that there is strong agreement between the rankings, as expected.

Last Update: 2006-Jän-18