• incanter

1.2.3-SNAPSHOT

# correlation-ratio

## incanter.stats

• (correlation-ratio & xs)

http://en.wikipedia.org/wiki/Correlation_ratio

In statistics, the correlation ratio is a measure of the relationship between the statistical dispersion within individual categories and the dispersion across the whole population or sample. i.e. the weighted variance of the category means divided by the variance of all samples.

Example

Suppose there is a distribution of test scores in three topics (categories):

* Algebra: 45, 70, 29, 15 and 21 (5 scores)
* Geometry: 40, 20, 30 and 42 (4 scores)
* Statistics: 65, 95, 80, 70, 85 and 73 (6 scores).

Then the subject averages are 36, 33 and 78, with an overall average of 52.

The sums of squares of the differences from the subject averages are 1952 for Algebra, 308 for Geometry and 600 for Statistics, adding to 2860, while the overall sum of squares of the differences from the overall average is 9640. The difference between these of 6780 is also the weighted sum of the square of the differences between the subject averages and the overall average:

5(36 ? 52)2 + 4(33 ? 52)2 + 6(78 ? 52)2 = 6780

This gives

eta^2 =6780/9640=0.7033

suggesting that most of the overall dispersion is a result of differences between topics, rather than within topics. Taking the square root

eta = sqrt 6780/9640=0.8386

Observe that for ? = 1 the overall sample dispersion is purely due to dispersion among the categories and not at all due to dispersion within the individual categories. For a quick comprehension simply imagine all Algebra, Geometry, and Statistics scores being the same respectively, e.g. 5 times 36, 4 times 33, 6 times 78.

### Source incanter/stats.clj:2691 top

```(defn correlation-ratio
"
http://en.wikipedia.org/wiki/Correlation_ratio

In statistics, the correlation ratio is a measure of the relationship between the statistical dispersion within individual categories and the dispersion across the whole population or sample. i.e. the weighted variance of the category means divided by the variance of all samples.

Example

Suppose there is a distribution of test scores in three topics (categories):

* Algebra: 45, 70, 29, 15 and 21 (5 scores)
* Geometry: 40, 20, 30 and 42 (4 scores)
* Statistics: 65, 95, 80, 70, 85 and 73 (6 scores).

Then the subject averages are 36, 33 and 78, with an overall average of 52.

The sums of squares of the differences from the subject averages are 1952 for Algebra, 308 for Geometry and 600 for Statistics, adding to 2860, while the overall sum of squares of the differences from the overall average is 9640. The difference between these of 6780 is also the weighted sum of the square of the differences between the subject averages and the overall average:

5(36 ‚àí 52)2 + 4(33 ‚àí 52)2 + 6(78 ‚àí 52)2 = 6780

This gives

eta^2 =6780/9640=0.7033

suggesting that most of the overall dispersion is a result of differences between topics, rather than within topics. Taking the square root

eta = sqrt 6780/9640=0.8386

Observe that for Œ? = 1 the overall sample dispersion is purely due to dispersion among the categories and not at all due to dispersion within the individual categories. For a quick comprehension simply imagine all Algebra, Geometry, and Statistics scores being the same respectively, e.g. 5 times 36, 4 times 33, 6 times 78.
"
[& xs]
(let [sos (map sum-of-square-devs-from-mean xs)
all (apply concat xs)
overall-sos (sum-of-square-devs-from-mean all)]
(sqrt
(/ (- overall-sos (apply + sos))
overall-sos))))```
Vars in incanter.stats/correlation-ratio: + - / apply defn let
Used in 0 other vars