• incanter

1.2.3-SNAPSHOT

# chisq-test

## incanter.stats

• (chisq-test & options)

Performs chi-squared contingency table tests and goodness-of-fit tests.

If the optional argument :y is not provided then a goodness-of-fit test
is performed. In this case, the hypothesis tested is whether the
population probabilities equal those in :probs, or are all equal if
:probs is not given.

If :y is provided, it must be a sequence of integers that is the
same length as x. A contingency table is computed from x and :y.
Then, Pearson's chi-squared test of the null hypothesis that the joint
distribution of the cell counts in a 2-dimensional contingency
table is the product of the row and column marginals is performed.
By default the Yates' continuity correction for 2x2 contingency
tables is performed, this can be disabled by setting the :correct
option to false.

Options:
:x -- a sequence of numbers.
:y -- a sequence of numbers
:table -- a contigency table. If one dimensional, the test is a goodness-of-fit
:probs (when (nil? y) -- (repeat n-levels (/ n-levels)))
:freq (default nil) -- if given, these are rescaled to probabilities
:correct (default true) -- use Yates' correction for continuity for 2x2 contingency tables

Returns:
:X-sq -- the Pearson X-squared test statistics
:p-value -- the p-value for the test statistic
:df -- the degress of freedom

Examples:
(use '(incanter core stats))
(chisq-test :x [1 2 3 2 3 2 4 3 5]) ;; X-sq 2.6667
;; create a one-dimensional table of this data
(def table (matrix [1 3 3 1 1]))
(chisq-test :table table) ;; X-sq 2.6667
(chisq-test :table (trans table)) ;; throws exception

(chisq-test :x [1 0 0 0 1 1 1 0 0 1 0 0 1 1 1 1]) ;; 0.25

(use '(incanter core stats datasets))
(def math-prog (to-matrix (get-dataset :math-prog)))
(def x (sel math-prog :cols 1))
(def y (sel math-prog :cols 2))
(chisq-test :x x :y y) ;; X-sq = 1.24145, df=1, p-value = 0.26519
(chisq-test :x x :y y :correct false) ;; X-sq = 2.01094, df=1, p-value = 0.15617

(def table (matrix [[31 12] [9 8]]))
(chisq-test :table table) ;; X-sq = 1.24145, df=1, p-value = 0.26519
(chisq-test :table table :correct false) ;; X-sq = 2.01094, df=1, p-value = 0.15617
;; use the detabulate function to create data rows corresponding to the table
(def detab (detabulate :table table))
(chisq-test :x (sel detab :cols 0) :y (sel detab :cols 1))

;; look at the hair-eye-color data
;; turn the count data for males into a contigency table
(def male (matrix (sel (get-dataset :hair-eye-color) :cols 3 :rows (range 16)) 4))
(chisq-test :table male) ;; X-sq = 41.280, df = 9, p-value = 4.44E-6
;; turn the count data for females into a contigency table
(def female (matrix (sel (get-dataset :hair-eye-color) :cols 3 :rows (range 16 32)) 4))
(chisq-test :table female) ;; X-sq = 106.664, df = 9, p-value = 7.014E-19,

;; supply probabilities to goodness-of-fit test
(def table [89 37 30 28 2])
(def probs [0.40 0.20 0.20 0.19 0.01])
(chisq-test :table table :probs probs) ;; X-sq = 5.7947, df = 4, p-value = 0.215

;; use frequencies instead of probabilities
(def freq [40 20 20 15 5])
(chisq-test :table table :freq freq) ;; X-sq = 9.9901, df = 4, p-value = 0.04059

References:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
http://en.wikipedia.org/wiki/Pearson's_chi-square_test
http://en.wikipedia.org/wiki/Yates'_chi-square_test

### 0 Examples top

Log in to add / edit an example.

### See Also top

Log in to add a see also.

### Source incanter/stats.clj:2328 top

```(defn chisq-test
"
Performs chi-squared contingency table tests and goodness-of-fit tests.

If the optional argument :y is not provided then a goodness-of-fit test
is performed. In this case, the hypothesis tested is whether the
population probabilities equal those in :probs, or are all equal if
:probs is not given.

If :y is provided, it must be a sequence of integers that is the
same length as x. A contingency table is computed from x and :y.
Then, Pearson's chi-squared test of the null hypothesis that the joint
distribution of the cell counts in a 2-dimensional contingency
table is the product of the row and column marginals is performed.
By default the Yates' continuity correction for 2x2 contingency
tables is performed, this can be disabled by setting the :correct
option to false.

Options:
:x -- a sequence of numbers.
:y -- a sequence of numbers
:table -- a contigency table. If one dimensional, the test is a goodness-of-fit
:probs (when (nil? y) -- (repeat n-levels (/ n-levels)))
:freq (default nil) -- if given, these are rescaled to probabilities
:correct (default true) -- use Yates' correction for continuity for 2x2 contingency tables

Returns:
:X-sq -- the Pearson X-squared test statistics
:p-value -- the p-value for the test statistic
:df -- the degress of freedom

Examples:
(use '(incanter core stats))
(chisq-test :x [1 2 3 2 3 2 4 3 5]) ;; X-sq 2.6667
;; create a one-dimensional table of this data
(def table (matrix [1 3 3 1 1]))
(chisq-test :table table) ;; X-sq 2.6667
(chisq-test :table (trans table)) ;; throws exception

(chisq-test :x [1 0 0 0  1 1 1 0 0 1 0 0 1 1 1 1]) ;; 0.25

(use '(incanter core stats datasets))
(def math-prog (to-matrix (get-dataset :math-prog)))
(def x (sel math-prog :cols 1))
(def y (sel math-prog :cols 2))
(chisq-test :x x :y y) ;; X-sq = 1.24145, df=1, p-value = 0.26519
(chisq-test :x x :y y :correct false) ;; X-sq = 2.01094, df=1, p-value = 0.15617

(def table (matrix [[31 12] [9 8]]))
(chisq-test :table table) ;; X-sq = 1.24145, df=1, p-value = 0.26519
(chisq-test :table table :correct false) ;; X-sq = 2.01094, df=1, p-value = 0.15617
;; use the detabulate function to create data rows corresponding to the table
(def detab (detabulate :table table))
(chisq-test :x (sel detab :cols 0) :y (sel detab :cols 1))

;; look at the hair-eye-color data
;; turn the count data for males into a contigency table
(def male (matrix (sel (get-dataset :hair-eye-color) :cols 3 :rows (range 16)) 4))
(chisq-test :table male) ;; X-sq = 41.280, df = 9, p-value = 4.44E-6
;; turn the count data for females into a contigency table
(def female (matrix (sel (get-dataset :hair-eye-color) :cols 3 :rows (range 16 32)) 4))
(chisq-test :table female) ;; X-sq = 106.664, df = 9, p-value = 7.014E-19,

;; supply probabilities to goodness-of-fit test
(def table [89 37 30 28 2])
(def probs [0.40 0.20 0.20 0.19 0.01])
(chisq-test :table table :probs probs) ;; X-sq = 5.7947, df = 4, p-value = 0.215

;; use frequencies instead of probabilities
(def freq [40 20 20 15 5])
(chisq-test :table table :freq freq) ;; X-sq = 9.9901, df = 4, p-value = 0.04059

References:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
http://en.wikipedia.org/wiki/Pearson's_chi-square_test
http://en.wikipedia.org/wiki/Yates'_chi-square_test

"
([& options]
(let [opts (when options (apply assoc {} options))
correct (if (false? (:correct opts)) false true)
x (:x opts)
y (:y opts)
table? (if (:table opts) true false)
xtab (when (or x y)
(if y
(tabulate (bind-columns x y))
(tabulate x)))
table (cond
table?
(:table opts)
(and x y)
(:table xtab))
two-samp? (if (or (and x y)
(and table?
(and (> (nrow table) 1) (> (ncol table) 1))))
true false)
r-levels (if table?
(range (nrow table))
(first (:levels xtab)))
c-levels (if table?
(range (ncol table))
(second (:levels xtab)))
r-margins (if table?
(if two-samp?
(apply hash-map (interleave r-levels (map sum (trans table))))
(if (> (nrow table) 1)
(to-list table)
(throw (Exception. "One dimensional tables must have only a single column"))))
(second (:margins xtab)))
c-margins (if table?
(if two-samp?
(apply hash-map (interleave c-levels (map sum table)))
0)
(first (:margins xtab)))

counts (if two-samp? (vectorize table) table)
N (if table?
(sum counts)
(:N xtab))
n (when (not two-samp?) (count r-levels))
df (if two-samp? (* (dec (nrow table)) (dec (ncol table))) (dec n))
probs (when (not two-samp?)
(cond
(:probs opts)
(:probs opts)
(:freq opts)
(div (:freq opts) (sum (:freq opts)))
:else
(repeat n (/ n))))
E (if two-samp?
(for [r r-levels c c-levels]
(/ (* (c-margins c) (r-margins r)) N))
(mult N probs))
X-sq (if (and correct (and (= (count r-levels) 2) (= (count c-levels) 2)))
(reduce + (map (fn [o e] (/ (pow (- (abs (- o e)) 0.5) 2) e)) counts E))
(reduce + (map (fn [o e] (/ (pow (- o e) 2) e)) counts E)))
]
{:X-sq X-sq
:df df
:two-samp? two-samp?
:p-value (cdf-chisq X-sq :df df :lower-tail false)
:probs probs
:N N
:table table
:col-levels c-levels
:row-levels r-levels
:col-margins c-margins
:row-margins r-margins
:E E})))```
Vars in incanter.stats/chisq-test: defn let
Used in 0 other vars