1.2.3-SNAPSHOT permalink Arrow_down_16x16

chisq-test

incanter.stats

  • (chisq-test & options)

Performs chi-squared contingency table tests and goodness-of-fit tests.

If the optional argument :y is not provided then a goodness-of-fit test
is performed. In this case, the hypothesis tested is whether the
population probabilities equal those in :probs, or are all equal if
:probs is not given.

If :y is provided, it must be a sequence of integers that is the
same length as x. A contingency table is computed from x and :y.
Then, Pearson's chi-squared test of the null hypothesis that the joint
distribution of the cell counts in a 2-dimensional contingency
table is the product of the row and column marginals is performed.
By default the Yates' continuity correction for 2x2 contingency
tables is performed, this can be disabled by setting the :correct
option to false.


Options:
:x -- a sequence of numbers.
:y -- a sequence of numbers
:table -- a contigency table. If one dimensional, the test is a goodness-of-fit
:probs (when (nil? y) -- (repeat n-levels (/ n-levels)))
:freq (default nil) -- if given, these are rescaled to probabilities
:correct (default true) -- use Yates' correction for continuity for 2x2 contingency tables


Returns:
:X-sq -- the Pearson X-squared test statistics
:p-value -- the p-value for the test statistic
:df -- the degress of freedom


Examples:
(use '(incanter core stats))
(chisq-test :x [1 2 3 2 3 2 4 3 5]) ;; X-sq 2.6667
;; create a one-dimensional table of this data
(def table (matrix [1 3 3 1 1]))
(chisq-test :table table) ;; X-sq 2.6667
(chisq-test :table (trans table)) ;; throws exception

(chisq-test :x [1 0 0 0 1 1 1 0 0 1 0 0 1 1 1 1]) ;; 0.25

(use '(incanter core stats datasets))
(def math-prog (to-matrix (get-dataset :math-prog)))
(def x (sel math-prog :cols 1))
(def y (sel math-prog :cols 2))
(chisq-test :x x :y y) ;; X-sq = 1.24145, df=1, p-value = 0.26519
(chisq-test :x x :y y :correct false) ;; X-sq = 2.01094, df=1, p-value = 0.15617

(def table (matrix [[31 12] [9 8]]))
(chisq-test :table table) ;; X-sq = 1.24145, df=1, p-value = 0.26519
(chisq-test :table table :correct false) ;; X-sq = 2.01094, df=1, p-value = 0.15617
;; use the detabulate function to create data rows corresponding to the table
(def detab (detabulate :table table))
(chisq-test :x (sel detab :cols 0) :y (sel detab :cols 1))

;; look at the hair-eye-color data
;; turn the count data for males into a contigency table
(def male (matrix (sel (get-dataset :hair-eye-color) :cols 3 :rows (range 16)) 4))
(chisq-test :table male) ;; X-sq = 41.280, df = 9, p-value = 4.44E-6
;; turn the count data for females into a contigency table
(def female (matrix (sel (get-dataset :hair-eye-color) :cols 3 :rows (range 16 32)) 4))
(chisq-test :table female) ;; X-sq = 106.664, df = 9, p-value = 7.014E-19,


;; supply probabilities to goodness-of-fit test
(def table [89 37 30 28 2])
(def probs [0.40 0.20 0.20 0.19 0.01])
(chisq-test :table table :probs probs) ;; X-sq = 5.7947, df = 4, p-value = 0.215

;; use frequencies instead of probabilities
(def freq [40 20 20 15 5])
(chisq-test :table table :freq freq) ;; X-sq = 9.9901, df = 4, p-value = 0.04059



References:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
http://en.wikipedia.org/wiki/Pearson's_chi-square_test
http://en.wikipedia.org/wiki/Yates'_chi-square_test

0 Examples top

Log in to add / edit an example.

See Also top

Log in to add a see also.

Plus_12x12 Minus_12x12 Source incanter/stats.clj:2328 top

(defn chisq-test
"
  Performs chi-squared contingency table tests and goodness-of-fit tests.

  If the optional argument :y is not provided then a goodness-of-fit test
  is performed. In this case, the hypothesis tested is whether the
  population probabilities equal those in :probs, or are all equal if
  :probs is not given.

  If :y is provided, it must be a sequence of integers that is the
  same length as x. A contingency table is computed from x and :y.
  Then, Pearson's chi-squared test of the null hypothesis that the joint
  distribution of the cell counts in a 2-dimensional contingency
  table is the product of the row and column marginals is performed.
  By default the Yates' continuity correction for 2x2 contingency
  tables is performed, this can be disabled by setting the :correct
  option to false.


  Options:
    :x -- a sequence of numbers.
    :y -- a sequence of numbers
    :table -- a contigency table. If one dimensional, the test is a goodness-of-fit
    :probs (when (nil? y) -- (repeat n-levels (/ n-levels)))
    :freq (default nil) -- if given, these are rescaled to probabilities
    :correct (default true) -- use Yates' correction for continuity for 2x2 contingency tables


  Returns:
    :X-sq -- the Pearson X-squared test statistics
    :p-value -- the p-value for the test statistic
    :df -- the degress of freedom


  Examples:
    (use '(incanter core stats))
    (chisq-test :x [1 2 3 2 3 2 4 3 5]) ;; X-sq 2.6667
    ;; create a one-dimensional table of this data
    (def table (matrix [1 3 3 1 1]))
    (chisq-test :table table) ;; X-sq 2.6667
    (chisq-test :table (trans table)) ;; throws exception

    (chisq-test :x [1 0 0 0  1 1 1 0 0 1 0 0 1 1 1 1]) ;; 0.25

    (use '(incanter core stats datasets))
    (def math-prog (to-matrix (get-dataset :math-prog)))
    (def x (sel math-prog :cols 1))
    (def y (sel math-prog :cols 2))
    (chisq-test :x x :y y) ;; X-sq = 1.24145, df=1, p-value = 0.26519
    (chisq-test :x x :y y :correct false) ;; X-sq = 2.01094, df=1, p-value = 0.15617

    (def table (matrix [[31 12] [9 8]]))
    (chisq-test :table table) ;; X-sq = 1.24145, df=1, p-value = 0.26519
    (chisq-test :table table :correct false) ;; X-sq = 2.01094, df=1, p-value = 0.15617
    ;; use the detabulate function to create data rows corresponding to the table
    (def detab (detabulate :table table))
    (chisq-test :x (sel detab :cols 0) :y (sel detab :cols 1))

    ;; look at the hair-eye-color data
    ;; turn the count data for males into a contigency table
    (def male (matrix (sel (get-dataset :hair-eye-color) :cols 3 :rows (range 16)) 4))
    (chisq-test :table male) ;; X-sq = 41.280, df = 9, p-value = 4.44E-6
    ;; turn the count data for females into a contigency table
    (def female (matrix (sel (get-dataset :hair-eye-color) :cols 3 :rows (range 16 32)) 4))
    (chisq-test :table female) ;; X-sq = 106.664, df = 9, p-value = 7.014E-19,


    ;; supply probabilities to goodness-of-fit test
    (def table [89 37 30 28 2])
    (def probs [0.40 0.20 0.20 0.19 0.01])
    (chisq-test :table table :probs probs) ;; X-sq = 5.7947, df = 4, p-value = 0.215

    ;; use frequencies instead of probabilities
    (def freq [40 20 20 15 5])
    (chisq-test :table table :freq freq) ;; X-sq = 9.9901, df = 4, p-value = 0.04059



  References:
    http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
    http://en.wikipedia.org/wiki/Pearson's_chi-square_test
    http://en.wikipedia.org/wiki/Yates'_chi-square_test

"
  ([& options]
    (let [opts (when options (apply assoc {} options))
          correct (if (false? (:correct opts)) false true)
          x (:x opts)
          y (:y opts)
          table? (if (:table opts) true false)
          xtab (when (or x y)
                 (if y
                   (tabulate (bind-columns x y))
                   (tabulate x)))
          table (cond
                  table?
                   (:table opts)
                  (and x y)
                    (:table xtab))
          two-samp? (if (or (and x y)
                            (and table?
                                 (and (> (nrow table) 1) (> (ncol table) 1))))
                      true false)
          r-levels (if table?
                     (range (nrow table))
                     (first (:levels xtab)))
          c-levels (if table?
                     (range (ncol table))
                     (second (:levels xtab)))
          r-margins (if table?
                      (if two-samp?
                        (apply hash-map (interleave r-levels (map sum (trans table))))
                        (if (> (nrow table) 1)
                          (to-list table)
                          (throw (Exception. "One dimensional tables must have only a single column"))))
                      (second (:margins xtab)))
          c-margins (if table?
                      (if two-samp?
                        (apply hash-map (interleave c-levels (map sum table)))
                        0)
                      (first (:margins xtab)))

          counts (if two-samp? (vectorize table) table)
          N (if table?
              (sum counts)
              (:N xtab))
          n (when (not two-samp?) (count r-levels))
          df (if two-samp? (* (dec (nrow table)) (dec (ncol table))) (dec n))
          probs (when (not two-samp?)
                  (cond
                    (:probs opts)
                      (:probs opts)
                    (:freq opts)
                      (div (:freq opts) (sum (:freq opts)))
                    :else
                      (repeat n (/ n))))
          E (if two-samp?
              (for [r r-levels c c-levels]
                (/ (* (c-margins c) (r-margins r)) N))
              (mult N probs))
          X-sq (if (and correct (and (= (count r-levels) 2) (= (count c-levels) 2)))
                 (reduce + (map (fn [o e] (/ (pow (- (abs (- o e)) 0.5) 2) e)) counts E))
                 (reduce + (map (fn [o e] (/ (pow (- o e) 2) e)) counts E)))
         ]
      {:X-sq X-sq
       :df df
       :two-samp? two-samp?
       :p-value (cdf-chisq X-sq :df df :lower-tail false)
       :probs probs
       :N N
       :table table
       :col-levels c-levels
       :row-levels r-levels
       :col-margins c-margins
       :row-margins r-margins
       :E E})))
Vars in incanter.stats/chisq-test: defn let
Used in 0 other vars

Comments top

No comments for chisq-test. Log in to add a comment.