• incanter

1.2.3-SNAPSHOT

# linear-model

## incanter.stats

• (linear-model y x & options)

Returns the results of performing a OLS linear regression of y on x.

Arguments:
y is a vector (or sequence) of values for the dependent variable
x is a vector or matrix of values for the independent variables

Options:
:intercept (default true) indicates weather an intercept term should be included

Returns:
a map, of type ::linear-model, containing:
:design-matrix -- a matrix containing the independent variables, and an intercept columns
:coefs -- the regression coefficients
:t-tests -- t-test values of coefficients
:t-probs -- p-values for t-test values of coefficients
:coefs-ci -- 95% percentile confidence interval
:fitted -- the predicted values of y
:residuals -- the residuals of each observation
:std-errors -- the standard errors of the coeffients
:sse -- the sum of squared errors, also called the residual sum of squares
:ssr -- the regression sum of squares, also called the explained sum of squares
:sst -- the total sum of squares (proportional to the sample variance)
:r-square -- coefficient of determination

Examples:
(use '(incanter core stats datasets charts))
(def iris (to-matrix (get-dataset :iris) :dummies true))
(def y (sel iris :cols 0))
(def x (sel iris :cols (range 1 6)))
(def iris-lm (linear-model y x)) ; with intercept term

(keys iris-lm) ; see what fields are included
(:coefs iris-lm)
(:sse iris-lm)
(quantile (:residuals iris-lm))
(:r-square iris-lm)
(:f-stat iris-lm)
(:f-prob iris-lm)
(:df iris-lm)

(def x1 (range 0.0 3 0.1))
(view (xy-plot x1 (cdf-f x1 :df1 4 :df2 144)))

References:
http://en.wikipedia.org/wiki/OLS_Regression
http://en.wikipedia.org/wiki/Coefficient_of_determination

### Source incanter/stats.clj:2009 top

```(defn linear-model
"
Returns the results of performing a OLS linear regression of y on x.

Arguments:
y is a vector (or sequence) of values for the dependent variable
x is a vector or matrix of values for the independent variables

Options:
:intercept (default true) indicates weather an intercept term should be included

Returns:
a map, of type ::linear-model, containing:
:design-matrix -- a matrix containing the independent variables, and an intercept columns
:coefs -- the regression coefficients
:t-tests -- t-test values of coefficients
:t-probs -- p-values for t-test values of coefficients
:coefs-ci -- 95% percentile confidence interval
:fitted -- the predicted values of y
:residuals -- the residuals of each observation
:std-errors -- the standard errors of the coeffients
:sse -- the sum of squared errors, also called the residual sum of squares
:ssr -- the regression sum of squares, also called the explained sum of squares
:sst -- the total sum of squares (proportional to the sample variance)
:r-square -- coefficient of determination

Examples:
(use '(incanter core stats datasets charts))
(def iris (to-matrix (get-dataset :iris) :dummies true))
(def y (sel iris :cols 0))
(def x (sel iris :cols (range 1 6)))
(def iris-lm (linear-model y x)) ; with intercept term

(keys iris-lm) ; see what fields are included
(:coefs iris-lm)
(:sse iris-lm)
(quantile (:residuals iris-lm))
(:r-square iris-lm)
(:f-stat iris-lm)
(:f-prob iris-lm)
(:df iris-lm)

(def x1 (range 0.0 3 0.1))
(view (xy-plot x1 (cdf-f x1 :df1 4 :df2 144)))

References:
http://en.wikipedia.org/wiki/OLS_Regression
http://en.wikipedia.org/wiki/Coefficient_of_determination

"
([y x & options]
(let [opts (when options (apply assoc {} options))
intercept? (if (false? (:intercept opts)) false true)
_x (if intercept? (bind-columns (replicate (nrow x) 1) x) x)
xtx (mmult (trans _x) _x)
xtxi (if (number? xtx) (/ 1 xtx) (solve xtx))
xty (mmult (trans _x) y)
coefs (if (and (number? xtxi) (number? xty))
(* xtxi xty)
(to-list (if (or (number? xtxi) (number? xty))
(mult xtxi xty)
(mmult xtxi xty))))
fitted (to-list (if (number? coefs)
(mult _x coefs)
(mmult _x coefs)))
resid (to-list (minus y fitted))
sse (sum-of-squares resid)
ssr (sum-of-squares (minus fitted (mean fitted)))
sst (+ sse ssr)
r-square (/ ssr sst)
n (nrow y)
p (ncol _x)
p-1 (if intercept? (- p 1) p)
adj-r-square (- 1 (* (- 1 r-square) (/ (- n 1) (- n p 1))))
mse (/ sse (- n p))
msr (/ ssr p-1)
f-stat (/ msr mse)
df1 p-1
df2 (- n p)
f-prob (cdf-f f-stat :df1 df1 :df2 df2 :lower-tail false)
coef-var (mult mse xtxi)
std-errors (sqrt (diag coef-var))
t-tests (div coefs std-errors)
t-probs (mult 2 (cdf-t (abs t-tests) :df df2 :lower-tail false))
t-95 (mult (quantile-t 0.975 :df df2) std-errors)
coefs-ci (if (number? std-errors)
[(plus coefs t-95)
(minus coefs t-95)]
(partition 2
(interleave
(minus coefs t-95)
(plus coefs t-95))))
]
(with-meta
{:x _x
:y y
:fitted fitted
:design-matrix _x
:coefs coefs
:t-tests t-tests
:t-probs t-probs
:coefs-ci coefs-ci
:residuals resid
:std-errors std-errors
:sse sse
:ssr ssr
:sst sst
:mse mse
:msr msr
:f-stat f-stat
:f-prob f-prob
:df [df1 df2]
:coef-var coef-var
:r-square r-square