Performing a chi-square test for independence of attributes of classification with MATLAB


One of the things I enjoyed most about the statistical theory sequence I taught this past academic year was using MATLAB to do grungy computations required in many of the problems and examples. I've posted a number of examples on this blog before. Here's another one.

Section 8.6 of Probability and Statistical Inference 7e by Hogg and Tanis covers contingency tables. Problems in this area typically require more computations than I care to do with paper and pencil, even with the aid of a calculator. Let's consider Example 8.6-3 in which a random sample of 400 students at the University of Iowa was taken, and the question of independence of gender and enrollment in the school (Business, Engineering, Liberal Arts, Nursing, Pharmacy) is analyzed.

Hogg and Tanis present a 2 x 5 contingency table and a compute the value of a chi-square statistic.

Table from P. 538 (Hogg and Tanis)
College
Gender Bus. Engin. LibArts Nursing Pharm
males211614526
females144175134
 


We then wish to test the null hypothesis




where k = 2, and h = 5.

We then need to compute the chi-square test statistic




where




and are the frequencies shown in the contingency table. We then reject the null hypothesis if





Now, to compute Q in MATLAB, we begin by storing the observed frequencies from the contingency table in a matrix

Y=[ 21 16 145 2 6 ; 14 4 175 13 4];

and perform the following computations:

% Compute total number of trials
N=sum(sum(Y));
 
% Compute the totals for Attribute A (Gender)
nidot=sum(Y,2);
 
% Compute the totals for Attribute B (College)
ndotj=sum(Y);
 
% Compute the relative frequencies (probability estimates) 
% for Attribute B
pdotj = ndotj/N; 
 
% Compute the expected frequencies  (an outer product)
NP=nidot*pdotj;
 
% Compute the relative frequencies (probability estimates) 
% for Attribute A
pidot = nidot/N;
 
% Compute the chi-square statistic for the test of 
% independence of attributes
q=sum(sum(((Y-NP).^2)./NP));



Next we need to compare the resulting value of q to , which is something we can do either with a chi-square table or with my MATLAB function. Following is an illustration of the use of my MATLAB function.


% Compute the degrees of freedom for q:
[k h] = size(Y);
dof = (h-1)*(k-1);
 
% Find chi-square-subalpha (dof):
chiSquareSubAlpha = chiSquarePercentilesBisect(dof,alphaSig);




We estimate or bound the p-value using a chi-square table or we can use my MATLAB function to compute the p-value:


% Compute the p-value
pvalue=chiSquareProb(dof, q, inf);



I packaged these MATLAB commands as a function named chiSquareIndependenceTest2Attr, which can be called as follows.

alphaSig = 0.01;
whichtest=0;
[passORfail q chiSquareSubAlpha pvalue NP pidot pjdot nidot ndotj] ...
  = chiSquareIndependenceTest2Attr(Y, alphaSig, whichtest)


Note that

Input:
%  Y  -  k by h matrix (2-D array) with containing k events of attribute 
%        A and h events of attribute B. 
% alphaSig  - scalar significance level of test
% whichtest  -  scalar if 1, require p-value >= alphaSig for pass. 
%               Anything else will require chi-square test statistic
%               q <= chi-square_alpha[(k-1)(h-1)] 
%               for pass
%
% Output:
%  passORfail        - 1 if pass 0 if fail at alphaSig significance level
%  q                 - scalar chi-square test statistic
%  chisquareSubAlpha - scalar chi-square sub alpha
%  pvalue            - scalar p-value
%  NP                - array size of Y containing corresponding expected 
%                      frequencies
%  pidot             - vector containing relative frequencies (probability
%                      estimates) for Attribute A
%  pdotj             - vector containing relative frequencies (probability
%                      estimates) for Attribute B
%  nidot             - vector containing frequencies of Attribute A
%  ndotj             - vector containing frequencies of Attribute B


A call to this function (using Y as defined above) produces:



Fails independence test at 0.01 level of significance.
chi-square Test Statistic = 18.926482873851 > 13.276672 = chi^2_0.010(4).

passORfail =

     0


q =

   18.9265


chiSquareSubAlpha =

   13.2767


pvalue =

   8.1252e-04


NP =

   16.6250    9.5000  152.0000    7.1250    4.7500
   18.3750   10.5000  168.0000    7.8750    5.2500


pidot =

    0.4750
    0.5250


pjdot =

    0.0875    0.0500    0.8000    0.0375    0.0250


nidot =

   190
   210


ndotj =

    35    20   320    15    10


Here nidot corresponds to


and ndotj corresponds to



These functions are available on my Statistical Theory II course website. You'll need the Symbolic Toolkit for chiSquarePercentilesBisect, chiSquareProb, and chiSquareIndependenceTest2Attr (since this last one uses the two preceding ones).


Posted: Sunday - May 07, 2006 at 07:31 AM        


©