Performing a chi-square test for independence of attributes of classification with MATLABOne of the things I enjoyed most about the
statistical theory sequence I taught this past academic year was using MATLAB to
do grungy computations required in many of the problems and examples. I've
posted a number of examples on this blog before. Here's another
one.
Section 8.6 of Probability and Statistical Inference 7e by Hogg and Tanis covers contingency tables. Problems in this area typically require more computations than I care to do with paper and pencil, even with the aid of a calculator. Let's consider Example 8.6-3 in which a random sample of 400 students at the University of Iowa was taken, and the question of independence of gender and enrollment in the school (Business, Engineering, Liberal Arts, Nursing, Pharmacy) is analyzed. Hogg and Tanis present a 2 x 5 contingency table and a compute the value of a chi-square statistic.
Y=[ 21 16 145 2 6 ; 14 4 175 13 4]; and perform the following computations: % Compute total number of trials N=sum(sum(Y)); % Compute the totals for Attribute A (Gender) nidot=sum(Y,2); % Compute the totals for Attribute B (College) ndotj=sum(Y); % Compute the relative frequencies (probability estimates) % for Attribute B pdotj = ndotj/N; % Compute the expected frequencies (an outer product) NP=nidot*pdotj; % Compute the relative frequencies (probability estimates) % for Attribute A pidot = nidot/N; % Compute the chi-square statistic for the test of % independence of attributes q=sum(sum(((Y-NP).^2)./NP)); Next we need to compare the resulting value of q to % Compute the degrees of freedom for q: [k h] = size(Y); dof = (h-1)*(k-1); % Find chi-square-subalpha (dof): chiSquareSubAlpha = chiSquarePercentilesBisect(dof,alphaSig); We estimate or bound the p-value using a chi-square table or we can use my MATLAB function to compute the p-value: % Compute the p-value pvalue=chiSquareProb(dof, q, inf); I packaged these MATLAB commands as a function named chiSquareIndependenceTest2Attr, which can be called as follows. alphaSig = 0.01; whichtest=0; [passORfail q chiSquareSubAlpha pvalue NP pidot pjdot nidot ndotj] ... = chiSquareIndependenceTest2Attr(Y, alphaSig, whichtest) Note that Input: % Y - k by h matrix (2-D array) with containing k events of attribute % A and h events of attribute B. % alphaSig - scalar significance level of test % whichtest - scalar if 1, require p-value >= alphaSig for pass. % Anything else will require chi-square test statistic % q <= chi-square_alpha[(k-1)(h-1)] % for pass % % Output: % passORfail - 1 if pass 0 if fail at alphaSig significance level % q - scalar chi-square test statistic % chisquareSubAlpha - scalar chi-square sub alpha % pvalue - scalar p-value % NP - array size of Y containing corresponding expected % frequencies % pidot - vector containing relative frequencies (probability % estimates) for Attribute A % pdotj - vector containing relative frequencies (probability % estimates) for Attribute B % nidot - vector containing frequencies of Attribute A % ndotj - vector containing frequencies of Attribute B A call to this function (using Y as defined above) produces:
Fails independence test at 0.01 level of significance.
chi-square Test Statistic = 18.926482873851 > 13.276672 = chi^2_0.010(4).
passORfail =
0
q =
18.9265
chiSquareSubAlpha =
13.2767
pvalue =
8.1252e-04
NP =
16.6250 9.5000 152.0000 7.1250 4.7500
18.3750 10.5000 168.0000 7.8750 5.2500
pidot =
0.4750
0.5250
pjdot =
0.0875 0.0500 0.8000 0.0375 0.0250
nidot =
190
210
ndotj =
35 20 320 15 10
Here nidot corresponds to and ndotj corresponds to These functions are available on my Statistical Theory II course website. You'll need the Symbolic Toolkit for chiSquarePercentilesBisect, chiSquareProb, and chiSquareIndependenceTest2Attr (since this last one uses the two preceding ones). Posted: Sunday - May 07, 2006 at 07:31 AM |
Quick Links
Statistics
Total entries in this blog:
Total entries in this category: Published On: May 23, 2008 09:10 AM |
|||||||||||||||||||||||||