STA201 Project 1 Tips. Generating Histograms with MS EXCEL


This posting is primarily for students in my STA 201 Statistical Methods course, but it may be of interest to anyone using EXCEL for statistical computations. As noted in a previous posting, where I showed how to do things using the TI-83Plus graphing calculator, Project 1 of my STA201 course involves obtaining numerical data (from sources given in class) and analyzing the first non-zero digit. In particular, we want to study the distribution of the first non-zero digit in the data sets.

Read More for a detailed description of how to use MS EXCEL to obtain frequency distributions, relative frequency distributions, and histograms.


Begin by opening a new EXCEL blank workbook and enter the titles 1st Digit Data and Digit in cells A1 and B1, respectively. Next enter the first digit of each datum beginning in column A (starting in A2). In our small example, we only have 10 values.



Next, select the values you just entered, and from the title bar menu, select Format --> Cells (or press Ctrl 1) to make sure that the data is treated as text.



Do the same thing for the categories in column B.




The next step is to obtain the frequency distribution. From the title bar, select Tools --> Data Analysis...




Then in the window that comes up, select Histogram. and click OK. (In our case, you're not actually going to make the histogram this way. Instead, you're going to use this feature of EXCEL to produce the frequency distribution.)




The next window that pops up is shown next.




With your mouse, make the Input Range field active, then using your mouse, select all the data in column A, beginning with A2. In our example, you would select from A2 to A11, as shown below.




The bin range (B2 to B10) is selected in a similar manner, using the mouse. To select an output range (i.e., where the frequency distribution will go) can also be selected with the mouse, as shown below.






Next, click OK to generate the frequency distribution.




To generate the relative frequency distribution, select the cell immediately under the last Frequency entry. In that cell, type in =SUM(1st Frequency cell:last frequency cell). In our example, that is =SUM(E2:E10) and press enter. That will put the total number of values in the that cell. In our example, that value is 10.





Next, start a new column labeled Relative Frequency, say just to the right of the frequency column, and in the first cell under the title cell, enter =Cell with first frequency/$column identifier of column containing frequencies$row number in frequency column containing sum of frequencies. In our case, you would enter =E2/$E$11. Notice that the $-sign surrounds the column identifier.




Press enter to obtain the relative frequency of the first category (i.e., digit 1).



To obtain the rest of the relative frequencies, select the first relative frequency using your mouse, positioned at the lower right corner (until the cursor looks like a solid black plus sign), then drag the mouse down to select the other cells that will contain the appropriate relative frequencies. This is shown in the figure below.




Now to make a (frequency) histogram, select the frequencies, as shown below




and press the Chart Wizard icon .

In the Chart Wizard window, select Column as shown. You can click and hold on the Click and Hold to View Sample button to preview your histogram.



Click Next to obtain the window show below.




Click on the Next button again to enter the Title, Category (x) axis label, and Value (Y) axis label, as shown below.





Note that in actual practice, your title for the histograms should reflect the type of data being studied. So, for example, if the data comes from street address, a title of Histogram: Street Addresses, would be appropriate. Similarly, if the data comes from NASDAQ securities, then a title of Histogram: NASDAQ Securities would be appropriate.

Click on the Legend tab to un-select Show Legend. (You don't need a legend when you have only one graph in the window.)





Click on the Finish button to generate the histogram, as shown.




A relative frequency histogram can be produced following the steps used to produce the frequency histogram, with the exceptions that you must select the relative frequencies instead of the frequencies, and that you should adjust titles and labels, accordingly.




If we want the bars of the histogram to touch, we can make that adjustment by clicking on a histogram, as shown below.




Next, right-click on the histogram and select Format Data Series...




In the window that comes up, select the Options tab, and set Gap width to 0, as shown below. Then, click OK.




The final result is shown below.





After you've done all the above work, you'll be ready to report the (a) frequency table (b) relative frequency table, and (c) the relative frequency histogram (which is the histogram on the lower right of the preceding figure.

The Frequency Table is

1st Nonzero Digit Frequency
11
23
31
41
50
61
71
80
92

1st Nonzero Digit Relative Frequency
10.1
20.3
30.1
40.1
50.0
60.1
70.1
80.0
90.2


The relative frequency histogram is shown below.




Keep in mind that the distribution that you'll obtain for the project will not look like the one developed here. The example shown in this posting was used only to demonstrate the process of obtaining the distribution.

Posted: Friday - November 10, 2006 at 01:44 PM        


©