STA201 Project 1 Tips. Generating Histograms with MS EXCEL
This posting is primarily for students in my STA
201 Statistical Methods course, but it may be of interest to anyone using EXCEL
for statistical computations. As noted in a previous posting, where I showed
how to do things using the TI-83Plus graphing calculator, Project 1 of my STA201
course involves obtaining numerical data (from sources given in class) and
analyzing the first non-zero digit. In particular, we want to study the
distribution of the first non-zero digit in the data sets.
Read More for a detailed description
of how to use MS EXCEL to obtain frequency distributions, relative frequency
distributions, and histograms.
Begin by opening a new EXCEL blank workbook and
enter the titles 1st Digit
Data and
Digit in
cells A1 and B1, respectively. Next enter the first digit of each datum
beginning in column A (starting in A2). In our small example, we only have 10
values. Next,
select the values you just entered, and from the title bar menu, select
Format -->
Cells (or press Ctrl 1) to make sure that the
data is treated as
text.
Do
the same thing for the categories in column
B. The
next step is to obtain the frequency distribution. From the title bar, select
Tools --> Data
Analysis... Then
in the window that comes up, select Histogram. and click
OK. (In
our case, you're not actually going to make the histogram this way. Instead,
you're going to use this feature of EXCEL to produce the frequency
distribution.)
The next window that pops up is shown
next. With
your mouse, make the Input Range field active, then using your mouse, select all
the data in column A, beginning with A2. In our example, you would select from
A2 to A11, as shown
below. The
bin range (B2 to B10) is selected in a similar manner, using the mouse. To
select an output range (i.e., where the frequency distribution will go) can also
be selected with the mouse, as shown
below.
Next, click
OK to
generate the frequency distribution.
To
generate the relative frequency distribution, select the cell immediately under
the last Frequency entry. In that cell, type in
=SUM(1st
Frequency
cell:last
frequency
cell).
In our example, that is =SUM(E2:E10)
and press enter. That will put the total
number of values in the that cell. In our example, that value is
10. Next,
start a new column labeled Relative Frequency, say just to the right of the
frequency column, and in the first cell under the title cell, enter
=Cell with first frequency/$column
identifier of column containing frequencies$row number in frequency column
containing sum of frequencies. In our case,
you would enter
=E2/$E$11.
Notice that the $-sign surrounds the column
identifier. Press
enter to
obtain the relative frequency of the first category (i.e., digit
1). To
obtain the rest of the relative frequencies, select the first relative frequency
using your mouse, positioned at the lower right corner (until the cursor looks
like a solid black plus sign), then drag the mouse down to select the other
cells that will contain the appropriate relative frequencies. This is shown in
the figure
below. Now
to make a (frequency) histogram, select the frequencies, as shown below
and
press the Chart Wizard icon
.In
the Chart Wizard window, select
Column
as shown. You can click and hold on the
Click and Hold to View
Sample button to preview your histogram.
Click
Next to obtain the window show
below. Click
on the
Next
button again to enter the
Title,
Category (x)
axis label, and
Value (Y)
axis label, as shown
below. Note
that in actual practice, your title for the histograms should reflect the type
of data being studied. So, for example, if the data comes from street address,
a title of Histogram: Street
Addresses, would be appropriate. Similarly,
if the data comes from NASDAQ securities, then a title of
Histogram: NASDAQ Securities
would be appropriate.
Click on the Legend tab to un-select
Show
Legend. (You don't need a legend when you
have only one graph in the
window.) Click
on the
Finish
button to generate the histogram, as shown.
A
relative frequency histogram can be produced following the steps used to produce
the frequency histogram, with the exceptions that you must select the relative
frequencies instead of the frequencies, and that you should adjust titles and
labels, accordingly.
If
we want the bars of the histogram to touch, we can make that adjustment by
clicking on a histogram, as shown
below. Next,
right-click on the histogram and select
Format Data
Series... In
the window that comes up, select the
Options
tab, and set Gap
width to 0, as shown below. Then, click
OK. The
final result is shown
below. After
you've done all the above work, you'll be ready to report the (a) frequency
table (b) relative frequency table, and (c) the relative frequency histogram
(which is the histogram on the lower right of the preceding
figure.The Frequency Table
is
|
1st Nonzero Digit |
Frequency |
| 1 | 1 |
| 2 | 3 |
| 3 | 1 |
| 4 | 1 |
| 5 | 0 |
| 6 | 1 |
| 7 | 1 |
| 8 | 0 |
| 9 | 2 |
|
1st Nonzero Digit |
Relative Frequency |
| 1 | 0.1 |
| 2 | 0.3 |
| 3 | 0.1 |
| 4 | 0.1 |
| 5 | 0.0 |
| 6 | 0.1 |
| 7 | 0.1 |
| 8 | 0.0 |
| 9 | 0.2 |
The relative frequency histogram is shown
below. Keep
in mind that the distribution that you'll obtain for the project will not look
like the one developed here. The example shown in this posting was used only to
demonstrate the process of obtaining the distribution.
Posted: Friday - November 10, 2006 at 01:44 PM
|
Quick Links
Statistics
Total entries in this blog:
Total entries in this category:
Published On: May 23, 2008 09:10 AM
|