Assessment Committee at
DVHS: Mission
The DVHS Assessment Committee seeks to develop curriculum-based methods
of helping our students improve scores on all types of assessments.
This page updated on 14 March 2006.
Inservice Training
We are looking for workshops which would benefit teachers beyond the
usual how-to-write-a-rubric workshops we all are subjected to. If you
have any ideas, let us know!
Testing Tips
The Assessment Committee and Mr. Bart Cox's video production class
has collaborated on a series of short films illustrating test taking
tips for standardized or other multiple-choice assessments. A list of
these tips appears below. Italicized tips have been made into videos.
Each test taking tip is presented as a trick used by writers to lure
students into selecting the wrong answer. Therefore, each "trick" must
be defeated by a test taking strategy which defends against that trick.
Not all tricks are employed on any particular assessment. In this way
we avoid specifically preparing for state mandated tests, which is required
by law. Preparing for tests in general is allowed. Also, none of the
tips dealt specifically with test content, which is also prohibited.
Tip |
Writing Technique |
Test Taking Strategy |
Example Item |
Best Use |
| Best Wrong Answer First
|
Write a m/c question which presents reasonable
distractors, and place the one most likely to be selected ahead
of the right answer in the list. Since students often stop when
they find the right answer, they will not read past this "best"
wrong answer. |
Read all the choices before making a decision. |
Who was the second President of the United
States?
a. George Washington
b. Thomas Jefferson
c. John Quincy Adams
d. Abraham Lincoln |
This tip is best used against teacher-written
exams. Standardized exams are often computer-scrambled. |
| True Statements that Don't Address
the Question |
Write questions which ask for one thing, but
provide answers which answer another question entirely. These
answers should be true and based solidly in the question or reading
passage. |
Make sure your choice answers the question
asked by reading the question again after making your selection. |
Why did Huckleberry Finn run away from home?
a. He liked to swim..
b. His father was abusive.
c. He watched the riverboat fire a cannon to try to raise his
body from the bottom of the river.
d. He was friends with Tom Sawyer. |
Any well-written multiple choice exam |
| Math Problem answers will always
appear on your calculator |
For any math problem involving two numbers,
provide choices which add, subtract, divide, and multiply the
two numbers. |
Don't rely on your calculator and don't always
multiply the numbers or always divide the big number by the small
number. Work it out before selecting a choice, and eliminate the
wrong answers afterward. |
If you make $10.00 for 2 hours of work, what
is your hourly rate?
a. $10 per hour
b. $20 per hour
c. $5 per hour
d. $8 per hour
e. $0.20 per hour |
Any mathematical question |
| Do the easy questions first |
Put some hard questions at the beginning of
the test. |
Do the easy questions first, because all items
are worth the same amount. If you run out of time, you could lose
points. |
This tip is not about an individual question,
but rather a collection of questions. |
This works well on tests where each item is
worth the same, such as most teacher written tests and some standardized
tests like the SAT - 9. This doesn't apply to exams where the
harder questions are weighted more, such as the CAT-6 and SATs. |
| Guessing |
Tell students whether or not guessing is to
their advantage in the test instructions. |
If a test does not penalize for wrong answers,
you should guess on items you would otherwise leave blank. If
there is a penalty (typically -1/4 point for each wrong answer)
then you should guess if you can eliminate one or two choices,
but not if you cannot eliminate any choices. |
This tip is not about an individual question,
but rather a collection of questions. |
Guessing is almost always effective on questions
where two or more choices can be eliminated. If you cannot eliminate
any wrong choices, then guessing may penalize you on questions
like the AP exams. |
| No stupid choices |
Well written items will always have at least
two reasonable choices. |
Eliminate stupid choices if you can to increase
the likelihood you will select the right answer. |
What are Deer Valley's school colors?
a. Mauve and Chartreuse
b. Orange and Green
c. Teale and Black
d. Red and Pink |
You'll find questions like these on any well
written tests. Tests written by non-educators such as certification
tests, licensure tests, and so on, are more likely to have questions
with stupid choices. |
| More than one right answer |
Use Roman numerals and an extra layer of selection
to allow choices involving more than one right answer. |
Carefully ignore the letter choices and ask
which of the Roman numeral choices are correct. Then match up
according to the letters corresponding to your choices. |
Which of the following states border California?
a. I only
b. II and I only
c. I , II, and III only
d. all of these
e. II and III and IV only
I. Oregon
II.Nevada
III. Arizona
IV. Utah |
Very common on standardized tests. |
| The longest choice |
Try to avoid making the longest choice consistently
the correct choice. |
Poor question writers often make the right
answer, with all its conditions and high accuracy, the longest
answer. Pay special attention to answers which are longer than
all the others. It may be right---or it may be a trap. |
What is Newton's third law?
a. Objects in motion will stay in motion.
b. F= ma.
c. For every action force, there is an equal, opposite, and simultaneous
reaction force. The forces do not cancel because they do not act
on the same thing. |
On poorly written tests, long answers tend
to be right answers. On well-written tests, long answers are seldome
the right answers (but sometimes they are.) |
| Exploitation of weakness |
Analyze student homework or writing to determine
common errors and construct questions based on that. |
Pay attention to feedback on homework and
test review. |
Spell the name of the continent where the
south pole is located.
a. Arctica
b. Antiarcitca
c. Antiarctica
d. Antarctica
e. Aurora |
This strategy appears in both teacher written
and standardized tests.
This is why it is important to pay attention
when the teacher goes over the results of an exam before a final
exam. |
| Graph trends |
Write a question which involves interpreting
graphical data. More sophisticated questions involve the trends
on the graph rather than simply reading values. However, many
standardized tests involve simply reading values. |
Be familiar with all the major types of graphs
and how to interpret them: pie charts, line graphs, bar charts,
scatterplots, and so on. Make sure you know the question is regarding
the value on the graph or the trend of the data. |
Pending |
|
| Analogies |
Write items that draw analogies between two
ideas by using the format: A is to B as C is do D. You can leave
one of these items out and ask it, and |
Ask yourself, what is the relationship of
A to B? Then ask, what has a relationship with C like that? The
only correct choice has an identical relationship. |
Pending |
|
Links about Assessment
The STAR results are in and parent reports have been sent out. You
will be receiving more information from your principal or administrator
in the coming weeks about how your school did and how you can use this
information to guide instruction.
Included here are some links that you will want to bookmark for future
reference:
This link provides the blueprints for the Content Standards Tests (CST)
for the STAR. This is helpful to see which standards are assessed and
the number of questions for each standard.
http://www.cde.ca.gov/ta/tg/sr/blueprints.asp <http://www.cde.ca.gov/ta/tg/sr/blueprints.asp>
This link provides released questions for the CSTs. By analyzing these questions
teachers will be able to see how standards are assessed, the format of questions,
and the rigor of the question. Teacher-developed assessments can begin to align
with the structure and rigor of the CSTs.
http://www.cde.ca.gov/ta/tg/sr/css05rtq.asp <http://www.cde.ca.gov/ta/tg/sr/css05rtq.asp>
This link provides the 2006 STAR results from the state website. You can look
at groups through this link. Remember to click ‘view reports’ each
time you change the group.
http://star.cde.ca.gov/star2006/viewreport.asp <http://star.cde.ca.gov/star2006/viewreport.asp>
From Mary McCarthy's presentation at staff development, Fall 2004:
These are some websites that will be a good follow-up to my presentation
on Wednesday, August 25. You’ll want to check out some of these,
especially the released CAHSEE and STAR Questions. They will help you
determine if your assessment questions are rigorous enough, and comparable
to the STAR and CAHSEE questions. Other websites, like the SAT information,
will be a resource to help you guide students who want to know about
the new SAT.
This website contains the CAHSEE released questions, and the Study
Guide Information
http://www.cde.ca.gov/ta/tg/hs/resources.asp
This website contains information on the California Reading List Number
http://www.cde.ca.gov/ta/tg/sr/readinglist.asp
This website contains the STAR CST Released Questions
2004 STAR and CAHSEE results can be viewed at
http://www.cde.ca.gov/
Additional Links
http://www.ncee.org/
National Center on Education and the Economy
http://www.cde.ca.gov/
California Department of Education Main Page
http://star.cde.ca.gov/
California's STAR testing page at CDE.
Papers and Presentations
Using
Performance Assessments in the Physics Classroom
This paper uses physics examples but contains many general tutorial
tips about performance assessment.
Writing
and Taking Multiple Choice Exams
A guide for teachers. Includes all the standard tricks.
Standards
and Assessments
Presentation to AUSD teachers about connecting SAT 9 to standards.
Student
Generated Assessments
Hypothesis: students who write test questions will take questions more
seriously.
Assessment Samplers
The New Standards
Reference Exams offered by Harcourt
Brace Educational Measurement provide information about how well
students perform against the New
Standards Performance Standards, developed by the National
Center on Education and the Economy. Jeff was the director of assessment
for NCEE and coordinated the development of these exams.
Sample
Assessment items from Science
Sample
Assessment items from Social Science (courtesy Alison Weihe)
Psychometrics
Psychometrics is the study of psychological measurement, in particular
of repeatable and operationally defined quantities such as achievement,
intelligence, and personality. Within this field there are some specific
concepts which are of concern to educators involved in standardized
testing. In particular you should be familiar with validity and reliability,
as these are the gauges used to select multiple-choice and open-response
items for standardized tests. What follows in an excerpt from the Wikipedia entry
on Psychometrics.
The key traditional concepts in classical test theory are reliability and validity. A reliable measure is measuring something consistently,
while a valid measure is measuring what it is supposed to measure.
A reliable measure may be consistent without necessarily being valid,
.e.g., a measurement instrument like a broken ruler may always under-measure
a quantity by the same amount each time (consistently), but the resulting
quantity is still wrong, that is, invalid. For another example, a reliable
rifle will have a tight cluster of bullets in the target, while a valid
one will center its cluster around the center of the target, whether
or not the cluster is a tight one.
Both reliability and validity may be assessed mathematically. Internal
consistency may be assessed by correlating performance on two halves
of a test (split-half reliability); the value of the Pearson product-moment
correlation coefficient is adjusted with the Spearman-Brown prediction
formula to correspond to the correlation between two full-length tests.
Other approaches include the intra-class correlation (the ratio of
variance of measurements of a given target to the variance of all targets).
A commonly used measure is Cronbach's alpha, which is equivalent to
the mean of all possible split-half coefficients. Stability over repeated
measures is assessed with the Pearson coefficient, as is the equivalence
of different versions of the same measure (different forms of an intelligence
test, for example). Other measures are also used.
[In practice the Pearson coefficient is used as a criteria for
rejecting items. A p-value is computed for each item in comparison
to the rest
of the test. If an item has a low p-value, it does not correlate well
with the remainder of the items on the test, and thus will be a poor
predictor of how well a student will do on the test as a whole. On
a standardized test you'd throw this one out; on a criterion referenced
test you'd throw it out too, but after a vigorous debate about how
important it is to the integrity of the test. If the test has any slack
(extra items not needed to maintain reliability) you can keep it.
The
other tests distinguish the performance of the item in relationship
to the items known to be good predictors. If an item for some reason
is only answered correctly by students who would otherwise fail most
other items, and especially if this same item is missed by students
who are successful on the rest of the test, the vendor will argue
the item should be thrown out. Using such an item reduces the profit
margin for the testing company as it makes the
test
longer, uses more paper, and does not contribute to the reliability
of the results. - JA}
Validity may be assessed by correlating measures with a criterion
measure known to be valid. When the criterion measure is collected
at the same
time as the measure being validated the goal is to establish concurrent
validity; when the criterion is collected later the goal is to establish
predictive validity. A measure has construct validity if it is related
to other variables as required by theory. Content validity is simply
a demonstration that the items of a test are drawn from the domain
being measured. In a personnel selection example, test content is
based on a defined statement or set of statements of knowledge, skill,
ability,
or other characteristics obtained from a job analysis.
[Most testing vendors keep a bookcase of binders on a shelf showing
how each of their products correllates to each state's standards. If
a client from California called about test X, there's a binder on the
shelf that has Test X's content mapped to the standards for California.
This is probably just a content correllation rather than an evaluation
that determines whether or not items get at the meaning of a standard,
although I've been out of the business for a few years and practices
might have changed. In California, in particular, items are supposed
to be criterion referenced and designed more directly at the standards--that's
why we moved away from assessments such as the SAT 9. Typically items
are approved through a committee consisting of state dept.
personnel,
vendor content
specialists
and
teachers.
-
JA]
Predictive or concurrent validity cannot exceed the square of the
correlation between two versions of the same measure.
Item response theory models the relationship between latent traits
and responses to test items. Among other advantages, IRT provides
a basis for obtaining an estimate of the location of a test-taker
on
a given latent trait as well as the standard error of measurement
of that location. For example, a university student's knowledge
of history
can be deduced from his or her score on a university test and
then be compared reliably with a high school student's knowledge
deduced
from a less difficult test. Scores derived by classical test
theory do not have this characteristic, and assessment of actual
ability
(rather than ability relative to other test-takers) must be assessed
by comparing
scores to those of a norm group randomly selected from the population.
In fact, all measures derived from classical test theory are
dependent on the sample tested, while, in principle, those derived
from item
response theory are not.
[The sample size is driven by the confidence interval needed by
the state to politically justify the scores. A politician might need
to
claim that the test is 90% reliable, thus if given again to another
random group in the population, 9/10 times you'd get the same score.
Fewer items generally reduces the confidence interval. Fewer than 10
items makes the subtopic essentially unreportable, although as one
psychometrician told me, "If the client insists, we could report scores
on individual items. We advise the client on the confidence intervals,
and recommend what to report and what not to report within the subgroups.
What the client does with the information is beyond our control." In
other words, the state might overinterpret results or claim detailed
knowledge of miniscule points when in fact the number of items used
to make the judgement is psychometrically inadequate. What this means
for the practicing teacher is, don't get too bent out of shape on subscores--treat
it as interesting advice-- but put some stock in the content score
for a content area as a whole.
-- JA]
Good Arguments for Taking
Standardized Tests Seriously
10. You should take pride in everything you do and perform to the best
of your ability.
9. There are going to be two kinds of people in your future: people
who brag about purposefully doing poorly on standardized tests, and
people who don't need to brag because they did well. Which do you want
to be?
8. Any test is good practice for other tests of the same nature.
7. School spirit and a sense of community are enhanced when we all
work together toward a common goal.
6. It is possible to learn new things you didn't know in the process
of answering questions. In fact, some theories of learning say that
the only time you learn new things is when you figure them out for yourself
rather than being told them in a lecture; some people learn more during
homework and tests than they ever do when sitting passively in a classroom.
5. Employers regularly use standardized tests as a method of screening
applicants for high paying jobs. Not only do they look at scores from
school tests, they administer their own tests. You are more likely to
do well if you have taken such tests seriously. This ranges from driver's
licensing exams to industrial certifications to teaching. Tests are
just a fact of life.
4. Real estate prices have been shown to be influenced by local standardized
test scores. Agents want to know how well the schools prepare students,
and the only public measurement they have is standardized test scores.
(Made into a video by Mr. Cox's class.)
3. You never know when someone is going to check on your score on any
standardized test. The career center reports that colleges and employers
ask about SAT-9 scores.
2. The state of California will pay the top 10% of scorers in a school
and the top 5% in the state a $1000 scholarship which can be redeemed
by the college of your choice upon graduation.
1. Why not?