SIP 13.4 Writing Good Multiple-choice Questions

Thirsty for a Strong Instructional Practice?

Student working from home on laptop.As more assessment takes place in the online environment these days, the advantages of multiple-choice questions become apparent. And while employing a variety of assessment formats is ideal, multiple-choice questions can be an effective and efficient way to assess student learning outcomes (SLOs).

Multiple-choice questions have several potential advantages:

  1. On a practical level, multiple-choice questions can be automatically scored by learning management systems (LMS), providing students with instant feedback, and facilitate the opportunity to retake an assessment if the instructor desires such, making them good for formative and summative assessment.
  2. Multiple-choice questions are versatile and can be written to assess the achievement of SLOs at a variety of levels on Bloom’s Taxonomy, from basic recall to application and evaluation. However, there are obvious limits on what can be assessed with multiple-choice questions. For example, they are not an effective way to test students’ ability to organize thoughts or articulate explanations or creative ideas.
  3. Well-written multiple-choice questions are reliable. They are less susceptible to guessing than true/false questions. Also, the objective nature of the scoring associated with multiple-choice questions eliminates scorer inconsistency associated with the scoring of essay questions. Reliability is enhanced when the number of questions focused on a single SLO is increased. Fortunately, the ability for LMS to randomize large pools of multiple-choice questions makes this process relatively simple.
  4. Assessments (exams, quizzes, etc.) based on well-written multiple-choice questions are valid. Due to their efficiency, such assessments can cover a relatively large amount of course material, thus increasing the validity of the assessment. For maximum validity, every question should be linked to an SLO and the level of achievement that a correct response indicates should be clearly noted.

Of course, the key to taking advantage of these benefits is writing good multiple-choice questions.

Take a SIP of This: Writing Good Multiple-choice Questions

First, some terminology: Multiple-choice questions (or “items” since not all are stated as questions) consist of a question or incomplete sentence, known as the “stem,” and a list of potential answers – words, phrases, sentences, etc., known as “alternatives.” The alternatives consist of one or more correct or best alternative(s), the “answer,” and several incorrect or inferior alternatives, known as “distractors.” For example:

What is chiefly responsible for the increase in the average length of life in the U.S. during the past 50 years? STEM
Distractor A. Compulsory health and physical-education courses in public schools. ALTERNATIVES
Answer B. The reduced death rate among infants and young children.
Distractor C. The safety movement, which has greatly reduced the number of deaths from accidents.
Distractor D. The substitution of machines for human labor.

Items with a single correct/best answer are known as multiple choice, while items with more than one correct answer are known as multiple answer. For multiple-answer items, students may be required to select any correct answer to earn credit or they may be required to select all correct answers to earn credit. These options increase the versatility of multiple-choice/-answer questions.

Stems

Ideally, the stem should be a definite question or an incomplete sentence. Question marks and blanks are essential. A question stem is preferable to an incomplete sentence because it allows the student to focus on answering the question rather than holding the partial sentence in working memory and sequentially completing it with each alternative (Statman 1988). The cognitive load of incomplete sentences is further increased when the stem is constructed with an initial or interior blank, so this construction should be avoided.

Poor Example Good Example
California:

  1. Contains the tallest mountain in the United States.
  2. Has an eagle on its state flag.
  3. Is the second-largest state in terms of area.
  4. Was the location of the Gold Rush of 1849.

 

What is the main reason so many people moved to California in 1849?

  1. California land was fertile, plentiful and inexpensive.
  2. Gold was discovered in central California.
  3. The East was preparing for a civil war.
  4. They wanted to establish religious settlements.

The stem should make sense on it its own and not rely on the alternatives for meaning.

Poor Example Good Example
Idaho is widely known as ___.

  1. The largest producer of potatoes in the United States.
  2. The location of the tallest mountain in the United States.
  3. The state with a beaver on its flag.
  4. The “Treasure State.”
For what agricultural product is Idaho widely known?

  1. Apples
  2. Corn
  3. Potatoes
  4. Wheat
Note: The good example tests students’ knowledge of Idaho’s agriculture. The poor example is confusing because students are unsure if they are answering a question on Idaho’s agriculture, geography, flag or nickname.

Research suggests that some students often have difficulty understanding items with negative phrasing (Rodriguez 1997). Thus, the stem should be stated negatively only when SLOs require it, such as in the case of identifying dangerous laboratory or clinical practices. If a stem is stated negatively, the negative element should be emphasized with italics or CAPITALIZATION.

Poor Example Good Example
A nurse is assessing a client who has pneumonia. Which of these assessment findings indicates that the client does NOT need to be suctioned?

  1. Diminished breath sounds.
  2. Absence of adventitious breath sounds.
  3. Inability to cough up sputum.
  4. Wheezing following bronchodilator therapy.
Which of these assessment findings, if identified in a client who has pneumonia, indicates that the client needs to be suctioned?

  1. Absence of adventitious breath sounds.
  2. Respiratory rate of 18 breaths per minute.
  3. Inability to cough up sputum.
  4. Wheezing prior to bronchodilator therapy.

The stem should not contain irrelevant material, which can decrease the reliability and the validity of the test scores.

Poor Example Good Example
Suppose you are a sociology professor who wants to determine whether your teaching of a unit on probability has had a significant effect on your students. You decide to analyze their scores from a test they took before the instruction and their scores from another exam taken after the instruction. Which of the following t-tests is appropriate to use in this situation?

  1. Dependent samples
  2. Heterogeneous samples
  3. Homogeneous samples
  4. Independent samples
What is the most appropriate statistical test to determine whether your teaching has had a significant effect on your students’ pretest and post-test scores?

  1. T-test for dependent samples
  2. T-test for heterogeneous samples
  3. T-test for homogeneous samples
  4. T-test for independent samples

Alternatives

All alternatives should be plausible. The function of the incorrect alternatives is to serve as distractors, and implausible alternatives don’t do this. Common student errors provide the best source of distractors.

Poor Example Good Example
Which of the following artists is known for painting the ceiling of the Sistine Chapel?

  1. Andy Warhol
  2. Fred Flintstone
  3. Michelangelo
  4. Santa Claus
Which of the following artists is known for painting the ceiling of the Sistine Chapel?

  1. Sandro Botticelli
  2. Leonardo da Vinci
  3. Michelangelo
  4. Raphael

Alternatives should be mutually exclusive. Multiple-choice items with more than one correct answer may be considered “trick questions” by students, which can erode trust and respect for the assessment process.

Poor Example Good Example
___ was the median household income for the United States in 2020.

  1. $0 to $25,000
  2. $25,000 to $50,000
  3. $25,000 to $75,000
  4. $25,000 to $100,000
What was the median household income for the United States in 2020?

  1. $0 to $25,000
  2. $25,001 to $50,000
  3. $50,001 to $75,000
  4. $75,001 or greater

Alternatives should be similar in content and style. Alternatives that include different topics can provide cues to students about the correct answer.

Alternatives should be stated clearly and concisely. Items that are excessively wordy assess students’ reading ability and working memory rather than the achievement of the SLO (Rodriguez 1997).

Poor Example Good Example
When you subtract one of the numbers below from 900, the answer is greater than 300. Which number is it?

  1. 667
  2. 823
  3. 579
  4. 712
900 minus this number results in an answer greater than 300. Which number is it?

  1. 823
  2. 712
  3. 667
  4. 579

Alternatives should be presented in a logical order (e.g., alphabetical or numerical) to avoid a bias toward listing the answer in certain positions (see example above).

Alternatives should be free from clues about which response is correct. Sophisticated test-takers are alert to inadvertent clues to the correct answer, such differences in grammar, length and language choice in the alternatives. Similarly, watch for non-textual cues such as differences in font, size, spacing (especially if copying and pasting when writing items) and position in your list of alternatives. In other words, don’t help your students “game” your exam or quiz.

Poor Example Good Example
The term “operant conditioning” refers to the learning situation in which ___.

  1. A familiar response is associated with a new stimulus.
  2. Individual associations are linked together in sequence.
  3. Operant conditioning is a response in which the learner is instrumental in leading to a subsequent reinforcing event.
  4. Verbal responses are made to verbal stimuli.
Which learning situation best represents operant conditioning?

  1. A familiar response is associated with a new stimulus.
  2. Individual associations are linked together in sequence.
  3. The learner’s response leads to reinforcement.
  4. Verbal responses are made to verbal stimuli.
Note:  The length of answer (C) in the poor example is longer and in a different font than the distractors. Some students are keen at spotting these changes. Also, the language in the poor example is from the textbook, but the distractors are in the instructor’s words. The good example makes the phrasing consistent in length and uses the instructor’s language.

The alternatives “all of the above” and “none of the above” should be avoided. When “all of the above” is used as an answer, test-takers who can identify more than one alternative as correct can select the correct answer even if unsure about other alternative(s). When “none of the above” is used as an alternative, test-takers who can eliminate a single option can thereby eliminate a second option. In either case, students can use partial knowledge to arrive at a correct answer. A good alternative is to use a multiple-answer item and ask students to identify all correct answers.

There is little difference in difficulty, discrimination and test-score reliability among items containing two, three and four distractors (Statman 1988). The number of alternatives can vary among items if all alternatives are plausible.

Additional considerations

Avoid complex multiple-choice items, in which some or all of the alternatives consist of different combinations of options. As with “all of the above” answers, a sophisticated test-taker can use partial knowledge to achieve a correct answer.

Poor Example Good Example
___ won the Nobel Prize for the discovery of DNA.

  1. Francis Crick
  2. James Watson
  3. Rosalind Franklin
  4. A and B
  5. B and C
  6. A and C
Who won the Nobel Prize for the discovery of DNA?

  1. Francis Crick and James Watson
  2. Rosalind Franklin
  3. Carl Wieman
  4. Eric Cornell and Jan Hall

 

Finally, keep the content of items independent of one another. That is, don’t provide the answer to one question in another. Savvy test-takers can use information in one question to answer another question, reducing the validity of the test.

Still Thirsty?  Take a SIP of this:

  1. Connie Malamed, the eLearning Coach
  2. Chiavaroli 2016 Negatively-worded multiple choice questions: An avoidable threat to validity
  3. Rodriguez 1997 The Art & Science of Item-Writing: A Meta-Analysis of Multiple-Choice Item Format Effects
  4. Statman 1988 Ask a Clear Question and Get a Clear Answer: An Enquiry into the Question/Answer and Sentence Completion Formats of Multiple Choice Items

Permanent link to this article: https://sites.msudenver.edu/sips/13-4_writing_good_multiple-choice_questions/

Leave a Reply

Your email address will not be published.