|
Reporting test scores: Why raw scores may not be the best approach
There are many ways to report test scores. It is important to use an approach that is accurate and understandable to students, teachers, administrators and parents.
The most common way test scores are reported is as raw scores. Raw scores are simply a tally of the number of questions answered correctly.
Raw scores are easy for students and others to understand. And, we are very accustomed to receiving results that tell us how many questions "we got right."
But raw score reporting may not be the best choice. Even though raw scores are familiar, they may not be the most accurate way to report scores; reporting scores on a scale may be more accurate.
A detailed discussion of scaling and measurement is well beyond this article, but there are a couple of key points that can help clarify why it is often better to report scaled scores.
Raw scores are limited in meaning. Any test is a "proxy" for finding out where a student stands on a "true" underlying scale reflecting the domain you are measuring. So, if you are administering a 20 question reading test, that test is only a proxy for reading ability; the raw scores from that reading test are only a rough indicator of a person's underlying reading ability. By using advanced measurement techniques, it is possible to change the raw scores into an underlying scale that better reflects the underlying reading ability.
Raw scores do not take item difficulty into account. Since the questions on each test are not equally hard and may differ statistically from each other in several other ways, the distance or "amount of ability" between scores on the raw scale may be greater at certain points. For example in terms of actual reading ability, the distance or "amount of ability" between 10 and 15 correct may be greater than the distance or the "amount of ability" between 15 and 20 correct on the test. Converting to scaled scores can help eliminate these differences, by making sure that the amount of ability between each point across the whole scale is the same and one score point difference in the middle means the same thing as one score point difference near the top of the range.
Raw scores should not be compared over time or across forms. Reporting scores on a common underlying scale (rather than raw scores) is particularly valuable when your testing plans call for the use of more than one form of the test. This includes using multiple forms at a single administration or using different forms of the test to monitor growth or progress over time.
Using raw scores under these circumstances can be very misleading. Despite your best efforts to make two forms similar, it is likely that they vary in difficulty. Take our reading test for example; Form B may be harder than Form A. Let's say this is a 20 question test and that a raw score of 10 on Form A may be equal in difficulty to a raw score of 8 on Form B. A student who takes Form A and gets a 10, then takes Form B later in the year and gets a 10 again, appears to have no growth. But in fact, because of the difficulty of the two tests there was improvement in performance. If both tests had been reported on the same scale, this growth in the scores would be readily apparent.
A detailed discussion of scaling is beyond the scope of this article. But, you should be aware that there are many different types of scales available--all with their own strengths and weaknesses.
If you currently provide assessments or are developing new ones, please contact us to discuss what type of reporting makes the most sense for your program.
|
|
Greetings!
Welcome to the November issue of SEGway. This issue examines several topics related to assessment and product effectiveness research.
There are many ways that test scores can be reported. In our technical corner, Dr. Elliot, SEG's president, explains why "raw scores" (number correct) may not be the best way to report scores.
Also in this issue, we explore an innovative way to disseminate research results and present this month's featured project: EdSteps, the development of an emerging to expert scale based on paired comparisons of student work.
SEG will be attending several conferences and events in the next few months (see below). We look forward to seeing you there. Please let us know if you will be attending and if you would like to meet with us.
We will not be publishing a December SEGway issue to allow all of us to enjoy the holidays. We look forward to seeing you in January. And, as always, please feel free to learn more about us at www.segmeasurement.com.
Sincerely,
Melissa Garza
Editor
|
|
Disseminating Research Results: An innovative approach
So, you have invested your time and resources to conduct a controlled, product effectiveness study. Now, you want to make sure that potential customers know that your product or service has been proven effective.
You have made the final report available, provided press releases and made a splash at conferences. But, you still feel like you aren't reaching potential customers looking for proof of effectiveness.
This is precsiely the challenge that Wiley Publishing faced at the conclusion of its product effectiveness study earlier this year. Wiley Publishing developed an interesting approach to reaching their potential customers: Print the research results directly on the product! Wiley decided to include a summary of the research on the back cover of every visualizing series textbook they produce. You can see how Wiley Publishing approached this by clicking on this link to the back cover.
This strategy need not be limited to textbooks. A brief summary of the research can be incorporated into nearly any product. A web-based product could include a summary directly on the website, a service such as professional development could include a description of the research in the manual or other resources distributed as part of the service.
SEG assists publishers in disseminating research results. We will continue to share our clients' success stories and our insights in upcoming issues of SEGway. Contact us if you would like to know more about how we can help you disseminate the results of your research efforts.
|
|
Featured Study: EdSteps, Developing a scale of student ability using paired comparisons of actual student work
What if we had a single "ruler" we could use to judge student writing from emerging to expert for formative assessment--a continuum we could use regardless of the grade level or age of the student to understand the level of the student's writing? This is precisely what SEG is helping the Council of Chief State School officers to develop as part of a grant funded by the Bill and Melinda Gates Foundation.
The EdSteps project is creating a continuum for judging student work ranging from emerging to expert. A range of student work including written, graphic, audio- and video-based performance is being examined. A continuum will be developed in multiple skill and learning areas.
EdSteps is using a paired comparison approach to develop the EdSteps continuum. First, we collected many thousands of pieces of actual student work, from students and teachers across the United States (and internationally).
Now that we have collected the student work, teachers and others are being asked to evaluate pairs of student work; each piece of student work will be paired with a second piece of work. After reviewing the criteria for judging effectiveness and examples of student work illustrating the criteria, the reviewers indicate whether the first or second piece of student work is more effective or whether the two pieces are equally effective. Early in 2011, the results of the paired comparisons will be statistically analyzed, to place all of the student work on a continuum from novice to expert.
Many reviewers will compare each pair of student work and judge which piece of work is more effective. Simply put, the piece of work judged to be more effective most often is assumed to be higher on the continuum than a piece of work judged most effective less often. And, the difference in the number of times the piece of work is judged "more effective" helps determine how far apart the two pieces of student work are on the continuum. Rasch Analysis will be used create a continuum of student work from novice to expert.
We are excited about this project and look forward to sharing the results of this research in 2011. For more information about EdSteps or to participate as a reviewer, please visit www.edsteps.org.
SEG provides research and psychometric services to organizations worldwide. Contact us to find out how we can help you address your research and assessment needs.
|
|
SEG at Upcoming Conferences
We look forward to seeing you in the next few months at the conferences and events we will be attending.
-
Software & Information Industry Association (SIIA) Ed Tech Business Forum, November 29-30, The Princeton Club, New York City
-
Market Data Retrieval (MDR) Holiday Party, December 1, 5:30-8:30pm, Metro 53, 307 53rd Street, New York City
-
Association of Educational Publishers (AEP) Educational Publishing Hall of Fame, December 2, 8-10:30am, McGraw-Hill Auditorium, New York City
If you would like to meet with a representative from SEG to discuss how we might help you with your assessment and research needs, please contact Hilary Rickert by phone at 267-759-0617 ext. 102 or by email at hrickert@segmeasurement.com.
|
SEG Measurement Announces New Website
You may have noticed that our website has changed. Earlier this month, we officially launched a new website, to help our clients and other members of the educational community better understand who we are and the services we provide.
The website provides substantially more resources than its predecessor. Clients can obtain research reports and assessment reports directly from the website and obtain a live twitter feed to get the latest news regarding SEG in specific and research and assessment in general.
Come check us out at www.segmeasurement.com.
|
|
|
|
|