Back in March, Tobin White, an Associate Professor at the UC Davis School of Education, discussed his assessment of current testing practices at DJUSD. According to Professor White, while the district uses two exams – the OLSAT (Otis-Lennon School Ability Test) and the TONI (Test of Nonverbal Intelligence) – to identify students eligible for the AIM program, the OLSAT identifies about 24% of the AIM students, the TONI 49% of AIM students, while 27% are identified through some other test.
The TONI is used to retest students who come within five points of qualifying for enrollment on the OLSAT test or who have been identified with various “risk factors.” Professor White’s research determined that students administered the TONI were “six times more likely to qualify than those taking only the OLSAT.” They were also nine times more likely, according to Professor White to score in the 99th percentile.
He wrote, “These are radically different measures, yet they are being treated as equivalent in program placement decisions.”
He added, citing research, “The TONI was not designed to replace broad-based intelligence tests but rather to provide an alternative method of assessment when a subject’s cognitive, language, or motor impairments rendered traditional tests of intelligence inappropriate and ineffectual.”
The Vanguard, about ten days ago, exchanged emails with the professor to clarify some of his thinking and conclusion.
Professor White stated at one point that the TONI measures something very different than the OLSAT and yet is used interchangeably. He explained to the Vanguard, “There are at least two grounds on which to say these tests are measuring different things. One reflects what these tests are built to do.”
He said, “Keep in mind that intelligence is complex and multifaceted. Intelligence tests try to specify and then measure one or more dimensions of intelligence. Robust intelligence tests are multidimensional—there are different portions of the test that target different aspects of intelligence.”
The OLSAT, for example, he said, “seeks to measure verbal comprehension, verbal reasoning, pictorial reasoning, figural reasoning, and quantitative reasoning. The TONI, on the other hand, is unidimensional—it measures only a single aspect of intelligence, figural reasoning.”
Professor White continued, “The other way in which they are clearly measuring different underlying constructs is the markedly different scoring profiles they generate for Davis students.”
Here he noted, “The TONI produces qualifying and 99th percentile scores at much higher rates than the OLSAT, and yields a dramatically higher mean score (far beyond the standard measures of error for both tests) across a sample of students taking both tests, even when you control for any criteria by which students might have been selected to take the TONI (more on this below). So, regardless of which measure you think is the more accurate one, it is clear that the two tests measure the same group of children in very different ways—which is what I mean when I say the underlying constructs, or models of the intelligence of these children, don’t match.”
In fact, he continued, “my impression from the district’s data, reinforced by some reviewing of the literature, is that the OLSAT tends to under-identify giftedness (a trend toward ‘false negatives’), while the TONI tends to over-identify giftedness (a trend toward ‘false positives’).”
The way to address both these concerns “is to use these tests in combination with other measures before making a determination about whether GATE services are appropriate.”
Tobin White continued, “One of my principal objections to the district’s practice in recent years is that, when faced with mismatched scores on these two tests for a given child—say, a score in the 40th percentile on the OLSAT and a score in the 99th percentile on the TONI, the policy has been to simply choose the higher score and consider the child AIM-qualified, rather than to seek additional information in order to reconcile the scores and make an informed decision about the best placement for the child.”
He continued, “If the child, for example, has limited proficiency in English, there is good reason to doubt the accuracy of the OLSAT score. If, on the other hand, there is nothing about the child that indicates a non-verbal test is appropriate, he or she may simply be very good at figural reasoning, but not measurably gifted in other ways—or, the test may have simply produced a ‘false positive’ through measurement error.”
Professor White stated, “Remember that because the TONI is unidimensional, it’s at best only giving you information about one piece of the intelligence puzzle. If it’s a compelling piece—a high score—that’s definitely a good reason to probe further, and try to get a fuller picture. But that one piece of information alone is not enough to assume the whole puzzle is solved.”
“A common strategy for combining multiple measures, like two or more tests, is to develop a formula that weights the scores together, and sets a cutoff for the combined score rather than one in isolation. In fact, this approach was the way an evaluator recommended the district use the OLSAT and the TONI several years ago, but the recommendation was never adopted. It might be worth revisiting,” he concluded.
—David M. Greenwald reporting
i have a lot of problems with tobin white’s point.
first, he writes, “TONI measures something very different than the OLSAT and yet is used interchangeably.” are they being used interchangeably? my understanding is that for low ses students or students without the language skills, the toni is being used because it measures their abilities far more effectively that olsat, which he seems to admit is culturally biased. as such, the tests are not being used interchangeably – they are being used to assess different populations of students.
he then argues, “is that the OLSAT tends to under-identify giftedness (a trend toward “false negatives”), while the TONI tends to over-identify giftedness (a trend toward “false positives”).”
however, since we know that not only does olsat under-identify, it culturally biases it, then using some of the other research, we should err on the side of over-identification.
he then says, “The way to address both these concerns “is to use these tests in combination with other measures before making a determination about whether GATE services are appropriate.””
but aren’t we doing that?
I think that it is logical to assume that for a particular student the “search and serve” practice would have been initiated because of “other measures,” such as teacher and parent observations. Certainly when a student’s OLSAT score falls within the test’s margin of error. Where it gets murky for me, is when the “search and serve” process was initiated based on risk factors.
Also, it may not be his point, but Dr. White may be suggesting, for any child who qualifies, to take a step back and re-look at the whole child. Seems like a good idea, since the program has become such a magnet for the competitive high achievers. Not everyone thrives in that environment.
My problem with Tobin White is that his written report, paid for by DJUSD, was crap – and would undoubtedly be trashed by competent peer review. What he now says in private email exchanges is unpersuasive. Oink!