Tag Archives: Assessment

Survivorship Bias

A hidden bias

Out of 90 charter schools that administered the New York State standardized tests in both 2011 and 2012, Harlem Link had the 8th highest average increase in English Language Arts (ELA) and Math scores.  This score improvement was amazing, fantastic, even inspiring.  And misleading—because of a small, relatively unknown factor called survivorship bias.

Survivorship bias is a statistical term for an indication that there is some hidden factor that excludes certain members of a data set over time—namely, part of a sample that was there at the beginning is no longer there at the end and does not count in the final analysis.  The smaller subset of those who “survive” over time might be better off than the original whole group simply because of who stayed and who left, not any value added over time.

Simply put, every year, at every school, some students leave, and their departure changes the profile of who takes the test from year to year.  Sometimes high-scoring students depart.  At other times low-scoring students depart.

If schools continuously enroll new students (and some don’t), the same factor impacts the student population for these incoming students.  At the end of this blog I chart a hypothetical situation in which survivorship bias shows how a school can appear to improve while not actually adding any value simply by not adding new students year after year.

In large systems, there is so much mobility that these student profiles tend to cancel each other out because of scale.  For example, the student population appears relatively stable from year to year in the third grade in Community School District 3, where 1,342 students in 30 school took the state English Language Arts exam in 2012.  But in small student populations like the one at Harlem Link, where only 52 third grader students took the 2012 exam, a few students entering or leaving the school with certain test scores can make a big difference.

When the state department of education releases test scores each year, however, it does not provide this or any other contextual background information alongside the scores.  I believe that this process penalizes, in the public eye, schools that continue to enroll students to replace those that depart.

(Partly) illusory gains

At Harlem Link, the fact that we only test in three grades guarantees that at least 1/3 of our students taking the tests each year will be different students than those who took it the year before.  Putting aside the variability in the state test from year to year, this rolling of the dice has influenced some dramatic swings in achievement that mean our school’s test scores have looked worse than the actual performance of our teachers in some years, and at other times (like this year) they may have looked better than they really were.

It turns out that the profile of our students who departed before the last school year was a much less successful one than the profile of the group that left the prior year.  In other words, we had to improve less to get apparently lofty gains.

In English Language Arts, we saw an improvement of 18 percentage points from 2011 to 2012, according to the state’s way of reporting the scores.  But since many of the students who graduated in 2011 or left for other reasons following the 2010-11 academic year performed poorly on the 2011 exams, the students who returned had a better passing rate by 10 percentage points than the original group.  In other words, more than half of our test score gains in ELA could be accounted for by attrition.

Now, I’m not going to say that I’m not proud of our scores or that they are not indicative of a powerful effort by talented and dedicated professionals.  I’m not even going to tell you that we didn’t improve our practice last year.  I think we have improved in that area every year, because we have been an honest, self-examining, learning organization.  But the wide swing in test scores and the state’s failure to describe enrollment patterns when reporting the scores masks the true story of a gradual, continuous march to improvement that is the real hallmark of the growth at Harlem Link.

Best practices often begin as difficult, controversial and seemingly impossible changes to “the way things are.”  Strong schools take the time required to plan, assess and tweak new initiatives until they become standard operating procedures.  The lack of information provided alongside scores obscures this type of growth, creating perverse incentives for schools to “push out” students who are low performers and to “quick fix” by whittling down large original cohorts to smaller groups of survivors, uncompromised by new admittees.

At Harlem Link, we have resisted these perverse incentives.  We have always replaced students who leave, for budgetary reasons (being a small, standalone charter school) and to serve a greater portion of the community starved for high-quality school choices.  Each year, we have encouraged some students who are particularly high achieving to leave a year early by helping them apply to competitive public and independent middle schools that only admit in fifth grade, reasoning that we’d rather lose their strong fifth grade test scores than see them lose an opportunity to get firmly on the college track a year ahead of their peers.  If we followed the short-sighted state incentive, we would not have urged four of our highest-scoring fourth graders on the state exams in 2012 to apply to and enter the Upper West Side’s highly sought-after Center School.  They were admitted and are all attending—a fact that may well push down our fifth grade test scores by as much as 10% next year—and we are thrilled, because we helped four more students living in a high-poverty environment to gain admission to this exclusive public school.  We also would not have pushed students leave after fourth grade in years past to embark on the independent school track by attending the East Harlem School at Exodus House and the George Jackson Academy in lower Manhattan.

In the context of reform

This issue has been raised before in the blogosphere, but not in a thoughtful manner.  Instead, it has been wielded as a weapon by those who are against the current strain of education reform.  It has been used to defeat the straw man argument that charters are silver bullets and to denigrate the success of networks like KIPP, which is another organization that deserves no such uninformed criticism.  (Each year, KIPP asks itself several questions in its annual internal reporting, including, “Are we serving the students who need us?”)

Because it is potentially embarrassing and might burst the balloon of so-called charter education miracles, this issue has also (to my knowledge) been ignored publicly by my colleagues in the charter community.  There are many groups of charter schools that go happily on their way winnowing down their large kindergarten classes, educating fewer and fewer students in each cohort each year, not adding new students and narrowing down their challenges as they deal with fewer and fewer “survivor” students well.  And those charters that benefit from network infrastructure and economies of scale can balance their budgets even while shrinking six to eight kindergarten sections down to three or four fifth grade sections.

I’m not passing judgment on those networks.  As a charter school founder who has been running a school for almost 10 years, I still believe that the charter experiment has been a profoundly positive one for the communities where such schools have flourished.  What I want is for the public to have some understanding of the context behind test scores, so alleged miracles can be put in their proper place, and year to year statistical swings that have nothing to do with a school community’s actual performance can be put into their proper perspective.

Hypothetical (with some assumptions): survivorship bias in action

In the example below, compare two schools that start out with similar student profiles.  School A replaces each student who departs.  School B does not.

Each year at both schools, a greater percentage of academically struggling students than successful students leave.  Each year at both schools, neither school is adding any value since no individual’s test scores are changing.

Because the entering students at School A are similarly academically disadvantaged to those who depart, its scores do not change.  School B’s scores improve more than 20 percentage points—simply by virtue of attrition, the decision not to enroll new students, and the mix of which students are taking the test each year.

School A = Enrolls new students continuously
School B = Does not enroll new students

School A

Grade

5

6

7

8

Passing students added

40

5

5

5

Failing students added

60

15

15

15

Passing students leaving

0

5

5

5

Failing students leaving

0

15

15

15

Total Passing Students

40

40

40

40

Total Failing Students

60

60

60

60

Pct. Passing

40.0%

40.0%

40.0%

40.0%

 

School B

Grade

5

6

7

8

Passing students added

40

0

0

0

Failing students added

60

0

0

0

Passing students leaving

0

5

5

5

Failing students leaving

0

15

15

15

Total Passing Students

40

35

30

25

Total Failing Students

60

45

30

15

Pct. Passing

40.0%

43.8%

50.0%

62.5%

 This post also appeared at GothamSchools.

Posted in Education Policy | Tagged , | 1 Comment

Time to Plan Math Curriculum

Thank you, State Education Department, for releasing a memo on August 17 prioritizing the new Common Core state math standards.  The memo details which standards are “Major,” meaning they will be emphasized on the new state tests in April, and which can wait until May and June.

Now we can start planning our curriculum for the year!

Wait a minute, school started at Harlem Link two weeks ago.  When the memo came out we were already in our fourth day of staff training, and we had rolled out our curriculum units, including those for math, for the year and our instructional priorities during the first two days of that training.

We are now analyzing this Education Department memo—after all, the state test is how the effectiveness of our curriculum and our interpretation of the Common Core standards will be judged—and comparing it to the arrangements we had made when we mapped out our math standards in June.

Did the state really want us to wait until August 17 to start planning curriculum for the year?  I don’t think so, but the tone of the memo vividly illustrates the year-to-year thinking that has long plagued the system.  The memo says, “Schools and districts are encouraged to use this guidance when reviewing local curricula and in designing their Grades 3-8 instructional programs.” 

Last year one of our biggest issues as a school was that we were designing our “local curriculum” and “instructional programs” during the year, right before or while teaching them.  This frenzied approach resulted from the need to shift quickly and wholly to state and Common Core standards, coupled with our commitment to a home-grown curriculum, not relying solely on off-the-shelf, pre-packaged programs. 

I’m not writing to condemn the Education department for issuing this guidance.  Actually, it’s more than I thought we would receive, since the state has promised that henceforth state tests would be obscured and unpredictable to prevent the gaming of the system that has characterized the recent testing regime.

Instead, I’m writing to illustrate the complexity of doing genuine education reform with a sense of urgency and seriousness when, for example, the state bureaucracy fails to sets its major requirements in a timely way.

This little memo is a big window into an important shortcoming of large systems.

Posted in Common Core Standards, Education Policy | Tagged , | Leave a comment

Are There Voices of Reason on State Testing?

The Best System We’ve Got

I had the most amazing experience in early April: the chance to sit down with teachers at my school and have an open conversation about the role of standardized testing in education policy today.

As always, I started planning for this hour-long seminar by thinking backward from an assessment, in this case the promised 2015 rollout of the Common Core-related PARCC exam in New York, one of two dozen or so states that are contributing to this test’s development.

What I learned in my session didn’t surprise me; my faculty members are insightful and passionate about their work, but they have no better answers than I do when it comes to two key questions:

  1. What better idea could replace high stakes testing and fill the gaping policy void it currently occupies?
  2. How do educators stanch the overwhelming tide of teacher and student “fear and punishment” that has grown around the testing culture?

The words “fear and punishment” come from one of two texts I used to frame the discussion, a recent blog post by John Merrow decrying the prevalence of test prep throughout the land in these weeks leading up to state assessment frenzy.  I juxtaposed Merrow’s angry words with those of conciliation of a teacher named AmaNyamekye, who last year wrote an essay describing her changing attitude towards standardized tests, now that she has had a chance to analyze how they could actually help her practice.

Nyamekye came to the same conclusion I did: the wave of standardized testing has hurt all of us, but the power to use it rather than cower from it is within us as educators.  Besides, in doing the important work of distinguishing low achievement from high achievement among students, and high functioning schools from ones that are hurting children, it’s the best system we’ve got.  Until we can make it better (enter PARCC?), we need to use it for all it’s worth.

An Absurd Obsession

For charters, the subtext of any thinking in this area is the reality that the decision, made for all charters at a maximum of every five years, whether to dissolve our school will be based on exams that the state is about to discard because, well, the testsaren’t good enough.  It’s absurd irony worthy of Jean Paul Sartre.

The richness of this irony includes one reason the state is tossing the exams: Reformers fear that current exam formats have narrowed the curriculum to such an extent that test prep is now trumping authentic good instruction.  As one of the flag bearers of the crusade against test prep, Mike Schmoker, laments in his recent book, Focus, “Scores can be artificially pumped up on a diet of 500-word passages and multiple-choice drills (which many students live on)” (pp. 114-115).  In other words, obsession over state tests has hampered good instruction.

The problem of test prep has now invaded even early childhood, an age group which had been spared the indignity of obsession over high-stakes multiple choice exams under No Child Left Behind.  The New York Times reported last week on a disturbing trend emerging from this year’s round of kindergarten test results for admission to public school Gifted and Talented programs: Test prep is beginning to determine the winners and losers even at this level.

Sacrifices

I am haunted by a conversation I had with a senior vice president for accountability during our authorizer’s annual visit this past winter.  (Let’s call him Bob.)  I was describing to him the sacrifices we have made for the first time this year in order to meet certain state test targets that are requirements in our charter–or, less mildly, to game the system.

As Bob knows from his many visits to Harlem Link over the years, we have always followed the principles embedded in our mission and our charter, even if doing so meant that we can’t guarantee hitting all of the testing targets.  For example, we have:

  • …maintained strong teachers in the lower (non-testing) grades for stability and consistency, even as we grew and added upper grades.  Building an upper grade program has taken years, just as it has in the lower grades, but we have had less time to do it because we began with only kindergarten and first grade and grew slowly.
  • …distributed resources disproportionately to our lowest achieving students (often meaning, our newest enrollees), not the ones who are “on the bubble” of passing the arbitrary state testing bar.  As a result, we have fewer kids than we’d like achieving the passing rate, but a great number of students”almost there.”  Those don’t count in the binary land of accountability, and neither does our significantly smaller percentage than state averages of the lowest achieving scores that has resulted from our interventions.
  • …resisted the temptation of buying off the shelf curriculum that other highly touted schools use to get “teacher proof” test scores.  Instead, we prefer authentic curriculum that is home-grown, guided by standards and student data, and written collaboratively by teachers and administrators.

The principles in our charter that have guided these decisions include the centrality of teacher voice in curriculum; child centered instruction; democratic leadership, meaning genuine input from all levels of the organization; and an interest in developing teachers and building a team over time.

So what was I saying to Bob?  This year–in publicly available decisions made by our board of trustees in consultation with our staff–we have admitted fewer students in the upper grades, in three cases this year turning away students who had left the school in the past and tried to re-enroll mid-year; put social studies curriculum development in the upper grades on the back burner to focus on the testing subjects (New York State stopped testing social studies in 2011); prioritized the students who are close to passing the state test but not certain to do so; and hired (while holding my nose) a famous test-prep company to provide dozens of hours of after-school tutoring for those prioritized students.

We made undeniably positive changes as well that were part of our organizational plan and independent of the state testing pressures, such as modifying roles on our leadership team to give leaders more time to focus on instruction and revising lesson planning expectations across the school.  But clearly, as I was telling Bob, our top priorities this year have been guided by the coming high stakes tests and the somewhat artificial achievement targets we agreed to meet when we were chartered.

“You must have mixed feelings about that,” Bob told me.

“Yes.  Of course.  While I like our general direction, I’m not satisfied with the aspects of our school that don’t show up on the state tests, and I want to increase our enrollment, not limit it.  The tests are emphasized to the point of being a distraction.”

Bob and I have had many of these conversations over the years.  But now that we have actually taken steps to follow the recipe, narrowing the curriculum and joining the testing frenzy, I felt the humanity behind Bob’s icy glare.

“Look,” he said.  “I agree with you.  I can’t defend this system.

“But it’s the best one we’ve got.”

Posted in Education Policy, Uncategorized | Tagged , | Leave a comment

Steroids and Bubbles

The sky is falling! 

That’s what some observers would have you believe after the scores students around New York State attained in standardized tests plummeted this year. At Harlem Link we do not want to confuse a sense of urgency with a counterproductive panic. We know that only long-term and comprehensive solutions are going to fix the problems that have plagued our schools for generations. 

Has the state of our national educational program gotten worse? The problems our schools face have been compounded by globalization, the technology revolution and a rapidly changing world, but let’s face facts: Our nation has never provided equitable education, not since compulsory schooling began to take hold in the 19th century. And before then – good luck, unless you were landed, male and white.

Speaking of race, are you worried about a racial achievement gap? (I am.) In my office I have a 1950 issue of Life magazine, on which a white girl graces the cover with the headline “U.S. Schools: They Face a Crisis.” Sixty years later, we’ve had wave after wave of educational reform driven by panic and hyperbolic assessment of this “crisis.” 

1950 Life Magazine Cover

Reflecting on these facts has helped me put the change in the state test scores this summer in their proper context.  In  sum, New York State Education Department (SED) commissioner David Steiner and Board of Regents chancellor Merryl Tisch acted with a courage and an integrity rare among public officals when they decided to ratchet down scores that had been demonstrably inflated over the past five to 10 years. They noted in a July press conference that the state tests had become increasingly predictable and unchallenging. The announcement included a promise to overhaul the state exams and make them more rigorous in coming years. Moreover, the Regents and SED would be holding all students to a higher standard for tests already taken this year.  

In recent years, New York City’s racial achievement gap had appeared to be steadily closing, at least if you believed the test scores, but overnight that gulf re-appeared in force. Suddenly there were heated reactions in the state educational community about the tests and what they had to say about student achievement. Did anyone really think that things had gotten much better?

The critics were merciless.  Michael Petrilli of the Fordham Institute was quoted in The New York Times the day after the press conference as saying, “The state test is completely unreliable.” Aaron Pallas, a Columbia Teachers College professor, said in a Times article the next day, “We just really can’t trust the state tests for judging whether the quality of education in New York City has really improved.” New York City Mayor Bloomberg appeared ruffled by the sudden drop in scores.  “Everybody can have their definition of what it means,” he said. Later, he infamously added: “The last time I checked, Lady Gaga is doing fine with just a year of college.”

The furor reached a head at the August meeting of the city Department of Education’s Panel for Educational Policy (PEP), during which parents protested the drop in test scores and the previously inflated scores so vociferously, bullhorns and all, that the meeting was shut down early.

My view is that for all the reforms, all the changes ebbing and flowing in curriculum and assessment of student achievement, all the fads and the gimmicks, things have not changed all that much since the “crisis” of 1950. Proficiency rates on state tests should not be the goal; student independence and success in higher education and in life ought to be the goal.  So I see this drop in test scores as just the popping of another bubble – not unlike the home run bubble created by steroid proliferation in baseball and the stock market bubble created by an unsustainable housing boom. Do these two graphs appear to have anything in common?

40 Home Run Hitters per team, 1924-2009

Dow Jones closing averages, 1924-2009

Down, up, down again, WAY up, and then, BUST!  If I were a betting man, I would bet that the New York City proficiency scores on 4th and 8th grade tests, if plotted over time, would show the same pattern. (I have searched the Internet, but this data is demonstrably harder to find than baseball and Dow Jones statistics.) 

As with the dreadful state of the economy, panicking in the face of these test scores will get us nowhere. If we are going to have lasting change, we need to ignore fads and focus on what will bring long-term improvement. In the wake of the housing meltdown, hucksters sprung up to “rescue” defaulting homeowners from their crushing debt, only to be prove to be just another bunch of scam artists. There are no quick fixes.  There are no shortcuts. 

In education, we know what works.  School by school, change is possible with a committed group of competent educators focused on a clear and compelling mission, a shared community emphasis on student goals, robust home-school communication and, finally, a clear vision to which everyone subscribes to make those elements come to life. Everything else – all the bells and whistles and promises and panics – is just another manifestation of the crisis thinking that, if obeyed, will send us back into yet another false boom and bust cycle.

Posted in Newsletter column | Tagged | Leave a comment

Theories of intelligence

During the college basketball season, a commenter on a sports website described a star student-athlete from my alma mater as “big, strong, agile and smart.”  I wouldn’t dispute that characterization, but I have some inside information about the player’s academic performance.  An esteemed professor of mine, someone I consider a mentor, confided that this scholar could be found staring into his Blackberry during the professor’s lectures, something you don’t want to do with this prof – both because you’d be missing out on a golden opportunity to absorb some wisdom and because you won’t like the menacing stare that would be sure to follow!

This situation once again raised a question I’ve wondered about since I was a child: What does it mean to be smart?  It turns out there is a lot of research and literature on the subject, and an endless number of opinions. 

The question is of vital importance in a school setting.  If educators begin with the assumption that there are some kids who are smart and other kids who are not smart, and these are fixed capacities that can only be mildly influenced by school, then there’s not a whole lot of good that a school can do other than shepherd each kid along his or her predestined path.  If a school treats a child as having limitless potential, whatever environmental, medical or psychological issues have begun to shape him or her, then the approach becomes very different.  That school will naturally have high expectations for all students and will challenge all kids to meet those expectations, to retain and synthesize what they experience in school.  In my view, this expectation that our knowledge and skills need not have limits is, along with natural human curiosity, the starting point of all learning.

I believe that the idea that intelligence is a fixed quality about a person, like height, is a dangerous and damaging concept.  I have observed students internalizing the negative (“I’m not smart”) when provided evidence of it more than internalizing the positive (“I am smart”) when they are provided evidence of that.  In the school setting, this phenomenon has often led to a shallow and, I think, misguided attempt to build students’ self-esteem regardless of the circumstances.  We can fight the very human tendency to fixate on the negative not by masking it, but by acknowledging that tendency and countering it by affirming the potential that we all have.

A series of town hall meetings last week reinforced this belief. Margaret and I talked frankly about this subject with groups of our students who were soon to take the state exams.  I showed them our school’s test scores from last year and how we hope they perform on the tests this year.  We compared their performance in 2009 to that of District 3, our local school district that is a diverse and fairly accurate snapshot of the city as a whole.  (District 3 includes our small part of Harlem and all of the Upper West Side.) 

The curious thing is that, when I asked the third graders who saw that our third grade had a 98% passing rate on the state math test last year, if that meant that our third graders were smarter than the District 3 kids who scored lower, they all said, “No.”  They implicitly understood that under other circumstances the other kids could perform better and knock the Harlem Link students off their perch.

But when I asked the same question of the fifth graders, whose cohort did not outperform  District 3 on last year’s exam (but still had a respectable 75% attainment rate), there was a robust mix of “Yes” and “No.”  Many, maybe even most, of the kids believed that the students with a higher achievement rate were actually smarter than the the Harlem Link kids because of the test results!  Our objective in this situation is to restore a sense in the students that they control their destiny, to instill in them the belief that their knowledge is what they make of it.

One of my favorite theories of intelligence is that of the psychologist Robert Sternberg.  He built the “tri-archic” model, describing the analytic, creative, and practical domains of intelligence.  The model suggests that we call on different aspects of our intelligence in different situations and that there is a fluidity to knowledge that is dependent on context.  I believe that if our fifth graders thoroughly understood this idea, their reaction to last year’s test scores would be, “Those kids beat our school’s scores on that particular day, on that particular test.”

Our challenge is not only to light the candle of learning and help the kids internalize facts, skills and habits, but it’s also to help them view themselves as creatures more complex than can a simple number can express.

Time and again I have seen supposedly smart people appear foolish.  We all witnessed that sort of behavior in the mortgage and banking crisis that precipitated our current recession.  On my nightstand right now is The Black Swan, by Nassim Nicholas Taleb.  This book all but predicted the economic crisis the year before it happened, when the global economy was going through what turned out to be irrational exuberance.  Taleb’s thesis was summarized by Bloomberg.com: “We’re all blind to rare events and routinely fool ourselves into believing we can predict risks and rewards.”

Congress and the federal education department also seem to be filled with smart people making not so smart decisions.  The people who brought us the No Child Left Behind Act in 2001 are at it again.  The president’s proposed reauthorization of the law (formally called the Elementary and Secondary Education Act) will eliminate the unrealistic demand that all students in the country be deemed proficient on state tests by 2014.  That’s all well and good – but there’s a problem.  The federal education department has begun another program, called Race To The Top, offering federal stimulus dollars to state education departments that show they will implement favored reforms, among them tying teacher evaluations to student achievement and supporting charter schools.  These are good ideas, but what’s disturbing – and foolish – is that there has been no public discourse on what went wrong in 2001.  Instead of a reasoned, well-informed discourse on the 2001 expectation and why it turned out to the unrealistic, we are on the next silver bullet that’s going to solve all of our problems. 

Harlem Link’s students?  We continue to pound the message to them, with the state tests on the horizon: You are not the sum of the numbers assigned to you.  Because you’re human, your intelligence can’t be fully measured by a test.  (Believe it or not, Alfred Binet, the creator of the first IQ test, would agree, as he was horrified by the use of his diagnostic test to label and sort people.)  The state tests, like the PSAT, SAT, SAT II, Regents exams and AP tests that our kids will take in years to come, are instead a chance to show what you know and know how to do in a certain domain.

That basketball player probably didn’t do so well on last semester’s final exam.  But he is intelligent, in a manner of speaking (it takes analytic, creative and practical skills to succeed in high level athletic competition).  In the classroom he made some not-so-intelligent choices or developed some not-so-intelligent habits.  He has made a choice to develop and express certain aspects of his intelligence, while allowing some other aspects to wither.  For the record, I think this choice is a terrible one, and it could have drastic consequences for him if his athletic plans fall through.

Our schoolwide attitude about testing is moving toward this notion of individual choice and the understanding that tests are a narrow and limited window into one dimension of knowledge.  This attitude is still forming, given that 2010 is only our third year of administering state tests and we’ve been adding new teachers each year as our school expanded.  Ultimately, I see Sternberg’s and Taleb’s ideas as supporting the triumph of the human spirit and the supremacy of individual choice.  Consistent with the great thinkers across human cultures and history, they would have us question our basic assumptions about what we see, hear and believe.  Finally, since our mission asks that we “empower children to taken an active role in learning,” the least we can do is teach them, encourage them, and ultimately trust them, to make good choices about their learning.

Posted in Newsletter column | Tagged | Leave a comment