Education

Teacher evaluations: Too much science, not enough art?

by Robert A. Frahm March 19, 2014 @ 12:05 amUpdated September 4, 2022 @ 10:24 pm

Evaluation featured image

The first in a series from

Teacher evaluations: Too much science, not enough art?

by Robert A. Frahm

Grading the Teacher: The shifting standards of a scrutinized profession

Robert A. Frahm / Special to the Mirror

Jason Bluestein, principal at Fairfield’s Burr School, meets with third-grade teacher Alison Taylor in a conference required under the state’s new teacher evaluation guidelines.

“[A] strong body of evidence now confirms what parents, students, teachers and administrators have long known: effective teachers are among the most important school-level factor in student learning…”—State Department of Education handbook on teacher evaluation

The first in a series. The second piece focused on the challenges to getting top students into the profession. The last piece was an in-depth look at education’s revolving door.

FAIRFIELD – As teacher Alison Taylor conducts a poetry lesson for her third-graders, veteran principal Jason Bluestein watches and listens closely, scratching notes into a spiral notebook – a process he will repeat again and again this year, more often than ever before.

Here in Room 216 at Burr School, Bluestein’s hour-long observation is just one step under a controversial, time-consuming new statewide evaluation system that requires schools to rate teachers like Taylor not only through classroom observations but on whether their students meet specific academic goals.

The system, required under state law, has gotten off to a rocky start, prompting complaints from educators across Connecticut. According to some, the system, which ranks teachers in four categories from “below standard” to “exemplary,” has left educators worried about the complexity of the process, uncertain about its accuracy, and buried in paperwork.

It has also thrust Connecticut into a simmering national debate on whether schools, using student test scores, can develop a truly reliable formula to measure effective teaching – a job that many consider as much an art as a science.

“The whole system is just predicated on a number here, a number there that’s going to tell you that a kid is learning … and learning is so much more complicated than that,” said Bluestein, who, like principals across the state, has found himself mired in the details of the process. After visiting Taylor’s classroom, he reviewed more than four pages of notes, typing his observations into one of a series of forms in a software program Fairfield is using to meet the requirements of the new law. It is one of 22 forms that must be completed by the end of the year for each of Burr’s 29 teachers, he said.

“There is good work to be had in parts of this process, but it’s just way too cumbersome,” said Bluestein, who estimates he is doing three or four observations a week.

A University of Connecticut study of pilot evaluation systems in 14 districts last year found similar complaints, with some principals reporting working on evaluations “on a near-daily basis.”

Teachers, too, reported spending more time on evaluations.

At Burr School, Taylor, like other teachers, spent hours outlining goals for her evaluation plan. Each teacher also is required to fill out reports both before and after formal classroom observations and must take part in midyear and end-of-year evaluation reviews.

“It’s a lot of input into the computer,” Taylor said. “That time has to be taken from something, and, unfortunately, it takes time from our instructional planning time.”

The complaints from educators became so frequent that state officials, led by Gov. Dannel P. Malloy, in January agreed to grant schools more flexibility – allowing them, for example, to scale back the number of observations required for experienced teachers who maintain satisfactory performance records.

State Education Commissioner Stefan Pryor said the state expects to continue fine-tuning a program that is still in its first full year. “We’re very proud of the fact…we have revised our system instead of simply defending it,” he said.

The latest revisions could provide some relief next year for principals such as Bluestein, but Fairfield officials say there are no plans to reduce the number of observations scheduled for the remainder of this school year.

Setting student goals

Aside from the paperwork, a key area of concern has been the setting of student goals, including test score targets. According to the UConn pilot study, teachers and principals reported getting little or no training on the process. As a result, some “are selecting far too challenging targets while others are choosing far too easy,” UConn reported.

Two factors for evaluation

Under Connecticut’s new teacher evaluation guidelines, teacher ratings are determined by a formula that weighs two major components: Teacher practice and student outcomes.

Teacher practice scores are based largely on classroom observations, with evaluators scoring teachers on a 1 to 4 scale on 17 separate elements, such as “Promoting student engagement and shared responsibility for learning” and “Planning instructional strategies to actively engage students in the content.” Parent surveys may also account for a small portion of this score.

Scores on student outcomes are based on how much progress students make toward individual goals, such as test score targets, established by each teacher early in the school year. School-wide performance or student feedback also is factored in.

After averaging, combining and weighting the scores, evaluators rank teachers on each of the two components according to the following scale:

Range	Rank
50 – 80 points	Below Standard
81 – 126 points	Developing
127 – 174 points	Proficient
175 – 200 points	Exemplary

State Department of Education evaluation handbook

Taylor said teachers feel more pressure and may be more conservative in setting targets for their students because those scores are directly connected to the evaluations. “If we don’t meet the goals, it’s kind of a negative,” she said. “In the past, we kind of aimed high.”

The setting of growth targets for students “is, in most cases, inherently arbitrary,” the UConn study said. “Should 100 percent of students score a 70 percent on an exam or should 70 percent of students score at 100 percent? If half the students fall below a certain performance level at the beginning of the year, what percentage should reasonably be expected to meet it by the end of the year?” the report said.

To a large degree, the movement linking student test scores to teacher evaluations is rooted in the Obama administration’s education policy, first as an element in the Race to the Top competition for federal funds and later as a requirement for states seeking waivers from the rigid mandates of the federal No Child Left Behind law.

“The whole idea of using student test scores…is really tricky,” said David Title, Fairfield’s superintendent of schools. “It’s conceptually appealing, but it’s very difficult to do technically…There are so many different variables that impact student achievement….What you’re not able to do, in my view, is prove cause and effect.”

Nevertheless, the idea has gained traction. Four years ago, New Haven schools won national attention when the district and the teachers’ union developed an evaluation system that uses test results as a factor in rating teachers. Since then, dozens of teachers have resigned or been dismissed as a result of the evaluations. Last year, 20 teachers, about 1 percent of the workforce, left the district after receiving poor evaluations.

“I won’t say [the system] is perfect, but it’s light-years ahead of where we were,” said New Haven Federation of Teachers President David Cicarella.

Most teachers support the idea of removing colleagues who are found to be obviously ineffective, Cicarella said. “We all use the same litmus test…,” he said. “You hear teachers say, ‘Would I want my kid in that teacher’s room?’ If the answer is no, we’ve got to do something about it.”

He said that rigorous, multiple observations are key to the system. He agreed that the observations are time-consuming, and that principals can be overwhelmed if they have too large a caseload, but said, “Compared to what they used to do – go in there 15, 20 minutes, half an hour, write up three or four paragraphs and that’s your evaluation – that’s what got us in this boat to begin with. The [former] evaluations were so superficial.”

In New Haven, officials eased the burden by computerizing the reporting process and training veteran teachers as “instructional managers” to assist in conducting evaluations, he said.

As for using student data in the evaluations, Cicarella said the New Haven system, like the state model, requires teachers to set their own individual targets for student growth based on the composition of their classes and factors such as previous academic performance. “At the end of the day, we have a responsibility for students to learn,” he said.

In Connecticut, the movement to hold teachers more accountable for student growth gained momentum four years ago when lawmakers passed a sweeping school reform law calling for a new evaluation process. Today, 41 states have laws linking evaluations to student performance, according to the National Council on Teacher Quality, an advocacy group promoting the use of student performance as a measure of teacher quality. In 19 of those states, including Connecticut, student growth is the predominant factor in evaluations, the council reports.

“It’s a huge policy shift,” said Sandi Jacobs, a council official. “We think it’s absolutely appropriate that evidence of student learning should be part of teacher evaluation…but it can’t be the only thing.”

How we got here with teacher evaluations

Jacqueline Rabe Thomas and Alvin Chang / CT Mirror

A three-year study sponsored by the Bill & Melinda Gates Foundation concluded last year that it is possible to identify teacher quality by a combination of classroom observations, student surveys and student achievement growth. Officials who designed the Connecticut model – which includes a blend of observations, student growth and surveys of parents and students – cite the Gates study as part of the research supporting the new approach.

Nevertheless, the use of student testing gains in evaluating educators has drawn skeptics. Noted education historian Diane Ravitch called the practice “junk science” in a foreword to a recent special issue on high-stakes teacher evaluation in the academic journal “Teachers College Record.”

Ravitch, a professor at New York University, wrote that judging teachers largely on student test scores is an “untried theory…[that] wreaks havoc on our schools and our teachers…[and] will squander billions of dollars that should have been spent in classrooms, or in funding research that conceptualizes more fully the many and varied aspects of good teaching.”

…judging teachers largely on student test scores constitutes “junk science.” It’s an “untried theory…[that] wreaks havoc on our schools and our teachers…[and] will squander billions of dollars…”

— Diane Ravitch

Education historian

Teacher ratings can fluctuate because of changes in the composition of classes, she said. “This year’s highly rated teacher is only average or worse than average next year. A district gives a teacher a bonus this year, then fires her the next,” she wrote.

In Connecticut, the state is changing its annual statewide test and is seeking federal approval to allow schools to delay until the 2015-2016 school year the use of that test in evaluations. This year, teachers and their evaluators must decide individually which other standardized tests or classroom exams they will use to measure their students’ academic growth. They also must predict how much progress they expect their students to make.

“How well are teachers really equipped to make these predictions?” asked Title, the Fairfield superintendent.

At Burr School, teachers are choosing among various standardized tests as part of their evaluations, but the school is changing some of its curriculum, and the tests “are not really caught up with that yet,” said Lisa Sherman, a math and science specialist who has worked with teachers on setting their evaluation goals.

“It’s really hard for teachers to write a goal based on a standardized assessment [that]…doesn’t really match the curriculum,” she said.

Nevertheless, Pryor, the education commissioner, said that allowing teachers to set individual classroom goals and student growth targets is the heart of the Connecticut model – unlike systems in other states that prescribe specific growth targets.

“What could be more important than the fact that teachers and supervisors are sitting down and giving appropriate time to the development of goals regarding the growth of youngsters’ achievement?” Pryor said. “That’s exactly what ought to be happening.”

What does ‘proficient’ mean?

Some educators question the accuracy of the system or fear that schools will be so focused on complying with the many evaluation requirements that they will lose sight of the real purpose – to improve teaching. “If you have to have the very latest software program just to calculate teacher effectiveness, that’s a signal right there you’re over-engineered,” said Title. “A false sense of precision comes with this.”

Matrix for determining summative teacher rating

The two major factors of a teacher’s evaluation — student growth/development and teacher observation ratings — are combined using this matrix. A “4” means exemplary, a “3” mean proficient, a “2” means developing and a “1” means below standard.

Student Outcomes Related Indicators Rating

4

3

2

1

Teacher Practice Related Indicators Rating

4

3

2

1

Rate Exemplary

Rate Exemplary

Rate Proficient

Gather Further Info

Rate Exemplary

Rate Proficient

Rate Proficient

Rate Developing

Rate Proficient

Rate Proficient

Rate Developing

Rate Developing

Gather further info

Rate Developing

Rate Developing

Rate Below Standard

Source: The State Department of Education handbook on teacher evaluation

The UConn report found wide variations in ratings of teacher quality among the 14 pilot districts studied last year. The small Columbia school system classified 76 percent of its teachers as “exemplary” and the rest as “proficient” but had no teachers in the categories “developing” or “below standard.” Meanwhile, schools in Capitol Region Education Council (CREC) rated 15 percent of their teachers as exemplary, 76 percent as proficient, 9 percent developing and less than 1 percent below standard.

But does “proficient” mean the same thing in Columbia as it does in CREC schools? Or Norwalk? Or Bridgeport? Would a “developing” teacher in one district receive the same rating in another?

“There’s a concern about consistency,” said Columbia’s superintendent, Lol Fearon, who noted that school systems use different tests and that some test data was not yet available when the pilot districts’ results were reported to UConn.

With personnel decisions, including termination, on the line, the accuracy of evaluations could be crucial. One superintendent, New London’s Nicholas Fischer, has told state officials that the guidelines are too vague and could lead to legal challenges.

“The distinguishing characteristics between the [rating] levels are often fuzzy,” he said. “It would be very difficult for teachers to know what exactly it is you expect them to be showing you in the classroom.”

Overall, the 14 pilot districts in the UConn study rated almost no teachers below standard and just 4 percent in the developing category – figures that were questioned by Robert Rader, executive director of the Connecticut Association of Boards of Education.

“It seems to me in any occupation, [to say] that only 4 percent are in need of development…would be very rare,” Rader said. “I’m not looking to fire teachers – I think the vast majority are good – but I do think 4 percent is a very low number.”

Rader, a member of a state committee that developed the evaluation guidelines, said, “I want to make sure what we’re doing is really telling the true story of what’s happening in our schools.”

Looking for failure?

Teachers’ union officials on the same committee bristled at Rader’s question.

“Evaluation systems should not look for failure,” said Mark Waxenberg, executive director of the Connecticut Education Association, the state’s largest teachers’ union. “I don’t want us to [create] a system that inhibits teachers from taking risks for fear of being tainted by an evaluation tool.”

Among those who have fielded complaints from educators is state Rep. Andy Fleischmann, co-chairman of the legislature’s Education Committee. Fleischmann said he understands the recent decision to ease some of the evaluation system’s requirements, but he thinks teachers and administrators will become more comfortable with the process over time. He remains a supporter of rigorous evaluation.

“It makes sense to me over the long run that we’d make sure teachers are aware of their students’ progress, and that some portion of their evaluation would be based on student academic growth,” he said.

“It makes sense to me over the long run that we’d make sure teachers are aware of their students’ progress, and that some portion of their evaluation would be based on student academic growth.”

—Andy Fleischmann

State Representative, House Chairman of Education Committee

One of the biggest tasks for state officials, as the evaluation system unfolds, will be to convince educators that the new evaluations are, as Pryor says, “far better than many alternatives…[and] a big move in the right direction.”

At Burr School, Bluestein did not hide his frustration with the process. He said he believes in school reform and does not want to return to the status quo, but he added, “I think the system is just not doing what it’s intended to do.”

The UConn report concluded that the state’s evaluation model has the potential to improve schools but needs more work, including further examination of the relationship between evaluation and student achievement.

One obvious change has been the increased number of classroom observations by administrators. Lisa Sherman, the math and science specialist at Burr, said she hasn’t undergone any observations since she became tenured eight years ago. Like other specialists in Fairfield, such as guidance counselors and school psychologists, she is exempt from the process this year but will undergo observations starting next fall.

Although the amount of time and paperwork associated with the system has ratcheted up the stress among colleagues, Sherman said, “I do think the process of being observed and getting feedback and having conversations…is good.”

Younger teachers such as Taylor – who is in her third year of teaching and not yet tenured – have been accustomed to more frequent observations. After teaching her recent poetry lesson, Taylor met with Bluestein for a post-observation conference – something both she and Bluestein described as the most valuable part of the evaluation process.

In a friendly, collegial conversation in Bluestein’s office, the two exchanged ideas about what worked and what strategies Taylor might try in her class of 25 children.

Taylor said she was pleased with how the class responded to the poetry lesson but would like to find a way to spur even more interaction and conversation among the children themselves.

“I’ve been trying for a long time to have it be more them and less me,” she told Bluestein.

“It’s an experimentation thing,” Bluestein replied. “Every class is a little different,” he said, suggesting that she might get children to work more independently by posing a problem directly to them and asking, “How do you want to solve this?”

By the end of the year, Bluestein must rate Taylor on each of 17 factors identified in a state handbook on the evaluation process – factors such as “Promoting student engagement and shared responsibility for learning” and “Selecting appropriate assessment strategies.” Those ratings will be combined with ratings based on the progress of Taylor’s third-graders as part of a complex formula to give Taylor a final score on a 200-point scale.

As he wrapped up the 20-minute conference with Taylor, Bluestein reassured her.

“Nice job, as always,” he said.

“I have to fill out my form…” he told her. “I would say for your paperwork, just keep it very brief.” ♦

Robert A. Frahm, a former reporter for The Mirror and, before that, The Courant, has written about education for more than 40 years. His numerous writing awards include the nation’s top prize for education reporting from the Education Writers Association in 1983 and 1996 and the 1996 Master Reporter Award from the New England Society of Newspaper Editors. He is a former high school English teacher.

Leave a comment

You must be logged in to post a comment.