How Do We Tell What's Working? Disrupting the Justice Evaluation Model

Subscribe to New Thinking on iTunes/Apple Podcasts →

On Google Play →

On Spotify →

Angela Hawken at our conference on community justice in May 2018.

Our collaborators are practitioners who have looked around their offices and the clients they were serving and asked themselves the “what if” question: What if we did it differently?

Identifying problems in the justice system and seeing whether reforms have actually made a difference require data and evaluation. But Angela Hawken, a professor of public policy at New York University, says too many programs are never properly tested, in part because the conventional evaluation process is too cumbersome and expensive. And she worries those barriers effectively give a small group of institutions and researchers a monopoly over deciding what becomes an "evidence-based" practice worthy of replication.

BetaGov doc — An example of a recent BetaGov trial (click on the image to expand).

Adopting the mindset of a more nimble startup, with support from private funders, Hawken founded BetaGov. It offers free and fast evaluations of public policy programs. Like the institutions conducting what she calls "Cadillac" research, BetaGov employs randomized controlled trials—the "gold standard" in evaluations—but unlike its Cadillac counterparts, BetaGov's experiments are designed to produce results quickly, and the ideas being tested generally come from inside the systems themselves—whether from practitioners, or even clients, such as incarcerated persons. Currently, BetaGov is supporting more than 200 trials across the country.

As part of our Short Answer series, Angela Hawken explains the role of randomized controlled trials.

Hawken, who describes herself as an impatient person, wants justice reformers to rediscover the value of learning from failure, launching multiple small-scale innovations, and taking note of where the results cluster, both positive and negative. "Some of my impatience is just making sure we're positioned in the right place at the right time to catch the learning."

The following is a transcript of the podcast:

MATTHEW WATKINS: Welcome to New Thinking from the Center for Court Innovation. I’m Matt Watkins. The focus of this podcast is generally on people taking innovative approaches to problems in the criminal justice system. But identifying those problems, and then seeing whether our solutions have actually helped, usually requires data. Our guest today, however, says too many programs and practices in the world of criminal justice have never been properly tested, and that’s in part because the evaluation process is too slow and expensive. Angela Hawken set out to solve this problem by founding BetaGov. With support from private funders, BetaGov offers free and fast evaluations of public policy programs. Right now it’s supporting more than 200 trials taking place across the country. Angela is also a professor of public policy at New York University and was a panelist at our recent conference on community justice in Birmingham, Alabama. However she joins me in studio today.

WATKINS: Angela, thanks very much for being here.

ANGELA HAWKEN: Thank you for having me, Matt.

WATKINS: So maybe just start by trying to introduce people to the BetaGov model, where it seems a little bit like you've set out to disrupt the previously cozy world of randomized, controlled trials. You make a distinction, I think, between what you call Cadillac research and the work that BetaGov engages in. So could you just explain that distinction for us?

HAWKEN: Sure. So the concern initially when we thought about the BetaGov model was that too many research projects were of this Cadillac model. They were expensive, they took a long time to do, and the stakes were so high that if they found that what they were testing wasn't especially productive, they kept lumbering along. When research is expensive, it's actually harder to shut it down.

So we thought we would create an alternative track of research. Think of it as exploratory research. On the front lines of innovation, where the research is inexpensive to do and can be shut down very nimbly if the outcomes aren't moving in the intended direction. These are mostly RCTs, randomized controlled trials. Not all, but our work primarily involves randomized controlled trials. But these are really exploratory trials.

Our collaborators are practitioners in the public sector who have looked around their offices and their buildings and the clients that they were serving and asked themselves the what-if question. What if we did it differently? And we just give them a vehicle to put a decent quality test around that question, and if it shows promising outcomes, we quickly replicate it somewhere else to see if the finding holds.

WATKINS: And do you want to just explain for folks briefly, what an RCT, which is I guess the acronym we're going to be using, or a randomized controlled trial, is? And why it's so important, particularly in this world of criminal justice evaluation?

HAWKEN: So randomized controlled trials are often referred to as the gold standard of research. And the reason is that you can draw what are called causal claims. You can say, the reason we're seeing this improvement in outcomes or a worsening in outcomes is because of the intervention that's being tested.

So in a randomized controlled trial, you start with an eligible population. That could be people, it could be law enforcement vehicles, it could be probation officers, it could be courts. And you randomize that eligible population into one of two conditions. The intervention condition receives the new policy or practice or program that you're trying to test, and the controlled condition is usually business as usual that we're going to compare the outcomes to.

But why RCTs, or randomized controlled trials, are so important within criminal justice is that criminal justice suffers from what we call strong selection biases. So you'll see many evaluations in criminal justice that compare, for example, treatment completers to people who did not go to treatment. The sorts of people who complete anything are different from the sorts of people who either don't start or drop out along the way.

And what that creates, then, is this problem that you really aren't comparing apples to apples. You're comparing different sorts of sub-populations. The nice thing about a randomized controlled trial is it levels the playing fields, that you really have an apples to apples comparison and can really compare the outcomes that you find.

WATKINS: It's my understanding that RCTs are sort of the gold standard in a sense precisely because they are really expensive, and they can be cumbersome. So I understand that you do see a role for what you're calling Cadillac research. But could we just get a little bit more into what you've done at BetaGov to make this model more nimble, as you see it?

HAWKEN: Sure. Well, we really looked to the private sector initially. If you look at well-performing companies like Amazon or Netflix or Google, they perform tens of thousands of experiments a year. And Jeff Bezos, the CEO of Amazon, said that he attributes the success of Amazon to the ability to do lots of tests. And he said the way they do that is by reducing the costs of tests so that you can do tens of thousands of them rather than dozens of them. And we really thought, well, let's learn from that model.

What makes a randomized controlled trial so strong is the ability to make that apples to apples comparison, not necessarily the cost involved. But what we do is we take a population, we have a BetaGov randomizer, so we oversee the randomizing process for our pracademics, who are central to the low cost of our approach. What a pracademic is, it's somebody working in a government organization who has a full-time job usually within that government organization who has raised their hand and said, “hey, I want to test this idea with you.” We very quickly put them through a webinar training event, explain to them what they can expect of us, so the resources we will provide them, and then what our expectations are of them.

But the result is, instead of me having a research team of 15 people, I now have this network of practitioners across the country who are part of the research team. And that no-cost labor very in tune, very in touch, able to access the data, this is their trial in the end, not ours, is how you can have a model that can go to scale producing relatively low-cost exploratory research. So that the stakes are so low for the agencies to allow the research.

And that's what we want to do. We wanted to shatter the monopoly over who gets to decide what is being tested. We really saw that monopoly as counter-productive. Very few people typically get to weigh in on what will become an evidence-based program or practice. And it's the people who control the research resources.

And our fundamental mission at BetaGov is to create government organizations that become learning organizations. Where staff are empowered to think about what they're doing, think about processes, and have a mechanism for having those processes tested.

WATKINS: I mean, as opposed to this credentialed Cadillac world, you've even opened up RCTs not just to practitioners, but in some cases, even prisoners.

HAWKEN: Oh, those are our favorite sorts of projects. We call those collaborative design. And the first work I was doing on that was in Washington state when I was in custody in a segregation unit. So in solitary confinement. And the gentlemen I was meeting with, they had so many ideas about how to improve the system. And they were all seemingly such worthy ideas.

Initially when I said I was going into solitary to solicit ideas from people incarcerated there, people thought that I'd be getting kind of trivial things. And everybody was going to complain about the food. And yes, they certainly were, and I don't blame them. But they really also were very thoughtful and submitted what became outstanding ideas. In fact, that then replicated from state to state, and that initial group meeting has turned into, I think, policy reform in several states.

People closest to the problems are also closest to the solutions, which means whether it's a front-line person in a government agency, a police officer, a corrections officer, or someone being served by the system, an arrestee or someone who's behind bars, they're closest to the problems and therefore closest to the good ideas. If we aren't listening to them, we're never going to make the sorts of progress that we should be making.

WATKINS: Are there a couple of examples of this sort of rapid cycle change that you'd like to highlight?

HAWKEN: Sure. One that I think is a fun example of collaborative design, and I mention collaborative design, there I'm talking about the relevant stakeholders coming together to think strategically about a new initiative to test. There was a nice example of that in Pennsylvania, and this is in custody, where mental health staff counselors, corrections officers, and this was a woman's facility, had got together. Women planned a strategy for helping women deal with escalating anxiety in a corrections setting. And it's called the chill plan, and you can see it on our website.

And this collaborative strategy, all that a program that cost them no money to come up with. And it's reduced negative outcomes quite substantially. And it's replicated well. We've since tried it in another facility. But what is really nice about it is the people who had most at stake here, the corrections officers in that facility, the counselors, as well as the women living there, together worked on a plan to meet their needs.

So much of what we do in the criminal justice system is arbitrary. If we could just stop for a second and be more purposeful. For example, drug testing. What is the optimal number of drug tests? We drug test people a lot. And we don't know anything about what is the optimal number of drug tests to do, if we should be doing them at all. We don't know. Is zero the magic number? All of these things should be purposeful.

Custody stays, when we put people behind bars. Whether it's for three days or 30 days or 60 days or 90 days. Those aren't research-driven numbers. This is all just some determination that's relatively arbitrary. So we have it right now in Wisconsin, a pilot looking at the dose-effect of probation. How long should probation be? Is six months better than a year? People are often tripped up just because of a probation area experience. We should be more mindful about these numbers. These numbers affect resource allocation and it affects the people within the system, and we have to know how best to serve them. So all of these things can be subject to randomized controlled trials.

WATKINS: Yeah, I mean, numbers are very powerful, obviously, and I think can be scary for some people occupying positions in the system. Data can be used to force change, it can be used to shame organizations sometimes. So I'm wondering, the model that you guys use at BetaGov in terms of dealing with the fact that you will have people who are going to be scared of having their data known.

HAWKEN: I have to say, we've been very lucky in that our pracademics work with us, right? So we're within the organizations, and our goal is always to help them improve their practices and their procedures. We do not engage in gotcha research. So, if you're at the end of a pilot and for some reason you feel that you want to anonymize the jurisdiction, they have the opportunity to do that. They never do. They have that opportunity, and I have to say, what's nice about our pracademic network is that they have been so willing to follow the data, irrespective of what the data has concluded.

And something that's been really central to our work in the beginning is this idea of, which is so unheard of in the public sector, is the idea that nothing good can happen without the ability to fail joyfully. And being willing to understand that the process is what matters. The learning process is what's going to improve outcomes, rather than the result of any particular pilot. If you can get folks on board in the learning process, they're more willing to engage with a data, whether it's a positive outcome or a negative outcome.

WATKINS: I'm interested a little bit in how you got into this work yourself. I saw that you started out as a labor economist, which isn't necessarily the most obvious stepping stone for trying to shake up the world of randomized controlled trials. So could you talk a little bit about how that took place?

HAWKEN: So in my previous, previous, previous history, I was working as an econometrician and mostly working with wonderful data sets and doing quasi-experimental research. I really became interested in these nimble RCTs having come off a very large study where had found a really good result in one jurisdiction and absolutely did not replicate it when we replicated in other jurisdictions. And that had taken many, many years. And at the time, I thought, wow, this is really not how we should be learning. We shouldn't be doing these big, expensive studies in the front end. We need to be doing these microstudies, right? These exploratory studies. Let's make sure we're getting it right in a jurisdiction before we bring in the Cadillac research or bother to do that.

There was a paper, and this is going to sound very dreary, but there was a paper written by a roboticist from MIT in the 1980s and they talked about space exploration and the problem with space exploration. And the paper was really formative to the thinking behind BetaGov. What they said is, we have these missions going off into space. And the mission fails miserably, and then everyone's mad at this exploration endeavor.

And what these scientists at MIT said, instead, what we should do is come up with these little bugs, these kind of space exploration bugs and they weigh a few pounds each, and we blast thousands of them off into space. And some of them are going to crash and die, and who cares, because they cost a few thousand dollars. But some of them are going to raise their antennas and say beep, beep, beep, carbon, or whatever you're asking them to detect. And we can look out into space and see the signals of promise.

And I thought, wow, what if we did that in the government sector? We blasted out thousands of innovations and then let the promising ones cluster. And let's not even bother sending out the big mission in the area that's just dark and we're getting no signal from. And we do that. We tend to send out the big mission first, then find it's dark, and then get mad. We need to change how we do this. That doesn't mean there's not room for the traditional academic model, of course there is. But it's a parallel track.

WATKINS: Is there, though, a particular challenge to dealing with criminal justice data? I mean, I've heard a lot of complaints by people who work in this field about the quality of the data, the organization of the data, and as we know, there is no unitary criminal justice system per se. It's literally thousands of, often, overlapping agencies. That must be something of a challenge for what you're trying to do in terms of making things more nimble.

HAWKEN: It is. For us especially, because we do low-cost research. So we really do rely, primarily, on administrative data.

We have a project right now we're putting together with prosecutors, and it is shocking just how most, even large, criminal justice agencies have no idea who's in their system, how long they've been there, what they're there for, and can't even start to produce the most basic descriptive statistics on their caseloads.

I do think I'm feeling quite optimistic in this regard. I used to just be mad at it and shake my fist at the sky. Data's so bad, data's so bad. We really are getting to a point now where, with sophisticated, new thinking. Sophisticated, new allies within the criminal justice system research world. And I'm thinking, with our team now, we've grabbed astrophysicists and computer scientists and unusual suspects, coming together to think through A, we have this messy data to begin with. What can we learn from it responsibly and then what does good data management and record keeping look like moving into the future? And I think as some large philanthropists become interested in this issue, I'm really hoping that they'll start to orient their attention to this issue of data.

If we had well-behaved data management, what we'd be able to do would be, we would be so much more productive than always having to grapple with the data issues.

WATKINS: At the same time, just to look at the other side of the coin I guess, is there a danger almost of overselling the promise of data? I mean, a kind of almost positivistic idea that if we just get all the right numbers, everything will make it all equal, everything will be completely transparent? So, I suppose that's my concern a little bit. Because not everything can be expressed in numbers. Numbers can only tell us so much, can only see so much, for example.

HAWKEN: Oh, absolutely. I think you always have to be mindful of. And especially when we think of unintended consequences, right? We might see an outcome move, we might be moving the dial on one motor that looks really promising, but some equally important social factor that isn't in that data set might be moving in the opposite direction. We always want to be mindful of that.

And I think that's the importance of people, right? Even when we think about our work, we primarily do RCTs, not exclusively. Well, not everything can be solved by bringing an RCT to bear on the issue. We talk about our work in a few ways, and that is, we do the data-driven work where we let the data speak for itself. And then a lot of the work is person-driven. You really still have to be in the field, on the front lines, experiencing these programs and practices firsthand to get a better sense of what the potential spillover implications could be too.

But yes, I agree. Data's a piece of this. It's not everything.

WATKINS: My sense I have of you is that you're an impatient person in the best sense of the word, in the sense of being impatient to make change and make a difference. So do you have a vision of where you would like BetaGov to be in five years from now or 10 years from now?

HAWKEN: So, some of this is speed. I think what is the primary distinguishing factor with BetaGov is the speed with which we're able to do some of this work. And that does probably extend from my somewhat impatient personality. But it's also because of the reality in the field. If you think about policy opportunities, there's this window in time where there's this opportunity to intervene, because the stakeholders are enthusiastic to do it now. And if you wait six months or a year or 18 months for the research funding to be in place, that window of opportunity is going to close. So some of my impatience is just making sure we're positioned in the right place at the right time to catch the learning. And if we can't orient quickly, we're going to lose that opportunity to learn. So some of that is that.

And we're now actually rebranding as BetaHub, which is going to include nonprofits. There are so many nonprofits doing really important work within criminal justice. Many of them really struggle with the data issue. Many of them have no idea whether their practices are optimized. So we'd like to work with them to improve their operational efficiency too.

So we have lots coming down the line. I know it's, as I said, I'm a terrible businesswoman. Everything we do is at no cost, so the onus is on me to keep this effort funded. And we hope to do so, because we love what we do. And it turns out that zero dollars is the right price. I know this sounds odd, but for exploratory research, if we're really going to get good at isolating those innovations that do make a difference, we have to do lots of tests to accommodate all the failure that's going to happen. Things that aren't going to make a difference, right? So we have to do lots of tests, and the only way to do lots and lots of tests is to make it inexpensive for the agencies to allow those tests to happen.

And when no money's at stake, people are also more willing to fail. And that's so important. Because there's failure all around us. Just no one's willing to ‘fess up to it. And we'd like to change that.

WATKINS: Well, I really look forward to hearing more about what you change going forward. And thank you so much for joining me today.

HAWKEN: Thank you so much. It was fun to meet with you.

WATKINS: I’ve been speaking with Angela Hawken. Angela is the founder of BetaGov and a professor of public policy at New York University. You can learn more about her work and see a video of her speaking at our recent community justice conference at our website: courtinnovation.org. Technical support for today’s episode has been provided by the great Bill Harkins and our theme music is by Michael Aaron at quivernyc.com. This has been New Thinking from the Center for Court Innovation. I’m Matt Watkins. Thanks for listening.

June 2018