Let’s get tough on crime by solving more crimes

Choosing between certainty and punishment

Aug 03, 2025

It is hard to think of an idea with more staying power, and one with less empirical support, than the idea that the way to be tough on crime is to lock lots of people up. The idea is simple and intuitive: we should lock up the bad guys for as long as we can to keep our streets safe. The trick is that we are pretty bad at identifying the bad guys, so we spend a lot of money locking up a lot of people without making ourselves much safer. What I want to propose here is a much more effective way to be tough on crime.

Let me just give you the bottom line:

‘Lock ‘em up’ requires that we have a system that is both good at identifying who commits a crime and also at figuring out how to keep those folks from committing another crime. Our current system is not great at either.
Detering crime is a key objective of the justice system and among the possible deterrence approaches, improving the certainty of detection is consistently shown to be more effective than increasing punishment.
As we have become more punitive as a society, the certainty of detection (figuring out who committed a crime) has declined. In essence, at the highest levels of policy, the macro levels, we have chosen to pursue punishment over certainty.
A better way to be tough on crime is to focus more resources on identifying and catching criminals, and less on punishment.
Share

How good is the justice system at figuring out who is a serious, chronic criminal and who is not?

I spent a couple of years as the visiting science director in NYC’s justice agency in the late 2010s. At the time, reducing the size of NYC’s jail—Rikers Island—was a high priority. Rikers, as has been well documented, has been the home of all sorts of injustices, and reducing its reach or closing it entirely is broadly popular in the city. By the late 2010s, the average daily population of Rikers declined from about 15,000 a decade earlier to about 7,500.

Now, there was a big assumption associated with this decline, one that was widely held. And that was: as the population of the jail declined, those who remained were the worst of the worst. This was not political positioning; it was just a general belief. I heard it from every corner of the policy world. The reasoning went that, of course, as fewer people are held in jail, the system—the courts, the prosecutors, the police—will ensure that only the really bad people remain locked up. If we lock up fewer people, only the worst will continue to be locked up.

Whether this was true or not is knowable. So, we conducted a simple empirical test. We analyzed the criminal histories of the people who were in jail when the average daily population was 15,000 and compared them to the criminal histories of people detained when the average daily population was 7,500. And what we found was… no difference.

When we looked at the criminal history of the average person in jail, it did not matter if there were 15,000 people in jail or 7,500. Their rap sheets were, all else being equal, the same. The average criminal history was so similar that it looked for all the world as if the system had just randomly selected every other person from the 15,000 and assigned half to jail and half to not jail.

Anyone who has conducted social science research can point to the holes in the analysis. For one, the test itself is a little kludgy. You cannot prove a null hypothesis (that there is no difference between two groups). You could only fail to reject the null (that is, you can fail to reject the hypothesis that there is a difference). And, while we soundly rejected the hypothesis that the two groups were different, that is a different thing from proving they are the same.

Also, this was a test of one place, at one time, so generalizing it to the rest of the world and to other times requires a pretty big leap in logic.

All that said, and though it is not a perfect test, it is a helpful gauge of how effective the justice system is in sorting people into the ‘very bad’ bucket, or not.

In my view, the evidence was compelling that as the system shrank, NYC was not simply detaining the worst of the worst.

This is one critical reason why ‘let's get tough on crime by locking lots of people up’ is a poor policy choice. It assumes that the criminal justice system is really good at sorting people into the "bad guy" and "not bad guy" buckets. This analysis suggests the criminal justice system is not good at that sorting. Not at all. This is not good for equal justice, and it is not good for public safety.

But the fact remains that lots of people, maybe a majority of people, want a tough-on-crime policy. And this is not a traditional left-right split. I read a lot of YIMBY blogs, for instance, and these folks who are definitely center-left on most issues—housing, state capacity, etc.—suddenly move center-right when crime enters the discussion. They are interested in policing solutions to crime problems and…that’s about it. These are folks who are very open to research-based best practice and empirical evidence on lots of issues, who suddenly get very closed-minded when crime comes up. And there are lots of people like this in all corners of policy discussion.

Finding an Effective Alternative to ‘Lock ’em up’

But if ‘lock ‘em up’ does not work and people want to be tough on crime, it seems reasonable to try and find an alternative approach that is actually tough on crime.

And that approach should be to focus on solving crimes.

The second critical reason why locking lots of people up is an ineffective way to be tough on crime is that we do a very poor job in America of identifying the perpetrators of crime. In the majority of serious person crimes—assault, robbery, rape, shootings, and homicide—we never find the perpetrator. Obviously, we can’t lock people up if we never catch them.

So, how bad are we at finding perpetrators? Let’s look at two measures: what percentage of cases are ‘cleared’: e.g., how often do the police know who did it? And, how well do peer nations perform at clearing cases?

Clearance Rates

I am going to tiptoe a little bit around clearance rates because they are the black hole of criminal justice statistics. There are two big problems. One is that the year a crime happens and the year a case is solved are often not the same, so you get some wild clearance rates (Jeff Asher has a nice discussion of this). The other is that cases can be cleared exceptionally, meaning that although there is no arrest, the police are sure they know who did it.

But still, it is probably the best existing measure of police performance. And a fair measure of our collective commitment to solving crimes. Another way to say it is that high clearance rates suggest that solving crimes is a priority; low clearance rates suggest it is not. Declining clearance rates suggest a declining commitment to certainty of detection as a policy priority.

Clearance rates are declining.

Source: Murder Accountability Project. https://public.tableau.com/app/profile/thomas.hargrove/viz/UCR1965-2019/HomicideClearances

Now, these data are just for homicide, but they do reflect more than one million homicides over the time series, so it is a big sample, albeit a specific one. The overall solve rate is about two-thirds but shows a decline from 80% of homicides cleared in 1965 to less than 60% in the 2020s.

Peer Nations

In the 2024 Annual Review of Criminology, Philip J. Cook and Ashley Mancik describe the empirical relationship between homicide clearance rates and the crime decline. It is a thoughtful and accessible place to rigorously dig into the issues I can only touch on here.

Cook and Mancik briefly address the question of how the US compares to international peers—a stubbornly difficult question—and conclude that the US compares poorly.

A careful analysis of four nations in Western Europe utilized harmonized data for the 5-year interval 2009–2014; it defined clearance as arrest or exceptional clearance. This study reported clearance rates of 98% for Finland, 95% for Switzerland, 83% for Sweden, and 77% for the Netherlands (Liem et al. 2019). Needless to say, all these rates are substantially higher than those in the United States for that period, which hovered around 65%, with a low of 62.5% in 2012.

Choosing Between Punishment and Certainty

Overall then, the US has done an increasingly poor job of solving crimes over the last few decades. What have we gotten better at? Incarceration. The graph below charts the number of Americans incarcerated (red line) over the homicide case clearance rate (blue). This is just a correlation, but the underlying numbers are not in dispute: the US prison population has more than quadrupled, and the case clearance rate has decline by half.

Source: (clearance rates): Murder Accountability Project. The source for the incarceration rate is the Bureau of Justice Statistics: National Prisoner Statistics, Annual Probation Survey (NPS), Annual Parole Survey, Annual Survey of Jails, Census of Jails, https://bjs.ojp.gov/document/keystatsupdate_2022.csv

This suggests we are having trouble walking and chewing gum. That is, as more (scarce) criminal justice system resources go to incapacitating both pre-trial and convicted offenders, the rate at which cases are solved has declined.

Cook and Mancik point out that over this time, the solvability of homicides—particularly gun homicides—has declined. That is, drive-bys and other shootings are particularly hard to investigate, and business as usual (what we do in a non-gun homicide) is probably not sufficient.

So far, our solution seems to be to shrug our shoulders and say, hey these crimes are hard to solve. And to lean more into punishment and away from certainty. Let me say a few words next about why this is not the most efficient approach.

Solution: Focus on Increasing the Certainty of Detection

I propose that if the goal is to get tough on crime, the better way to do it is by solving more crimes. This is doable. I went to a mid-sized southern city this past week and visited a police department laser-focused on this idea. They employed all sorts of smart technology—license plate readers (static and mobile), data scientists, crime scene evidence collectors, criminal analysts in a well-staffed data center—all directly connected to cops on the street. And all with one goal—catching as many perpetrators as possible.

A natural question is: aren’t all police departments laser-like focused on solving crimes? And the answer is a resounding no. There isn’t a simple reason why this is so. Many departments are understaffed and under-resourced. Some are pulled in other directions for local political reasons. Some are stuck in the past and have failed to embrace modern policing methods. Almost all of them simply have too few investigators working on too many cases, and cases that are increasingly hard to solve. If you are of a certain demographic and enjoy watching police procedurals on CBS, you are watching science fiction, for the most part. But it does not have to be this way.

A (Very) Quick Primer on Deterrence

So, let’s get tough on crime by catching more perpetrators. The beauty of this approach is that it creates a virtuous cycle. Deterrence is a big goal of the criminal justice system because it is essentially a mechanism to reduce crime without spending any money. If people believe the costs of committing a crime outweigh the benefits, they will tend not to commit the crime. So, if you can raise the costs of committing a crime, you can nudge that calculus toward not committing a crime.

There are four ways to increase the cost of committing a crime: severity, celerity, certainty, and salience. You can raise the penalties (severity). You can catch people more quickly (celerity, or swiftness). You can make the punishment better fit the crime (salience). Or you can make it more likely that if you commit a crime, you will get caught (certainty). Of the four approaches, the one with—by far—the most empirical support is certainty.

The more likely I am to think that I will get caught if I commit a crime, the less likely I am to commit a crime. And because I need to get caught before I can get locked up, the prospect of getting caught—if we can make that prospect more certain—is a more effective deterrent than the prospect of getting locked up.

Coda

My economics training pushes me to think about the factors of production in clearing more cases. I won’t bore you with all those thoughts, but technology is one key factor of production that could be enhanced, and it is obvious that we are at a technological inflection point. The mid-sized city I mentioned has fully embraced technology, with real results. How widespread is that?

According to the measure below, not very. The Washington Post created this cool data visualization from data compiled by Anthropic, the makers of the Claude LLM. Basically, they reviewed millions of AI chats and categorized the requests as either asking Claude to do something for you (automation) or to have Claude do something with you (augmentation).

Then, you could put in your job to see if Claude is likely to replace you or help you, depending on the proportion of responses unique to your position.

I put “Criminal investigators and special agents” into the algorithm, and this is what I found: investigator queries were exclusively requests to have Claude automate their job, not augment. “For criminal investigators and special agents, AI is currently automating 18% of the job functions and augmenting 0% of them.”

Source: Youyou Zhou, **How AI is impacting 700 professions — and might impact yours,** Washington Post https://www.washingtonpost.com/opinions/interactive/2025/ai-jobs-layoffs-tech/?itid=co_oppopular_3

So, anything I say from here on is speculation. But my guess is that what investigators are mainly using Claude for is to populate forms from their notes. Or write memos and emails from their notes. This is not a bad thing—in fact, it’s great. In theory, this frees investigators to do more investigating. But the fact that criminal investigators use Claude 0% of the time for augmentation makes me wonder if this is how they are using the time freed up from automation.

Anyway, the broader point is that there is a lot more that could be done to improve the certainty of detection. But we need a policy shift in that direction, and away from punishment, to get there.

The View from Outside the Ivory Tower

It is not often that I find non-scientists challenging scientific arguments. Yes, this type of criticism is widespread on Twitter, but almost all of it is performance art. It’s trolling, preening for the cameras. But finding uncredentialed (but not unqualified!) people willing to weigh in on scientific matters—in the scientific jargon—is pretty unusual.

This week, there were two excellent examples of writers who claimed no scientific training but wrote very effective critiques of complex science. Both are worth a read.

In his essay “The New York Fed's New Involuntary Commitment Paper is Methodologically Shaky as Well as Irresponsible”, Freddie DeBoer makes an extremely clear argument for why a complicated instrumental variables analysis might lead to unsupported conclusions. I am not saying he is correct and the underlying paper is wrong. I am simply highlighting an extremely cogent and accessible critique by a non-scientist that practicing scientists might learn from.

Now, I am very aware that Freddie is a controversial person, but I want to focus on the clarity and accessibility of this argument. If more professors taught in this language and more social science students were taught to present their arguments in this language—and both wrote more using this kind of language—there would be a lot less mystery around social science. And hopefully, much more uptake of robust papers.

The idea of the underlying paper is to test whether involuntary commitment leads to more (or less) suicide and violence among people who are marginally eligible for an involuntary placement. Since you obviously cannot randomly assign people to be involuntarily committed, the researchers use a clever instrumental variables design to approximate (quasi)-random assignment. The authors argue that since the assignment of the mental health provider who conducts the assessment is random, you can tease out effects similarly to a randomized trial. This kind of design is used widely in the economics of crime literature, particularly in papers examining the effects of sentencing or pre-trial detention, where the random assignment of a case to a judge becomes the instrument.

Here is DeBoer’s critique.

The authors claim that patients are randomly assigned to doctors based on a triage list, but they also admit that this process isn’t perfectly random. Sometimes specific doctors are assigned on purpose, for example, if a patient is a frequent visitor who’s been seen before, or if they’re a child or elderly and need a specialist. You can imagine all manner of casual, undocumented breaks from this “random” selection of doctors - maybe one doctor prefers patients from particular backgrounds and subtly pushes to see more of them, maybe a triage nurse has a grudge against a particular doctor, maybe another doctor takes strategic lunch breaks to avoid certain patients. Who knows? The Fed researchers certainly don’t. Even small breaks in the asserted randomness of doctors assigned to cases could potentially severely skew the results. If stricter doctors tend to get the most severe cases, or more experienced ones handle the hardest ones, then we can’t assume everything evens out in the wash, and note that this doesn’t have to be a formal policy of the institution for it to happen consistently. Just the fact that doctors have schedules can break this assertion of randomness; if stricter doctors are assigned to a particular time slot for nonrandom reasons (very possible) and particular patients are most likely to get evaluated at particular times of the day/week (almost certain) then hey presto, the whole design breaks and we have no practical ability to assess such tendencies in any robust way.
The second assumption is potentially even worse. Of course doctors affect outcomes in ways other than in decision to hospitalize! The whole logic of the study depends on doctors affecting outcomes only by deciding whether to hospitalize a patient or not. But doctors aren’t robots. They might interact differently with patients, give different advice, refer people to different outpatient services, or write different notes that influence how patients are treated later. Some doctors might have special relationships with other practitioners or programs that end up resulting in meaningful differences in treatment. If those things affect outcomes (and they do) then we can’t isolate the effect of hospitalization. We’re also picking up the effect of having a certain kind of doctor. This is what’s called a violation of the exclusion restriction, and it’s a common problem in instrumental variable designs.
If patients who end up with “lenient” or “strict” doctors are systematically different in ways we don’t see in the data then the whole thing collapses.

I am not saying Freddie is right or wrong or the Fed researchers are right or wrong. What I am saying is that the presentation of the critique of the method promises broader understanding than anything I have ever read about IV designs.

***

Finally, I’d like to just mention this article by Derek Thompson. Thompson is a full-throated advocate of regulation reduction and more housing construction. But he is aware that there is a split about how to solve this problem. YIMBY’s (the Yes! In my Backyard, pro-growth movement) tend to advocate for reducing the amount of regulation that restricts residential construction. Another faction believes that monopoly forces on residential housing developers constrain America’s housing supply. For the uninitiated, the intensity of the fight may be surprising, but it is a pretty bitter battle.

Enter Derek Thompson who takes the novel approach of reading the most commonly cited papers of the anti-monopoly folks and then calling the authors of the underlying studies to see if their work has been fairly and accurately represented in the summary articles. He finds, by and large, that they have not.

Anyone who has published an article that has been cited or referenced on any scale can tell you a story about someone misrepresenting their work, and the helplessness that results. I am a firm believer that once you produce anything—whether scholarship, art, or music—the interpretation of it belongs to the audience. That you should know going in that people will see your work differently than you imagined it. That, however, is a long way away from stating that you have said one thing when you have actually said something materially different.

Musical Interlude

By the time I was born, Tom Lehrer had walked away from performing in favor of teaching mathematics. But his album was a little bit hidden away in my parents’ record collection, and so I was enraptured and snuck a listen whenever the folks weren’t around. I was terribly sad to see him pass away, too soon, this week. But I was really delighted to see the huge outpouring of love for his satire that has followed.

Aug 4

Amen. In the broadest terms, this whole Substack project is intended to push forward the idea that data and theory for all-of-society (macro) crime measures are needed. You have identified two theories for why crime rises and falls nationally that should be tested. I couldn't agree more.

Expand full comment

Sam antha

The lack of strategy and knowledge when it comes to DNA databases, and the lack of DNA lab funding overall, are contributors to low clearance rates. Many states still don't have strong (or any!) arrestee or convicted offender DNA laws, and usually no one is in charge of audits or compliance.

This ties into your deterrence discussion because DNA databases have been shown to deter/prevent crime, in addition to identifying or confirming named offenders after the crime occurs. https://manhattan.institute/article/the-deterrent-effects-of-dna-databases

I'd really love to see more research and exploration of this topic.

2 replies by John K. Roman, PhD and others

8 more comments...

External Processing

Discussion about this post