Times Insider explains who we are and what we do, and delivers behind-the-scenes insights into how our journalism comes together.
Five days a week, Andrea Michelson signs on to a New York Times online chat: “I’m logging on, let me know where I can help!” During her eight-hour shifts, Ms. Michelson might help track coronavirus tolls in nursing homes across the United States, or verify cases reported in other publications. Ms. Michelson, a freelance journalist and a recent graduate of Northwestern University, is part of a team working on 澳门葡京网址’s coronavirus tracking project, which seeks to count every case of the virus and every death from Covid-19, the disease it causes, in the country.
The data gathered by the team has been an engine for The Times’s coverage of the pandemic and has been used by medical researchers, federal offices, health care providers and nonprofit organizations.
Because The Times has collected coronavirus data using a consistent methodology since late January, it has one of the few complete data sets available.
And it all started with a single spreadsheet.
As the first U.S. cases were being reported in late January, Mitch Smith, a national correspondent who covers the Midwest, and Monica Davey, the Chicago bureau chief, created a Google spreadsheet to keep track of confirmed cases across the country. At first, they documented cases in Washington State and on cruise ships.
“Every case was a news event then,” Mr. Smith said.
In those early days, Mr. Smith or a team member would note an infected person’s age, gender and condition, in addition to a few other details, and add them to the sheet. A map on the Times website, built by the Graphics desk, pulled data from the sheet to show where outbreaks were appearing.
By late February, with outbreaks popping up across the country, the team realized that its database had no equivalent in the public sector. “We had a level of detail and a level of immediacy that the federal government wasn’t providing,” Mr. Smith recalled.
The Times’s tracking project grew to keep up with an epidemic that was rapidly spreading. Reporters from the National desk, editors from the Graphics desk, developers from Interactive News and Technology, news assistants, researchers and freelancers were all pitching in. To date, more than one hundred people have contributed to an effort that is active 18 hours a day.
Manual collection of every new reported case became impossible. In April, more than 30,000 new cases were being reported daily. The spreadsheet eventually grew so big that it broke. At nearly 44,000 rows, it just stopped loading. The Times needed a way to programmatically gather the data and a database in which to store it.
By then, the developers had joined.
A team led by Tiff Fehr, a lead developer on the Interactive News desk, wrote custom software that pulled confirmed case and death numbers from the websites of 56 states and territories every few hours and saved them to a Times database.
While some states provide new case and death numbers broken down by county, other states provide only state-level numbers. To maintain consistency with how The Times has collected data since the beginning of the pandemic, the developers have had to write programs that pull data from county health department websites as well. The Times now collects data from 228 unique websites in the United States. So far, more than 18 million data samples have been collected.
Manual collection still plays an important role. Some states specify where clusters form in nursing homes, prisons and meatpacking plants. But many states don’t. That’s when reporters pick up the phone.
The Times is also counting cases around the world, and tracking how overall mortality has changed in 24 countries, with the hope that this data might provide a more accurate account of the virus’s toll where cases are underreported. All told, the coronavirus data collected informs over 70 different maps and articles live on the Times website. They are updated about every four hours.
In some ways, the Times newsroom has been preparing every four years for a project of this size, said Wilson Andrews, a graphics editor who is overseeing some aspects of the data collection. Presidential election results are a similar undertaking: Times data journalists — and the programs they write — must process large amounts of data from both state and local governments, then synthesize it for readers.
“It’s like that — basically reporting an election that never ends,” Mr. Andrews said.
After growth of the epidemic slowed in May, The Times counted record levels of new daily cases in June. More than 125,000 Americans had died as of Sunday. (The Times is including confirmed and probable cases in this count.)
The human toll can be hard to fathom. “All of these numbers represent people,” Mr. Andrews said. “And if I ever get a chance to step back and think about that, that’s incredibly devastating to comprehend.”
Prompted by requests from researchers, in late March, The Times publicly released the data set for anyone to use on GitHub, an online collaboration platform for developers. It has been used by public offices, economic groups and other news organizations including Kaiser Health News, the Google News Initiative and the Federal Reserve Bank of New York.
“There’s a hope that what we’re doing is a small piece of helping and providing at least an informational backbone to make decisions and maybe hopefully make advances against this or the next one,” Mr. Smith said. “And hopefully there’s never one.”