June 22, 2023
African-based data centers

COVID-19 illustrated once again the importance of localized epidemiological data, monitoring, and analysis.

By Michael Dumiak

University of Nairobi researchers analyzing a late 2019 cholera outbreak in Kenya were able to pick up details about the migratory patterns of infection using tried and tested tools: contact tracing and epidemiological modeling.

Data science in Africa homepage

They were about to be tested by a much larger emergency, global in scope, with the quick onset of the COVID-19 pandemic just weeks later. In the maelstrom of the pandemic, however, the researchers were reminded again and again that even with the huge global appetite and capacity for data modeling, a local understanding is often essential to make sense of the numbers.

“If you look at the [COVID] models that were coming out of China and models that were coming out of Europe, they didn’t factor in our extremely youthful population and they didn’t factor that we’re quite a warm place,” says computer scientist Shikoh Gitau, chief executive of the software platform builder Qhala and a co-founder of CEMA, the Center for Epidemiological Modeling and Analysis at the University of Nairobi.

Infectious disease epidemiologist Thumbi Mwangi and the infectious disease clinician Loice Ombajo at Kenyatta National Hospital are CEMA’s other co-founders. The center’s nascent start came with developing a digital tracing tool for cholera just as the COVID-19 pandemic was about to arrive. The early weeks of COVID-19 then supercharged their drive to marshal data to help the government with its pandemic response.

CEMA formally launched in May 2021, though by then it had been working for many months, and it quickly proved to be a valuable resource for the country’s health ministry. Their work even led to them being referenced on television by then-president Uhuru Kenyatta.

Large-scale data modeling is a part of our daily lives: it is hugely influential in finance and climate modelling and prediction, and, of course, in national health and security policy, as displayed vividly through the course of the COVID-19 pandemic.

But modeling is still often a blunt instrument, and the underlying data can often be biased in favor of gender or the elite or against poorly-represented groups, or, in the worst cases, plain misleading. Local context and local understanding matter hugely in the struggle to make modeling useful.

Gitau sees this clearly in her efforts to track cholera outbreaks in Kenya. “You see there is hardly any cholera outbreak in Nairobi in December. Why? Because everybody has moved out of the city and gone to their local villages between December 12th and the new year.” Following rural-urban migration patterns over time helps clarify how cholera spreads in-country.

Informal, barely mapped urban areas or slums like Nairobi’s Kibera are not necessarily going to be factored into modeling insights unless an analyst is fully aware of them. “If you know, you can begin to connect these disease patterns, because you understand there is a cultural and social connotation, not just epidemiological data that is coming from testing or observation from far away,” she says.

And as shown during the pandemic, this can have broad policy repercussions: a country with a very youthful population may not have needed to focus all its limited health resources on COVID-19 vaccinations if doing so meant falling behind on routine child immunizations, for instance.

As with many public health efforts, more focused information is often better information. While there are still logistical and resource challenges in many places on the African continent, there is a growing network of laboratories and an even faster-growing cohort of expertise and familiarity with “big” data science to draw upon for delivering local insight. It’s seen in more established modeling and data centers such as the South African Centre for Epidemiological Modelling and Analysis (SACEMA) at Stellenbosch University outside Cape Town, or AFENET, the African Field Epidemiology Network, a partnership started in 2005 and now active in 31 African countries.

SACEMA has supported data analysis in research on tuberculosis and HIV, as well as the modeling and analytics to support planning and budgeting during the COVID-19 pandemic. AFENET has the broader goal of strengthening public health surveillance and field epidemiology, and includes a strong digital component, as with its support of the test field deployment post-Ebola in Nigeria’s Kano State of the Surveillance Outbreak Response Management and Analysis System (SORMAS).

The Africa Centres for Disease Control and Prevention (Africa CDC) is also in the midst of a four-year collaboration with the European Union as part of its broader efforts to improve sharing and use of data in networks across the continent. One of the specific programs in the €10 million (US$10.6 million) package is a pilot communicable disease surveillance and early warning model. Last March the Africa CDC also launched a new digital transformation strategy.

Digital data analytics and tools can already be brought to bear to analyze and study what might have, even in the recent past, been overambitious endpoints or places too difficult to reach effectively. It becomes even more viable by collaborating with a proliferating number of local partner institutions and researchers.

This is what Chiara Altare at Johns Hopkins University — along with colleagues at the civil service organization Action Contre la Faim in Bangui, Central African Republic, and the IMPACT CAR analysis team — did in sifting through local COVID-19 data from the first year of the pandemic in the Central African Republic. IMPACT is a Geneva-based research group that has a local presence in many conflict-affected areas. That includes the Central African Republic, which is a country that has suffered for several years and throughout the pandemic from extreme civil conflict and violent clashes among armed groups, making it difficult to learn more about the direct and indirect effects of COVID-19 within the country.

With the help of IMPACT CAR’s household data collection, the Altare-led project was able to sketch a detailed picture of decreased health care utilization and underestimates of infection in the capital Bangui during the first year of the pandemic.

Paul Spiegel, director of the center for humanitarian health at Johns Hopkins, underscores the difference between the technical aspects of the borderless digital world — where massive computing, data sorting, and modeling can be done anywhere, using data collected everywhere — and the observation and interpretation of that data. “The former can be done most anywhere, if the infrastructure and the expertise exist. But the collection and interpretation should be done locally,” he says. “Many decisions about the models require input and judgement calls about which data to include and various assumptions. This is best done by people as close as possible to where the study occurs.”

For this, Spiegel looks at emerging research groups like CEMA in Nairobi and would like to see many more throughout Africa.

Continental Africa is huge and hugely complex: 1.2 billion people, 54 countries, and somewhere between 1,000 and 2,000 spoken languages, depending on how it’s counted. Gathering health data across the continent is still hard to do digitally — in some contexts, routine health data is collected on index cards and collated by hand. And digital health data is and will be subject to the same privacy concerns and technical challenges as anywhere else as it develops further.

But supercomputing is also coming to the continent. Two years ago, the Mohammed VI Polytechnic University in Ben Guerir, Morocco, plugged in a new petaflop-scale supercomputing center with an emphasis on placing its resources in the hands of African researchers, academics, and business. Petaflop supercomputers are powerful enough to have been used at the Los Alamos National Laboratory to solve classified military problems. The Ben Guerir machine — called Toubkal — is being used to model the genomes of African plants. It’s among the top 200 of the world’s powerful computers. A new supercomputer is now set to come to Cape Town that will be 30 times as powerful as Toubkal.

It’s of course possible to book time to run African data on supercomputers anywhere in the world. But these bookings are hard to come by. As data infrastructure develops throughout Africa, small startups like CEMA can make sure there is a local focus to how this data is collected, analyzed, and interpreted, all while developing new cadres of researchers to use it.

Michael Dumiak, based in Berlin, reports on global science, public health, and technology.

Read more: