Search icon Facebook icon Twitter icon Instagram icon

Study: Using Social Media Data to Forecast HIV and STI Rates

Monitoring new cases of HIV and STIs across the country has traditionally relied upon time-lagged government surveillance statistics that take years to compile. Big data from social media, however, may soon change the way researchers track – and even anticipate – regional patterns in these infections.

A new study recently published in AIDS and Behavior found that Twitter data contains clues that can accurately anticipate regional trends in HIV and STI transmission rates. The study looked at over three billion de-identified tweets collected from across the U.S. between 2009 and 2013. Researchers classified tweets by county of origin and developed word clouds for each county. Certain words appeared in tweets more frequently in counties with a high rate of infection.

The words “mafia” and “gospel,” for example, were significantly more frequent in tweets posted from counties with a high rate of gonorrhea infection. The words “gusting,” “mist,” and “southeast,” by contrast, were among the words significantly more prevalent in tweets from counties with a high rate of HIV transmission. Although none of these words are directly related to HIV or STIs, study authors explained that some words may indirectly indicate the social conditions of a community that predict its health.

Twitter-based models will not replace existing surveillance systems any time soon, but their data provide real-time signals that could be used to more efficiently allocate scarce public health resources to the regions with greatest need. Overall, the study is a “proof of concept” about an indirect way of tracking infections based on online information, says Sally Chan, a psychologist at the University of Illinois and the study’s senior author.

“Without a doubt, the availability of information about the HIV/STI prevalence in a region is important for prevention and treatment efforts,” explained Chan. “When that information is not available, estimating it can help to fill a critical void.”

Delays in collecting and disseminating HIV and STI transmission data lead to missed opportunities for timely action by public health officials. Researchers and public health practitioners should be able to use estimates based on social media data when they lack better, direct measures of new diagnoses, added Chan.

Scientists are increasingly using big data, particularly from social media, to understand public health problems like heart disease and the flu. A study published in 2015 revealed that mentions of positive emotions in social media posts are linked to lower mortality due to heart disease. Another article from that year described a statistical link between future-directed tweets and lower HIV prevalence in a region. Although social media “signals” may not explicitly mention disease, they can still suggest increased risk in certain areas.

Outside of public health research, psychologists have spent years mining social media data for clues of personality traits. Currently, this area of research is receiving increased scrutiny after reports that Cambridge Analytica used these data to influence the 2016 U.S. presidential election. When combined with precise geolocation data, aggregated social media posts provide a powerful tool for understanding population-wide trends in real time.

The ever-present possibility of funding cuts to HIV and STI prevention services is driving public health officials to test out less expensive methods for tracking infection rates. In the future, agencies could use social media models to understand and anticipate trends in HIV and STIs of a region with low cost and with less delay.