On inflammation, machine learning and lung recovery after COVID-19

Having your work published is always a great honor and satisfaction, even though your results are not so optimistic. If, by accident, you follow my blog for a while, you may have noticed a post concerning the CovILD project - an observation cohort study, I've been curating as a data scientist. Quite recently, we've managed to publish the follow-up quite prominently in eLife (Sonnweber et al.) and develop a toolbox which may help us to assess the risk of complicated recovery from COVID-19, at least when it comes to lung injury and functional deficits.

Lung injury following COVID-19 tends to be chronic and symptom-independent

As a reminder, at the onset of the SARS-CoV-2 pandemic in Tyrol, a cohort of 145 patients was recruited, including those suffering from mild COVID-19 managed by home isolation and individuals requiring hospitalization, including mechanical oxygen supply and intensive care. The intention of the study was to establish a long-term observation collective to monitor long term recovery from symptoms, respiratory, neurological and mental health deficits. In the current work, we focused in particular on their symptoms and lungs six months after COVID-19 and tried to identify constellations of factors linked to an incomplete recovery.

Figure 1. Percentages of individuals with any chest CT abnormalities, moderate-to-severe CT lesions (> 5 severity points) or impaired lung function.

As you may appreciate above in Figure 1, still a large proportion of the hospitalized participants had lung abnormalities detected by CT, even though more severe lesions (graded with > 5 severity points) resolved quite significantly. Note: the number of participants having recovered from lung lesions at the 6-month follow-up was much lower than in the initial phase of COVID-19 convalescence. Although I cannot disclose hard figures now (working on the manuscript...), those patients who had some CT findings 180 days after the disease still have them at one-year later follow-up. Evidence from the SARS1 outbreak some two decades ago suggests that such lung alterations may be evident even years after the disease and go hand in hand with reduced respiratory function. Quite scarily, this is what we see in our cohort (Figure 2) and similar recent reports. Given millions of COVID-19 hospitalizations, it is not hard to imagine the scale of the 'last wave' or 'long tail' of the pandemic and its effects on the public health. You may argue, that most of the affected individuals were anyway elderly, multi-morbid and with quite limited life expectancy. Well, the median age of our collective was somehow 57 years and hundreds of even younger, mostly unvaccinated patients populated Austrian hospitals during the past two years.

Figure 2. Study participants with CT abnormalities, impaired lung function and persistent symptoms 6 months after COVID-19. Numbers of participants are presented in a quasi-proportional Venn plot.

The anecdotal evidence and the image of a (long-) COVID-19 convalescent presented by the media and patient advocacy suggest a person plagued by multiple symptoms including fatigue, sleep problems, cough and shortness of breath. Interestingly, the ongoing symptoms were not necessarily linked to reduced lung function and CT alterations (Figure 2).

Protracted inflammation and overshoot immunity accompany lung deficits

However, we could find some reliable predictors of lung injury in our study. If you're not a person with an immunology background, you may tend to regard anti-SARS-CoV-2 immunity: antibodies, B and T cells as our toughest allies against the virus. At the end, this is why we let us vaccinate, isn't it? However the immunity has two faces. A well tempered one kills the pathogen, resolves when not needed anymore and snoozes as memory cells till the bug appears again. An overshoot one may kill the virus together with the host. This is the reason why the majority of severe COVID-19 cases in fact need rather an immunosuppressive therapy than immune-boosting drugs (a nice review here).

Figure 3. Five nearest neighbors of mild (CT severity score 1 - 5), moderate-to-severe CT lung abnormalities (CT severity score > 5) and impaired lung function six months following COVID-19. Color and line length code for distance between the features (simple matching distance or SMD).

Such exaggerated immunity played obviously a crucial role in development and persistence of the lung injury in our cohort. As we could observe by means of cluster analysis, the lung function and structure abnormality was tightly associated with inflammatory markers during the early convalescence (60 day post COVID-19): interleukin 6 (IL-6), C-reactive protein (CRP) and inflammatory anemia. In other words, patients who had still measurable inflammation 2 months after COVID-19 were likely those to have lung problems later on.

AI may help us to identify COVID-19 patients at risk of incomplete lung recovery

The vision of Watson, Siri or Cortana diagnosing human diseases and deciding on the treatment is exciting and horrifying at the same time for many physicians and patients. As presented in this blog as well, machine learning or artificial intelligence, even in its simple form, may help us greatly to get actionable knowledge from the flood of data and in the long run to combat the pandemic and its aftermath. This was also an idea behind the paper's analyses which I personally found the most exciting. We can not offer every severe COVID-19 survivor the full follow-up care including chest imaging, lung function testing, rehabilitation and therapy. We need an algorithm or a bundle of them to identify patients at risk of non-healing lung injury at the early time point after hospital or curfew release. Ideally, they should work with standard clinical and biochemical parameters, be cheap, high throughput and available to a local GP.

Figure 4. Prediction of CT abnormalities six months after COVID during early convalescence with non-CT clinical parameters by machine learning. (A) Area-under-the curve (AUC), sensitivity and specificity of 5 machine learning algorithms and their ensemble (Ens). Performance of the algorithms in cross-validation (10-fold) is presented. (B - D) The most important factors for CT abnormality prediction by Random Forest (RF), C5.0 decision tree and elastic net (glmNet) algorithms. Top 10 most influential features are shown.

In the paper, we show that such patients at risk of pulmonary problems may be identified as early as two months after the acute disease with a panel of approximately 50 easy to obtain clinical variables including demographic, medical history, symptom, inflammatory and immunity readouts. We accomplished it with 5 technically unrelated procedures and their ensemble and verified by cross-validation, i.e. a kind of random splitting of the data set into a portion used for learning and result validation (Figure 4A). By this way, we can expect that in an external patient collective approximately 75% (sensitivity) of the chest CT findings can be identified at the 2-month control visit just by feeding the results of a standard GP visit into an application: the details below!

I was also quite excited to see, what the AI saw as important factors to tell the patients at risk from those with expected complete lung recovery. As shown in Figure 4BD, this was not necessarily the ICU stay, oxygen therapy or age as you may expect. Instead, anti-Spike antibodies (anti-S1/S2, Q1: lowest quartile, Q4: top quartile), inflammatory markers (IL-6 and CRP), a blood vessel damage readout (D-dimer), male sex and specific co-morbidities: asthma, pulmonary diseases (PD), cancer and chronic kidney disease (CKD) were key decisive variables for three unrelated machine learning algorithms. In particular, inflammatory parameters and the anti-SARS-CoV-2 antibody levels suggest a strong interplay between uncontrolled immunity and post-COVID-19 lung injury.

Our toolbox is for everybody: let's collaborate!

Finally, this was actually an idea of a reviewer, we decided to publish our tool set along with the article. We created a simple open-source on-line risk assessment tool which allows you to simply copy-paste the data from medical records into an Excel file, upload it and launch the analysis without fiddling about with parameters - the AI machinery trained with the CovILD cohort data will take care of the rest.

Figure 5. CovILD tools a free on-line Shiny App to classify COVID-19 patients and assess the pulmonary deficit risk.

You may run it also locally, the source code is available for everybody at https://github.com/PiotrTymoszuk/COVILD-recovery-assessment-app. An R package is in development for those who intend to integrate the tools in their analysis pipelines - the update will be posted at the welcome page of the app.

The idea behind the online tool is to validate and refine our findings in an external collective. At the moment our knowledge on lung recovery after COVID-19 is, so to say, atomized in multiple small cohorts recruited at different stages of the pandemic. We - the long COVID researcher community is meant here - definitively need more collaboration to get a bigger picture! If interested, simply contact me or the CovILD study team.

Search This Blog

Everyday R: tips and tricks for data science with R