Abstract

One of the critical challenges in brain disorder research is the early identification of individuals with mild cognitive impairment (MCI) who may later develop Alzheimer’s disease (AD) or other dementias. MCI in older adults is frequently associated with diminished social engagement and observable behavioral changes. With the rapid expansion of social media platforms and advances in Machine Learning (ML), digital traces from online activity present a novel and scalable avenue for mental health screening. In particular, predictive ML approaches have demonstrated strong potential in building reliable classifiers for early detection of neurocognitive decline. This study introduces an innovative framework that utilizes Facebook-derived digital biomarkers to predict cognitive impairment in adults aged 50 and above who completed the Montreal Cognitive Assessment (MoCA). Using Natural Language Processing (NLP) techniques, including a text analysis application known as the Linguistic Inquiry Word Count (LIWC 2022) and G-Emotion datasets, psychological and linguistic markers, reaching 170 Features, were extracted to train five supervised ML models: Random Forest, Gradient Boosting, Logistic Regression, Support Vector Machine (SVM), and XGBoost. The models were evaluated on 26,756 published Facebook posts among 247 participants collected from a large community-based cohort of the Egyptian Population. Prediction accuracy ranged from 69% to 78%, with Gradient Boosting, Logistic Regression, and XGBoost achieving the highest accuracy (78%). Notably, Random Forest and XGBoost achieved strong sensitivity values (0.86), underscoring their ability to capture early-stage MCI cases. The results provide empirical validation that social media-derived linguistic markers, when integrated into ML frameworks, can serve as cost-efficient, non-invasive, and scalable tools for detecting early cognitive decline. This work contributes to precision public health by demonstrating how routinely available digital behavioral data can complement traditional neuropsychological testing, reduce system-level diagnostic barriers, and potentially improve long-term outcomes for individuals at risk of dementia.

Keywords: Mild Cognitive Impairment; Machine Learning; NLP; Digital Biomarkers; Dementia; Public Health; Health System

School

School of Sciences and Engineering

Department

Institute of Global Health & Human Ecology

Degree Name

MA in Global Public Health

Graduation Date

Fall 2-15-2026

Submission Date

1-25-2026

First Advisor

Mohamed Salama

Second Advisor

Seif Eldawlatly

Committee Member 1

May Bakr

Committee Member 2

Sara Elfarrash

Committee Member 3

Maya Nicolas

Extent

121 p.

Document Type

Master's Thesis

Institutional Review Board (IRB) Approval

Approval has been obtained for this item

Disclosure of AI Use

Thesis text drafting; Thesis editing and/or reviewing; Data/results visualization; Other

Other use of AI

text improving and references reviewing

Share

COinS