DETECTION OF EPILEPTIC SPASMS USING FOUNDATIONAL AI AND SMARTPHONE VIDEOS
The journey to diagnosing rare neurological disorders often presents significant hurdles, leading to delays that can severely impact patient outcomes. This is particularly true for Infantile Epileptic Spasm Syndrome (IESS), a debilitating condition affecting infants. While the widespread availability of smartphones has offered a promising avenue for capturing crucial diagnostic information, the sheer volume of video data and the limited access to specialized medical professionals for review have hindered its full potential. However, groundbreaking research is now bridging this gap by leveraging the power of foundational artificial intelligence (AI) and the vast repositories of publicly available social media videos to accelerate the detection of epileptic spasms (ES).
THE DIAGNOSTIC DILEMMA: UNDERSTANDING INFANTILE EPILEPTIC SPASMS
Infantile Epileptic Spasm Syndrome (IESS) stands as a severe developmental and epileptic encephalopathy, typically emerging within an infant’s first year of life, affecting approximately 1 in 2000–2500 infants. While many paroxysmal movement events in infants are benign, a significant percentage can indicate serious underlying conditions. The hallmark seizures of IESS, known as epileptic spasms (ES), are characterized by stereotypical movements. Despite their distinct nature, timely diagnosis of IESS is frequently delayed, often by weeks or even months. This delay stems from several critical factors:
* Misidentification of Symptoms: Parents and even physicians may mistake ES for benign physiological occurrences like startle reflexes, colic, or normal infant movements, failing to recognize any abnormality.
* Lack of Awareness: Insufficient recognition of the subtle signs of IESS contributes to delayed medical consultation.
These diagnostic delays carry severe long-term consequences, including poor cognitive outcomes, inadequate seizure control, increased disability, and higher healthcare costs. For conditions like IESS, where early intervention can significantly alter the developmental trajectory, such delays are particularly detrimental.
In recent years, the ubiquity of smartphones has provided a powerful tool for parents to capture their children’s suspicious movements. Videos recorded on smartphones have demonstrated their utility in enhancing diagnostic accuracy and informing clinical decision-making, simultaneously reducing patient and family stress. For children with IESS, these videos can facilitate earlier arrival at clinics, faster diagnostic electroencephalograms (EEGs), and improved treatment responses when reviewed by experts. In some instances, smartphone videos have even been found to be non-inferior to gold-standard video-EEG monitoring for initial diagnosis, offering a distinct advantage, especially in resource-limited settings.
However, a significant bottleneck remains: the scarcity of medical professionals, particularly neurologists specializing in pediatric epilepsy, who are available for the timely review and evaluation of these patient videos. This limitation restricts the broader applicability of smartphone video-based diagnosis, leaving many infants undiagnosed for extended periods.
HARNESSING THE POWER OF FOUNDATIONAL AI AND SOCIAL MEDIA
To address this critical clinical need, recent research has explored a novel approach: combining the immense capabilities of powerful foundational vision models with the wealth of publicly available video data on social media platforms. This innovative strategy directly tackles the dual challenges of timely diagnosis for rare conditions and the scarcity of large, labeled datasets traditionally required for robust AI model training.
FOUNDATIONAL VISION MODELS
At the core of this advancement are foundational vision models, particularly those based on transformer architectures. These AI models are pre-trained on vast, diverse datasets of images and videos sourced from the internet, enabling them to understand and interpret a wide range of visual information. For this study, the Hiera Vision Transformer, pre-trained on the Kinetics 400 Human Action Recognition dataset, was selected.
Why are these models so valuable for medical applications, especially in the context of video-based seizure detection?
* Robust Human Activity Recognition: Their extensive pre-training allows them to be highly robust in recognizing complex human activities, which is directly applicable to identifying subtle seizure semiologies.
* Adaptability to Real-World Conditions: They are inherently designed to handle variability in video quality, recording equipment, and patient demographics—factors commonly encountered in smartphone recordings outside of controlled clinical environments.
* Superior Feature Capture: Unlike traditional methods that might rely on skeleton landmark identification or simple frame-based analyses, vision transformers excel at capturing long-range dependencies within video sequences. This is crucial for recognizing the often subtle and repetitive motion patterns characteristic of epileptic spasms. They also demonstrate enhanced robustness to variations in camera angles and lighting conditions, further improving their practical utility.
By fine-tuning these generalist foundational models, researchers can significantly reduce the amount of highly specialized, labeled medical data needed for effective model training, making AI development feasible even for rare disorders like IESS.
SOCIAL MEDIA AS A DATA SOURCE
One of the most formidable challenges in developing accurate AI models for rare conditions is the scarcity of large, diverse, and labeled datasets. Clinical data collection is often slow, expensive, and subject to strict regulatory constraints regarding patient privacy. This study pioneered a solution by leveraging social media platforms as an “untapped resource.”
* Addressing Data Scarcity: Social media platforms like YouTube have inadvertently become valuable repositories where users upload videos clearly demonstrating the semiology of stereotypical seizures, including epileptic spasms. This allowed researchers to curate a large, clinically relevant dataset of a rare neurological condition quickly and efficiently. For context, while a single tertiary medical center might see around 24 IESS cases per year, this approach enabled the identification of 167 infants with over 1000 ES, a dataset that would take many years to collect through traditional means.
* Diverse and Heterogeneous Cohort: Videos derived from social media exhibit considerable technical heterogeneity (resolution, bitrate, brightness, sharpness) and semiological diversity (flexor, extensor, mixed, subtle spasms). This inherent variability helps in training AI models that are more robust and generalizable, reducing the need for constant model retraining and boosting clinical applicability across different populations and real-world scenarios.
* Ethical Considerations: Researchers strictly adhered to ethical principles by using only publicly accessible content, collecting no personally identifiable information beyond the videos themselves, and avoiding the publication of source links or screenshots to protect privacy.
This innovative approach to data sourcing, combined with the power of foundational AI, paves the way for rapid and robust development of diagnostic tools for conditions that have historically suffered from data limitations.
THE STUDY: METHODOLOGY AND REMARKABLE RESULTS
The study involved a meticulously designed process, from data collection and annotation to rigorous model training and external validation, ensuring the reliability and generalizability of the AI model.
DATA COLLECTION AND ANNOTATION
For the *derivation dataset*, a systematic search was conducted on YouTube for videos published before 2022, using keywords like “infantile spasms,” “epileptic spasms,” and “West syndrome.” Videos were included if the subject appeared to be under 2 years old, displayed events consistent with ES semiology (confirmed independently by two expert neurologists), and was clearly visible with sufficient video quality. Videos with obstructed subjects or insufficient quality were excluded. Non-overlapping 5-second video segments were manually annotated as either containing ES or being non-seizure segments.
To enhance the training dataset and improve the model’s specificity, additional videos of normally behaving infants were incorporated from previously collected YouTube datasets. These control videos included a diverse range of movements to expose the AI to typical infant activity.
Ethical approval was obtained, and for social media videos, fair use doctrine was followed, using only publicly accessible content without publishing source links or personally identifiable information.
MODEL TRAINING AND VALIDATION
The AI model was trained using the Hiera Vision Transformer, which was fine-tuned using Low-Rank Adaptation (LoRA), a parameter-efficient technique. The final classification layer was adapted for binary classification (seizure vs. non-seizure) using a sigmoid function. A five-fold cross-validation approach was implemented on the derivation dataset, ensuring that segments from the same child were kept within the same fold to prevent data leakage and ensure unbiased evaluation. Additional data from 127 healthy infants were specifically included in the training set to boost the model’s specificity.
For comprehensive out-of-sample evaluation, the final model, trained on the entire derivation dataset, was tested on three independent external datasets:
* Dataset 1: Smartphone videos of infants with ES, collected from YouTube and TikTok after 2022.
* Dataset 2: Additional smartphone videos of normally behaving infants, primarily from TikTok, specifically to assess the False Alarm Rate (FAR).
* Dataset 3: Gold-standard long-term video-EEG monitoring recordings from infants under 2 years old at a hospital, with seizure activity confirmed or ruled out by video-EEG. These videos were automatically cropped around the infant to focus analysis.
PERFORMANCE METRICS AND KEY FINDINGS
The model’s performance was rigorously assessed using standard metrics: Area Under the Receiver-Operating-Characteristic Curve (AUC), sensitivity, specificity, accuracy, and False Alarm Rate (FAR).
The results were impressive, showcasing the model’s high accuracy and robustness:
* Derivation Dataset (Smartphone Videos): The model detected ES with an AUC of 0.96 (95% CI 0.94–0.98). At a threshold of 0.5, it achieved 82% sensitivity (CI 78–87%), 90% specificity (CI 86–94%), and 85% accuracy (CI 82–88%). Importantly, no statistically significant relationships were found between the technical characteristics of the videos (resolution, brightness, sharpness) and the prediction performance, highlighting the model’s adaptability to varied input quality.
* External Validation (Smartphone-Based – Dataset 1 & 2):
* On Dataset 1 (26 infants with ES), the model maintained high performance: AUC 0.98 (95% CI: 0.94–1.0), sensitivity 89% (95% CI: 82–95%), and a remarkable 100% specificity (95% CI: 100–100%).
* On Dataset 2 (67 normally behaving infants), false detections occurred in only 0.75% (5/666) of evaluated video segments. Crucially, 62 out of 67 subjects had no false alarms, resulting in a low mean FAR per patient of 1.6%. Analysis of the few false positives indicated sudden bilateral arm extensions or camera movement artifacts as contributing factors.
* The model’s consistent performance across different social media platforms (YouTube and TikTok) and varied technical characteristics further underscored its robust generalizability.
* External Validation (Gold-Standard Video-EEG – Dataset 3):
* When applied to hospital-based video-EEG data, the model demonstrated robust transferability. For a child with ES, it achieved an AUC of 0.98, sensitivity of 80%, and specificity of 99%, closely matching smartphone-based performance.
* However, for 21 infants without seizures (10,860 segments), the total FAR increased to 3.4%. This higher FAR was systematically investigated and attributed to several factors inherent to the hospital setting:
* Lower Video Resolution: Only 2% of video segments had a resolution above 720p, and 21% were below 480p, significantly lower than smartphone videos.
* Night-Time Footage: Unlike the other datasets, Dataset 3 included night-time recordings. Excluding these reduced the FAR to 2.8%.
* Obstructions and Reduced Visibility: Long-term monitoring in hospital rooms often involves EEG caps, bed cribs, and family member interference. Object detection analysis confirmed that false positive videos had significantly lower confidence for infant detection, indicating reduced child visibility.
* Lower Image Sharpness and Bitrate: False positive videos showed significantly lower image sharpness and bitrate, suggesting that video quality profoundly impacts FAR.
* Lower Motion Intensity: This was particularly relevant for night-time videos, which are characterized by a paucity of movement.
These findings collectively suggest that while the model generalizes well to different camera sources, optimizing video quality and ensuring clear child visibility are essential for minimizing false alarms in diverse clinical settings.
IMPLICATIONS AND THE ROAD AHEAD FOR DIAGNOSTIC AI
This study represents a significant leap forward in addressing the diagnostic challenges of rare neurological disorders, demonstrating a novel approach for AI model development. By leveraging foundational AI and openly available social media data, the research team has created a high-performing video-AI model capable of identifying the subtle semiology of epileptic spasms from readily available smartphone videos.
ADVANCING SEIZURE DETECTION
This work notably contributes to the burgeoning field of video analysis for seizures in epilepsy. Unlike many previous studies that relied on specialized in-hospital cameras or hardware, this model was trained on diverse smartphone data, making it directly applicable to in-field settings. It also stands out as one of the largest datasets for automated video analysis of seizures, particularly focusing on ES—a subtle seizure type often missed. The use of the vision transformer architecture is a novel contribution in this domain, offering superior capabilities over traditional methods in capturing complex motion patterns.
POTENTIAL CLINICAL IMPLEMENTATION
The robust performance and low False Alarm Rates (particularly on smartphone data) pave the way for practical clinical integration:
* For Parents and Caregivers: A smartphone application implementing this model could allow parents to record suspicious movements for rapid automated assessment. This immediate feedback could potentially expedite specialist referrals when warranted, significantly shortening the time to diagnosis.
* For Healthcare Providers: The model could be deployed through compliant cloud services or as locally installed software, seamlessly integrating with electronic health records. This would serve as a powerful decision support tool during initial evaluations, helping primary care physicians identify seizures that might otherwise be missed. Neurologists could also utilize the technology to objectively quantify treatment response in already diagnosed patients.
TRANSFORMATIVE BENEFITS
The potential benefits of such technology are multifaceted:
* Improved Patient Outcomes: For infants with IESS, earlier diagnosis and prompt treatment initiation have been consistently associated with improved seizure control, better cognitive development, and reduced long-term disability.
* Streamlined Healthcare Systems: Accelerated diagnosis could lead to a reduction in unnecessary specialist visits, decreased emergency room utilization, and lower long-term care costs associated with managing developmental delays.
It is crucial to emphasize, however, that while this AI tool can significantly accelerate the diagnostic pathway, video analysis alone is not sufficient for definitive ES diagnosis. Non-epileptic paroxysmal events or seizure mimics (e.g., benign sleep myoclonus, gastroesophageal reflux) cannot be reliably differentiated by visual assessment alone. Therefore, this AI approach should be viewed as a powerful screening and decision-support tool to expedite referral to gold-standard video-EEG diagnostic evaluation and appropriate medical treatment, rather than a standalone diagnostic replacement.
LIMITATIONS AND FUTURE DIRECTIONS
Like any pioneering research, this study has inherent limitations that point towards future avenues for development:
* Lack of Gold-Standard EEG in Derivation Data: The initial social media derivation dataset lacked electrophysiological confirmation of ES. This was mitigated by rigorous expert neurological review and, crucially, by validating the model on external datasets that included gold-standard video-EEG confirmed cases.
* Potential Biases: Due to the nature of social media data, detailed demographic or clinical information was unavailable, meaning potential biases related to age, sex, or ethnicity could not be fully excluded. Additionally, the number of video segments per participant varied, and balancing seizure to interictal segments in a 1:1 ratio was not always possible.
* Selection Bias: The study relied on videos voluntarily uploaded to social media, introducing a potential selection bias. Future prospective studies enrolling a broader, more representative patient population are needed to further establish generalizability.
* Scope of Seizure Types: While ES are primary for IESS, infants may have other seizure types. Expanding the model’s capabilities to detect additional seizure semiologies and thoroughly evaluating its performance against various ES mimics will be crucial for broader clinical applicability.
Future research should focus on:
* Prospective Validation: Conducting large-scale prospective studies in diverse clinical settings to quantify improvements in time-to-diagnosis, treatment initiation speed, and long-term patient outcomes.
* Expanded Seizure Coverage: Training the model on a wider range of seizure types to enhance its utility for generalized epilepsy diagnosis.
* Improved Generalizability: Incorporating video data from multiple camera sources (smartphones, hospital cameras, home monitoring systems) into training to further enhance transferability and robustness across different recording environments.
* Mitigating False Alarms: Further research into factors like low light conditions, camera movement, and non-target subject visibility to refine the model’s performance in challenging real-world scenarios.
This groundbreaking work lays a strong foundation for the future of AI in neurological diagnostics. By demonstrating the feasibility and high performance of AI-powered ES detection from accessible smartphone videos, this research offers a tangible path to accelerating diagnosis and improving the lives of infants affected by IESS, and potentially, other rare neurological disorders.