Comparative analysis of artificial intelligence and expert assessments in detecting neonatal procedural pain

Table of Contents

Research environment

The study was conducted at the Level IV Perinatal center at University Hospital Hamburg-Eppendorf, Germany. AI-based emotion detection technology was applied to analyse the response of term neonates undergoing routine capillary blood sampling procedures intended for metabolic screening. The study protocol (PV7384) received ethical approval from the Ethics Committee of the Hamburg Medical Association. The research was performed in accordance with the Declaration of Helsinki following relevant guidelines. Informed consent was obtained from all participants (health-care-providers) and from all legal guardians of each neonates filmed during routine capillary blood sampling procedures.

Study framework

This research constitutes a secondary analysis of data initially gathered during an eye-tracking study, which was designed to assess neonatal pain evaluations conducted by healthcare professionals. The eye-tracking data was gathered using a Tobii Pro Fusion eye-tracker (2500 Hz, Tobii technology, Sweden) coupled with an LG Electronics monitor (24BK550Y, 1920 × 1080). Parental consent was obtained to film the newborns under three conditions: (1) Contact with a disinfectant-soaked swab (a non-noxious thermal stimulus), (2) A heel prick using an automatic lancet (SafetyLancet Neonatal, Sarstedt, Nümbrecht, Germany) (a short, noxious stimulus), and (3) Pressure applied to facilitate blood flow during capillary sampling (a prolonged noxious stimulus). The filming of the participating newborns, with prior parental consent, took place in a quiet room. The infants were positioned supine, and both video and audio were recorded—the camera (GoPro HERO 6, San Mateo, CA, USA) and microphone (iGoMic Mini Shotgun, MicW, China) were mounted on a tripod (DigiCharge Octopus 10″ Tripod, Digital Accessories Ltd, Derbyshire, UK) above the infant’s torso level. As per local guidelines for nonpharmacological pain management in neonates, a 0.1 mL/kg dose of 30% glucose solution (DOLCEO, Chiesi GmbH, Hamburg) was administered one minute before all procedures, no rescue doses were further used.

Video analysis and pain assessment

The initial study involved presenting participants (health-care-providers) with 10-s sequences of 42 videos, each depicting 14 infants previously filmed in three distinct conditions: (1) a non-noxious thermal stimulus, (2) a short, noxious stimulus, and (3) a prolonged noxious stimulus. All infants were filmed at day two postpartum. The filming followed a precise timeline sequence according to the routine blood sampling procedure (Fig. 1A). The sequence began with the infant in a calm state, followed by the administration of sucrose as described. Using a GoPro (GoPro Hero10), fixed above the infant, we filmed the patient undisturbed. The recording continued by the application of the disinfectant (thermal stimulus), the heel prick, and finally the pressure on the heel to draw blood. Detailed information about the non-noxious thermal stimulus is provided in the supplementary material (Fig. S1). The procedures were carried out calmly with adequate support from the medical staff. There was no overlapping between the events having at least 30 s break between a stimulus and the other. Exact information about duration of the videos, pause between each condition can be found in supplementary material (Fig. S2). The videos were edited in post-processing to distinguish the different procedures and to fit into the duration of 10 s. Figure S2 clarify the inclusion of the stimulus within the selected 10 s. Seven infants were presented in a full-body view, while the other seven were shown in a face view only (see Supplementary Material). Post processing cut of the videos was performed by AL using (iMovie, Version 10.3.8, Apple, CA, USA) (see Supplementary material for detailed information regarding the post processing). Videos of the infants were then randomly presented using Tobii Pro Lab (version 1.194, released on June 7, 2022) to a representative sample of health-care-providers (participants) of the NICU. Participants were in part blinded to the context as video were presented in full body view (knowing the stimulus), or in face view only (blinded to the stimulus). After viewing each video, health-care-providers (participants) were asked to rate the infant’s perceived pain using VAS (Fig. 1B). The videos were randomly displayed for evaluation to 46 healthcare providers. These providers assessed the videos using a Visual Analogue Scale (VAS) that ranged from 1 (indicating no pain) to 6 (indicating maximum pain), as depicted in Fig. S1. The VAS scale was chosen to align with the eye-tracking paradigm, enabling healthcare professionals to assess infants’ pain using the same technology (Fig. 1B).

FaceReader analysis

For the purposes of the current research, the videos underwent re-analysis employing the AI-based FaceReader software (Version 9, Noldus, Amsterdam, Netherlands), using baby face model. FaceReader is a facial analysis software capable of detecting facial expressions. The software, previously trained on adults (General Face model) and infants up to six months old (Baby Face model), had not been used on newborns prior to this study. The process involves three key steps: face finding, face modelling, and face classification. Face finding employs a deep learning algorithm to identify potential facial regions in an image at various scales. Face modelling uses a facial modelling technique based on deep neural networks to create an artificial face model that locates 468 key points on the face, providing a comprehensive set of facial landmarks¹⁸. For face classification, neural networks analyse the image pixels to categorize facial expressions, drawing on a database of several manually coded images^14,17,18. FaceReader also provides detailed information on individual facial action units as per the Facial Action Coding System (FACS) developed by Ekman and colleagues^18,19 and it´s derivate BabyFACS, as well as data on the activation level of each facial expression. The FaceReader is thought to detect following emotion in adult: happy, sad, angry, surprised, scared, disgusted, and neutral. It also provides the opportunity to customize groups of action units, working therefore with extreme flexibility. The FaceReader, further provides previous mentioned information on continuum where the valence indicate the emotional polarity of the expression, while the arousal its intensity level. In this specific case, using the module baby cry of FaceReader 9, valence was represented on a scale between 0 and 1 where higher values corresponded to higher level of pain while scores closed to 0 a more neutral face expression.

Statistical analysis

The analysis was conducted using R Studio (provided by the R Foundation for Statistical Computing, Vienna, Austria) and JASP (Version 0.18.3, JASP Team, 2024; University of Amsterdam). The facial action units (AUs) were depicted using barographs and boxplots for each category of stimulus: (1) non-noxious thermal stimulus, (2) short noxious stimulus, and (3) prolonged noxious stimulus. To visualize and analyze the effects of these stimuli on the AI-generated metrics arousal and valence, line graphs were employed.

Continuous variables are presented as means ± standard deviations (SD), while categorical variables are shown as counts and percentages. Pearson correlation analysis was applied to explore the relationship between arousal, valence, and expert assessments. An Analysis of Variance (ANOVA) was utilized to determine differences in AU activation across the three stimulus conditions, with the Tukey test serving as a post-hoc analysis for multiple comparisons. F-values and η² were reported to estimate effect size. A Multivariate Analysis of Variance (MANOVA) was performed to examine the differences in selected AUs across conditions, employing Pillai’s trace test to assess the impact of contributing factors on the model. A Delta was calculated to understand the difference between the minimum and maximum activity levels of each AU. An AU was deemed relevant if it met two key criteria: a mean activity level of at least 15% and a minimum delta of 50%. These thresholds were set to ensure that only AUs demonstrating both notable mean activity and significant dynamic variability were selected for further analysis. Lastly a net plot was drawn to understand correlation in a dynamic way between relevant AU previously identified.

link