Cardio-respiratory signal extraction from video camera data for continuous non-contact vital sign monitoring using deep learning.
Chaichulee S., Villarroel M., Jorge J., Arteta C., McCormick K., Zisserman A., Tarassenko L.
Non-contact vital sign monitoring enables the estimation of vital signs, such as heart rate, respiratory rate and oxygen saturation (SpO2), by measuring subtle color changes on the skin surface using a video camera. For patients in a hospital ward, the main challenges in the development of continuous and robust non-contact monitoring techniques are the identification of time periods and the segmentation of skin regions of interest (ROIs) from which vital signs can be estimated. This paper presents two convolutional neural network (CNN) models. The first network was designed for detecting the presence of a patient and segmenting the patient's skin area. The second network combined the output from the first network with optical flow for identifying time periods of clinical intervention so that these periods can be excluded from the estimation of vital signs. Both networks were trained using video recordings from a clinical study involving 15 pre-term infants conducted in the high dependency area of the Neonatal Intensive Care Unit (NICU) of the John Radcliffe Hospital in Oxford, UK. The proposed methods achieved an accuracy of 98.8\% for patient detection, a mean intersection-over-union (IOU) score of 88.6\% for skin segmentation and an accuracy of 94.5\% for clinical intervention detection using two-fold cross validation. Our deep learning models produced accurate results and were robust to different skin tones, changes in light conditions, pose variations and different clinical interventions by medical staff and family visitors. Finally, we show that cardio-respiratory signals can be continuously derived from the patient's skin during which the patient is present and no clinical intervention is undertaken.