This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Rehabilitation and Assistive Technology, is properly cited. The complete bibliographic information, a link to the original publication on http://rehab.jmir.org/, as well as this copyright and license information must be included.
Wearable sensors gather data that machine-learning models can convert into an identification of physical activities, a clinically relevant outcome measure. However, when individuals with disabilities upgrade to a new walking assistive device, their gait patterns can change, which could affect the accuracy of activity recognition.
The objective of this study was to assess whether we need to train an activity recognition model with labeled data from activities performed with the new assistive device, rather than data from the original device or from healthy individuals.
Data were collected from 11 healthy controls as well as from 11 age-matched individuals with disabilities who used a standard stance control knee-ankle-foot orthosis (KAFO), and then a computer-controlled adaptive KAFO (Ottobock C-Brace). All subjects performed a structured set of functional activities while wearing an accelerometer on their waist, and random forest classifiers were used as activity classification models. We examined both global models, which are trained on other subjects (healthy or disabled individuals), and personal models, which are trained and tested on the same subject.
Median accuracies of global and personal models trained with data from the new KAFO were significantly higher (61% and 76%, respectively) than those of models that use data from the original KAFO (55% and 66%, respectively) (Wilcoxon signed-rank test,
Our results suggest that when patients use a new assistive device, labeled data from activities performed with the specific device are needed for maximal precision activity recognition. Personal device-specific models yield the highest accuracy in such scenarios, whereas models trained on healthy individuals perform poorly and should not be used in patient populations.
Activity recognition (AR) has become an active area of research in the past decade, largely driven by the availability of low-cost wearable sensors and general purpose machine learning algorithms [
Rehabilitation is an area of health care that can largely benefit from AR [
The majority of wearable- and mobile phone–based AR studies have been conducted using healthy individuals, whereas relatively fewer studies are focused on people with disabilities [
Furthermore, gait patterns of individuals with disabilities can change significantly from that of healthy individuals, and additional variability can arise when disabled individuals who walk with an assistive device switch to a new device. The source of such variability can be due to differences in the mechanical design or in the way the new device is controlled, which often requires the person to learn new movement strategies [
In general, an AR model can be user specific (
Studies comparing personal with global models showed mixed results [
Here we focus on identifying physical activities using a waist-worn accelerometer in people walking with a leg orthosis, namely a knee-ankle-foot orthosis (KAFO). A KAFO is normally used by individuals who suffered a traumatic or neurological injury, as well as a neuromuscular disease causing weakness or partial paralysis of one or both legs [
After being consented, 11 individuals with disabilities (3F, mean age 56.4 [SD 12.9] years) and 11 age-matched, able-bodied individuals (5F, mean age 49.2 [SD 19.4] years) participated in this study. Northwestern University’s Institutional Review Board approved the experimental procedures for the study, which took place at the Rehabilitation Institute of Chicago. For the sake of convenience, in the following, we will also refer to our pool of participants with disabilities as “patients.”
All patients required the use of a unilateral KAFO to ambulate due to either a neurological or traumatic injury or a neuromuscular disease causing muscular weakness in one leg (see
Demographics of participants with disabilities.
Subj # | Gender | Age, in years | Diagnosis | Control assistive device |
1 | M | 64 | Poliomyelitis | Freewalk - Ottobock |
2 | F | 59 | Spinal cord injury | SPL2 - Fillauer |
3 | M | 40 | Poliomyelitis | E-MAG - Ottobock |
4 | M | 64 | Poliomyelitis | E-MAG - Ottobock |
5 | F | 41 | Poliomyelitis | E-MAG - Ottobock |
6 | M | 35 | Spinal cord injury | E-MAG - Ottobock |
7 | M | 72 | Poliomyelitis | E-MAG - Ottobock |
8 | M | 68 | West Nile meningitis | E-MAG - Ottobock |
9 | F | 44 | Peripheral neuropathy | Becker Stride - Becker |
10 | M | 65 | Poliomyelitis | E-MAG - Ottobock |
11 | M | 68 | Spinal cord injury | E-MAG -Ottobock |
Each patient was fitted and effectively trained at using a passive stance-control KAFO as their control device and a microprocessor-controlled hydraulic KAFO as their novel device, namely the C-Brace (Ottobock, Duderstadt, Germany). Each device was used by the participants at home and in the community. Unlike traditional KAFOs, the C-Brace embeds a computer-controlled hydraulic unit that dynamically changes the impedance of the knee joint by using sensors in the knee and ankle joint that infer the slope of the ground surface and the user intent [
All subjects wore a triaxial accelerometer (Actigraph wGT3X-BT; Actigraph LLC, Pensacola, FL) that recorded data at a sampling frequency of 30 Hz and was strapped around their waist on the right side with a belt. We aimed at detecting the following 5 functional activities: sitting, stair climbing and descent, standing, and walking. All subjects performed a scripted sequence containing the 5 activities, over 3 different sessions, which took place on separate days. Here, we define a single repetition of the sequence as a “session.” The total time of the recordings for each patient lasted an average of 35 minutes.
During each session, subjects were asked to sit comfortably while talking, gesturing, or checking their phone. They were then asked to stand while washing their hands or pouring and drinking water. Participants then walked at a self-selected, comfortable pace, and finally ascended and descended at least one flight of stairs at a self-selected pace. Each activity was performed for at least 30 seconds to ensure that enough data were collected. For safety purposes, all individuals with disabilities were supervised by a physical therapist.
Healthy subjects performed the scripted activities 3 times during 1 session. Patients performed the scripted activities during clinical training. For this data analysis, 3 sessions using the control assistive device and 3 using the novel assistive device were used. The sessions took place over a 3-week period on average. Due to comfort and safety issues related to their disability when using the new device, 2 patients could not ascend or descend stairs. A researcher observed the sessions and recorded the length of the activities for subsequent data labeling. Furthermore, all patients were administered the Orthotics Prosthetics Users Survey self-report questionnaire for lower extremity functional status (OPUS-LEFS) at the end of the study, to rate their level of comfort in using each KAFO. On average, all participants rated both the control and the novel device equally comfortable.
Accelerometer data were downloaded on a personal computer using the Actigraph ActiLife software (Actigraph LLC, Pensacola, FL). Data windows of 6 seconds with 75% overlap were extracted from the raw acceleration data and a set of 131 features (
We selected random forest as it does not suffer from overfitting, performs well in activity recognition problems [
List of features computed on the accelerometer data used for activity classification.
Description | Number of features |
Mean, range, interquartile range ( |
9 |
Moments: standard deviation, skew, kurtosis ( |
9 |
Histogram: bin counts of −2 to 1 |
12 |
Derivative of moments: mean, standard deviation, skew, kurtosis ( |
12 |
Mean of the squared norm | 1 |
Sum of axial standard deviations | 1 |
Pearson correlation coefficient, |
3 |
Mean cross products (raw and normalized), |
6 |
Absolute mean of cross products (raw and normalized) | 6 |
Power spectra: mean, standard deviation, skew, kurtosis ( |
12 |
Mean power in 0.5 Hz bins between 0 and 10 Hz ( |
60 |
We trained 5 classification models (
A. The two types of assistive devices (knee-ankle-foot orthosis, KAFO) used in the study. Patients performed activities with their control KAFO (passive stance-control orthosis) and then with the novel KAFO (Ottobock computer-controlled C-Brace). B. Experimental setup, data processing, and activity recognition steps (adapted with permission from [
Diagram depicting increasing specificity of classification models in terms of what groups of individuals (able-bodied or individuals with disabilities/patients) they are trained on. Patients are depicted using their control (black) or novel (red) assistive device. Each classification model is used to predict activities for the patient of interest (Test), walking with the novel assistive device. The top 3 layers of the pyramid contain global models, which are trained on individuals other than the one used to test the model. The 2 bottom layers of the pyramid contain personal models, which are trained and tested with data from the same individual.
As stair-climbing data are largely underrepresented, there is a significant class imbalance in the dataset. Because of that, we used the balanced accuracy (mean recall) as the metric to assess classifier performance, such that the error in each class receives equal weight. In scenarios with class imbalance, it is important to use an unbiased performance metric, such as the balanced accuracy or balanced error rate, to prevent drawing erroneous conclusions about the performance of the AR model [
where
To compare performances across models we performed 4 Wilcoxon-signed rank tests to account for the non-normality of one of the distributions (Shapiro-Wilk test). These 4 tests were performed sequentially, such that each classification model was compared with the next more specific model, with alpha=.05.
Whereas personal models are trained on data from a single subject, global models are trained on data from multiple subjects. As the number of subjects in the training dataset increases, the amount of training data increases, and the classification error of a global model will likely decrease. Therefore, we evaluated the balanced accuracy of both global models (healthy and impairment-specific) as a function of the number of training subjects. For each selected number of subjects, we ran 1000 training iterations, where in each iteration we randomly picked subjects to train on and one patient’s novel device data to test on. We chose 1000 iterations to account for a sufficient number of combinations of training and test subjects and for minor fluctuations in performance of the random forest. The largest number of training subjects for the impairment-specific model is 1 minus the total number of patients, as 1 patient is always set aside for testing. For each set of models trained on a selected number of subjects, we inferred the mean and 95% confidence interval of the median balanced accuracy by bootstrap using 1000 repetitions.
We compared the performance of global and personal classifiers trained with either data from patients who used their control KAFO assistive device or the novel C-Brace assistive device. A global model trained on healthy subjects was included in the comparison, representing the least specific classification model. Models were compared based on their balanced accuracy. Global models were then compared in terms of the amount of training data (number of subjects) used to reach a certain level of accuracy.
To understand whether training data from the novel assistive device will improve performance of a global model, we compared the classification accuracy across the 3 global models (
We then examined whether training data from the novel device affected the accuracy of personal models. The patient-specific model, which is a personal model trained with a patient’s control device data and tested on the patient’s own novel device data, yielded a median balanced accuracy of 66%. However, the performance of this model varied drastically across patients (interquartile range, IQR=[47%-72%]), and overall there was no statistically significant improvement over the global device-specific model (
Conversely, a personal model trained with the novel device data (patient- and device-specific) yielded the highest median balanced accuracy (76%), providing a significant advantage over all the previous models (
As the results on the balanced accuracy do not reveal which activities are misclassified by each model, we analyzed the accuracy per class (recall) across the 5 activities for all models (
The global healthy model had the lowest recall for predicting walking (27.13%, 1337/4928), which was mostly misclassified as climbing upstairs (
On the other hand, recall for walking was significantly higher (79.26%, 3906/4928 and 91.61%, 4514/4928, respectively) in the impairment-specific and device-specific models (
Patient-specific models performed in between the global-healthy model and the global-patients’ models, with a recall of 64.33% (3170/4928) for walking and of 43.8% (273/623) for stair climbing up. Recall for stair climbing down was still low (17.2%, 100/582). Recognition of both stair-ascend and descend activities only improved with the patient- and device-specific model (43.1%, 83.7/194 and 48.0%, 99.7/207.7), although the recall was well below that for walking or other activities. Therefore, the main gain achieved by personal models trained with the new device data was on the recognition of stair-climbing activities.
The distribution of balanced accuracies for the 5 models. Each model is tested on each patient using the novel assistive device (C-Brace). Boxes represent the interquartile range (IQR), red lines are medians, and whiskers show 1.5 IQR. Red crosses are outliers.
As global models are trained with data from multiple subjects, we evaluated how many subjects are required to achieve a desired level of performance for each global model. As expected, the median balanced accuracy increased with the number of subjects for all 3 global models (
Confusion matrices for the 5 classification models, grouped by global and personal models. Numbers represent percentage of instances in that class.
Effect of number of subjects used to train each global model on the median accuracy for healthy (red), impairment-specific (blue), and device-specific (orange) global models. The maximum number of subjects for patient models is 10, as 1 patient is left out for testing (leave-one-subject-out cross-validation). Shaded areas represent the 95% confidence intervals on the medians obtained by bootstrap. The green line represents the median accuracy of the patient- and device-specific models (personal model).
We asked whether AR models for individuals walking with an assistive device (KAFO) require training data from the new KAFO (C-Brace) or whether data from their control KAFO will suffice. We found that both global and personal models performed significantly better when trained with data from the novel KAFO used by the subjects to perform the functional activities. Therefore, an AR system has to be trained with data specific to the assistive device used to maximize classification accuracy.
We examined both global and personal models. Although global models were trained with about 16 times more samples than personal models, a personal model trained on the novel KAFO data (patient- and device-specific) largely outperformed all global models. Interestingly, this was not the case for a personal model trained on the control KAFO data (patient-specific), as the accuracy of this model was highly variable across subjects and overall not better than that of a global device-specific model. Therefore, in this scenario, a personal model might only help if trained with data from the specific assistive device used.
On the other hand, global models are arguably easier to deploy, as they do not require collecting data on each and every new patient [
Although the performance of the global-healthy model increased with the number of training subjects, this model was outperformed by global models trained on patients using the novel KAFO (device-specific). One reason is that gait patterns in individuals with disabilities can be markedly different from those of able-bodied subjects [
There were certain limitations to our study that we need to acknowledge. We only had a sample of 11 individuals with disabilities (patients) for training the global models; adding more subjects could increase the performance of these models, and should be explored in future studies. It has to be noted though that the accuracy of global models was dramatically lower than that of personal device-specific models. As reported by some prior studies, global models might not reach the performance of personal models even when a large number of subjects are used [
We asked our subjects to perform a structured set of activities in a lab setting and under the supervision of a clinician. Although specific instructions on how to perform activities were not provided (eg, washing hands or checking the phone), this scenario is still different from a natural environment. Previous studies showed that the accuracy of AR can drop significantly when the data collection is performed outside of a lab-controlled condition [
We compared performance of global models to that of personal models. However, one can also use intermediate approaches, where both data from other subjects and personal data are combined to train a new model. For example, activity-specific personal models from other subjects can be combined to fit a small dataset of labeled data from the target subject (semipopulation models) [
We only used one sensor (accelerometer) attached to the participants’ belt to detect the activity performed. This solution is unobtrusive and well suited for a long-term monitoring scenario, particularly in disabled or elder populations [
Guidelines on how to use wearable technology to track functional activities in populations other than young able-bodied are still lacking [
activity recognition
interquartile range
knee-ankle-foot orthosis
orthotics prosthetics users survey for lower extremity functional status
This research was funded by Otto Bock Healthcare Products, GmBH (Grant: CBrace 80795). The sponsor had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
None declared.