This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Rehabilitation and Assistive Technology, is properly cited. The complete bibliographic information, a link to the original publication on http://rehab.jmir.org/, as well as this copyright and license information must be included.
Performing physiotherapy exercises in front of a physiotherapist yields qualitative assessment notes and immediate feedback. However, practicing the exercises at home lacks feedback on how well patients are performing the prescribed tasks. The absence of proper feedback might result in patients performing the exercises incorrectly, which could worsen their condition. We present an approach to generate performance scores to enable tracking the progress by both the patient at home and the physiotherapist in the clinic.
This study aims to propose the use of 2 machine learning algorithms, dynamic time warping (DTW) and hidden Markov model (HMM), to quantitatively assess the patient’s performance with respect to a reference.
Movement data were recorded using a motion sensor (Kinect V2), capable of detecting 25 joints in the human skeleton model, and were compared with those of a reference. A total of 16 participants were recruited to perform 4 different exercises: shoulder abduction, hip abduction, lunge, and sit-to-stand exercises. Their performance was compared with that of a physiotherapist as a reference.
Both algorithms showed a similar trend in assessing participant performance. However, their sensitivity levels were different. Although DTW was more sensitive to small changes, HMM captured a general view of the performance, being less sensitive to the details.
The chosen algorithms demonstrated their capacity to objectively assess the performance of physical therapy. HMM may be more suitable in the early stages of a physiotherapy program to capture and report general performance, whereas DTW could be used later to focus on the details. The scores enable the patient to monitor their daily performance. They can also be reported back to the physiotherapist to track and assess patient progress, provide feedback, and adjust the exercise program if needed.
Rehabilitation is essential to regain lost or weakened functionality after injury or surgery. Although it is commonly initiated in a clinic and supervised by a physiotherapist, the prescribed therapeutic exercises will normally need to be practiced at home by patients on their own. Lack of motivation and compliance may hinder the healing process and, in some cases, even worsen the injury. Advances in virtual reality technologies have resulted in various virtual rehabilitation platforms introduced to address this issue [
The impact of serious games on physical therapy has been studied in terms of effectiveness [
The concept of exergaming enables exercising when playing games. For players, it is an opportunity to play games in a more active and less passive manner. For patients, it offers the opportunity to practice therapeutic tasks in a more playful and less repetitive manner. Exergames offer various activities, such as aerobic exercises and dancing; balance and stretching workouts; and recreational simulations, such as golf, skiing, and more. However, they require additional hardware and software. In terms of hardware, they require proper sensory equipment to track the user's motion. In terms of software, the game scenario must accommodate whole body interaction. There are various commercially available game consoles that enable exergames, including Xbox (Microsoft), PlayStation (Sony), and Wii (Nintendo). Each comes with its own dedicated input device for enabling user interaction with the games, that is, Kinect for Xbox, Move for PlayStation, and Remote Plus for Wii.
Among them, Kinect has gained higher popularity owing to its acceptable performance and versatility [
Kinerehab was introduced by Chang et al [
Commercially available rehabilitation platforms based on Kinect and exergames include MIRA [
Automatic performance evaluation of a user carrying out a task has always been a challenge among researchers in both medical and nonmedical domains. Such evaluations are usually subjective and, in the real world, are performed by judges who are experts in the given field. For example, evaluation of the quality of a dance, a gymnastic performance, or a physiotherapy/rehabilitation exercise may be performed by expert dancers, sport athletes, and professional therapists, respectively. Such evaluations require the presence of human specialists who may not be easily accessible or affordable. In addition, the fact that the assessment is subjective indicates that a different expert might have a different opinion. The ability to conduct objective automatic evaluations that are repeatable is thus highly desirable. A real-world example is a video game called Just Dance, which is developed by the French company Ubisoft for Microsoft Xbox. Using the Kinect sensor, the players must mimic the onscreen dancer's choreography to a chosen song. The system then continuously evaluates in real time the quality of a user's dance movements in terms of being “Ok,” “Good,” “Super,” or “Perfect” and reports a total numeric score at the end [
Studies in the literature concerned with automated evaluation of therapy motions are scarce [
Using KEHR, Su et al [
Similar techniques have also been employed in other domains, such as dance motion evaluation. Jang et al [
A common factor among all these efforts was that they focused on evaluating the incorrectness of the performance on the basis of subjective terms. In most cases, the method developed was used to sort multiple erratic performances with respect to a reference template. This approach motivated us to explore the use of DTW and HMM to generate a similarity score between a participant’s performance and a reference.
MIRA is a software platform that turns physiotherapy exercises into clinical exergames [
The MIRA system includes a Kinect V2 sensor (Microsoft Corp) connected to a computer running the MIRA program (
At the end of each session, the MIRA system reports various scores. Depending on the game, it reflects on the extent to which the player follows the game's objectives. For example, the number of fish caught and taken to the boat or the number of times the spaceship is safely passed through the fire rings. Although the scores can be an indication of how well the user played the game, they do not have much value in a clinical context. The aim of this work is to introduce an objective evaluation method that is more meaningful and suitable for clinical evaluation.
Medical Interactive Recovery Assistant system including a Kinect motion sensor and software to match an exercise with a game. A child (left) and an elderly gentleman (right) playing the Atlantis game. The child is practicing an arm exercise, the gentleman a hip exercise.
A snapshot of the Medical Interactive Recovery Assistant program. Exercises are listed on the left and games are shown on the right. Multiple game options allow practicing the same exercise with different games, thus encouraging patients to cope with the prescribed exercise by discovering the various game scenarios.
We developed a program in the Unity 3D game engine (Unity Technologies) [
As 3D position coordinates are dependent on the user size and location in front of the Kinect camera, we decided to extract invariant features (joint angles) to describe each exercise optimally (
A motion trajectory T(l) is formed by the sequence of feature values within the time frame 0≤t≤l, where l is the execution time. T(l) is a matrix of size l×3 for the lunge exercise and l×2 for rest.
Shoulder abduction: the arm should be kept close to the body. The exercise consists of raising the arm away from the side, keeping it in a straight line with the body.
Hip abduction: the leg should be held straight and on the ground. The exercise involves raising the leg away from the side, keeping it in a straight line with the body.
Lunge: stand straight facing forward with the spine and the pelvis in a neutral position. Take a step forward with a leg that is long enough so that when the knee bends, it does not go beyond the toes. Bend the back knee until it almost touches the floor, keeping both the torso and the spine in a neutral position. Return to the starting position.
Sit-to-stand: sit on a chair. Without using the hands for support, stand up and sit back down. Make sure each movement is slow and controlled.
Extracted features (joint angles) from the joint 3D positions for each type of exercise. Both 2D side view and 3D perspective view are provided for clarity. 2D: 2-dimensional; 3D: 3-dimensional.
Sample extracted features obtained from 3D position for all seven exercises: hip abduction left (hpl) and right (hpr), lunge left (lul) and right (lur), shoulder abduction left (shl) and right (shr), and sit-to-stand (sit).
DTW [
Although DTW was initially applied to speech recognition, it has also been widely used in gesture recognition [
We define DRP=DTW(TR,TP) as a distance measure between the reference (TR) and the participant (TP) trajectories. The MATLAB function
An HMM [
For performance evaluation, a single HMM, λR, is trained based on the reference motion trajectory TR. We then calculate the log likelihood of TP given the trained model by LRP=log(P(Tp|λR))/lp. Similar to DTW, the lower and upper limits of the log likelihood need to be calculated. The upper limit (Lu) is known and is equal to 0, as the highest probability is 1. However, the lower limit is unknown and can be any small value less than zero. Same as before, we assumed that this lower limit reflects the worst possible performance captured by TW. Letting the lower limit be LRW=log(P(Tw |λR))/lw, the similarity score 0≤SH≤100 corresponding to log likelihood Ll≤L≤Lu=0 is obtained by:
Hidden Markov model with three states (q1 to q3) and five observation symbols (v1 to v5). The relationship among states is described by transition matrix A=[aij]3x3, and between states and discreet symbols by observation matrix B=[bij]3x5. It is assumed the system evolves through certain states whose relationship is to be studied.
Standard hardware (computer, television, Kinect sensor) was used in combination with a Unity program to display the participant’s live performance on the screen and store the 3D position data along with a time stamp. As mentioned above, 4 types of exercises were chosen to be performed by the participants: shoulder abduction, hip abduction, lunge, and sit-to-stand exercise. Except the sit-to-stand exercises, all other exercises were performed for both the left and the right sides, resulting in a total of 7 exercises.
A total of 16 healthy participants, including 8 adult females (22 to 30 years), 6 adult males (22 to 40 years), and 2 school boys (12 and 17 years), were recruited for the study. Participants were asked to stand in front of the Kinect sensor and perform each of the 7 exercises for 20 seconds. They were told to repeat the chosen exercise at least five times with a short pause between each repetition. In addition to the 16 participants, the physiotherapist involved in the project (DS) was asked to perform the exercises as the reference performance. He repeated each of the 7 exercises at least five times during a period of 20 seconds each.
Of the 5 repetitions, 3 were extracted (ignoring the first and the last) for each participant. For DTW scores, the distance between each repetition of a participant and each repetition of the physiotherapist was calculated, yielding a total of 9 values. The final DTW score (SD) was obtained by taking the average of these values. For HMM scores, the likelihood of each repetition of a participant given the physiotherapist’s model was calculated, yielding a total of 3 values. The final HMM score (SH) was obtained by taking the average of these values.
Similarity scores obtained from applying dynamic time warping (SD, solid blue line) and hidden Markov model (SH, solid red line). Their difference, SH-SD, is shown by the black dashed line. Participant 17 is the physiotherapist and 18 is the worst performance, that is, making no movement. Error bars indicate standard error.
Several observations can be made from these plots in
Regarding the difference, SH−SD, it is difficult to observe any obvious pattern. On some occasions, the difference was positive and on some others it was negative. There were also cases where the difference was negligible. The difference could be as large as +18% (participant 2, lunge—left) or as small as −24% (participant 6, hip abduction—right). For the reference participant 17, the difference was generally very low (−1%, −1%, 4%, 1%, 2%, 1%, and 0%). However, this was not the case for the worst performance, participant 18. In most exercises, the difference was negative and was not negligible, indicating that the SD was usually larger than the SH for the worst performance. In addition, in most plots, SH was closer to 0% than SD for the worst performance.
Among the exercises, shoulder abduction was less challenging and easy to perform, which is reflected by participants performing well and achieving high similarity scores. In contrast, lunge was the most difficult and demanding exercise to perform, which is also reflected in the obtained similarity values.
Time domain plots of the best and worst performances can be used to visually examine the correlation between the trajectories and the calculated scores. For each of the 7 exercises, the trajectories of the best and the worst performances were plotted against the reference (
Except the lunge exercise for which 3 features (joint angles) were extracted, 2 features were obtained for all the other exercises. However, not all the extracted features had the same weight and importance. For instance, in hip abduction, referring to
Time-domain plots of the best and worst trajectories (excluding participants 17 and 18). X-axis indicates the time in seconds and Y-axis indicates the joint angle in degree. The three repetitions of the reference trajectories are given in dashed blue lines and the chosen participant’s trajectories in solid red lines. All plots include two features, except for lunge where three features are presented.
The effect on the scores of removing minor features.
Although single-feature SD is clearly larger than full-feature SD, single-feature SH is almost the same as full-feature SH. Adding more details, that is, presenting additional minor features, increases the distance values (and hence decreases the similarity scores) obtained by applying DTW.
Although DTW is more sensitive to detail, HMM is more sensitive to the way the feature space is quantized. Quantization is a preprocess applied over the extracted features to segment them into several clusters for the purpose of training a discrete HMM. The boundaries and the number of clusters have an obvious effect on the HMM scores. This can be seen in
The effect of quantization on the calculated hidden Markov model scores.
Worst performance is a key factor affecting the scores in both measures as it corresponds to the upper or lower boundary, as previously explained. This is evident from
SD=100×(D-Dl)/(Du-Dl)=100×(D/Du) and
SH=100×(L-Ll)/ (Lu-Ll)=100×(Ll-L/Ll). Both unknown limits (Du for DTW and Ll for HMM) are used as denominators to normalize the distance
SD=100×(D-Dl)/(Du-Dl)=100×(D/Du) and likelihood
SH= 100×(L-Ll)/(Lu-Ll)=100×(Ll-L/Ll)
values. The larger the denominator, the smaller the deviations (fluctuations) in the scores. This value can also be intentionally altered to adjust the sensitivity of the scores. With a larger denominator, the scores are smoother and the differences between participants become smaller. With a smaller denominator, the scores become sharper and the differences between participants are highlighted. As explained previously, we chose no movement as the worst performance for all exercises. Seemingly, a different worst performance can yield different scores if it generates different denominators. For example, one might say that closing the elbow in shoulder abduction could be worse than keeping it stretched (the current situation). An example of altering limits (multiplying and dividing Du and Ll by 2) for sit-to-stand is shown in
Several comparative tests were also conducted.
A comparison between the left and right performances, excluding sit-to-stand, is shown in
Furthermore, a comparison between female and male participants is shown in
Effect of altering limits (Du and Ll, obtained from the worst performance) on the scores: multiplied by 2 (left) and divided by 2 (right). Error bars indicate standard error.
The combined and averaged scores (left: SD and right: SH) for all seven exercises. Error bars indicate the standard error.
Comparison between left and right scores. Error bars indicate standard error.
Metrices | Exercises | ||
|
Hip abduction | Lunge | Shoulder abduction |
SDa | .27 | .09 | .44 |
SHb | .85 | .25 | .45 |
aSD: dynamic time warping score.
bSH: hidden Markov model score.
We implemented and compared 2 commonly used machine learning algorithms, DTW and HMM, to objectively evaluate the performance of patients using a rehabilitation exergaming platform. 3D movement data were obtained using the Kinect depth camera, and invariant features (joint angles) describing each exercise were extracted. The extracted features are independent of body fit, size, and position and distance of the user to the Kinect. They are also independent of the hardware being used and can be adapted for any motion-sensing device capable of tracking human skeleton joints, such as those mentioned in the
Setting a physiotherapist performance as the
The application of these similarity scores is twofold. The scores can be used by the patients at home to encourage them to continue practicing the exergames to achieve higher similarity scores. In addition, the scores can be reported back to the physiotherapist to monitor patient progress and provide feedback. The exercise program can also be adjusted by the physiotherapist given the level of progress to better fit the patient’s needs and progression.
Our proposed method has the potential for significant impact in the context of rehabilitation exergames by enabling remote therapy home-based sessions where performance can still be adequately monitored. This can help better assess the quality of physical exercises performed by patients, fine-tune rehabilitation programs, and enhance the efficiency of home-based rehabilitation. In turn, cost reductions and freeing up of physiotherapy unit time may also be achieved.
Future work will include testing our proposed system on a public data set such as the University of Texas at Dallas-Multimodal Human Action Dataset [
Comparison between female and male participants, left: DTW scores (SD) and right: HMM scores (SH). Error bars indicate standard error.
3-dimensional
dynamic time warping
hidden Markov model
Kinect-enabled home-based rehabilitation system
Medical Interactive Recovery Assistant
The project was funded through MedCity’s Collaborate to Innovate program and received funding from the European Regional Development Fund and the Higher Education Funding Council for England. The authors acknowledge the support of MIRA Rehab Ltd as well as members of the Simulation and Modelling in Medicine and Surgery research group at the Imperial College London.
None declared.