Our project sets out to solve three problems that modern fitness applications typically have. Firstly, our application will make fitness insightful. We will keep a variety of statistics for each user to provide invaluable information about their fitness goals and their rate of progression. This makes each user’s fitness journey highly personalized.
The second problem we are attempting to solve is the matter of consistency required for physical fitness. Making it to the gym multiple times a week is a daunting task for busy individuals. This is where our application comes in. With our application you can exercise from any location that you have a computer and a webcam.
The last problem for most is that exercising is difficult. Our application aims to make it easier through two key innovations. First, our application will monitor your exercises and make sure you are performing them correctly. Additionally, our application will track your live progress and rest times throughout a workout. Both of these combined can make fitness magnitudes easier for those that find fitness difficult.
Convolutional Neural Networks
The traditional neural network is a powerful tool that can be used in a variety of ways within limitations; one such limitation is the extremely costly computation of large amounts of data i.e. images. Classifying multiple images in real time with millions, potentially billions, of parameters (depending on the resolution of the images and the amount of layers and neurons used) is not practical. This is where convolutional neural networks (CNNs) come in handy. CNNs do three specific things to make image classification practical:
- Reduce the number of input nodes and parameters.
- Tolerate small shifts where pixels in an image are located.
- Take advantage of the correlations that are observed in complex images.
The first thing a CNN does is apply a filter to the input image. Filters are sometimes also called kernels. For this explanation the former will be used. A filter is a smaller matrix of pixels typically around the size of 3 x 3. The intensity of each pixel in the filter is determined by back propagation. To begin with the pixel values in the filter are randomized.
The filter is overlaid onto the image and the dot product of the image and the filter is calculated to get a value that we add a bias to and then put into a feature map.
We can say that the filter is “convolved” with the input after performing this which is what gives CNNs their name. The filter is then moved over one pixel or possibly more, this is up to the programmers discretion, and does that process over again. Once the feature map is finished we run it through an activation function, typically ReLU. Once the map has been run through ReLU we max pool the feature map to further minimize the size, max pooling meaning that the filter does not overlap itself, so for a 2 x 2 filter it would move 2 pixels. Max pooling selects the spots where the filter did the best job matching the input image; mean pool alternatively could be used where the mean of the filter is taken rather than the maximum value.
Now we can either use the output from a final convolutional layer or flatten to a n x 1 layer that can be fed into dense layers giving us our desired output for the task we have designated the network for.
Our application is a basic program that captures video feed from a user’s device. The application sends that video feed to an application that generates wire frames based on a human body. The application then sends the video of the wire frames to our state-of-the-art neural network. The network then determines if the wire frames, based on the user, are moving in such a way that the user is doing a correct push-up. If the network determines that a user has done a correct push-up it increases the user’s score. We keep track of the number of correct push-ups the user has done and save that information based on the user’s credentials to our database. The database is behind an API that allows us to easily select and insert information securely. Our website is also connected to this API and will display a user’s information based on their login credentials.
Pose Estimation Model Details
Our model architecture for pose estimation consists of a fully convolutional neural network. Fully convolutional meaning that the output is from a convolution layer and is never flattened into a dense layer. The output, because it is fully convolutional, is a set of 16 heatmaps of size 64 x 64 (16 x 64 x 64). We utilize a specific module called hourglass modules for our main building block of the network.
Inside, before and after each hourglass module we have a numerous amount of residual connections. The inside of each hourglass consists of a process of downsampling and upsampling and consolidating and processing features at different resolutions. We have a smaller residual module that is implemented in blocks in each hourglass. In addition to the residual connections from before to after each residual block module we also have residual connections from the encoder to the decoder end of the hourglass modules. This process is done by taking a residual connection from a smaller resolution portion of the encoder, it is then upsampled and added with its corresponding layer on the decoder side of the hourglass.
These hourglass modules have seen proficient results with just a single one implemented in a network. However, our network attempts to go further by stacking multiple hourglasses back to back for revaluation of initial predictions from earlier hourglasses. This process also allows us to apply a key technique crucial to the success of the network: intermediate supervision. In between each hourglass we map the output to 16 heatmaps so that we can apply loss, once loss can be applied we map it back to its original output filter size and it is added with its residual connections that come from before the current hourglass and the unmapped original output of the hourglass.
Our network utilizes 8 stacked hourglass modules with 7 points of intermediate supervision and 1 output that we can also apply loss to. We used an adam optimizer during training and MSE loss.
Exercise Recognition Experimentation
The approach the team took to how the application detected and ensured correct exercising was the use of angles and trigonometry functions. Given the points generated by the wireframe model, trig functions were used to determine the angles at certain locations on the body. Certain angle ranges at the correct locations can help determine the exercise and how well that exercise is being done. Additionally, some exercises require the distance between two points being checked. The application currently checks for three exercises: Push-Ups, Sit-Ups, and Squats. Each exercise is checked in its own method; however, each method is called at the same time so each exercise is checked at the same time. If one of the three exercises is identified then a counter is increased for that particular exercise. Below are a few examples of the initial test angles that were used and built upon.
<Insert initial_squat.jpg, initial_situp.jpg, & initial_pushup.jpg in column format>
In the most up-to-date version of the exercise recognition a few major features were added. First, a user interface was overlaid to make it more convenient to see the current count. Second, exercises can still be recognized regardless of which side of the body is facing the camera. Third, an “isInFrame” method was created to ensure that the body is well enough in frame for the wireframe to properly map onto the body. This helps reduce the likelihood of error. Minor “checks” were implemented to ensure a smooth exercise for the user.
In the end it does come down to a balancing game. If there are too many checks then it won’t recognize anything because the wireframe isn’t stable enough. However, if there aren’t enough checks then nothing is going to be accurately recognized.
There are four parts to our website development: registration, login, personal statistics, and global statistics. Registration was accomplished by storing a unique username and password hash in the database. If the provided passwords during registration do not match the website will not allow the user to register until they do match. If the username already exists on the database the website will notify the user that that username is already taken.
Login was accomplished by taking the entered username and password from the user, passing the password to a hash function, and checking to see if there is a row on the table with that username and password hash. If there is no result from the query the website will display “username and/or password is incorrect” to the user. If there exists a user on the table with the given username and password hash, the API will return a User object with the username and user ID and a Session object with the username, user ID, and session hash. The session information will be inserted into the database for future use. The website will write the returned session object to local memory to use for future automatic login attempts. When the website is closed and reopened, if a session exists in local memory, the app will send a request to attempt to login with that token. If the database contains a session with the provided user ID, username, and session hash, that will be considered a valid login. The application will then generate a new session object, with a new session hash, username, and user ID, that will be sent back to the website. The website will then update the session object stored in memory with the new session information. Logging out will clear the saved session object so there will be no automatic login attempt once the page is re-opened.
Personal statistics are determined by the user ID of the currently logged in user. Personal statistics are automatically gathered and displayed if the user is logged in. If the Workouts table contains workouts belonging to that user ID, the API will return those rows to the website to be displayed.
Global statistics is a table with a user on each row. The table has the total number of each exercise each user has done and is sorted in descending order of total exercises completed. This table is automatically refreshed when the website is opened.
Python GUI Application Development
The Python Application is the user’s front-end to the wireframe model and exercise recognition functions. Additionally, the user is able to register, login, and connect to the website using this GUI.
The application is a fixed 700 x 510 because there is not too much for the user to interact with so taking the full screen would only create unused real estate on the page. Additionally, the application takes advantage of the stacked widgets. A widget represents a front-end page. Each widget can be swapped in and out for viewing/interacting purposes. In total there are five widgets: HomeWindow, RegisterWindow, WelcomeWindow, ResultWindow, and Result2Window. The reason for their being two result windows is that there is a function within the result window that can immediately lead back to itself. To ensure that everything is properly reloaded (i.e. exercise scores) it is sent to a different, yet same result window. These two windows bounce back and forth for as long as the user is using the exercise application. Each widget connects and transitions smoothly with each other.
The registration, login, and sending exercise data functions all rely on the Python API which has specifically setup methods to deliver and receive any information that is required. All error handling is done locally on the application as opposed to the back-end API or elsewhere within the process.
- Add Additional Exercises
- Compatible Mobile Development
- Improve Feedback Analytics
Alana Matheny: Alana Matheny is Computer Science major with concentration in Data Science and Artificial Intelligence and a minor in Mathematics in the department of Computer Science and Engineering at the University of Arkansas – Fort Smith. She has completed relevant coursework for the proposed project by completing CS 3113 – Artificial Intelligence, CS 2033 – Web Systems, CS 2043 – Database Systems II, CS 3003 – Distributed Systems, CS 4003 – Software Engineering, CS 4033 – Ethics in Professional Practice, CS 3323 – Computer Graphics, CS 4043 – Formal Languages, CS 4323 – Data Analytics, and CS 4373 – Information Retrieval. Her responsibilities will include the development of a machine learning model, front-end development of the website, front-end development of the application, back-end development of the database, and back-end development of the API.
Sasha Lawson: Sasha Lawson is a Computer Science major with concentration in Data Science and Artificial Intelligence and a minor in Mathematics in the department of Computer Science and Engineering at the University of Arkansas – Fort Smith. He has completed relevant coursework for the proposed project by completing CS 3113 – Artificial Intelligence, CS 3323 – Computer Graphics, and CS 4333 – Machine Learning. He also obtained relevant experience through his internship at ArcBest Technologies as a Software Developer Intern. His responsibilities will include the development of a machine learning model to learn the proper actions for a given exercise. Additionally, he will develop required back-end systems to support the dashboard webpage.
Sam Donaldson: Sam Donaldson is a Computer Science major with concentration in Data Science and Artificial Intelligence and a minor in Mathematics in the department of Computer Science and Engineering at the University of Arkansas – Fort Smith. He has completed relevant coursework for the proposed project by completing CS 3113 – Artificial Intelligence, CS 4343 – Natural Language Processing, CS 2033 – Web Systems, CS 4003 – Software Engineering, CS 3323 – Computer Graphics, and CS 2043 – Database Systems II. He also obtained relevant experience through his internship at ArcBest Technology as a Software Developer Intern. His responsibilities will be the development of a machine learning model, front-end development of the website, and back-end development of the database and API.
Noah Buchanan: Noah Buchanan is a Computer Science major with concentration in Data Science and Artificial Intelligence and a minor in mathematics in the department of Computer Science and Engineering at the University of Arkansas – Fort Smith. He has completed relevant coursework for the proposed project by completing CS 3113 – Artificial Intelligence, CS 4333 – Machine Learning, CS 2033 – Web Systems, CS 4003 – Software Engineering, and CS 2043 – Database Systems II, CS 4373 – Information Retrieval, CS 4363 – Internet of Things Development. His responsibilities will include the development of our proposed model, front-end development, back-end development, and development of our API.
Our project was supervised by Professor Israel Cuevas and Professor Andrew Mackey in the department of Computer Science and Engineering at the University of Arkansas – Fort Smith.