BeagleBoard/GSoC/2022 Proposal/Running Machine Learning Models on Bela

From eLinux.org
Jump to: navigation, search

Running Machine Learning Models on Bela

Student: Ezra Pierce
Mentors: Jack Armitage, Victor Shepardson
Proposal:[1]

Proposal

All requirements listed on the ideas page have been completed, PR for cross compilation task can be found here.

Status

This project is currently just a proposal.

About you

Github: ezrapierce000
School: [Carleton University]
Country: Canada
Primary language : English
Typical work hours: 9AM-6PM Eastern Standard Time
Previous GSoC participation: This would be my first time participating in GSoC.

About your project

Project name: Running Machine Learning Models on Bela

Introduction

The goal of this project is to improve the tooling surrounding embedded machine learning on the BeagleBone Black(BBB)/Bela to aid its community in experimenting with machine learning applications for their projects. The specific developer tools chosen for this project are an inference benchmarking tool as well as a perf-based profiler developed for the BBB/Bela platform.

Bela is a platform built upon the BeagleBone Black, consisting of an audio cape and a custom real-time Linux image using the Xenomai framework. This platform provides a low-latency computing environment ideal for use in audio applications. There already exists a large community surrounding the Bela, as it is an increasingly popular platform for use in educational settings as well as musical instrument design and maker communities. This project aims to extend the Bela platform to include tools and documentation for machine learning projects, with the goal of simplifying the process of integrating machine learning models into real-time embedded Bela projects. As the Bela platform has been adopted by a wide range of users, from artists to engineers, this project will aim to provide tooling that caters to this broad userbase.

The usage of machine learning in instrument design has grown in recent years, yet there have not been many implementations in more resource-constrained embedded contexts like the BBB/Bela. This can be attributed to the fact that machine learning can be very computationally expensive, with many typical applications requiring GPUs, TPUs or other custom hardware accelerators. Although, with the growing industry interest in edge computing, there have been increasing numbers of projects looking to optimize the whole machine learning pipeline for embedded devices, such as TinyML, TFLite and many others. This project aims to leverage tools such as these to give Bela users the ability to deploy ML models to their devices.

One of the current challenges in doing so is the real-time nature of audio projects on the Bela, which is a key factor when developing instruments or interactive sensor systems. This imposes a latency requirement on any models being run. This strict latency requirement implies the need for performance analysis tools that can evaluate and measure ML models, providing feedback to the user on the runtime costs incurred by their models. Thus, this project's main focus will be the development of performance analysis tools for running machine learning models on the Bela. This will come in the form of a benchmarking tool and a profiling tool. The benchmarking tool will be used to take latency, memory and accuracy measurements, meant to be used when comparing different ML runtime components, model architectures and/or compilers. The profiler will be used to pinpoint bottlenecks during model development, allowing developers to discover slow operators and view CPU utilization. In addition to these tools, this project will also aim to build some exemplary projects that document the setup of ML projects on the BBB/Bela, providing a starting point to users looking to explore this space. During development of both tools, a focus will be put on keeping all code portable to allow for use on future BeagleBoard/Bela platforms.

Implementation

ML Stack

This project will focus on a specific modeling langauge (PyTorch) and platform (BBB+Bela). In between there are a number of potential model formats, compilers/runtimes frontends, and backend components. The analysis tools built during this project will aim to support multiple runtime frontends and backends to allow developers to compare performance results between them.

Summary of stack:

  • Modeling language: pytorch. (+tensorflow for converting to tflite)
  • Model format: ONNX, torchscript, (+tflite)
  • Runtime frontends: libtorch, ONNX runtime, SOFIE, (+tflite)
  • Runtime backend components: ArmNN, XNNPack, eigen, BLAS
  • OS + Hardware: Bela + BBB.

Some NN compiler projects will also be audited for potential BBB support:

Benchmarking Tool

This project will provide both a benchmarking tool and a profiling tool to be used to evaluate machine learning models on the BBB/Bela. The benchmarking tool will provide the following measurements:

  • Average latency
  • Maximum latency
  • Latency jitter
  • Average memory usage
  • Maximum memory usage
  • Accuracy

This will be done by providing a common frontend for the pre-existing frontends listed above, allowing developers to chose which runtime components they would like to test. This common frontend will be used to take latency measurements at each inference, while the benchmarking tool is also sampling the memory usage concurrently from a separate thread to allow for average and maximum memory measurements. The benchmarking tool should also allow for developers to provide test data for accuracy measurements. The tool will also facilitate the loading of a model from the developer's host PC to the BBB/Bela, supporting both torchscript and ONNX models.

The benchmarking tool on the BBB/Bela will be written in C++ with a simple Python tool on the host PC for model loading and communication between the developer's PC and the BBB/Bela.


GsocDiagram2.png

Profiling Tool

The profiling tool will aim to provide a GUI interface for the display of CPU cycles per function call, thread utilization and the call stack. This tool will be built around the perf Linux utility, which is a statistical profiler based on CPU performance counters. To provide a more intuitive interface, this project will build a simple local webserver (similar to the Bela IDE or perhaps integrated into the Bela IDE) that will display the data captured in a visual form. This will be done using the pprof profiling visualizer and the perf_data_converter tool. As an alternative, the perf-based hotspot tool will also be evaluated for use in this project. This tool will have to be run in a linux thread as opposed to a real-time Xenomai thread but the results should still be applicable for supporting model optimization work. Optimizations can then be tested with the benchmarking tool in a real-time Xenomai thread.

caption

Example flamegraph from the hotspotprofiling tool.

Model Selection

Much work on embedded ML focuses on a narrow application domain (such as image classification) and a small set of canonical model architectures. But Digital Musical Instrument (DMI) design demands creative ways of processing data of various shapes and modalities, from high sample rate audio streams to heterogeneous sensor inputs. Rather than optimize for a particular method or use-case, this project will aim for wide coverage of PyTorch operators and fine-grained visibility into their performance. To that end, evaluation will focus on composable neural network blocks which are relevant to ML tasks like sequence modeling, classification and variational autoencoding (the building-blocks of musical applications like gesture recognition, audio synthesis and control mapping). These will include:

  • matrix-vector product (at various sizes, with quantization, sparsity)
  • 1D convolutional networks (with groups and dilation)
  • memory-cell RNNs (LSTM, GRU)
  • multi-layer perceptrons (with various activation functions)
  • transformer blocks (dynamic input sizes, batch normalization)
  • mixture-density heads (testing various elementwise, shape and reduction ops)
  • reparameterized sampling and KL divergence for normal distributions

Example Projects

As an additional goal, if time permits, this project will also develop some example projects in Python and for Bela. These projects will serve as a learning tool for people looking to explore embedded ML on the BBB/Bela. They will cover the installation, configuration and use of the relevant tools as well as provide example code for building, training and running ML models on the BBB/Bela. They will also of course provide documentation for the use of the tools developed during this project. Based on the preliminary results recorded during this project, more specific example projects could be developed to target some potential use cases such as gesture recognition and mapping, neural audio synthesis or dimensionality reduction of incoming sensor data. These more targeted example projects would be valuable in providing concrete, applicable examples for the Bela community, inspiring new ideas and further development.

The development of example projects will also allow for the refinement of the performance analysis tools as it may inform the development of new features when used in real-world practice.

Timeline

Date Status Details
Presubmission
  • Research current embedded AI benchmarking tools, evaluate their feasibility for Bela-specific use cases
  • Research embedded linux profiling techniques, options for visualizing profiling data
  • Background reading on the use of Machine Learning in musical instrument design
  • Background reading on currently available embedded AI libraries
May 20th - June 12th Community Bonding
June 13th Milestone #1
  • Compilation of different components on Bela, libtorch, TFLite, ArmNN
  • Start coding benchmarking suite for use in Bela projects
  • Implement simple Python library for loading models & test data from host PC to Bela and communicating benchmarking results from Bela to host PC
June 20th Milestone #2
  • Test out both pprof and hotspot profilers
  • Finish model loading feature of benchmarking tool
June 27th Milestone #3
  • Implement SOFIE wrapper for benchmarking tool
  • Implement libtorch wrapper for benchmarking suite
  • Research compatibility of different neural network compilers with the BBB/Bela
July 4th Milestone #4
  • Finish latency measurement feature of the benchmarking tool
  • Run benchmarking latency test on MLP model
July 11th Milestone #5
  • Finish accuracy feature of benchmarking suite
  • Begin working on profiler/benchmarking GUI
  • Build and run benchmarking on 1-D convolutional network model
July 18th Milestone #6
  • Finish memory usage feature of the benchmarking suite
  • Document findings thus far in blog post
July 25th Milestone #7
  • Submit Phase 1 evaluation
  • Finish work on profiler
  • Build, benchmark and profile memory-cell RNN model, document bottlenecks using profiling data
August 1st Milestone #8
  • Build, benchmark and profile transformer block model, document bottlenecks using profiling data
  • Tweak profiler/benchmarking GUI
August 8th Milestone #9
  • Build, benchmark and profile mixture density head, document bottlenecks using profiling data
  • Evaluate documented bottlenecks from past weeks and research potential optimization techniques
August 15th Milestone #10
  • Begin writing example project(s) based on most promising model(s)
  • Work on implementing any optimization techniques
August 22nd Milestone #11
  • Finish writing example project(s)
  • Build, benchmark and profile optimized model(s), compare benchmarks against past iteration(s)
August 29th Milestone #12
  • Submit final work product and final mentor evaluation
  • Complete YouTube video
  • Document work done, results and next steps in blog post
Sep. 5th Milestone #13
  • Completion of GSoC

Experience and approach

Through coursework and multiple co-op terms in industry, I've gained experience relevant to this project such as:

  • Benchmarked hardware peripherals on an embedded linux system (RPi CM4) and TI C2000 platform for high-speed binary data transfer
  • Developed features and fixed bugs in C for embedded Linux TCP server used for sensor data acquisition
  • Built multiple Python testing systems for various software systems and hardware calibration protocols
  • Completed labs in an Intro to Machine Learning course using Keras to build, train and test models
  • Designed and implemented an audio plugin in C++ for translating audio data into haptic signals in real-time
  • Designed and implemented firmware for the Pi Pico in C++ and CircuitPython to interface with different connected modules using I2S, SPI & PWM peripherals

I believe the skills outlined above give me a strong technical base to draw from during this project, yet I am sure to come across new challenges and gaps in my knowledge during this project. To help navigate this, weekly meetings will be conducted with my mentors and I to discuss progress made on the weekly milestones as well as new ideas or roadblocks. Both mentors have significant domain expertise in relevant areas such as: Bela development, ML research and instrument design, making them a good source of advice and support during this project.

Contingency

If more support is needed during this project I'll reach out for help from various involved communities such as the BBB Slack chat, the Bela forum, the iil.is Discord or the PyTorch forum.

While writing this proposal I have also amassed some resources that may be useful during the project:

Benefit

This project will provide multiple benefits. Firstly, it will improve the development ecosystem surrounding the BBB/Bela by providing a new tool to measure the performance of different models. This will help those researching ML for use in embedded musical instrument design speed up their iteration cycle by providing measurements directly from the target hardware. In tandem with this benefit, the project will also provide researchers and developers with the ability to dive deeper down into the details of their implementation and examine the potential bottlenecks on a CPU-cycle by CPU-cycle basis. This will greatly improve the understanding of what types of model architectures could be possible on this platform, maximize the available computational resources on the BBB and motivate future optimization work. Finally, this project will improve access to embedded ML on the BBB/Bela and potential future platforms like the BBAI. This will benefit instrument designers, artists and makers by providing them with example projects and documented tools, enabling new explorations of the applications of embedded machine learning.

I think flexible modeling languages like PyTorch, which let researchers iterate quickly at multiple levels of abstraction, have a lot to do with recent advances in machine learning. Being able to define and train models in PyTorch and move them quickly onto Bela, with visibility into their performance characteristics, would vastly accelerate efforts to use deep learning in an embedded musical context.

~ Victor Shepardson

Programming embedded systems for real-time musical interaction is extremely difficult, despite the existence of ground-breaking projects like Bela. Embedded machine learning has the potential to change that, allowing developers and musicians to take advantage of the entire ML ecosystem, to teach instruments to recognise gestures, synthesise unique sounds, and much else. This project would represent a major first step towards unlocking that potential.

~ Jack Armitage

Misc

All requirements listed on the ideas page have been completed, PR for cross compilation task can be found here.