Difference between revisions of "BeagleBoard/GSoC/2022 Proposal/Running Machine Learning Models on Bela"
(→Model Selection) |
Ezrapierce (talk | contribs) |
||
Line 1: | Line 1: | ||
=[https://elinux.org/BeagleBoard/GSoC/2022_Proposal/Running_Machine_Learning_Models_on_Bela Running Machine Learning Models on Bela]= | =[https://elinux.org/BeagleBoard/GSoC/2022_Proposal/Running_Machine_Learning_Models_on_Bela Running Machine Learning Models on Bela]= | ||
− | |||
''Student'': [https://elinux.org/User:Ezrapierce Ezra Pierce]<br> | ''Student'': [https://elinux.org/User:Ezrapierce Ezra Pierce]<br> | ||
''Mentors'': [https://elinux.org/User:Jarm Jack Armitage], [https://elinux.org/User:Victor-shepardson Victor Shepardson]<br> | ''Mentors'': [https://elinux.org/User:Jarm Jack Armitage], [https://elinux.org/User:Victor-shepardson Victor Shepardson]<br> | ||
Line 27: | Line 26: | ||
The goal of this project is to improve the tooling surrounding embedded machine learning on the BeagleBone Black(BBB)/Bela to aid its community in experimenting with machine learning applications for their projects. The specific developer tools chosen for this project are an inference benchmarking tool as well as a [https://perf.wiki.kernel.org/index.php/Main_Page perf]-based profiler developed for the BBB/Bela platform. | The goal of this project is to improve the tooling surrounding embedded machine learning on the BeagleBone Black(BBB)/Bela to aid its community in experimenting with machine learning applications for their projects. The specific developer tools chosen for this project are an inference benchmarking tool as well as a [https://perf.wiki.kernel.org/index.php/Main_Page perf]-based profiler developed for the BBB/Bela platform. | ||
− | [https://bela.io/ Bela] is platform built upon the BeagleBone Black and consists of an audio cape and a custom real-time Linux image using Xenomai. This platform provides a low-latency computing environment ideal for use in audio applications. | + | [https://bela.io/ Bela] is a platform built upon the BeagleBone Black and consists of an audio cape and a custom real-time Linux image using the Xenomai framework. This platform provides a low-latency computing environment ideal for use in audio applications. There already exists a large community surrounding the Bela, as it is an increasingly popular platform for use in educational settings as well as musical instrument design and maker communities. This project aims to extend the Bela platform to include tools and documentation for machine learning projects, with the goal of simplifying the process of integrating Machine Learning models into embedded real-time Bela projects. |
− | The usage of machine learning in instrument design has grown in recent years, yet there has not been many implementations in more resource-constrained embedded contexts like the BBB/Bela. The tools developed during this project aim to improve the workflow of those interested in exploring embedded machine learning and reduce some of the barriers that come with it. The benchmarking tool will be used to take latency, memory and accuracy measurements, meant to be used when comparing different ML runtime components and/or compilers. The profiler will be used to pinpoint bottlenecks during model development, allowing developers to discover slow operators and view CPU utilization. This project will also build up a model zoo for the BBB/Bela and build some example projects. | + | The usage of machine learning in instrument design has grown in recent years, yet there has not been many implementations in more resource-constrained embedded contexts like the BBB/Bela. This can be attributed to the fact that machine learning can be very computationally expensive.The tools developed during this project aim to improve the workflow of those interested in exploring embedded machine learning and reduce some of the barriers that come with it. The benchmarking tool will be used to take latency, memory and accuracy measurements, meant to be used when comparing different ML runtime components and/or compilers. The profiler will be used to pinpoint bottlenecks during model development, allowing developers to discover slow operators and view CPU utilization. This project will also build up a model zoo for the BBB/Bela and build some example projects. |
===Implementation=== | ===Implementation=== |
Revision as of 21:44, 18 April 2022
Running Machine Learning Models on Bela
Student: Ezra Pierce
Mentors: Jack Armitage, Victor Shepardson
Proposal:[1]
Proposal
All requirements listed on the ideas page have been completed, PR for cross compilation task can be found here.
Status
This project is currently just a proposal.
About you
Github: ezrapierce000
School: [Carleton University]
Country: Canada
Primary language : English
Typical work hours: 9AM-6PM Eastern Standard Time
Previous GSoC participation: This would be my first time participating in GSoC.
About your project
Project name: Running Machine Learning Models on Bela
Introduction
The goal of this project is to improve the tooling surrounding embedded machine learning on the BeagleBone Black(BBB)/Bela to aid its community in experimenting with machine learning applications for their projects. The specific developer tools chosen for this project are an inference benchmarking tool as well as a perf-based profiler developed for the BBB/Bela platform.
Bela is a platform built upon the BeagleBone Black and consists of an audio cape and a custom real-time Linux image using the Xenomai framework. This platform provides a low-latency computing environment ideal for use in audio applications. There already exists a large community surrounding the Bela, as it is an increasingly popular platform for use in educational settings as well as musical instrument design and maker communities. This project aims to extend the Bela platform to include tools and documentation for machine learning projects, with the goal of simplifying the process of integrating Machine Learning models into embedded real-time Bela projects.
The usage of machine learning in instrument design has grown in recent years, yet there has not been many implementations in more resource-constrained embedded contexts like the BBB/Bela. This can be attributed to the fact that machine learning can be very computationally expensive.The tools developed during this project aim to improve the workflow of those interested in exploring embedded machine learning and reduce some of the barriers that come with it. The benchmarking tool will be used to take latency, memory and accuracy measurements, meant to be used when comparing different ML runtime components and/or compilers. The profiler will be used to pinpoint bottlenecks during model development, allowing developers to discover slow operators and view CPU utilization. This project will also build up a model zoo for the BBB/Bela and build some example projects.
Implementation
ML Stack
This project will focus on a specific modeling langauge (PyTorch) and platform (BBB+Bela). In between there are a number of potential model formats, compilers/runtimes frontends, and backend components. The analysis tools built during this project will aim to support multiple runtime frontends and backends to allow developers to compare performance results between them.
Summary of stack:
- Modeling language: pytorch. (+tensorflow for converting to tflite)
- Model format: ONNX, torchscript, (+tflite)
- Runtime frontends: libtorch, ONNX runtime, SOFIE, (+tflite)
- Runtime backend components: ArmNN, XNNPack, eigen, BLAS
- OS + Hardware: Bela + BBB.
Some NN compiler projects will also be audited for potential BBB support:
- torch-MLIR (https://github.com/llvm/torch-mlir)
- plaid (https://plaidml.github.io/plaidml/)
- glow (https://github.com/pytorch/glow)
- NNC (https://dev-discuss.pytorch.org/t/nnc-walkthrough-how-pytorch-ops-get-fused/125)
- Apache TVM (https://tvm.apache.org/)
- IREE (https://google.github.io/iree/)
Benchmarking Tool
This project will provide both a benchmarking tool and a profiling tool to be used to evaluate machine learning models on the BBB/Bela. The benchmarking tool will provide the following measurements:
- Average latency
- Maximum latency
- Average memory usage
- Maximum memory usage
- Accuracy
This will be done by providing a common frontend for the pre-existing frontends listed above, allowing developers to chose which runtime components they would like to test. This common frontend will be used to take latency measurements at each inference, while the benchmarking tool is also sampling the memory usage concurrently from a separate thread to allow for average and maximum memory measurements. The benchmarking tool should also allow for developers to provide test data for accuracy measurements.
The benchmarking tool on the BBB/Bela will be written in C++ with a simple Python tool on the host PC for communication between the developer's PC and the BBB/Bela.
Profiling Tool
The profiling tool will aim to provide a GUI interface for the display of CPU cycles per function call, thread utilization and the call stack. This tool will be built around the perf Linux utility, which is a statistical profiler based on CPU performance counters. To provide a more intuitive interface, this project will build a simple local webserver (similar to the Bela IDE or perhaps integrated into the Bela IDE) that will display the data captured in a visual form. This will be done using the pprof profiling visualizer and the perf_data_converter tool. As an alternative, the perf-based hotspot tool will also be evaluated for use in this project. This tool will have to be run in a linux thread as opposed to a real-time Xenomai thread but the results should still be applicable for supporting model optimization work. Optimizations can then be tested with the benchmarking tool in a real-time Xenomai thread.
Example flamegraph from the hotspotprofiling tool.
Model Selection
Much work on embedded ML focuses on a narrow application domain (such as image classification) and a small set of canonical model architectures. But DMI design demands creative ways of processing data of various shapes and modalities, from high sample rate audio streams to heterogeneous sensor inputs. Rather than optimize for a particular method or use-case, this project will aim for wide coverage of PyTorch operators and fine-grained visibility into their performance. To that end, evaluation will focus on composable neural network blocks which are relevant to ML tasks like sequence modeling, classification and variational autoencoding (the building-blocks of musical applications like gesture recognition, audio synthesis and control mapping). These will include:
- matrix-vector product (at various sizes, with quantization, sparsity)
- 1D convolutional networks (with groups and dilation)
- memory-cell RNNs (LSTM, GRU)
- multi-layer perceptrons (with various activation functions)
- transformer blocks (dynamic input sizes, batch normalization)
- mixture-density heads (testing various elementwise, shape and reduction ops)
- reparameterized sampling and KL divergence for normal distributions (in-graph RNG?)
Example Projects
As an additional goal, if time permits, this project will also develop some exemplary projects in Python and for Bela.
Timeline
Date | Status | Details | |
---|---|---|---|
Presubmission |
|
||
May 20th - June 12th | Community Bonding |
| |
June 13th | Milestone #1 | ||
June 20th | Milestone #2 |
| |
June 27th | Milestone #3 |
| |
July 4th | Milestone #4 |
| |
July 11th | Milestone #5 |
| |
July 18th | Milestone #6 |
| |
July 25th | Milestone #7 |
| |
August 1st | Milestone #8 |
| |
August 8th | Milestone #9 |
| |
August 15th | Milestone #10 |
| |
August 22nd | Milestone #11 |
| |
August 29th | Milestone #12 |
| |
Sep. 5th | Milestone #13 |
|
Experience and approach
Through coursework and multiple co-op terms in industry, I've gained experience relevant to this project such as:
- Benchmarked hardware peripherals on an embedded linux system (RPi CM4) and TI C2000 platform for high-speed binary data transfer
- Developed features and fixed bugs in C for embedded Linux TCP server used for sensor data acquisition
- Built multiple Python testing systems for various software systems and hardware calibration protocols
- Completed labs in an Intro to Machine Learning course using Keras to build, train and test models
- Designed and implemented audio plugin in C++ for translating audio data into haptic signals in real-time
- Designed and implemented firmware for the Pi Pico in C++ and CircuitPython to interface with different connected modules using I2S, SPI & PWM peripherals
Contingency
If I come to any roadblocks during this project I'll first talk with my mentors to brainstorm potential solutions. If they happen to not be available I'll reach out for help from the community on various platforms such as the BBB Slack chat, the Bela forum, the iil.is Discord or the PyTorch forum.
While writing this proposal I have also amassed some resources that may be useful during the project:
- MLPerf™ Tiny Deep Learning Benchmarks for Embedded Devices
- Installing C++ Distributions of PyTorch
- Xenomai docs
- EdgeAI TIDL tools and examples
- hotspot and heaptrack tools
- C++ Real-Time Audio Programming with Bela
- DeepLearningForBela
- "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!"
- Various papers on machine learning in musical instrument design
Benefit
This project will provide multiple benefits. Firstly, it will give some base benchmark measurements for various machine learning model architectures on the BBB/Bela which will help developers decide which models could be worth investigating for their use cases. Secondly, it will provide tools for developers to benchmark and profile new models for their BBB/Bela. Thirdly, it will provide some example projects for those looking to get started using machine learning in their Bela projects.
I think flexible modeling languages like PyTorch, which let researchers iterate quickly at multiple levels of abstraction, have a lot to do with recent advances in machine learning. Being able to define and train models in PyTorch and move them quickly onto Bela, with visibility into their performance characteristics, would vastly accelerate efforts to use deep learning in an embedded musical context. |
~ Victor Shepardson
Programming embedded systems for real-time musical interaction is extremely difficult, despite the existence of ground-breaking projects like Bela. Embedded machine learning has the potential to change that, allowing developers and musicians to take advantage of the entire ML ecosystem, to teach instruments to recognise gestures, synthesise unique sounds, and much else. This project would represent a major first step towards unlocking that potential. |
~ Jack Armitage