Difference between revisions of "BeagleBoard/GSoC/2022 Proposal/Running Machine Learning Models on Bela"

From eLinux.org
Jump to: navigation, search
(Model Selection)
Line 1: Line 1:
 
=[https://elinux.org/BeagleBoard/GSoC/2022_Proposal/Running_Machine_Learning_Models_on_Bela Running Machine Learning Models on Bela]=
 
=[https://elinux.org/BeagleBoard/GSoC/2022_Proposal/Running_Machine_Learning_Models_on_Bela Running Machine Learning Models on Bela]=
About
 
 
''Student'': [https://elinux.org/User:Ezrapierce Ezra Pierce]<br>
 
''Student'': [https://elinux.org/User:Ezrapierce Ezra Pierce]<br>
 
''Mentors'': [https://elinux.org/User:Jarm Jack Armitage], [https://elinux.org/User:Victor-shepardson Victor Shepardson]<br>
 
''Mentors'': [https://elinux.org/User:Jarm Jack Armitage], [https://elinux.org/User:Victor-shepardson Victor Shepardson]<br>
Line 27: Line 26:
 
The goal of this project is to improve the tooling surrounding embedded machine learning on the BeagleBone Black(BBB)/Bela to aid its community in experimenting with machine learning applications for their projects. The specific developer tools chosen for this project are an inference benchmarking tool as well as a [https://perf.wiki.kernel.org/index.php/Main_Page perf]-based profiler developed for the BBB/Bela platform.
 
The goal of this project is to improve the tooling surrounding embedded machine learning on the BeagleBone Black(BBB)/Bela to aid its community in experimenting with machine learning applications for their projects. The specific developer tools chosen for this project are an inference benchmarking tool as well as a [https://perf.wiki.kernel.org/index.php/Main_Page perf]-based profiler developed for the BBB/Bela platform.
  
[https://bela.io/ Bela] is platform built upon the BeagleBone Black and consists of an audio cape and a custom real-time Linux image using Xenomai. This platform provides a low-latency computing environment ideal for use in audio applications.  
+
[https://bela.io/ Bela] is a platform built upon the BeagleBone Black and consists of an audio cape and a custom real-time Linux image using the Xenomai framework. This platform provides a low-latency computing environment ideal for use in audio applications. There already exists a large community surrounding the Bela, as it is an increasingly popular platform for use in educational settings as well as musical instrument design and maker communities. This project aims to extend the Bela platform to include tools and documentation for machine learning projects, with the goal of simplifying the process of integrating Machine Learning models into embedded real-time Bela projects.
  
The usage of machine learning in instrument design has grown in recent years, yet there has not been many implementations in more resource-constrained embedded contexts like the BBB/Bela. The tools developed during this project aim to improve the workflow of those interested in exploring embedded machine learning and reduce some of the barriers that come with it. The benchmarking tool will be used to take latency, memory and accuracy measurements, meant to be used when comparing different ML runtime components and/or compilers. The profiler will be used to pinpoint bottlenecks during model development, allowing developers to discover slow operators and view CPU utilization. This project will also build up a model zoo for the BBB/Bela and build some example projects.
+
The usage of machine learning in instrument design has grown in recent years, yet there has not been many implementations in more resource-constrained embedded contexts like the BBB/Bela. This can be attributed to the fact that machine learning can be very computationally expensive.The tools developed during this project aim to improve the workflow of those interested in exploring embedded machine learning and reduce some of the barriers that come with it. The benchmarking tool will be used to take latency, memory and accuracy measurements, meant to be used when comparing different ML runtime components and/or compilers. The profiler will be used to pinpoint bottlenecks during model development, allowing developers to discover slow operators and view CPU utilization. This project will also build up a model zoo for the BBB/Bela and build some example projects.
  
 
===Implementation===
 
===Implementation===

Revision as of 21:44, 18 April 2022

Running Machine Learning Models on Bela

Student: Ezra Pierce
Mentors: Jack Armitage, Victor Shepardson
Proposal:[1]

Proposal

All requirements listed on the ideas page have been completed, PR for cross compilation task can be found here.

Status

This project is currently just a proposal.

About you

Github: ezrapierce000
School: [Carleton University]
Country: Canada
Primary language : English
Typical work hours: 9AM-6PM Eastern Standard Time
Previous GSoC participation: This would be my first time participating in GSoC.

About your project

Project name: Running Machine Learning Models on Bela

Introduction

The goal of this project is to improve the tooling surrounding embedded machine learning on the BeagleBone Black(BBB)/Bela to aid its community in experimenting with machine learning applications for their projects. The specific developer tools chosen for this project are an inference benchmarking tool as well as a perf-based profiler developed for the BBB/Bela platform.

Bela is a platform built upon the BeagleBone Black and consists of an audio cape and a custom real-time Linux image using the Xenomai framework. This platform provides a low-latency computing environment ideal for use in audio applications. There already exists a large community surrounding the Bela, as it is an increasingly popular platform for use in educational settings as well as musical instrument design and maker communities. This project aims to extend the Bela platform to include tools and documentation for machine learning projects, with the goal of simplifying the process of integrating Machine Learning models into embedded real-time Bela projects.

The usage of machine learning in instrument design has grown in recent years, yet there has not been many implementations in more resource-constrained embedded contexts like the BBB/Bela. This can be attributed to the fact that machine learning can be very computationally expensive.The tools developed during this project aim to improve the workflow of those interested in exploring embedded machine learning and reduce some of the barriers that come with it. The benchmarking tool will be used to take latency, memory and accuracy measurements, meant to be used when comparing different ML runtime components and/or compilers. The profiler will be used to pinpoint bottlenecks during model development, allowing developers to discover slow operators and view CPU utilization. This project will also build up a model zoo for the BBB/Bela and build some example projects.

Implementation

ML Stack

This project will focus on a specific modeling langauge (PyTorch) and platform (BBB+Bela). In between there are a number of potential model formats, compilers/runtimes frontends, and backend components. The analysis tools built during this project will aim to support multiple runtime frontends and backends to allow developers to compare performance results between them.

Summary of stack:

  • Modeling language: pytorch. (+tensorflow for converting to tflite)
  • Model format: ONNX, torchscript, (+tflite)
  • Runtime frontends: libtorch, ONNX runtime, SOFIE, (+tflite)
  • Runtime backend components: ArmNN, XNNPack, eigen, BLAS
  • OS + Hardware: Bela + BBB.

Some NN compiler projects will also be audited for potential BBB support:

Benchmarking Tool

This project will provide both a benchmarking tool and a profiling tool to be used to evaluate machine learning models on the BBB/Bela. The benchmarking tool will provide the following measurements:

  • Average latency
  • Maximum latency
  • Average memory usage
  • Maximum memory usage
  • Accuracy

This will be done by providing a common frontend for the pre-existing frontends listed above, allowing developers to chose which runtime components they would like to test. This common frontend will be used to take latency measurements at each inference, while the benchmarking tool is also sampling the memory usage concurrently from a separate thread to allow for average and maximum memory measurements. The benchmarking tool should also allow for developers to provide test data for accuracy measurements.

The benchmarking tool on the BBB/Bela will be written in C++ with a simple Python tool on the host PC for communication between the developer's PC and the BBB/Bela.


GsocDiagram2.png

Profiling Tool

The profiling tool will aim to provide a GUI interface for the display of CPU cycles per function call, thread utilization and the call stack. This tool will be built around the perf Linux utility, which is a statistical profiler based on CPU performance counters. To provide a more intuitive interface, this project will build a simple local webserver (similar to the Bela IDE or perhaps integrated into the Bela IDE) that will display the data captured in a visual form. This will be done using the pprof profiling visualizer and the perf_data_converter tool. As an alternative, the perf-based hotspot tool will also be evaluated for use in this project. This tool will have to be run in a linux thread as opposed to a real-time Xenomai thread but the results should still be applicable for supporting model optimization work. Optimizations can then be tested with the benchmarking tool in a real-time Xenomai thread.

caption

Example flamegraph from the hotspotprofiling tool.

Model Selection

Much work on embedded ML focuses on a narrow application domain (such as image classification) and a small set of canonical model architectures. But DMI design demands creative ways of processing data of various shapes and modalities, from high sample rate audio streams to heterogeneous sensor inputs. Rather than optimize for a particular method or use-case, this project will aim for wide coverage of PyTorch operators and fine-grained visibility into their performance. To that end, evaluation will focus on composable neural network blocks which are relevant to ML tasks like sequence modeling, classification and variational autoencoding (the building-blocks of musical applications like gesture recognition, audio synthesis and control mapping). These will include:

  • matrix-vector product (at various sizes, with quantization, sparsity)
  • 1D convolutional networks (with groups and dilation)
  • memory-cell RNNs (LSTM, GRU)
  • multi-layer perceptrons (with various activation functions)
  • transformer blocks (dynamic input sizes, batch normalization)
  • mixture-density heads (testing various elementwise, shape and reduction ops)
  • reparameterized sampling and KL divergence for normal distributions (in-graph RNG?)

Example Projects

As an additional goal, if time permits, this project will also develop some exemplary projects in Python and for Bela.

Timeline

Date Status Details
Presubmission
  • Research current embedded AI benchmarking tools, evaluate their feasibility for Bela-specific use cases
  • Research embedded linux profiling techniques, options for visualizing profiling data
  • Background reading on the use of Machine Learning in musical instrument design
  • Background reading on currently available embedded AI libraries
May 20th - June 12th Community Bonding
June 13th Milestone #1
  • Compilation of different components on Bela, libtorch, TFLite, ArmNN
  • Start coding benchmarking suite for use in Bela projects
  • Implement simple Python library for loading models & test data from host PC to Bela and communicating benchmarking results from Bela to host PC
June 20th Milestone #2
  • Test out both pprof and hotspot profilers
  • Finish model loading feature of benchmarking tool
June 27th Milestone #3
  • Implement SOFIE wrapper for benchmarking tool
  • Implement libtorch wrapper for benchmarking suite
  • Research compatibility of different neural network compilers with the BBB/Bela
July 4th Milestone #4
  • Finish latency measurement feature of the benchmarking tool
  • Run benchmarking latency test on MLP model
July 11th Milestone #5
  • Finish accuracy feature of benchmarking suite
  • Begin working on profiler/benchmarking GUI
  • Build and run benchmarking on 1-D convolutional network model
July 18th Milestone #6
  • Finish memory usage feature of the benchmarking suite
  • Document findings thus far in blog post
July 25th Milestone #7
  • Submit Phase 1 evaluation
  • Finish work on profiler
  • Build, benchmark and profile memory-cell RNN model, document bottlenecks using profiling data
August 1st Milestone #8
  • Build, benchmark and profile transformer block model, document bottlenecks using profiling data
  • Tweak profiler/benchmarking GUI
August 8th Milestone #9
  • Build, benchmark and profile mixture density head, document bottlenecks using profiling data
  • Evaluate documented bottlenecks from past weeks and research potential optimization techniques
August 15th Milestone #10
  • Begin writing example project(s) based on most promising model(s)
  • Work on implementing any optimization techniques
August 22nd Milestone #11
  • Finish writing example project(s)
  • Build, benchmark and profile optimized model(s), compare benchmarks against past iteration(s)
August 29th Milestone #12
  • Submit final work product and final mentor evaluation
  • Complete YouTube video
  • Document work done, results and next steps in blog post
Sep. 5th Milestone #13
  • Completion of GSoC

Experience and approach

Through coursework and multiple co-op terms in industry, I've gained experience relevant to this project such as:

  • Benchmarked hardware peripherals on an embedded linux system (RPi CM4) and TI C2000 platform for high-speed binary data transfer
  • Developed features and fixed bugs in C for embedded Linux TCP server used for sensor data acquisition
  • Built multiple Python testing systems for various software systems and hardware calibration protocols
  • Completed labs in an Intro to Machine Learning course using Keras to build, train and test models
  • Designed and implemented audio plugin in C++ for translating audio data into haptic signals in real-time
  • Designed and implemented firmware for the Pi Pico in C++ and CircuitPython to interface with different connected modules using I2S, SPI & PWM peripherals

Contingency

If I come to any roadblocks during this project I'll first talk with my mentors to brainstorm potential solutions. If they happen to not be available I'll reach out for help from the community on various platforms such as the BBB Slack chat, the Bela forum, the iil.is Discord or the PyTorch forum.

While writing this proposal I have also amassed some resources that may be useful during the project:

Benefit

This project will provide multiple benefits. Firstly, it will give some base benchmark measurements for various machine learning model architectures on the BBB/Bela which will help developers decide which models could be worth investigating for their use cases. Secondly, it will provide tools for developers to benchmark and profile new models for their BBB/Bela. Thirdly, it will provide some example projects for those looking to get started using machine learning in their Bela projects.

I think flexible modeling languages like PyTorch, which let researchers iterate quickly at multiple levels of abstraction, have a lot to do with recent advances in machine learning. Being able to define and train models in PyTorch and move them quickly onto Bela, with visibility into their performance characteristics, would vastly accelerate efforts to use deep learning in an embedded musical context.

~ Victor Shepardson

Programming embedded systems for real-time musical interaction is extremely difficult, despite the existence of ground-breaking projects like Bela. Embedded machine learning has the potential to change that, allowing developers and musicians to take advantage of the entire ML ecosystem, to teach instruments to recognise gestures, synthesise unique sounds, and much else. This project would represent a major first step towards unlocking that potential.

~ Jack Armitage