Beagleboard:GSoC 2018 Proposal:Beaglebone GPU Offload

From eLinux.org
Revision as of 04:04, 27 March 2018 by UserSidharth (talk | contribs) (More balanced proposal)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Proposal for Beaglebone GPU offload

{{#ev:youtube|Jl3sUq2WwcY||right|BeagleLogic}}

This project aims to use the GPU capabilities present on the Beagleboard.

Name : Sidharth Mohla
Student: UserSidharth
Mentors: Hunyue Yau, Robert Manzke
Wiki: https://elinux.org/BeagleBoard/GSoC/Ideas#BeagleBone_GPU_offload

Status

This project is currently just a proposal. It is planned to be implemented in three stages
1. Headlessness: Setting up a headless beaglebone black which can access the GPU through Mesa's GBM and render nodes as described here and documenting the capabilities of the GPU benchmarking(in windowed Qt mode) and finding supported extensions and Imagination technology's proprietary APIs such as the BufferClass API and PVR2D API
2. Easy access to FrameBuffer: Scripting whatever is done above and providing a function to initialize the openGL context headlessly. Building a library for accessing the frame buffer directly in the headless system
3. Examples: Documenting the library, providing sample starter codes to explain the basic concepts. Build a library for General matrix multiplication, calculating eigenvalues and eigenvectors and document it.
Stretch goals:

  1. Build a library which will take a basic function (like a kernel) as input and directly generate a shader for executing them. However this library will be simple so no datatypes supported by OpenGl. (Basically intended for mathematical operations which can be parallelised easily, like the BLAS/LAPACK/ATLAS)
    Implement BLAS/LAPACK for the generalised matrix.
    Use thses tools to implement a stereo correspondence algorithm on GPU

Proposal

Test task completed here

About you

IRC: Sidharth
Github: sidmohla
School: Indian Institute of Technology Hyderabad
Country: India
Primary language : English
Typical work hours : 08:00 to 21:00 IST
Previous GSoC participation: No previous experience .
I primarily want to take part in GSoC because I am very excited to work in collaboration with an open source community and produce stuff which is useful to as many people as possible. I am also enthusiastic about working with GPUs and the opportunity has presented itself infront of me to work on the same!

About your project

Project name: Beaglebone GPU offload

Description

In all beaglebone devices, in the TI's ARM processor we have a GPU which is made by Imagination Technologies, which on AM335x is PowerVR SGX 530 having a processing power of 2005ish graphics cards. This can be particularly useful in situations where code has components which can be executed in parallel. Using GPU for such computations can be efficient as well as can keep the CPU free to perform other tasks. Ofcourse, the GPU is too weak to be performing ML or intense Computer vision algorithms but many of old Computer vision algorithms and basic linear algebra calculations can be sped up. so people who have wish to use Beaglebone for robotics/automation can use the GPU. Also, since openGL ES has already been standardized by Khronos group, users along with the generated benchamrks and documentation results can implement portable GPU code (if worth it; benchmark allows you to judge that) which will work similarly whether be mobile, Raspberry pi or Beagleboard, thus allowing merging of code developed for raspberry pi GPU albeit at much lower performance. Finally, this can be useful for those users who wish to drive small LCD displays since this will be more efficient. However this hasn't already been done due to the fact that libEGL is provided as a binary blob as well as the fact that PVR2d (also a binary blob) acceleration does not support X11 as well as the drivers are binary. The FBOObject library will provide the mentioned functionality as C++ classes. It should be noted that the main aim of this project is to document and demo the usage of GPU for computations so the library itself would be very basic as the tools made for the demo. This can, in theory atleast be mitigated by using the vast availability of OpenGL ES code online, one of such being an openCL library for raspberry pi (ofcourse, modifications for accessing the framebuffer will have to be made).


Timeline

Before Week 1: Set up graphics SDK and PVR SDK with Qt and a display running EGLFS/Wayland (whichever is more suitable for benchmarking). Make this setup on a seprate SD card so that we can test any shader here first before moving on to headless system. We'll have three SD cards, this one, one working headlessly (kept for backup) and tset card.

Week 1 and 2: Next, we will run the GPU headlessly, tryiing may options such as EGLFS on Qt
Possible problems: Many. Since driver is not open source, we can't use mesa GBMs to ensure headless operation. In the worst case, virtual display driver will be needed to be used on top of Qt. Probably this can extend further, so getting this right is extremely necessary, and will be mitigated by starting early (i.e. will try to get to this point before it officialy is due).
Week 3: Get a simple shader program running (like doing +1 to each element in 2D data and storing in frame buffer), as well as test proper operation of GPU in headless mode
Possible problems: Hello GPGPU should not be a problem at all. This week is not that troublesome at all, so it can be used to finish the standing the problems if any.
Week 4: Benchmark the display setup to know the capabilities of the GPU, and also test the proprietary extensions, learning how much CPU is used as well. (https://archive.fosdem.org/2015/schedule/event/gl_testing/attachments/slides/670/export/events/attachments/gl_testing/slides/670/slides.pdf). This is done late so as to leave a little space for headless operations, if worst comes to worst
Possible problems: Since a stable driver will already be running, such problems should not arise.

Deliverables for this period : Benchmarks and headless operation of GPU (Since documentation will not be ready yet a disk image of the SD card can be provided for verification).

Week 5: Document and script how was the headless operation achieved so that this becomes useful for the community.
Possible problems: Testing the scripts will be done on a copy of headless SD card, so that scripting errors do not waste the effort, and that the operations are sufficiently documented so that even a new user can use the GPU.
Week 6,7 and 8: Start working on the FBO abstraction class, similar to this one here
Possible problems: Getting access to framebuffer may progress slowly since Imagination technologies provides a binary blob for direct FB acces. Otherwise it's just translation to OpenGL ES. If appropriate extensions (render_to_texture, framebuffer_object) are suppported this might get sped up. In which case we have the stretch goals.

Deliverables for this period : FBOClass, allowing multiple framebuffer objects to be used

Week 9: Document the library as well as clearly document what is going on with the library to makeit easier for new users to understand the system in use
Possible problems : None, and should not overflow than the allocated time.
Week 10,11: Implement the tutorials given here.
Possible problems: Since glew is not for GL ES, doing this manually may eat up some time. Also, if substitutions for extensions are to be found it may drag it even more. That's why two weeks is sufficient even though the source code is available. Week 12: Make a general matrix multiplication utility function as well as an eigenvalue solver, adding more operations if time permits and provide an easy tutorial how was this accomplished.

Deliverables for this period : Mainly documentation. The products for this week will help the community to use the GPU efficiently and understand how to make their own programs to be run on the GPU

Experience and approach

Experience
I started with embedded systems from our IoT course, where I made smart room sensor as a project. Since then I have come a long way and have used Arduino Uno and Nano, Raspberry pi and mbed LPC1768, accessing low power modes on the LPC board through manipulating registers as per given in its manual. I also know VHDL, where I implemented a simple processor which could execute Brainf**k natively (no pipeline, debug or cache though). On the software side I have experience with OpenGL and Unity, C/C++, python, MATLAB. I am interested in ML and CV and so I am currently doing courses for the same. I thus know I can do this project as I have the knowledge as well as no other commitments for summer holidays.

Approach
Strictly as per timeline. Since I have highlighted what problems as well as what sources I will use, I should be able to avoid most of them during working on the project. Also, I will try to have stable drivers for the platform by working on resolving it now only, and will go through the mentioned sources throughly beforehand, which should be able to pinpoint specific issues before getting ambushed by them, which will result in success.

Contingency

What will you do if you get stuck on your project and your mentor isn’t around?
If my mentor is not around I will first use google (I can handle many tabs at once!) as well as using stackoverflow and reddit. I will also ask my brother/my profs/his profs if I can't still get it working. I will also use the TI forums for GPU related issues and raspberry pi community for embedded software issues, along with beagleboard forums

Benefit

There will be enough information for users to establish whether or not to use the GPU aas well as how to use it.
Mar 11 07:18:26 <ds2> a basic demo is a bunch of GL calls to set things up and create a shader follow by a send texture, render, read texture loop...that part isn't that complex
Mar 11 07:18:41 <ds2> but putting it together to show it being useful has value
Mar 11 07:22:09 <ds2> the GLES stuff is a bunch of binary blobs... on paper, it should be possible to use GLES with framebuffers, wayland, X, Android, etc
Mar 11 07:22:26 <ds2> in reality, only a few of those work (due to the binary libEGL) Mar 11 07:22:45 <ds2> so it would be wise to plan on time to figure out which one of those work well enough for this
Mar 11 07:26:01 <ds2> if anything - getting it work w/framebuffer and X and other EGL flavors would be potentially useful
Mar 11 07:26:19 <ds2> I am personally interestedi n framebuffer as I suspect it is the lowest overhead
Mar 26 17:03:43 <ds2> Sidharth: trying for the framebuffer EGL flavor instead of Wayland may be desireable... not all BBB's have a display.... a plain FB one should suffice

Suggestions

Is there anything else we should have asked you?