LCNA 2015 Device Mainlining BOF meeting notes

From eLinux.org
Revision as of 10:03, 25 August 2015 by Tim Bird (talk | contribs) (first draft)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Device Mainlining Project Tim Bird, LF CE Workgroup, Mark Brown Linaro, Kate Stewart LF 21 attendees.

[post-meeting notes by attendees are in brackets, like this]

Tim gave overview of problem and current activities of the device mainlining project. See the slides here: [[

Looking for ideas and discussion for different concepts.

Slides will be available (tim to post) [Tim: see http://elinux.org/images/0/08/Device-mainlining-LCNA-BOF-2015-08.pdf]

There are myriad ways of getting source code from product vendors. Everyone has their own system. One elinux page has links to where to get source for different phones. The site also has some diffs posted, so you don’t have to grab the source yourself.

[Tim: Here’s the link: http://elinux.org/Phones_Processors_and_Download_Sites]

Anyone can add links, which will help when we want to get stats for next round of phones.

There are some specific technical projects already started:

1. Wireless drivers: Wireless Broadcom driver in tree is not product grade. Google mandates use of chip set, and integrates an out of tree code base for their downstream partners. Broadcom has no incentive to push. Tim proposed to CEWG to fund a project to backport the broadcom (brcm80211) wireless driver to 3.14, for use in the next generation of Android devices. Tim found that SuSE is running a backporting project (luis’ project - uses Cocinelle). This project seems advanced. It would be good to convince product vendors to test this driver to report issues so that the mainline driver can be improved. [Tim: this seems like an action item]

2. USB: Integration with charger isn’t in mainline. Linaro has posted patches for interface between charger and USB driver. Another issue is that some USB pins (ID, VBUS) are not hooked up to USB controller, so the mainline drivers don’t actually work for OTG switching. extcon seems to be the preferred upstream method ot fix this. Sony is working on some enhancements here.

Some institutional barriers, as well as process issues:

Looking for help identifying deficiencies. Tim encouraged other users to run the upstream-analysis-tools, so they can see what areas affect newer kernels (3.10 or 3.14)

John Stultz: 3.10 vs. 3.4 - amount of out-of-tree code is currently getting worse. Tim: want to definitely re-run stats on later kernels and see if areas are the same. Mark: Vendors appear to be shipping one kernel version per SoC version. That is, they stay on one kernel version for a particular SoC, and switch when they introduce a new SoC to their customers.

Fork and forward port tends to be the pattern from manufacturers. [Tim: This means they’re carrying a lot of patches from each release to the next. This is true for Sony and I’m sure other product vendors have the same problem.]

Someone noted that the raw lines of code might be misleading, due to a single source tree supporting a lot of different processors.

A better stat might be to find the lines that are actually used in the source tree, based on the kernel config for the product, and only diff those. ie. strip a tree to just source code used, and do a diff on it. However, if DT is used, the kernel may contain code for lots of different processors.

Mark B: It’s worth differentiating between just device support, and infrastucture. Solving these requires different approaches. Tim: It seems like many SoCs don’t have low level stuff upstream (things like clocks, regulators, interprocessor communication systems, pinctrl, etc.), and hence, there’s not a good foundation to build on. [Tim: note after the fact - this would be good to actually research to see for each category of low-level support where each SoC is with mainline support] Mark B: Qualcomm tried with NFD. [NFD? Tim: I’m trying to remember what this discussion was] Tim: I hate to admit it, but some drivers in mainline now can’t run, because dependencies are missing. But hopefully they’ll be able to run soon when a bit more infrastructure code is there.

Tim: My hunch is that many SoCs don’t have good low-level interprocessor communication support upstream. rpmsg may not be adequate for communication that these chips need.

What is biggest problem people are seeing?

John Stoltz - philosophy of mainline: doing it right, vendors: it isn’t in the budget. Tim: Obsolescence cycle.

Discussion about Free-electrons example: 9 guys make lot of progress - Thomas Petazzoni has a diagram showing value of doing work. Thomas’ point was that because IP blocks get reused over SoC family, the porting work does pay off over time. Some vendors have a different team for each SoC, and thus don’t see long-term cost savings of mainlining.

Some out-of-tree drivers are written to vendor framework so they can have one driver that works on multiple OSes (e.g. Windows and Linux). Mainline won’t take these drivers. They have needless abstraction layers. Rationale by vendor is that they can fix bug or add features in one place for multiple OSes. However, it may actually better to just maintain the driver separately for each of the OSes. Linux-specific drivers tend to be much smaller and more performant. Driver that is 1/10th size is less cost to you. Tim: Can this be measured to produce hard numbers to show benefit of Linux-specific/mainline driver? For example, take a sampling of 10 out of tree drivers, and then measure size in tree, after paring down.

Not just lines of code, also a Linux-specific driver often performs much better. But this would be a hard stat to generalize across driver types.

Brought up the fact that bug fixes have to be applied on all of them. Vendors assume that one bugfix will cost less if they have a single driver. However, this may not be the case. Some bugs will only show up on a single OS. Also, you still have to do QA on each OS, for bugs you fix. The QA costs don’t go away with a single driver.

Abstraction in driver adds risk though as well. (It increases the risk surface for bugs) Figuring out may not impact others, etc. Removing the abstraction sometimes fixes bugs the developers didn’t know were there.

Need a cogent argument. Something like: “Linux specific drivers are this much smaller. If you delete this code, and just use the framework, it will fix this bug. Etc.”

[Tim: would be nice to have anecdata on specific bugs fixed when drivers were mainlined.]

Idea - multi-OS drivers are assumed to be easier, because the developers only have to know the single vendor abstraction layer, and not each individual OSes frameworks. However, this may be false. You still might have to know the intricacies of each OS, in order to avoid bugs or interact with the idiosyncracies for each OS. This may mean it is actually more bug-prone to write multi-OS drivers.

Other obstacles: Kernel documentation is the pits. Developers don’t maintain docs very well. Sometimes the initial documentation provided for a framework is good, but it gets stale because people don’t maintain it. The docs are there to help newcomers. [Tim: can also help old-timers who switch between systems.]

Hypothesis Measure: number of patches that a vendor has to do to make it useful for a product. That is, it would be interesting to see if developers have to fix out-of-tree code more, than they have to fix mainline code. The metric would be patches applied to out-of-tree drivers compared to patches applied to in-tree drivers. Example: Synaptics Touch Screen Driver. 100K diff. Sony Mobile has lots of patches, but Synaptics wouldn’t take them and obviously they weren’t applicable to mainline. Sony had no place to send patches and had to maintain them themselves. Risk factor as well. Mark Gross: Why didn’t Sony mainline this driver? Tim: It was on our list, but was lower priority than SoC stuff. Also, the motivation for doing stuff not your IP is real low.

Situation with NFC support in Linux kernel is pretty bad. NFC drivers for android usually consist of driver in userspace talking to a small in-kernel I2C shim. The shim was a 600 line driver, but it was impossible to get it mainlined. The maintainers said that allowing the shim would take away people’s incentive to do the right thing. However, the vendor was now demotivated from doing anything in that area of the kernel.

How to Incentivize management to worry about upstream. How about a Wall of shame? [Tim: We discussed that at other meetings, and decided a reward rather than punishment would be better. Maybe we need a metric for “good mainline status”?]

Maybe produce a list of upstream supported parts - rather than wall of shame, … Back to wall of shame: Idea: Could focus on vendor’s out of tree items - try to demonstrate that they are bigger security risk. Vendors might try to get their name off of a ‘security risk’ list.

Community is doing a pretty good job of backports.

What about user space closed source software. There is code that should be in the kernel, but people are using kernel helpers with drivers in user space. For now, this is outside scope of this project. Example of this are media drviers. These are mostly in user space driver, as are GPUs, COMM processors. Note that an open source driver for the the Adreno GPU is coming along nicely.

Tim: Plan is to attack the tractable problems first.

Making sure that people use good examples heavily. Successfully use - with handset vendors. It works better. Less hassle.

Illustrate how much cost it actually is…. over a period of time. Useful data for vendors to see.

We looked at the amount of contributions (commit counts) for various companies. Many companies have made big improvements (e.g. Samsung). Intel has big numbers, but most of the commits are not for mobile SoCs. TI turned up in Jon Corbet’s top 10 list of contributors, but this may have been deceptive, as it may have been a few “key people” rather than an institutional directive. Free Electrons example discussed. They have few people, but a lot of commits.

Tim: We have device tree armageddon coming….. Petazzoni’s slide showing that review is lagging behind DT submissions for review. Mark: It may not be as bad as it looks. Possibly more DT things are coming through that don’t need DT review. [Tim: this implies that DT is stabilizing, which would be great. However, personally I’m not sure there’s enough evidence of this yet.]

In order to get good commit numbers, having key people is the way to do it. Need to be able to widen pool. Mark: Companies having good commit numbers, and getting stuff upstream depends on what you spend your time and effort on. Its about actually caring enough to do it.

Outsourcing to freeelectrons method doesn’t scale. If a company outsources their mainlining, the SoC vendors developers miss out on interaction with mainline. Development teams, product team concerns.

Looking at mobile chipset commits. Intel SoCs have about same types of things out of tree as arm processors. (Mark Gross agrees).

One suggestion: leverage staging more. Short term window while devices in market.

MediaTek - wrote lots of multiOS code, built own frameworks and abstractions. Tim: Is it worth putting multi-OS drivers in staging if Mediatek is not on board? General consensus of yes.

Tim: Some people might object to focusing forum efforts on a single vendor’s driver (broadcom). Even Broadcom might object, if they prefer to work on their out-of-tree driver.

Broadcom - make it better against their wishes…. Tim said he hadn’t talked to Broadcom. Someone said don’t rule them out. (they might be interested in improving the mainline driver.)

Mark B: Boot bit is really important - massive hurdle overcome. [Tim: I’m not sure - is this about making sure mainline can at least boot on each processor? Mark: Yes, being able to run at all]