These are some minutes from the Automated Test Summit, 2018
The meeting was held on Thursday, October 25 in Edinburgh, Scotland.
= Sponsors =
The sponsors for the meeting were:
* the Core Embedded Linux Project (CELP) of the Linux Foundation
* Linaro
* DENSO TEN
= Attendees =
The following people attended the meeting:
Name Company Project
-------------------- ----------------- ---------------------
Alejandro Hernandez TI Opentest
Alice Ferrazzi Gentoo Gentoo kernel CI
Anders Roxell Linaro kselftest
Andrew Murray ARM Witekio board farm
Carlos Hernandez TI Opentest
Chris Fiege Pengutronix Labgrid, sdmux
Cyril Hrubis SUSE LTP
Dan Rue Linaro Kernel Validation
Daniel Sangorrin Toshiba Fuego, CIP
Dmitry Vyukov Google syzbot
Geert Uytterhoeven Renesas LTSI maintainer
Guenter Roeck Google kerneltests.org
Heiko Schocher Denx tbot
Hirotaka Motai Mitsubishi Electric Fuego RT testing
Jan Lübbe Pengutronix Labgrid
Jan-Simon Moller Linux Foundation AGL
Kevin Hilman BayLibre KernelCI
Khiem Nguyen Renesas Fuego, LTSI testing
Li Xiaoming Fujitsu Fuego
Manuel Traut Linutronix ci-RT, r4d
Mark Brown Linaro Kernelci
Matt Hart Linaro LAVA
Michal Simek Xilinx Xilinx testing
Milosz Wasilewski Linaro LKFT
Nobuhiro Iwamatsu Cybertrust Japan Gentoo kernel CI
Pawel Wieczorek Samsung SLAV
Philip Li Intel 0-day
Punnaiah Kalluri Xilinx Xilinx testing
Richard Purdie Linux Foundation Yocto Project
Sjoerd Simons Collabora Kernelci
Steve Rostedt VmWare ktest, RT maintainer
Tim Bird Sony Fuego
Tim Orling Intel KCF, Yocto Project
Takao Koguchi Hitachi CELP
Tsugikazu Shibata NEC LTSI, CELP
Yoshitake Kobayashi Toshiba CIP, CELP
Yuichi Kusakabe DENSO TEN Fuego, Sponsor
= outline =
* welcome and introductions
* vision and problem definition
* glossary and diagram discussion
* test definition, build artifacts, execution API (E)
* run artifacts, results formats, parsers
* farm standards - DUT control drivers, board definitions
* wrap-up
= Minutes =
Minute conventions:
hash (#) prefixes speaker name (when known).
- Introductions - attendees introduced their name, company and project
# Tim:
- Not enough time today
- Goal: Get a description framework/terminology
- talk about common vocabulary
- value of collaboration/difficulty of going alone ("ribbon")
- we share a lot of code, but we do not share QA-tools
90% of the code is OSS, but only 5% of testing code is OSS
- Jenkins, LTP are a good start, but not enough
- Vision => promote the sharing of automated CI components as code is shared today
- Some shift to sharing components
- KernelCI uses lava
- Fuego doesn't have a board control layer
- Non goals for today:
- no standards or common APIs
- do not implement all nice features we hear about today in your test framework
=> rather find a way to share
Tim does not want for fuego:
- email based ci triggers
- SUT deployment abstractions
- DUT control drivers
- centralized results repos
- distributed results visualization
focus on:
- repo of test definitions
- sharing of pass criteria, testcase docs
- generalized output parsing system
current problems:
- many aspects are not shared, but nobody can do it all
- tests are treated as "secret sauce"
- similar to how embedded SW was viewed 20 years ago
- We have no place to share our tests
- LTP? kselftest?
- there are other, standalone, open source tests:
- cyclictest, syzkaller, iozone, lmbench
- often tests are lab- or target-specific
- makes HW testing difficult
- Tim said sony has internal USB multiplexing test rig
- tests depend on test framework
- TVs are switching to android, and some use android test framework
- file formats, APIs, architecture differ
- paradox of generalization and specialization
=> tests are hard to reuse
- need the ability to customize tests
- (skip lists, expected values, variants) and pass criteria
- benchmark value thresholds based on previous results
- different frameworks factor their data and services quite differently
- interfaces between services, central server
- where are operations performed?
- central server, local host, on-DUT, by the user
#?: lava had complaints that tests could only be run in lava, so they
can now output script which can be run interactively
- tests defintion are split up into diffent files:
- customizable per test
- per board
- per lab (doesn't exist in fuego)
Tim's idea: fractal nature of testing
- individual tests, test suite, test plan: actual vs. expected results
- can do pass criteria, result analysis, reporting at each level
- the features are expressed differently at different levels
# Richard:
- another item missing from problem statements is modularization
- frameworks tend to be monolithic systems that are hard to
extract useful sub-systems from
- use of docker is incompatible with yocto (which wants to run on different distros)
- yocto doesn't have a good results parser
- maybe we can't directly use code, but we could start with
standards for interchange format?
# Tim: query about languages used
- many people use python, rest is diverse (go, perl, ruby, java, groovy)
- python = about 10 frameworks use it
- go = 3
- perl = 1
- ruby = 3
- bash = 5
- java = ones that use jenkins (about 4?)
- groovy - 1 used outside of jenkins
# TI:
- they built an abstraction on top of other tools
- aggregate tests from different levels: applications, linux, low-level
- below that: "execution layer": fuego, lava, ...
- abstraction is Java
# Tim:
- interchange options: TAP, JUNIT, XUNIT
- XML doesn't have good human readability
- XUNIT (and JUNIT?) don't show all testcases, only the failures
# Jan? - neither does JSON, unless pretty-printed
# Richard:
- yocto doesn't want to run a board farm
- want to run a test and collect results
# Michal?:
- wants to have a central local of tests to run on the linux kernel
- wants to share technology / devices used to run tests
== glossary review ==
clarification and definition of glossary terms
https://elinux.org/Test_Stack_Survey#Glossary
Tim presents his view
Boot: startup phase up until a test can be run
- (but can still be in the bootloader or whatever is to be tested)
- boot failure is still a relevant result
Deploy vs. provision:
- Tim: deploy: install SW under test
- Fuego uses "deploy" to refer to installing test program
- Tim: provision: install the more general test environment
-> labgrid does only to provision: setup hw around and sw on the target
labgrid does usually not deploy a test program
DUT:
- sometimes there is a device under test
- sometimes there can be multiple devices under test
- Pools - how to describe identical boards used interchangeably?
# Dmitry?:
- we only test software, we do not have hardware
- the software is under test, and the virtual machine, or
hardware, is just another resource that is required for
the test
# Kevin:
- DUT is a term of art in embedded, and we took it for granted
- maybe need to find a more general term
- "board"? - but that's confusing for testers using a VM
- "target"?
- "system under test"
# Tim: "system under test" has the same acronym as software under test (SUT)
- you can't test the software without it running on something
# Cyril: survey took a long time because of unfamiliarity with the terms
test agent:
everything that gathers information on the DUT (e.g. syslogd, ssh for access, adb, ...)
notification:
# Manuel:
- can go 2 places:
1) lab technician, for lab failure
2) test initiator, for problem report
logs vs. run artifacts:
- log is usually text output from some element
- kernel, system log, test program, maybe tracer
- not every run artifact is a log
- examples: audio file, video file, binary traces, binary dumps
- also, can be text file that is not a log, like run meta-data
trigger:
- is thing that starts a test
- can be explicit, like manual user action or git commit hook
- can be implicit, like sending an e-mail to a kernel mailing list
test plan:
- is a list of tests to trigger at the same time
- most systems have this, some call it test plan
- Google uses directory structure to trigger a set of tests
- there was discussion about tests always being with the source
- for product testing, this isn't feasible, which source would
you associate a system integration test with?
# Tim:
- have to skip the slide for "candidate terms"
- skipped term details
== diagram review ==
# Tim: test-runner (suggested new box)
- missing test runner (or is another bullet in test scheduler)
# Kevin:
- maybe put as another bullet in "Test Scheduler" box
# Michal?: (cli box)
- cli tools are everywhere, not just at frontend
# Tim: (DUT control APIs)
- missing APIS inside the DUT control host box
# Kevin:
- there were too many of them
- need to have separate diagram just for DUT controller box
- is DUT controller software or hardware?
- unclear from diagram
- front ends:
- there can be multiple front ends
# Jan?: (code review box)
- this seems out of place
- should replace code review box with trigger
11:00 Pause
= 11:12 Test Definition (TD) =
== Test-definition ==
- storage format(s)
- repository API
- Elements
- Issues with this:
- what fields do people have? Why?
- can we interoperate?
- fields
- dependencies - lots of different kinds
- maybe separate dependencies by how handled?:
- exclude test
- install something (install package)
- change status (eg sudo root)
- some things can't be changed (amount of memory?, number of CPUs, kconfig)
- both build-time and run-time dependencies exist
# Tim: Here are Fuego test definition elements:
Fuego:
- meta-data: Maintainer, Version, license
- dependencies: What features on a DUT are required
- instructions: shell commands
- How to visualize
- Tests can be a single test, or a test can be a test suite
- source, or location of source
#?:
Google filters tests by source code path (net/ip/tcp/...)
Tim:
- where to get the tests? (git/tarball)
Richard:
for yocto you need:
- install tests remotely
- parsable output
- simple dependencies
-> ptest
type of dependencies:
memory, packages, root, hardware, kernel config, files, features, permissions, lab-hardware
# it would be nice to have a standard place to find the kernel config (to check which features are available)
Tim: Wants to define a "Test Execution API" (aka. "famous interface 'E'")
== build artifacts ==
# Richard:
- YP has ptest, which is a package that gets delivered to target
- would be nice to have a standard for "make test"
- "make install" used to have all kinds of problems, but it's better now
# Tim:
- Fuego just added a prototype feature for bundling the test program
- what is needed beyond just a manifest?
- standard location on DUT for test materials?
# ?:
Some DUTs have read-only filesystems
- (you can't have a single standard location)
# Tim:
- test packages need to be relocatable
- how to handle files outside the package's test location?
- e.g. modification to /etc or some other system path
# Richard:
- YP allows to bundle a script with test package to modify other
areas of filesystem
# Tim:
- what format are people using for build artifacts (test packages)?
- answers: tarball (fuego), cpio (0-day)
# Richard:
- YP can package in any of it's supported formats:
- debian, ipkg, rpm
*During Lunch Break: CELP Brainstorming session:
2:00 Back to ATS Summit
== Run Artifacts ==
Tim O?:
- How do we get all the data out of the DUT without interference?
- We would like to have really everything we can get
- measurement is also an interference workload, so they piped it
out via the network (to avoid local storage)
#?:
- dmesg
- Power consumption
#? (opentest):
- Data that shows infrastructure failures
- Include the test definition and other metadata (version of testsuite, ...) with the results
# Richard:
- we need all the data (self documented!) to reproduce a test
# ?:
- downside to using lava features like overlay: it's harder to
reproduce manually
# ?:
- Testlink has relational tables to link test cases to executions
to performance metrics
- They have upward and downward translators for each execution engine
# Jan:
- we do not only need a command format for the result but also for the
test and metadata. Otherwise we can not make sense out of the data.
# Cyril:
- they are thinking about a more formal test description
# Richard:
- ptest is just pass/fail
# Carlos: shows TI Testlink:
https://en.wikipedia.org/wiki/TestLink
- Would like to have someting above the test executions frameworks
- Would improve collaboration
- They have a Django App in opentest to link requirements from jira with a set of individual testcases
- result tables are generated from that
- use kibana to provide an overview of test racks (28+)
Result Analysis / Pass Criteria:
- the result may be a value instead of pass/fail, may need to be board specific
# Tim O:
- they use pytest. he does not like the idea of having the pass-criteria in the test-case
# Jan:
- you can add metadata to a result
# Richard:
- They have a mail report which includes performance graphs. That makes it
easy to find problems.
# Philip (0-day):
- They compare the results to the previous runs
- They determine threshold automatically
- They try to do automatic bisects
How do you identify false-failures:
# Philip:
- 0-day does automatic bisect. If they find the regression: fine;
If they don't: they drop the false failure.
# TI Opentest:
- They use a running average + stddev to compare the new result with history
# Kevin:
- For aggregation on the test suite level, we need to document what the
expected or allowed failures are.
= Board farm standards =
# Tim:
Would be great if the would have a standard for APIs for board farm
hardware manufacturer
# Xilinx:
Just go 1:1 for corporate setups, to avoid the cable problem
# Chris:
Cables are not the problem, they can work fine. You need to have an
abstraction layer
# Mark:
Everyone needs to find out which HW works, which doesn't
# Tim Bird:
Would be nice to have a DUT Controller available on seed studio
# Jan-Simon:
Network-Logging-Recv needs to be supported, minimal impact on the DUT is important
- labgrid - what does it do for plugins?
- has a python API for modules
- example: there's a module for pdudaemon
- example: there's a module for web power switch (by Power Solutions, Inc.)
- observation: there are lots of different control points for the lab
and DUT.
# Tim: can we start with a single API, and expand from there
- suggest power control, as it's a single bit
- actually, it can have voltage, bounce, etc - it's more complicated
- decision(!!) to standardize on pdudaemon for power control
- put power control drivers in that project
# TI OpenTest:
OpenTest has a generic interface for Multimeters
# Geert:
- iiodaemon might be good for measurement API (measurement drivers for
things like power)
- also have gpiod
# Tim Orling:
We should document "Design for Test" best practices
# Kevin:
kernelci will become a LF project (more compute, more storage)
# Tim Orling:
has been talking with Kevin about "os-ci"
?:
decoupling their tests from lava were a lot of work
# Tim:
Hard to maintain SW/test for HW you don't have
# Mark:
This is a known problem for kernel maintainers ;)
== Wrap-up and overview ==
# ?:
Everyone should learn other people's systems
# Michal:
Would be great if every project would have a document:
"how to 'hello world'" on a beagle-bone".
- hello test, but with description of execution flow through framework
# Tim:
survey is a start at sharing information.
= future coordination =
- mailing list
# ?:
- how about a dedicated list, not yocto-project based?
- how about something on vger?
- lists don't handle html
- it may not be archived
- it's not specific to the kernel
- most attendees seemed to think it's OK to keep the current list
- there needs to be more activity on the list - it looks dead
- some of us only doing this part time
- everyone try to generate more traffic
# Tim:
Try more pin-pointed discussions on the list rather than large mails
- discussion on whether to keep the mailing list
- decision is to keep on YP list for now
- documents
- where to put documents?
- decision to put all stuff on elinux wiki for now
# Jan:
Maybe create the "Design for Test" document first
# Michal:
Create a list of test cases on the elinux wiki
- meetings
- when to meet again?
- what about at plumbers?
- don't need sponsor
- has a problem that it sells out too quick
- not as many people visit it
- decision to do: ELCE 2019, Lyon France
- testing track, testing BOF
- private meeting again?
- not sure
- would need to get sponsorship again - may be
cheaper if only a meeting room
# Richard:
Do we have a way to speak with one voice?
=> one document endorsed by kernelci and yocto?
- do we need an organization?
- we are here as individuals, not speaking for our companies?
- can kernelci project be the "voice"
# Kevin:
- Kevin can't speak for board or for kernelci project
- it's not formed yet
- first projects:
- test definition
- Tim will do another survey
- not questions, but just ask for a link to an example of a test
- common run artifacts
- common results format
- just use xunit?
- who is doing this?
- test execution API (E)
- maybe start with list of phases
- stuff on wiki:
- main Automated Testing page
- list of links to test repositories
- get a URL for each test system
A picture was taken of summit attendees
= Decisions from the summit =
* pdudaemon will serve as our first DUT control API consolidation point
- please put your DUT power control driver in pdudaemon
* we will continue to use the current mailing list for discussions, for now
* we will save information and documents on the elinux wiki
* our next physical meeting(s) will be at ELCE 2019
= Action Items from the summit =
AI - refine glossary over time to remove ambiguity (??)
AI - modify diagram with discussed changes (Kevin?)
AI - create elinux wiki page for Automated Testing topic (Tim)
AI - create elinux wiki page for Test Systems (Tim)
- with links to repositories for each system
AI - collect Run Artifact fields (for possible RA standard) (??)
AI - collect Test Definition fields (for possible TD standard) (Tim)
AI - send a list of test phases to the list (start of API 'E' discussions) (Tim)
AI - create Debian package for pdudaemon (Tim Orling)
AI - create an automated test project in the Linux Foundation (Kevin)
- currently called KernelCI project
AI - arrange for sessions and meetings at ELCE 2019 (Tim)
AI - create "Design for Testing" document aimed at board hardware designers (??)