Skip to content
Snippets Groups Projects
Commit a61f6155 authored by Dave Pigott's avatar Dave Pigott
Browse files

Add deployment and LAVA team roadmap

parent e7d16380
Branches
Tags
1 merge request!58Add deployment and LAVA team roadmap
Pipeline #55193 passed
---
title: Collabora Lava Lab device deployment plan
---
## In stock, ready for deployment
### Ampere emag servers
* Quantity: 4
* Ready to go in next batch
* 1 that can go in straight away, just needs some testing with the dhcp for the correct efi grub to be sent to it
* 3 others need some firmware flashing as well.
* [T34634](https://phabricator.collabora.com/T34634)
## In stock, awaiting dependencies
### Chromebook Tomato (cherry) Acer
* Quantity: 12
* Ready to be deployed when new dispatchers are set up
* [T36522](https://phabricator.collabora.com/T36522)
### Renegade elite (rk3399)
* Quantity: 5
* just needs a tweak to the docs to point to the right firmware to flash them so that if we add more or need to re-flash we are doing the same as they currently are set up
* [T38110](https://phabricator.collabora.com/T38110)
### Rock 5B
* Quantity: NA
* A couple that could go in but they are of a different spec. Lower priority
### Ampere Mt jade
* Quantity: 1(?)
* Awaiting confirmation we can use it in the Lab – unit we have was pre-release
## With engineer for integration
### Chromebook Berknip (zork) HP
* Quantity: 6
* One on staging so Laura can work on depthcharge
* [T40291](https://phabricator.collabora.com/T40291)
### Chromebook Dewatt (guybrush) Acer
* Quantity: 12
* In bring up with Laura/Lucas
* [T39039](https://phabricator.collabora.com/T39039)
### Apertis potential new renesas
* Quantity: 5
* Apertis working on roadmap for lab deployment
### Chromebook Kaisa (puff)
* Quantity: 12
* On it’s way to Laura for her to work on depthcharge
* [T40243](https://phabricator.collabora.com/T40243)
### Chromebook Volmar (brya) Acer
* Quantity: 12
* On it’s way to Laura for to work on the depthcharge
* [T40244](https://phabricator.collabora.com/T40244)
## Awaiting stock
### Chromebook arcada
* Quantity: NA
* Not with us yet. Nick working on customs invoice with google
### Chromebook Volteer
* Quantity: 5 or 10
* Not with us yet. Nick working on customs invoice with google
* Mesa would like some more of these before the split so we are in communication with google to source more.
* [T39591](https://phabricator.collabora.com/T39591)
## Potential but unknown
### TI AM62xx ?? 5 – 10
* Quantity: 5 - 10
---
title: LAVA team roadmap
---
## LAVA Development
### Internals (T31327)
#### Review internal/external LAVA server-worker API
* Find differences between internal/external flow
* Verify if it can be unified
* Reduce LAVA code bas by reusing common components
#### Improve job logs
* Lower occurrences of "Listened connection for namespace '%s' for up to %ds" message (T37051)
* Consider `\r` as a valid line end marker when monitoring the DUT's console (T37054)
- Issue reported upstream: https://git.lavasoftware.org/lava/lava/-/issues/561
* Allow keeping escape control characters (T37055)
- Both items above resolved by: https://gitlab.collabora.com/lava/lava/-/merge_requests/120 (to be upstreamed)
#### Traffic reduction (T32184)
* Main goal is to provide new more efficent ways for handling logs
* Keep in mind to document any dropped solution proposal
* Start by mimicing Open Build Service log handling
#### Benchmarks (T32182)
* Ping upstream for review, update demo-related branches across all relevant repositories (less than half day)
* Add benchmarks for frequently used API endpoints (less than quarter day)
* Enable benchmarking pipeline at least in the internal GitLab (less than half day)
* Extend benchmarking scenarios (for generated database and tests)
* Review bottlenecks found by benchmarks (preferably with solution proposals)
* Submit a blog post with rationale and implementation details
### Option for disabling viewinggroups
* [LAVA MR 1942](https://git.lavasoftware.org/lava/lava/-/merge_requests/1942)
* Awaiting approval or decline by LAVA Team
### Revise stats collection in the database
* Review index usage and look for little used ones – drop them from Django or from Postgres
### Postgres Vacuum
* Periodic stall check Kubernetes provides support for long running crontaabs
### DB Use cases
* Which package should it be put in? lava-dev, lava-debug? (latter does not exist yet)
### Job output compression
* currently timing out - do binary chop on compression period
## LAVA CI
* Results comparison using internal pytest-benchmark mechanism
## Security
### Codebase review
* Run as gitlab runners?
####Automated scanning:
* [Verifying Django generated HTML](https://github.com/peterbe/django-html-validator)
* [Finding security flaws in python](https://pypi.org/project/bandit/)
* [Being fixed by LAVA team](https://git.lavasoftware.org/lava/lava/-/issues/584)
* [Python code quality checker](https://github.com/PyCQA/pyflakes)
## System administration
### Resource issues
* What if someone is unavailable - how do we mitigate - create a plan
### Alerting for predictable defects
* If support services are unavailable or are about to become unavailable, alert and remedy.
### Storing and extracting metadata: Loki, Prometheus/Victoria/Mimir
* Kubernetes only stores 10MB data – large logs, and we lose data. Develop a mitigation strategy/
* Sometimes Loki loses connection after upgrade. Investigate underlying causes
### Postgres optimization
* Use Unix sockets instead of TCP, outline comparison
* Find out what the performance benefit, if any, would result
### Dispatcher version synchronisation
* Plan a move to lavapeur and automated upgrades
### Device controllers
#### Fleet management
#### Conserver, PDU control etc, etc.
* Analyse actual reasons for issue occurrences
#### Align deployment
* Docker image alignment with upstream
#### Consider Prometheus alternatives
* Investigate and produce a plan if suitable alternative found
## Monitoring
### Revisit db index usage
* [How often is it updated?](https://monitoring.core.collabora.dev/d/IDWko4VVk/postgresql-stats)
* Replace ratio value with cache misses
* Add Grafana alerts for potential defects
## LAVA Lab device integration and deployment
* See deployment road map in gitlab
## Operator's perspective
### Hardware management
#### Configuration and fleet management (controller boards)
* Unify configuration management to use Ansible, e.g. for device configuration changes rollout (T21468)
* Move DUT controlling utilities (pdudaemon, conserver, etc.) from dispatcher to external [Target Managers](https://elinux.org/Test_Glossary)
#### Operator's routines
* Provide a list of _known failures_ (e.g. pending external support) to prevent ignoring new alerts
* Add a _"blame hardware"_ CronJob for issues resolved by reseating connections
### Administration and integration
#### Monitoring cloud-friendliness (T32181)
* Check which tracing solution (Sentry, Jaeger, etc.) fits best with current setup
* Provide minimal working setup for initial testing and change verification
* Add tracing service to the deployment
#### Investigate available storage solutions
* Take into account other products than Kubernetes volumes
* Compare benefit-to-cost ratios
* Keep in mind storage size reduction efforts (outdated jobs, job artifacts removal)
#### Component upgrades: Synchronize dispatcher version with server
* Determine how the dispatcher version is exposed and when upgrade should be enforced (half day)
* Verify if upstream approach with host daemon can be reused or improved (half day - a day)
* Verify dispatcher upgrade mechanism with Kubernetes-based server (half-day)
#### Component upgrades: Extend component version management
* Set up mirror repository with a CI job triggered by a new tag (less than half day)
* Rebase staging branch on the new release assuming no merge conflicts - to be reviewed manually (half day)
* Determine which components might need version pinning/manual upgrades (if any)
#### Batch processing
* Parse job output from [lava-gitlab-runner](https://gitlab.collabora.com/lava/lava-gitlab-runner)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment