Skip to content
Snippets Groups Projects
Commit a61f6155 authored by Dave Pigott's avatar Dave Pigott
Browse files

Add deployment and LAVA team roadmap

parent e7d16380
No related branches found
No related tags found
1 merge request!58Add deployment and LAVA team roadmap
Pipeline #55193 passed
---
title: Collabora Lava Lab device deployment plan
---
## In stock, ready for deployment
### Ampere emag servers
* Quantity: 4
* Ready to go in next batch
* 1 that can go in straight away, just needs some testing with the dhcp for the correct efi grub to be sent to it
* 3 others need some firmware flashing as well.
* [T34634](https://phabricator.collabora.com/T34634)
## In stock, awaiting dependencies
### Chromebook Tomato (cherry) Acer
* Quantity: 12
* Ready to be deployed when new dispatchers are set up
* [T36522](https://phabricator.collabora.com/T36522)
### Renegade elite (rk3399)
* Quantity: 5
* just needs a tweak to the docs to point to the right firmware to flash them so that if we add more or need to re-flash we are doing the same as they currently are set up
* [T38110](https://phabricator.collabora.com/T38110)
### Rock 5B
* Quantity: NA
* A couple that could go in but they are of a different spec. Lower priority
### Ampere Mt jade
* Quantity: 1(?)
* Awaiting confirmation we can use it in the Lab – unit we have was pre-release
## With engineer for integration
### Chromebook Berknip (zork) HP
* Quantity: 6
* One on staging so Laura can work on depthcharge
* [T40291](https://phabricator.collabora.com/T40291)
### Chromebook Dewatt (guybrush) Acer
* Quantity: 12
* In bring up with Laura/Lucas
* [T39039](https://phabricator.collabora.com/T39039)
### Apertis potential new renesas
* Quantity: 5
* Apertis working on roadmap for lab deployment
### Chromebook Kaisa (puff)
* Quantity: 12
* On it’s way to Laura for her to work on depthcharge
* [T40243](https://phabricator.collabora.com/T40243)
### Chromebook Volmar (brya) Acer
* Quantity: 12
* On it’s way to Laura for to work on the depthcharge
* [T40244](https://phabricator.collabora.com/T40244)
## Awaiting stock
### Chromebook arcada
* Quantity: NA
* Not with us yet. Nick working on customs invoice with google
### Chromebook Volteer
* Quantity: 5 or 10
* Not with us yet. Nick working on customs invoice with google
* Mesa would like some more of these before the split so we are in communication with google to source more.
* [T39591](https://phabricator.collabora.com/T39591)
## Potential but unknown
### TI AM62xx ?? 5 – 10
* Quantity: 5 - 10
---
title: LAVA team roadmap
---
## LAVA Development
### Internals (T31327)
#### Review internal/external LAVA server-worker API
* Find differences between internal/external flow
* Verify if it can be unified
* Reduce LAVA code bas by reusing common components
#### Improve job logs
* Lower occurrences of "Listened connection for namespace '%s' for up to %ds" message (T37051)
* Consider `\r` as a valid line end marker when monitoring the DUT's console (T37054)
- Issue reported upstream: https://git.lavasoftware.org/lava/lava/-/issues/561
* Allow keeping escape control characters (T37055)
- Both items above resolved by: https://gitlab.collabora.com/lava/lava/-/merge_requests/120 (to be upstreamed)
#### Traffic reduction (T32184)
* Main goal is to provide new more efficent ways for handling logs
* Keep in mind to document any dropped solution proposal
* Start by mimicing Open Build Service log handling
#### Benchmarks (T32182)
* Ping upstream for review, update demo-related branches across all relevant repositories (less than half day)
* Add benchmarks for frequently used API endpoints (less than quarter day)
* Enable benchmarking pipeline at least in the internal GitLab (less than half day)
* Extend benchmarking scenarios (for generated database and tests)
* Review bottlenecks found by benchmarks (preferably with solution proposals)
* Submit a blog post with rationale and implementation details
### Option for disabling viewinggroups
* [LAVA MR 1942](https://git.lavasoftware.org/lava/lava/-/merge_requests/1942)
* Awaiting approval or decline by LAVA Team
### Revise stats collection in the database
* Review index usage and look for little used ones – drop them from Django or from Postgres
### Postgres Vacuum
* Periodic stall check Kubernetes provides support for long running crontaabs
### DB Use cases
* Which package should it be put in? lava-dev, lava-debug? (latter does not exist yet)
### Job output compression
* currently timing out - do binary chop on compression period
## LAVA CI
* Results comparison using internal pytest-benchmark mechanism
## Security
### Codebase review
* Run as gitlab runners?
####Automated scanning:
* [Verifying Django generated HTML](https://github.com/peterbe/django-html-validator)
* [Finding security flaws in python](https://pypi.org/project/bandit/)
* [Being fixed by LAVA team](https://git.lavasoftware.org/lava/lava/-/issues/584)
* [Python code quality checker](https://github.com/PyCQA/pyflakes)
## System administration
### Resource issues
* What if someone is unavailable - how do we mitigate - create a plan
### Alerting for predictable defects
* If support services are unavailable or are about to become unavailable, alert and remedy.
### Storing and extracting metadata: Loki, Prometheus/Victoria/Mimir
* Kubernetes only stores 10MB data – large logs, and we lose data. Develop a mitigation strategy/
* Sometimes Loki loses connection after upgrade. Investigate underlying causes
### Postgres optimization
* Use Unix sockets instead of TCP, outline comparison
* Find out what the performance benefit, if any, would result
### Dispatcher version synchronisation
* Plan a move to lavapeur and automated upgrades
### Device controllers
#### Fleet management
#### Conserver, PDU control etc, etc.
* Analyse actual reasons for issue occurrences
#### Align deployment
* Docker image alignment with upstream
#### Consider Prometheus alternatives
* Investigate and produce a plan if suitable alternative found
## Monitoring
### Revisit db index usage
* [How often is it updated?](https://monitoring.core.collabora.dev/d/IDWko4VVk/postgresql-stats)
* Replace ratio value with cache misses
* Add Grafana alerts for potential defects
## LAVA Lab device integration and deployment
* See deployment road map in gitlab
## Operator's perspective
### Hardware management
#### Configuration and fleet management (controller boards)
* Unify configuration management to use Ansible, e.g. for device configuration changes rollout (T21468)
* Move DUT controlling utilities (pdudaemon, conserver, etc.) from dispatcher to external [Target Managers](https://elinux.org/Test_Glossary)
#### Operator's routines
* Provide a list of _known failures_ (e.g. pending external support) to prevent ignoring new alerts
* Add a _"blame hardware"_ CronJob for issues resolved by reseating connections
### Administration and integration
#### Monitoring cloud-friendliness (T32181)
* Check which tracing solution (Sentry, Jaeger, etc.) fits best with current setup
* Provide minimal working setup for initial testing and change verification
* Add tracing service to the deployment
#### Investigate available storage solutions
* Take into account other products than Kubernetes volumes
* Compare benefit-to-cost ratios
* Keep in mind storage size reduction efforts (outdated jobs, job artifacts removal)
#### Component upgrades: Synchronize dispatcher version with server
* Determine how the dispatcher version is exposed and when upgrade should be enforced (half day)
* Verify if upstream approach with host daemon can be reused or improved (half day - a day)
* Verify dispatcher upgrade mechanism with Kubernetes-based server (half-day)
#### Component upgrades: Extend component version management
* Set up mirror repository with a CI job triggered by a new tag (less than half day)
* Rebase staging branch on the new release assuming no merge conflicts - to be reviewed manually (half day)
* Determine which components might need version pinning/manual upgrades (if any)
#### Batch processing
* Parse job output from [lava-gitlab-runner](https://gitlab.collabora.com/lava/lava-gitlab-runner)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment