Commits · collabora/production · obs / aptly

Jun 06, 2024

chart: Remove unused Traefik middleware · ae3635da

Ryan Gonzalez authored 7 months ago

Apparently, while copying the contents of the old aptly-repository over
to here, I accidentally included this middleware file that was sticking
around in my local copy from back when we were deciding how exactly to
set up authentication. The original idea was that we'd use oathkeeper as
a decision maker only, rather than a proxy as it is now, but this was
changed in order to avoid dealing with CRDs.

Of course, this file got stuck in here, which spoiled that goal just a
tiny bit. In fact, I *literally reviewed changes to it* in !13 that were
explicitly tied to the exact CRD issues we wanted to avoid in the first
place.

https://phabricator.apertis.org/T10135



Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

ae3635da

May 23, 2024

put image in container-specific sections · ff2346f2

Pablo Vigo Mas authored 8 months ago


The aptly API image was declared outside the specific API section.
Considering that the statefulset file has 2 containers with
different images, this made the image configuration confusing.

Now, the image name is within the specific API section, making the
values file more understandable and avoiding confusion.

Signed-off-by: Pablo Vigo <pvigo@collabora.com>

ff2346f2

May 10, 2024

add publish image in values · 6b22ba21

Pablo Vigo Mas authored 8 months ago


Currently, the publish Docker image is hardcoded in the statefulset
file. It is necessary to parameterize it in the values file to have
the option of using another Docker image if needed.

Some instances of aptly require OpenID Connect authentication, and
it is necessary to use a specific image with support for this
authentication.

Signed-off-by: Pablo Vigo <pvigo@collabora.com>

6b22ba21

Apr 23, 2024

chart: Move extraArgs to the api values · 3dd099c5

Ryan Gonzalez authored 8 months ago

That way it's clear that it only affects the API container and also sits
right next to extraEnvVars.

https://phabricator.apertis.org/T10425



Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

3dd099c5

Apr 16, 2024

Add extravolumes to Helm Chart · 05f71869

Pablo Vigo Mas authored 9 months ago


It is necessary to add a new block to allow new volumes from the
values file without modifying the original Helm chart.

Signed-off-by: Pablo Vigo <pvigo@collabora.com>

05f71869

Apr 09, 2024

aptly: Make sure directory URLs always have a trailing slash · 64ac45de

Ryan Gonzalez authored 9 months ago

mod_dir is needed to redirect directory URLs to include the trailing
slash; without that, following links in the index will *overwrite* the
trailing path component instead of appending to it, leading to 404s.

https://phabricator.apertis.org/T10135



Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

64ac45de

Use Apache instead of nginx to serve our static repo files · aa8315fc

Ryan Gonzalez authored 9 months ago and

Emanuele Aina committed 9 months ago

This includes support for injecting custom Apache configuration blocks
by editing the values file, which should make it possible to add
authentication later on.

This simplifies the switchover of deployments that are currently
using Apache + mod_auth_oidc to guard their published repositories.

https://phabricator.apertis.org/T10135



Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

aa8315fc

Apr 05, 2024

chart: Add support for setting custom env vars · d57750c6

Ryan Gonzalez authored 9 months ago and

Pablo Vigo Mas committed 9 months ago

This is needed to configure the Go runtime.

https://phabricator.apertis.org/T10420



Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

d57750c6

Mar 28, 2024

chart: Add extra arguments in aptly-api container · a8c2451c

Pablo Vigo Mas authored 9 months ago and

Emanuele Aina committed 9 months ago


Sometimes it is necessary to run the service with additional arguments, for example to use opportunistic locking on the database to allow for for maintenance tasks via the CLI. Currently, the Helm Chart is not configured to allow the addition of new arguments. With this modification, it is now possible to add arguments to the aptly-api container from the values file using the `extraArgs` value.

Signed-off-by: Pablo Vigo <pvigo@collabora.com>

a8c2451c

Mar 21, 2024

Change traefik API group · fd0f8de4

Pablo Vigo Mas authored 10 months ago and

Pablo Vigo Mas committed 10 months ago

We need to deploy an instance of `aptly` in the `core` cluster, but there is an
incompatibility on the CRDs API Group of Traefik. Currently, the Traefik Kubernetes CRDs
API Group used in the Core cluster is older than what the latest `aptly helm chart`
requires, so a modification of the `aptly Helm Chart` to make it compatible with the
current CRD API version is required.

The ultimate solution is to upgrade Traefik. However, since this upgrade involves a longer
execution time, it has been decided to temporarily modify the Helm Chart to enable faster
deployment while planning the upgrade.

fd0f8de4

Feb 12, 2024

Split reflists to share their contents across snapshots · 353a6374

Ryan Gonzalez authored 1 year ago and

Emanuele Aina committed 11 months ago


In current aptly, each repository and snapshot has its own reflist in
the database. This brings a few problems with it:

- Given a sufficiently large repositories and snapshots, these lists can
  get enormous, reaching >1MB. This is a problem for LevelDB's overall
  performance, as it tends to prefer values around the confiruged block
  size (defaults to just 4KiB).
- When you take these large repositories and snapshot them, you have a
  full, new copy of the reflist, even if only a few packages changed.
  This means that having a lot of snapshots with a few changes causes
  the database to basically be full of largely duplicate reflists.
- All the duplication also means that many of the same refs are being
  loaded repeatedly, which can cause some slowdown but, more notably,
  eats up huge amounts of memory.
- Adding on more and more new repositories and snapshots will cause the
  time and memory spent on things like cleanup and publishing to grow
  roughly linearly.

At the core, there are two problems here:

- Reflists get very big because there are just a lot of packages.
- Different reflists can tend to duplicate much of the same contents.

*Split reflists* aim at solving this by separating reflists into 64
*buckets*. Package refs are sorted into individual buckets according to
the following system:

- Take the first 3 letters of the package name, after dropping a `lib`
  prefix. (Using only the first 3 letters will cause packages with
  similar prefixes to end up in the same bucket, under the assumption
  that packages with similar names tend to be updated together.)
- Take the 64-bit xxhash of these letters. (xxhash was chosen because it
  relatively good distribution across the individual bits, which is
  important for the next step.)
- Use the first 6 bits of the hash (range [0:63]) as an index into the
  buckets.

Once refs are placed in buckets, a sha256 digest of all the refs in the
bucket is taken. These buckets are then stored in the database, split
into roughly block-sized segments, and all the repositories and
snapshots simply store an array of bucket digests.

This approach means that *repositories and snapshots can share their
reflist buckets*. If a snapshot is taken of a repository, it will have
the same contents, so its split reflist will point to the same buckets
as the base repository, and only one copy of each bucket is stored in
the database. When some packages in the repository change, only the
buckets containing those packages will be modified; all the other
buckets will remain unchanged, and thus their contents will still be
shared. Later on, when these reflists are loaded, each bucket is only
loaded once, short-cutting loaded many megabytes of data. In effect,
split reflists are essentially copy-on-write, with only the changed
buckets stored individually.

Changing the disk format means that a migration needs to take place, so
that task is moved into the database cleanup step, which will migrate
reflists over to split reflists, as well as delete any unused reflist
buckets.

All the reflist tests are also changed to additionally test out split
reflists; although the internal logic is all shared (since buckets are,
themselves, just normal reflists), some special additions are needed to
have native versions of the various reflist helper methods.

In our tests, we've observed the following improvements:

- Memory usage during publish and database cleanup, with
  `GOMEMLIMIT=2GiB`, goes down from ~3.2GiB (larger than the memory
  limit!) to ~0.7GiB, a decrease of ~4.5x.
- Database size decreases from 1.3GB to 367MB.

*In my local tests*, publish times had also decreased down to mere
seconds but the same effect wasn't observed on the server, with the
times staying around the same. My suspicions are that this is due to I/O
performance: my local system is an M1 MBP, which almost certainly has
much faster disk speeds than our DigitalOcean block volumes. Split
reflists include a side effect of requiring more random accesses from
reading all the buckets by their keys, so if your random I/O
performance is slower, it might cancel out the benefits. That being
said, even in that case, the memory usage and database size advantages
still persist.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

353a6374

Fix reflist diffs failing to compact when one of the inputs ends · f52c4f38

Ryan Gonzalez authored 1 year ago and

Emanuele Aina committed 11 months ago

The previous reflist logic would early-exit the loop body if one of the
lists was empty, but that skips the compacting logic entirely.

Instead of doing the early-exit, we can leave a list's ref as nil when
the list end is reached and then flip the comparison result, which will
essentially treat it as being greater than all others. This should
preserve the general behavior without omitting the compaction.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

f52c4f38

Skip loading reflists when listing published repos · e84b0a39

Ryan Gonzalez authored 1 year ago and

Emanuele Aina committed 11 months ago

The output doesn't actually depend on the reflists, and loading them for
every published repo starts to take substantial time and memory.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

e84b0a39

docker: Switch from building on Debian to using the official Go image · 2bf1cbfa

Ryan Gonzalez authored 1 year ago and

Emanuele Aina committed 11 months ago


Getting Go 1.21 (required by newer aptly) on bookworm requires utilizing
the backports repository; it's easier to just rely on the official
images instead.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

2bf1cbfa

Merge branch 'wip/pvigo/fix-chart-annotation' into 'collabora/staging' · b435865f
Emanuele Aina authored 11 months ago
```
Always include annotations in the Deployment object

See merge request !8
```
b435865f

Always include annotations in the Deployment object · 0e9ff49a

Pablo Vigo Mas authored 11 months ago and

Emanuele Aina committed 11 months ago


Checksum in annotations ensures that the pod restart when an object is
updated. Previosuly, checksums were not included when there were no other
annotations on the object, so they had no effect.
Instead, always include annotations with at least the checksums, so they
are be used regardless.

Fixes: 45bed51a ("Ensure the pod restarts when the K8s secret with the config is updated")
Signed-off-by: Pablo Vigo <pvigo@collabora.com>

0e9ff49a

Jan 30, 2024
- Merge branch 'wip/refi64/image-build-fixes' into 'collabora/staging' · 75bed82a
  Emanuele Aina authored 11 months ago
```
Reduce docker build memory usage and rename the image

See merge request !9
```
  75bed82a
Jan 17, 2024

Push non-production aptly builds to a separate repository · 50b16acc

Ryan Gonzalez authored 1 year ago


This will help to avoid accidentally deploying changes that aren't on
the production branch.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

50b16acc

Pass --compressed-caching=false to Kaniko to reduce memory usage · 2ab88aff

Ryan Gonzalez authored 1 year ago


The builds are apparently now OOM-ing on the lightweight runner, and
`--compressed-caching=false` reduces memory usage from >2GiB to <0.1GiB
in exchange for only a few seconds of slowdown.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

2ab88aff

Ensure Pod Restarts upon Object Update · acdbc8ce

Pablo Vigo Mas authored 1 year ago and

Ryan Gonzalez committed 1 year ago

Ensure Pod restarts automatically whenever the ConfigMap or Secret object is updated. A hash of the object is now included in StatefulSet.

PODs only restart if the checksum of the ConfigMap or Secret is different, which means that the content of the object has been changed. PODs remain running if the content of the object is the same.

Signed-off-by: Pablo Vigo <pvigo@collabora.com>

acdbc8ce

Jan 12, 2024

Add dockerfile and helm chart from aptly-repository · d453ed6f

Ryan Gonzalez authored 1 year ago


This imports the docker & helm setup, since having it all in one repo
makes the update process a bit smoother.

There are a few changes to the original docker setup:

- The startup script has several improvements:
  - It actually forwards command-line arguments to aptly.
  - APTLY_PROFILE can be set at runtime to enable profiling, writing
    the data to /aptly/data/profile.
- The dockerfile can build aptly w/ debugging enabled if
  APTLY_DEBUG=true is given, which can be passed over via GitLab CI
  variables.
- GOFLAGS will be forwarded to the builder stage in the dockerfile,
  which is useful for passing down some development-related flags.

The latter two points in particular make it easier to build and run
versions of aptly w/ profiling enabled, for debugging performance and
resource usage issues.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

d453ed6f

Use zero-copy decoding for reflists · ad8738b8

Ryan Gonzalez authored 1 year ago

Reflists are basically stored as arrays of strings, which are quite
space-efficient in MessagePack. Thus, using zero-copy decoding results
in nice performance and memory savings, because the overhead of separate
allocations ends up far exceeding the overhead of the original slice.

With the included benchmark run for 20s with -benchmem, the runtime,
memory usage, and allocations go from ~740us/op, ~192KiB/op, and 4100
allocs/op to ~240us/op, ~97KiB/op, and 13 allocs/op, respectively.

https://github.com/aptly-dev/aptly/pull/1222

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

ad8738b8

Improve publish cleanup perf when sources share most of their packages · ebe91ce8

Ryan Gonzalez authored 1 year ago

The cleanup phase needs to list out all the files in each component in
order to determine what's still in use. When there's a large number of
sources (e.g. from having many snapshots), the time spent just loading
the package information becomes substantial. However, in many cases,
most of the packages being loaded are actually shared across the
sources; if you're taking frequent snapshots, for instance, most of the
packages in each snapshot will be the same as other snapshots. In these
cases, re-reading the packages repeatedly is just a waste of time.

To improve this, we maintain a list of refs that we know were processed
for each component. When listing the refs from a source, only the ones
that have not yet been processed will be examined. Some tests were also
added specifically to check listing the files in a component.

With this change, listing the files in components on a copy of our
production database went from >10 minutes to ~10 seconds, and the newly
added benchmark went from ~300ms to ~43ms.

https://github.com/aptly-dev/aptly/pull/1222

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

ebe91ce8

Improve performance of simple reflist merges · 90487350

Ryan Gonzalez authored 1 year ago

When merging reflists with ignoreConflicting set to true and
overrideMatching set to false, the individual ref components are never
examined, but the refs are still split anyway. Avoiding the split when
we never use the components brings a massive speedup: on my system, the
included benchmark goes from ~1500 us/it to ~180 us/it.

https://github.com/aptly-dev/aptly/pull/1222

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

90487350

Use github.com/saracen/walker for file walk operations · 340334d6

Ryan Gonzalez authored 1 year ago

In some local tests w/ a slowed down filesystem, this massively cut down
on the time to clean up a repository by ~3x, bringing a total 'publish
update' time from ~16s to ~13s.

https://github.com/aptly-dev/aptly/pull/1222



Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

340334d6

publish: Allow slash '/' character in distribution name on publishing · ce4b2df4

Ariel D'Alessandro authored 2 years ago and

Ryan Gonzalez committed 1 year ago


There's no apparent reason to disallow this type of characters in the
distribution name. Aptly will just create the proper subdirectories to
that path.

Signed-off-by: Ariel D'Alessandro <ariel.dalessandro@collabora.com>

ce4b2df4

Add support for Azure package pools · 5f5c5866

Ryan Gonzalez authored 2 years ago

This adds support for storing packages directly on Azure, with no truly
"local" (on-disk) repo used. The existing Azure PublishedStorage
implementation was refactored to move the shared code to a separate
context struct, which can then be re-used by the new PackagePool. In
addition, the files package's mockChecksumStorage was made public so
that it could be used in the Azure PackagePool tests as well.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

5f5c5866

Clean up temporary files when mirroring · 87605136
Ryan Gonzalez authored 2 years ago
```
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
```
87605136

Reduce required usage of LocalPackagePool · 41b202d8

Ryan Gonzalez authored 2 years ago

Several sections of the code *required* a LocalPackagePool, but they
could still perform their operations with a standard PackagePool.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

41b202d8

Move Stat() into LocalPackagePool · 3e85a3d3

Ryan Gonzalez authored 2 years ago

The contents of `os.Stat` are rather fitted towards local package pools,
but the method is in the generic PackagePool interface. This moves it to
LocalPackagePool, and the use case of simply finding a file's size is
delegated to a new, more generic PackagePool.Size() method.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

3e85a3d3

Add support for custom package pool locations · d5e5bc26
Ryan Gonzalez authored 2 years ago
```
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
```
d5e5bc26
Add functional tests for Azure publishing · 69b54749
Ryan Gonzalez authored 2 years ago
```
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
```
69b54749
Document Azure configuration · 79e7278f
Ryan Gonzalez authored 2 years ago
```
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
```
79e7278f

Change Azure endpoint syntax to require a full URL · 39cb83ce

Ryan Gonzalez authored 2 years ago

Before, a "partial" URL (either "localhost:port" or an endpoint URL
*without* the account name as the subdomain) would be specified, and the
full one would automatically be inferred. Although this is somewhat
nice, it means that the endpoint string doesn't match the official Azure
syntax:

https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string



This also raises issues for the creation of functional tests for Azure,
as the code to determine the endpoint string needs to be duplicated
there as well.

Instead, it's just easiest to follow Azure's own standard, and then
sidestep the need for any custom logic in the functional tests.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

39cb83ce

Fix functional tests' '--capture' on Python 3 · c6f55a1f

Ryan Gonzalez authored 2 years ago


None of the commands' output is ever treated as binary, so we can just
always decode it as text.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

c6f55a1f

Fix S3 tests on Python 3 · 61fab905

Ryan Gonzalez authored 2 years ago


read_path() can read in binary, which the S3 tests don't support (simply
because they don't need it)...but it needs to be able to take the `mode`
argument anyway.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>

61fab905

Nov 23, 2023
- add support for `EdDSA` keys in `pubkeyAlgorithmName` · aeef41bf
  Paul Cacheux authored 1 year ago
  
  aeef41bf
- fix `t09_repo/IncludeRepo21Test_gold` gold error · 99dbe31d
  Paul Cacheux authored 1 year ago
  
  99dbe31d
- add name to authors · 5ca3a97b
  Paul Cacheux authored 1 year ago
  
  5ca3a97b
- replace `golang.org/x/crypto/openpgp` with `github.com/ProtonMail/go-crypto/openpgp` · cfcab13c
  Paul Cacheux authored 1 year ago
  
  cfcab13c