- Jun 06, 2024
-
-
Ryan Gonzalez authored
Apparently, while copying the contents of the old aptly-repository over to here, I accidentally included this middleware file that was sticking around in my local copy from back when we were deciding how exactly to set up authentication. The original idea was that we'd use oathkeeper as a decision maker only, rather than a proxy as it is now, but this was changed in order to avoid dealing with CRDs. Of course, this file got stuck in here, which spoiled that goal just a tiny bit. In fact, I *literally reviewed changes to it* in !13 that were explicitly tied to the exact CRD issues we wanted to avoid in the first place. https://phabricator.apertis.org/T10135 Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
- May 23, 2024
-
-
Pablo Vigo Mas authored
The aptly API image was declared outside the specific API section. Considering that the statefulset file has 2 containers with different images, this made the image configuration confusing. Now, the image name is within the specific API section, making the values file more understandable and avoiding confusion. Signed-off-by: Pablo Vigo <pvigo@collabora.com>
-
- May 10, 2024
-
-
Pablo Vigo Mas authored
Currently, the publish Docker image is hardcoded in the statefulset file. It is necessary to parameterize it in the values file to have the option of using another Docker image if needed. Some instances of aptly require OpenID Connect authentication, and it is necessary to use a specific image with support for this authentication. Signed-off-by: Pablo Vigo <pvigo@collabora.com>
-
- Apr 23, 2024
-
-
Ryan Gonzalez authored
That way it's clear that it only affects the API container and also sits right next to extraEnvVars. https://phabricator.apertis.org/T10425 Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
- Apr 16, 2024
-
-
Pablo Vigo Mas authored
It is necessary to add a new block to allow new volumes from the values file without modifying the original Helm chart. Signed-off-by: Pablo Vigo <pvigo@collabora.com>
-
- Apr 09, 2024
-
-
Ryan Gonzalez authored
mod_dir is needed to redirect directory URLs to include the trailing slash; without that, following links in the index will *overwrite* the trailing path component instead of appending to it, leading to 404s. https://phabricator.apertis.org/T10135 Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
This includes support for injecting custom Apache configuration blocks by editing the values file, which should make it possible to add authentication later on. This simplifies the switchover of deployments that are currently using Apache + mod_auth_oidc to guard their published repositories. https://phabricator.apertis.org/T10135 Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
- Apr 05, 2024
-
-
This is needed to configure the Go runtime. https://phabricator.apertis.org/T10420 Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
- Mar 28, 2024
-
-
Sometimes it is necessary to run the service with additional arguments, for example to use opportunistic locking on the database to allow for for maintenance tasks via the CLI. Currently, the Helm Chart is not configured to allow the addition of new arguments. With this modification, it is now possible to add arguments to the aptly-api container from the values file using the `extraArgs` value. Signed-off-by: Pablo Vigo <pvigo@collabora.com>
-
- Mar 21, 2024
-
-
We need to deploy an instance of `aptly` in the `core` cluster, but there is an incompatibility on the CRDs API Group of Traefik. Currently, the Traefik Kubernetes CRDs API Group used in the Core cluster is older than what the latest `aptly helm chart` requires, so a modification of the `aptly Helm Chart` to make it compatible with the current CRD API version is required. The ultimate solution is to upgrade Traefik. However, since this upgrade involves a longer execution time, it has been decided to temporarily modify the Helm Chart to enable faster deployment while planning the upgrade.
-
- Feb 12, 2024
-
-
In current aptly, each repository and snapshot has its own reflist in the database. This brings a few problems with it: - Given a sufficiently large repositories and snapshots, these lists can get enormous, reaching >1MB. This is a problem for LevelDB's overall performance, as it tends to prefer values around the confiruged block size (defaults to just 4KiB). - When you take these large repositories and snapshot them, you have a full, new copy of the reflist, even if only a few packages changed. This means that having a lot of snapshots with a few changes causes the database to basically be full of largely duplicate reflists. - All the duplication also means that many of the same refs are being loaded repeatedly, which can cause some slowdown but, more notably, eats up huge amounts of memory. - Adding on more and more new repositories and snapshots will cause the time and memory spent on things like cleanup and publishing to grow roughly linearly. At the core, there are two problems here: - Reflists get very big because there are just a lot of packages. - Different reflists can tend to duplicate much of the same contents. *Split reflists* aim at solving this by separating reflists into 64 *buckets*. Package refs are sorted into individual buckets according to the following system: - Take the first 3 letters of the package name, after dropping a `lib` prefix. (Using only the first 3 letters will cause packages with similar prefixes to end up in the same bucket, under the assumption that packages with similar names tend to be updated together.) - Take the 64-bit xxhash of these letters. (xxhash was chosen because it relatively good distribution across the individual bits, which is important for the next step.) - Use the first 6 bits of the hash (range [0:63]) as an index into the buckets. Once refs are placed in buckets, a sha256 digest of all the refs in the bucket is taken. These buckets are then stored in the database, split into roughly block-sized segments, and all the repositories and snapshots simply store an array of bucket digests. This approach means that *repositories and snapshots can share their reflist buckets*. If a snapshot is taken of a repository, it will have the same contents, so its split reflist will point to the same buckets as the base repository, and only one copy of each bucket is stored in the database. When some packages in the repository change, only the buckets containing those packages will be modified; all the other buckets will remain unchanged, and thus their contents will still be shared. Later on, when these reflists are loaded, each bucket is only loaded once, short-cutting loaded many megabytes of data. In effect, split reflists are essentially copy-on-write, with only the changed buckets stored individually. Changing the disk format means that a migration needs to take place, so that task is moved into the database cleanup step, which will migrate reflists over to split reflists, as well as delete any unused reflist buckets. All the reflist tests are also changed to additionally test out split reflists; although the internal logic is all shared (since buckets are, themselves, just normal reflists), some special additions are needed to have native versions of the various reflist helper methods. In our tests, we've observed the following improvements: - Memory usage during publish and database cleanup, with `GOMEMLIMIT=2GiB`, goes down from ~3.2GiB (larger than the memory limit!) to ~0.7GiB, a decrease of ~4.5x. - Database size decreases from 1.3GB to 367MB. *In my local tests*, publish times had also decreased down to mere seconds but the same effect wasn't observed on the server, with the times staying around the same. My suspicions are that this is due to I/O performance: my local system is an M1 MBP, which almost certainly has much faster disk speeds than our DigitalOcean block volumes. Split reflists include a side effect of requiring more random accesses from reading all the buckets by their keys, so if your random I/O performance is slower, it might cancel out the benefits. That being said, even in that case, the memory usage and database size advantages still persist. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
The previous reflist logic would early-exit the loop body if one of the lists was empty, but that skips the compacting logic entirely. Instead of doing the early-exit, we can leave a list's ref as nil when the list end is reached and then flip the comparison result, which will essentially treat it as being greater than all others. This should preserve the general behavior without omitting the compaction. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
The output doesn't actually depend on the reflists, and loading them for every published repo starts to take substantial time and memory. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Getting Go 1.21 (required by newer aptly) on bookworm requires utilizing the backports repository; it's easier to just rely on the official images instead. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Emanuele Aina authored
Always include annotations in the Deployment object See merge request !8
-
Checksum in annotations ensures that the pod restart when an object is updated. Previosuly, checksums were not included when there were no other annotations on the object, so they had no effect. Instead, always include annotations with at least the checksums, so they are be used regardless. Fixes: 45bed51a ("Ensure the pod restarts when the K8s secret with the config is updated") Signed-off-by: Pablo Vigo <pvigo@collabora.com>
-
- Jan 30, 2024
-
-
Emanuele Aina authored
Reduce docker build memory usage and rename the image See merge request !9
-
- Jan 17, 2024
-
-
Ryan Gonzalez authored
This will help to avoid accidentally deploying changes that aren't on the production branch. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
The builds are apparently now OOM-ing on the lightweight runner, and `--compressed-caching=false` reduces memory usage from >2GiB to <0.1GiB in exchange for only a few seconds of slowdown. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ensure Pod restarts automatically whenever the ConfigMap or Secret object is updated. A hash of the object is now included in StatefulSet. PODs only restart if the checksum of the ConfigMap or Secret is different, which means that the content of the object has been changed. PODs remain running if the content of the object is the same. Signed-off-by: Pablo Vigo <pvigo@collabora.com>
-
- Jan 12, 2024
-
-
Ryan Gonzalez authored
This imports the docker & helm setup, since having it all in one repo makes the update process a bit smoother. There are a few changes to the original docker setup: - The startup script has several improvements: - It actually forwards command-line arguments to aptly. - APTLY_PROFILE can be set at runtime to enable profiling, writing the data to /aptly/data/profile. - The dockerfile can build aptly w/ debugging enabled if APTLY_DEBUG=true is given, which can be passed over via GitLab CI variables. - GOFLAGS will be forwarded to the builder stage in the dockerfile, which is useful for passing down some development-related flags. The latter two points in particular make it easier to build and run versions of aptly w/ profiling enabled, for debugging performance and resource usage issues. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
Reflists are basically stored as arrays of strings, which are quite space-efficient in MessagePack. Thus, using zero-copy decoding results in nice performance and memory savings, because the overhead of separate allocations ends up far exceeding the overhead of the original slice. With the included benchmark run for 20s with -benchmem, the runtime, memory usage, and allocations go from ~740us/op, ~192KiB/op, and 4100 allocs/op to ~240us/op, ~97KiB/op, and 13 allocs/op, respectively. https://github.com/aptly-dev/aptly/pull/1222 Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
The cleanup phase needs to list out all the files in each component in order to determine what's still in use. When there's a large number of sources (e.g. from having many snapshots), the time spent just loading the package information becomes substantial. However, in many cases, most of the packages being loaded are actually shared across the sources; if you're taking frequent snapshots, for instance, most of the packages in each snapshot will be the same as other snapshots. In these cases, re-reading the packages repeatedly is just a waste of time. To improve this, we maintain a list of refs that we know were processed for each component. When listing the refs from a source, only the ones that have not yet been processed will be examined. Some tests were also added specifically to check listing the files in a component. With this change, listing the files in components on a copy of our production database went from >10 minutes to ~10 seconds, and the newly added benchmark went from ~300ms to ~43ms. https://github.com/aptly-dev/aptly/pull/1222 Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
When merging reflists with ignoreConflicting set to true and overrideMatching set to false, the individual ref components are never examined, but the refs are still split anyway. Avoiding the split when we never use the components brings a massive speedup: on my system, the included benchmark goes from ~1500 us/it to ~180 us/it. https://github.com/aptly-dev/aptly/pull/1222 Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
In some local tests w/ a slowed down filesystem, this massively cut down on the time to clean up a repository by ~3x, bringing a total 'publish update' time from ~16s to ~13s. https://github.com/aptly-dev/aptly/pull/1222 Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
There's no apparent reason to disallow this type of characters in the distribution name. Aptly will just create the proper subdirectories to that path. Signed-off-by: Ariel D'Alessandro <ariel.dalessandro@collabora.com>
-
Ryan Gonzalez authored
This adds support for storing packages directly on Azure, with no truly "local" (on-disk) repo used. The existing Azure PublishedStorage implementation was refactored to move the shared code to a separate context struct, which can then be re-used by the new PackagePool. In addition, the files package's mockChecksumStorage was made public so that it could be used in the Azure PackagePool tests as well. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
Several sections of the code *required* a LocalPackagePool, but they could still perform their operations with a standard PackagePool. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
The contents of `os.Stat` are rather fitted towards local package pools, but the method is in the generic PackagePool interface. This moves it to LocalPackagePool, and the use case of simply finding a file's size is delegated to a new, more generic PackagePool.Size() method. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
Before, a "partial" URL (either "localhost:port" or an endpoint URL *without* the account name as the subdomain) would be specified, and the full one would automatically be inferred. Although this is somewhat nice, it means that the endpoint string doesn't match the official Azure syntax: https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string This also raises issues for the creation of functional tests for Azure, as the code to determine the endpoint string needs to be duplicated there as well. Instead, it's just easiest to follow Azure's own standard, and then sidestep the need for any custom logic in the functional tests. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
None of the commands' output is ever treated as binary, so we can just always decode it as text. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
Ryan Gonzalez authored
read_path() can read in binary, which the S3 tests don't support (simply because they don't need it)...but it needs to be able to take the `mode` argument anyway. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
-
- Nov 23, 2023
-
-
Paul Cacheux authored
-
Paul Cacheux authored
-
Paul Cacheux authored
-
Paul Cacheux authored
-