Skip to content

Artifact upload may have a practical limit we will hit in some circumstances #6

@briantist

Description

@briantist

Part of splitting out the various pieces of a docs build/publish is ensuring that we use least privilege and prevent arbitrary code from having access to privileged tokens or secrets. As a result, the build portion of this process (which runs arbitrary code from contributors in the case of PRs) runs in its own job, with no secrets or write access. But it must make the rendered docsite files available to other jobs that might do something useful with them.

Because each job runs in its own virtual machine, these files cannot be directly accessed by other runners. The content is too large to send via outputs practically. The supported way for transferring data like this, is uploading build artifacts. Within a workflow, artifacts can be downloaded by other jobs within the same workflow run, so that is how we handle this, using the upload-artifact action.

This action is convenient, in that we can point it at a directory, and it will upload the whole tree. As described in the readme though, each file is uploaded individually, and for large numbers of files, this can result in hitting API/concurrency limits, making uploads slow and/or error-prone, possibly untenable. Their suggested solution is to tar the files beforehand, and upload a single file.

I anticipate that we may hit such limits in larger collections, or in use cases for the docs build process that build a docsite for multiple collections, so I would like to get ahead of that and implement pre-tarring as an option.

Note: compression is handled by the uploading process, so it's not necessary to also compress the files.

The caveats of doing this are that if someone wants to download the artifact from the build, it will be a .tar.gz (the downloads are only available this way despite each file being uploaded individually), and inside will be another tar that the end user will have to expand. Additionally, any action or workflow we have that uses the artifact (for example to publish it) will also need to be aware of this so that it can properly untar the downloaded artifact.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions