Skip to content

Conversation

tw4l
Copy link
Member

@tw4l tw4l commented Sep 23, 2025

Fixes #2648

Replaces #2805

This PR introduces a preferSingleWACZ query parameter to the /all-crawls/<crawl_id>/download and /crawls/<crawl_id>/download endpoints. When set to true, these endpoints will only create multi-WACZs when a crawl has more than one WACZ file, and otherwise will stream the original crawl WACZ.

This flag is not enabled by default to prevent introducing breaking changes to the API, but the frontend is updated to use it in all places where it seemed appropriate.

A new backend test is also added to account for the change.

Comments and suggestions on other ways to implement this behavior are very welcome!

For crawls and all-crawls endpoints, this commit adds an optional
preferSingleWACZ query param which will download an archived item
as a single WACZ file when possible instead of repackaging single
WACZs into multiWACZs.
@tw4l tw4l requested review from ikreymer and SuaYoo September 23, 2025 20:03
@tw4l tw4l changed the title Downlaod archived items as single WACZ file when possible Download archived items as single WACZ file when possible via new download endpoint query parameter Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Change]: If only one WACZ file is available to download from archived items, provide a regular WACZ
1 participant