Skip to content

Improve file delete performance for expire_snapshots #26230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tbaeg
Copy link
Member

@tbaeg tbaeg commented Jul 17, 2025

Description

Improve file delete performance for expire_snapshots.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Iceberg
* Improve performance of expire_snapshots procedure. ({issue}`26230`)

@cla-bot cla-bot bot added the cla-signed label Jul 17, 2025
@github-actions github-actions bot added the iceberg Iceberg connector label Jul 17, 2025
@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch from 9e77d23 to 9ffdd87 Compare July 17, 2025 16:42
@tbaeg tbaeg requested a review from raunaqmorarka July 18, 2025 20:17
@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch 2 times, most recently from 896fd0d to e0cd470 Compare July 19, 2025 14:16
@raunaqmorarka raunaqmorarka requested review from sopel39 and Copilot July 19, 2025 15:57
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the performance of the expire_snapshots procedure in Iceberg by introducing parallel file deletion capabilities and delegating file cleanup operations to Iceberg's native bulk delete functionality.

Key changes include:

  • Introduces a new configurable thread pool for file deletion operations
  • Replaces custom file deletion logic with Iceberg's native bulk delete operations
  • Updates thread pool sizing defaults with reasonable upper bounds

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
ForIcebergFileDelete.java Adds new binding annotation for file delete executor dependency injection
IcebergConfig.java Adds configuration for file delete thread pool size
IcebergExecutorModule.java Creates new executor service for file deletion operations
IcebergMetadata.java Replaces custom file deletion with Iceberg's native bulk delete operations
IcebergMetadataFactory.java Integrates file delete executor into metadata factory
TestIcebergConfig.java Updates tests for new configuration and thread pool defaults
BaseIcebergMinioConnectorSmokeTest.java Updates test expectations for bulk delete behavior
Various test files Adds file delete executor parameter to test constructors

@raunaqmorarka
Copy link
Member

@tbaeg are you able to share any benchmark results from trying this change ?

@tbaeg
Copy link
Member Author

tbaeg commented Jul 20, 2025

@tbaeg are you able to share any benchmark results from trying this change ?

At the moment, I do not have any benchmarks to share. For our testing environment I'd first need to back port the change. I can try to get something this week.

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for missing documentation.

@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch from e0cd470 to c0402c7 Compare July 21, 2025 13:01
@github-actions github-actions bot added the docs label Jul 21, 2025
@findinpath
Copy link
Contributor

findinpath commented Jul 21, 2025

@tbaeg if it is not much trouble, could you add in the description of the PR an informal benchmark with and without parallelism for expire_snapshot.
I'd be interested to see the order of acceleration in terms of time we can achieve when using parallelism.

@@ -228,6 +228,9 @@ implementation is used:
- Number of threads used for retrieving metadata. Currently, only table loading
is parallelized.
- `8`
* - `iceberg.file-delete-threads`
- Number of threads to use for deleting files when running `expire_snapshots` procedure.
- Double the number of processors on the coordinator node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misc question: Do we have any way to restrict certain settings only for the coordinator ?

Outside of the scope of the current PR

@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch from c0402c7 to 4dc4d6c Compare July 21, 2025 17:43
// ForwardingFileIo handles bulk operations so no separate function implementation is needed
table.expireSnapshots()
.expireOlderThan(session.getStart().toEpochMilli() - retention.toMillis())
.executeDeleteWith(fileDeleteExecutor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder whether we could ensure somehow that there are no regressions in terms of sending in bulk requests to aws s3

Copy link
Member Author

@tbaeg tbaeg Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming something needs to be deleted in each category (i.e. - data/manifest/manifest list/statistics) you technically have more minimum requests. But the improvements should really shine as more files need to be deleted. So in a sense, this is a "regression".. although I'd categorize this as more of a trade off for throughput.

One of the aspects I imagine could become an issue, is excessive 500 responses from S3, since each element in a batch delete requests technically translates to a request towards the rate limiting quota. This would require testing at a certain scale to verify.

Copy link
Member Author

@tbaeg tbaeg Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I was wrong about the parallelization in iceberg. Looking at the code more closely, it only parallelizes when the FileIO implementation does NOT support bulk operations. So, there is effectively no performance change.

There would need to be a change in iceberg to support parallelized bulk operations, or the ForwardingFileIo would need to handle the parallelization.

Old:
10 per deletes per request (manually set the delete batch size) - 412 total requests

2025-07-21T13:00:29.337-0600	INFO	ForkJoinPool-1-worker-1	stdout	5024 metadata files found
2025-07-21T13:00:29.338-0600	INFO	ForkJoinPool-1-worker-1	stdout	1002 data files found
2025-07-21T13:00:32.795-0600	INFO	ForkJoinPool-1-worker-1	stdout	Time for expire_snapshots time: 3457
2025-07-21T13:00:32.845-0600	INFO	ForkJoinPool-1-worker-1	stdout	2011 metadata files found after expire_snapshots
2025-07-21T13:00:32.845-0600	INFO	ForkJoinPool-1-worker-1	stdout	2 data files found after expire_snapshots

New:
10 per deletes per request (manually set the delete batch size) - 413 total requests

2025-07-21T12:53:18.326-0600	INFO	ForkJoinPool-1-worker-1	stdout	5024 metadata files found
2025-07-21T12:53:18.326-0600	INFO	ForkJoinPool-1-worker-1	stdout	1002 data files found
2025-07-21T12:53:21.775-0600	INFO	ForkJoinPool-1-worker-1	stdout	Time for expire_snapshots time: 3448
2025-07-21T12:53:21.819-0600	INFO	ForkJoinPool-1-worker-1	stdout	2011 metadata files found after expire_snapshots
2025-07-21T12:53:21.819-0600	INFO	ForkJoinPool-1-worker-1	stdout	2 data files found after expire_snapshots

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our case SupportsBulkOperations is always implemented because of using ForwardingFileIo.
So we have to do the parallelism in io.trino.plugin.iceberg.fileio.ForwardingFileIo#deleteFiles and skip using executeDeleteWith.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another crude benchmark.

2025-07-22T12:50:13.288-0600	INFO	ForkJoinPool-1-worker-1	stdout	5024 metadata files found
2025-07-22T12:50:13.289-0600	INFO	ForkJoinPool-1-worker-1	stdout	1002 data files found
2025-07-22T12:50:15.741-0600	INFO	ForkJoinPool-1-worker-1	stdout	Time for expire_snapshots time: 2452
2025-07-22T12:50:15.797-0600	INFO	ForkJoinPool-1-worker-1	stdout	2011 metadata files found after expire_snapshots
2025-07-22T12:50:15.797-0600	INFO	ForkJoinPool-1-worker-1	stdout	2 data files found after expire_snapshots

@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch from 4dc4d6c to 5bc97de Compare July 21, 2025 17:53
@tbaeg
Copy link
Member Author

tbaeg commented Jul 21, 2025

@tbaeg if it is not much trouble, could you add an informal benchmark with and without parallelism for expire_snapshot. I'd be interested to see the order of acceleration in terms of time we can achieve when using parallelism.

I was trying to get something informal together on a test cluster with more substantial data, but it looks like it might not be possible. I can write a simple test and profile it locally, but not sure how worthwhile that is.

@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch 4 times, most recently from dc1c0d2 to 8845baa Compare July 22, 2025 16:44
@tbaeg
Copy link
Member Author

tbaeg commented Jul 22, 2025

Did a POC that parallelizes ForwardingFileIo.deleteFiles and introduced a new FileIoFactory so I didn't need to inject the ExecutorService every time a new ForwardingFileIo object was created. If you could take a look again, I'd appreciate it.

@raunaqmorarka @findinpath @chenjian2664 @ebyhr

Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

import com.google.inject.Module;
import com.google.inject.Scopes;

public class FileIoModule
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a new Module now ? Doesn't seem justified with just one binding

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly because the lakehouse connector was using it. If I place it in the IcebergModule it doesn't work. Open to suggestions on where to place it.

@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch 2 times, most recently from 8685530 to d39a0a4 Compare July 23, 2025 14:43
- Add ForwardingFileIoFactory so delete specific ExecutorService is not injected in multiple places
- Remove direct usage of ForwardingFileIo in favor of a FileIoFactory implementation
- Add parallelized deletes in ForwardingFileIo
@tbaeg tbaeg force-pushed the expire-snapshots-parallel branch from d39a0a4 to 1be0c5a Compare July 23, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants