-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Allow using temporary staging path in Iceberg for writing sorted files #26172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
606ea70
to
9280531
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds support for a configurable temporary staging directory when writing sorted files in Iceberg, improving performance on object stores.
- Introduces
temporaryStagingDirectoryEnabled
andtemporaryStagingDirectoryPath
inIcebergConfig
and exposes them as session properties. - Updates
IcebergSessionProperties
andIcebergPageSink
to select between the staging directory and default temp path per session. - Expands tests and documentation to cover the new staging directory behavior.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergConfig.java | Add new config properties and their setters/getters |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSessionProperties.java | Register new session properties and provide accessors |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java | Implement logic to choose staging directory or default temp path |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSinkProvider.java | Remove unused SortingFileWriterConfig injection |
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/*.java | Update tests to include SortingFileWriterConfig and new session props |
docs/src/main/sphinx/connector/iceberg.md | Document the new temporary staging directory properties |
Comments suppressed due to low confidence (2)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergConfig.java:90
- [nitpick] The field name includes the 'is' prefix which duplicates the boolean getter style. Consider renaming it to
temporaryStagingDirectoryEnabled
to avoid confusion between field and getter naming.
private boolean isTemporaryStagingDirectoryEnabled;
docs/src/main/sphinx/connector/iceberg.md:220
- [nitpick] The list formatting uses
* -
which differs from surrounding items. Align these entries with the existing Sphinx list style for consistency and readability.
* - `iceberg.temporary-staging-directory-enabled`
if (location.scheme().isPresent()) { | ||
return location; | ||
} | ||
return Location.of("file:///" + location.path()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Concatenating the scheme with the path may produce extra slashes (e.g., file:////tmp/...
). Consider using a URI builder or checking if the path already starts with a '/' to normalize to a single 'file://' prefix.
return Location.of("file:///" + location.path()); | |
String normalizedPath = location.path().startsWith("/") ? "file://" + location.path() : "file:///" + location.path(); | |
return Location.of(normalizedPath); |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is identical to the function in HiveWriterFactory:
trino/plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveWriterFactory.java
Lines 685 to 691 in 5b81408
static Location setSchemeToFileIfAbsent(Location location) | |
{ | |
if (location.scheme().isPresent()) { | |
return location; | |
} | |
return Location.of("file:///" + location.path()); | |
} |
9280531
to
1c37cdf
Compare
1c37cdf
to
80d1253
Compare
Description
This change adds support for using temporary staging directory during write operations involving sorted tables. Writes to sorted tables will utilize this path for staging temporary files during sorting operation. When disabled, the target storage will be used for staging while writing sorted tables which can be inefficient when writing to object stores like S3.
Adding session properties for these, as well as for the properties that control the number and target size of buffer files. This is useful because files of different sizes/number can be configured per query, so they can be tuned for the type of data in a given table.
Additional context and related issues
Fixes #24376
Similar to functionality added for Hive in #3434
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: