Skip to content

Commit cc7f631

Browse files
TensorFlow Datasets Teamcopybara-github
authored andcommitted
Update TFDS to 4.5.0
This is the last version of TFDS supporting 3.6. Future version will use 3.7 * Better split API: * Splits can be selected using shards: `split='train[3shard]'` * Underscore supported in numbers for better readability: `split='train[:500_000]'` * Select the union of all splits with `split='all'` * [`tfds.even_splits`](https://www.tensorflow.org/datasets/splits#tfdseven_splits_multi-host_training) is more precise and flexible: * Return splits exactly of the same size when passed `tfds.even_splits('train', n=3, drop_remainder=True)` * Works on subsplits `tfds.even_splits('train[:75%]', n=3)` or even nested * Can be composed with other splits: `tfds.even_splits('train', n=3)[0] + 'test'` * FeatureConnectors: * Faster dataset generation (using tfrecords) * Features now have `serialize_example` / `deserialize_example` methods to encode/decode example to proto: example_bytes = features.serialize_example(example_data)` * `Audio` now supports `encoding='zlib'` for better compression * Features specs exposed in proto for better compatibility with other languages * Better testing: * Mock dataset now supports nested datasets * Customize the number of sub examples * Documentation update: * Community datasets: https://www.tensorflow.org/datasets/community_catalog/overview * New [guide on TFDS and determinism](https://www.tensorflow.org/datasets/determinism) * [RLDS](https://github.com/google-research/rlds): * Nested datasets features are supported * New datasets: Robomimic, D4RL Ant Maze, RLU Real World RL, and RLU Atari with ordered episodes * Misc: * Create beam pipeline using TFDS as input with [tfds.beam.ReadFromTFDS](https://www.tensorflow.org/datasets/api_docs/python/tfds/beam/ReadFromTFDS) * Support setting the file formats in `tfds build --file_format=tfrecord` * Typing annotations exposed in `tfds.typing` * `tfds.ReadConfig` has a new `assert_cardinality=False` to disable cardinality * Add a tfds.display_progress_bar(True) for functional control * Support for huge number of shards (>99999) * DatasetInfo exposes `.release_notes` And of course, new datasets, bug fixes,... Thank you to all our contributors for improving TFDS! PiperOrigin-RevId: 424105312
1 parent b9128ca commit cc7f631

File tree

3 files changed

+837
-250
lines changed

3 files changed

+837
-250
lines changed

0 commit comments

Comments
 (0)