-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit cc7f631
Update TFDS to 4.5.0
This is the last version of TFDS supporting 3.6. Future version will use 3.7
* Better split API:
* Splits can be selected using shards: `split='train[3shard]'`
* Underscore supported in numbers for better readability: `split='train[:500_000]'`
* Select the union of all splits with `split='all'`
* [`tfds.even_splits`](https://www.tensorflow.org/datasets/splits#tfdseven_splits_multi-host_training) is more precise and flexible:
* Return splits exactly of the same size when passed `tfds.even_splits('train', n=3, drop_remainder=True)`
* Works on subsplits `tfds.even_splits('train[:75%]', n=3)` or even nested
* Can be composed with other splits: `tfds.even_splits('train', n=3)[0] + 'test'`
* FeatureConnectors:
* Faster dataset generation (using tfrecords)
* Features now have `serialize_example` / `deserialize_example` methods to encode/decode example to proto: example_bytes = features.serialize_example(example_data)`
* `Audio` now supports `encoding='zlib'` for better compression
* Features specs exposed in proto for better compatibility with other languages
* Better testing:
* Mock dataset now supports nested datasets
* Customize the number of sub examples
* Documentation update:
* Community datasets: https://www.tensorflow.org/datasets/community_catalog/overview
* New [guide on TFDS and determinism](https://www.tensorflow.org/datasets/determinism)
* [RLDS](https://github.com/google-research/rlds):
* Nested datasets features are supported
* New datasets: Robomimic, D4RL Ant Maze, RLU Real World RL, and RLU Atari with ordered episodes
* Misc:
* Create beam pipeline using TFDS as input with [tfds.beam.ReadFromTFDS](https://www.tensorflow.org/datasets/api_docs/python/tfds/beam/ReadFromTFDS)
* Support setting the file formats in `tfds build --file_format=tfrecord`
* Typing annotations exposed in `tfds.typing`
* `tfds.ReadConfig` has a new `assert_cardinality=False` to disable cardinality
* Add a tfds.display_progress_bar(True) for functional control
* Support for huge number of shards (>99999)
* DatasetInfo exposes `.release_notes`
And of course, new datasets, bug fixes,...
Thank you to all our contributors for improving TFDS!
PiperOrigin-RevId: 4241053121 parent b9128ca commit cc7f631Copy full SHA for cc7f631
File tree
Expand file treeCollapse file tree
3 files changed
+837
-250
lines changedFilter options
- tensorflow_datasets
Expand file treeCollapse file tree
3 files changed
+837
-250
lines changed
0 commit comments