Read performance with/without missing

(Motivated from https://github.com/Alexander-Barth/NCDatasets.jl/issues/227#issuecomment-2323915997)

Creating a fake dataset with some compression like

```julia
using NCDatasets
A = rand(Float32, 5000, 5000)    # 100MB uncompressed
sort!(vec(A))                    # make it somewhat compressible

ds = NCDataset("test.nc", "c")
defVar(ds, "data", A, ("x", "y"), attrib = Dict("_FillValue"=>NaN32), deflatelevel=3)
close(ds)
```

This file is now 24.8MB on disk so ~4x compression factor. Now benchmark the read + decompression

```julia
using NCDatasets, BenchmarkTools
ds = NCDataset("test.nc")
```

1. Read and uncompress raw data, ignore any missing with `.var`

```julia
julia> @btime A = $ds["data"].var[:];
  533.270 ms (54 allocations: 95.37 MiB)
```

So almost 200MB/s and it only allocates that 100MB that the uncompressed array requires.

2. Read and uncompress as by default, returning `Matrix{Union{Missing, Float32}}`

```julia
julia> @btime A = $ds["data"][:];
  641.664 ms (55 allocations: 214.58 MiB)
```

only bit slower but requires more than twice the memory

3. Read and uncompress via `nomissing(::CFVariable)`

```julia
julia> @btime A = nomissing($ds["data"])
```

Takes absolutely forever, don't do this. See https://github.com/Alexander-Barth/NCDatasets.jl/issues/227#issuecomment-2323915997 -- maybe add a warning or remove the `nomissing(::CFVariable)` method?

4. Read and uncompress via `nomissing(::Array)`

```julia
julia> @btime A = nomissing($ds["data"][:]);
  712.682 ms (57 allocations: 309.95 MiB)
```

Bit slower again and 3x the allocations.

5. Read and uncompress via `Array(::CFVariable)`

```julia
julia> @btime A = Array($ds["data"])
  496.846 ms (64 allocations: 214.58 MiB)
```

Same as (2) but faster?

6. Read and uncompress via `Array{T}(::CFVariable)` but providing target type `T`

```julia
julia> @btime A = Array{Float32}($ds["data"])
```

Don't do this, also takes forever, probably same as (3).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Read performance with/without missing #264

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Read performance with/without missing #264

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions