Skip to content

Read performance with/without missing #264

Open
@milankl

Description

@milankl

(Motivated from #227 (comment))

Creating a fake dataset with some compression like

using NCDatasets
A = rand(Float32, 5000, 5000)    # 100MB uncompressed
sort!(vec(A))                    # make it somewhat compressible

ds = NCDataset("test.nc", "c")
defVar(ds, "data", A, ("x", "y"), attrib = Dict("_FillValue"=>NaN32), deflatelevel=3)
close(ds)

This file is now 24.8MB on disk so ~4x compression factor. Now benchmark the read + decompression

using NCDatasets, BenchmarkTools
ds = NCDataset("test.nc")
  1. Read and uncompress raw data, ignore any missing with .var
julia> @btime A = $ds["data"].var[:];
  533.270 ms (54 allocations: 95.37 MiB)

So almost 200MB/s and it only allocates that 100MB that the uncompressed array requires.

  1. Read and uncompress as by default, returning Matrix{Union{Missing, Float32}}
julia> @btime A = $ds["data"][:];
  641.664 ms (55 allocations: 214.58 MiB)

only bit slower but requires more than twice the memory

  1. Read and uncompress via nomissing(::CFVariable)
julia> @btime A = nomissing($ds["data"])

Takes absolutely forever, don't do this. See #227 (comment) -- maybe add a warning or remove the nomissing(::CFVariable) method?

  1. Read and uncompress via nomissing(::Array)
julia> @btime A = nomissing($ds["data"][:]);
  712.682 ms (57 allocations: 309.95 MiB)

Bit slower again and 3x the allocations.

  1. Read and uncompress via Array(::CFVariable)
julia> @btime A = Array($ds["data"])
  496.846 ms (64 allocations: 214.58 MiB)

Same as (2) but faster?

  1. Read and uncompress via Array{T}(::CFVariable) but providing target type T
julia> @btime A = Array{Float32}($ds["data"])

Don't do this, also takes forever, probably same as (3).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions