Open
Description
(Motivated from #227 (comment))
Creating a fake dataset with some compression like
using NCDatasets
A = rand(Float32, 5000, 5000) # 100MB uncompressed
sort!(vec(A)) # make it somewhat compressible
ds = NCDataset("test.nc", "c")
defVar(ds, "data", A, ("x", "y"), attrib = Dict("_FillValue"=>NaN32), deflatelevel=3)
close(ds)
This file is now 24.8MB on disk so ~4x compression factor. Now benchmark the read + decompression
using NCDatasets, BenchmarkTools
ds = NCDataset("test.nc")
- Read and uncompress raw data, ignore any missing with
.var
julia> @btime A = $ds["data"].var[:];
533.270 ms (54 allocations: 95.37 MiB)
So almost 200MB/s and it only allocates that 100MB that the uncompressed array requires.
- Read and uncompress as by default, returning
Matrix{Union{Missing, Float32}}
julia> @btime A = $ds["data"][:];
641.664 ms (55 allocations: 214.58 MiB)
only bit slower but requires more than twice the memory
- Read and uncompress via
nomissing(::CFVariable)
julia> @btime A = nomissing($ds["data"])
Takes absolutely forever, don't do this. See #227 (comment) -- maybe add a warning or remove the nomissing(::CFVariable)
method?
- Read and uncompress via
nomissing(::Array)
julia> @btime A = nomissing($ds["data"][:]);
712.682 ms (57 allocations: 309.95 MiB)
Bit slower again and 3x the allocations.
- Read and uncompress via
Array(::CFVariable)
julia> @btime A = Array($ds["data"])
496.846 ms (64 allocations: 214.58 MiB)
Same as (2) but faster?
- Read and uncompress via
Array{T}(::CFVariable)
but providing target typeT
julia> @btime A = Array{Float32}($ds["data"])
Don't do this, also takes forever, probably same as (3).
Metadata
Metadata
Assignees
Labels
No labels