-
Notifications
You must be signed in to change notification settings - Fork 134
Description
Why is this feature important?
Computation of Min/Max statistics on the CPU is considerably slower than on the GPU. Data production from the CPU is relevant since we need to transform our data before output and generally do not have GPU memory to spare for this.
Timings from the IO plugin of PIConGPU when creating a 34Gb checkpoint (single GPU run, no MPI parallelization in this test, time includes data preparation and output):
- BP4, StatsLevel=0: 1min 6sec 208msec
- BP4, StatsLevel=1, Threads=1: 1min 21sec 25msec
- BP4, StatsLevel=1, Threads=8: 1min 9sec 881msec
- BP5, StatsLevel=0: 1min 0sec 324msec
- BP5, StatsLevel=1: 1min 15sec 727msec
Detailed statistics (profiling.json
) at the end.
Describe the solution you'd like and potential required effort
There were two suggestions on how to deal with this:
- By @pnorbert: Implement threaded CPU Min/Max computation in BP5 (feature already exists in BP4), this would enable PIConGPU users (and probably others) to have statistics with a reasonable overhead
- Some weeks ago, I talked to @anagainaru about an option to compute statistics only for GPU variables, i.e. something like
StatsLevel=GpuOnly
. This would allow us to specify statistics more selectively to only include low-overhead computations by default.
Possible downsides: Users might be confused why some variables have statistics and others not.
Describe alternatives you've considered and potential required effort
We currently deactivate statistics by default in openPMD and users need to opt-in to using them.
What is the potential impact of this feature in the community?
PIConGPU users can activate statistics for ADIOS2 without compromising too much performance.
Additional context
PIConGPU uses the Span API for Put() operations.
Test cased used a KelvinHelmholtz simulation with launch parameters picongpu -d 1 1 1 -g 192 512 192 -s 100 --openPMD.ext bp4 --openPMD.period 100:100 --openPMD.json '{"adios2":{"engine":{"parameters":{"InitialBufferSize": "35Gb", "StatsLevel": 0, "Threads": 8}}}}'
Detailed performance statistics:
{
"bp4_no_stats": [
{
"rank": 0,
"start": "Fri_Apr_04_16:48:52_2025",
"threads": 8,
"bytes": 36223373054,
"mkdir_mus": 74,
"aggregation_mus": 0,
"meta_sort_merge_mus": 226,
"minmax_mus": 0,
"memcpy_mus": 0,
"buffering_mus": 14643165,
"transport_0": {
"type": "File_POSIX",
"close_mus": 5,
"write_mus": 22837467,
"open_mus": 69
},
"transport_1": {
"type": "File_POSIX",
"close_mus": 0,
"write_mus": 46,
"open_mus": 111
}
}
],
"bp4_with_stats_serial": [
{
"rank": 0,
"start": "Fri_Apr_04_16:40:50_2025",
"threads": 1,
"bytes": 36223379678,
"mkdir_mus": 76,
"aggregation_mus": 0,
"meta_sort_merge_mus": 217,
"minmax_mus": 15412132,
"memcpy_mus": 0,
"buffering_mus": 14641027,
"transport_0": {
"type": "File_POSIX",
"close_mus": 5,
"write_mus": 23368561,
"open_mus": 70
},
"transport_1": {
"type": "File_POSIX",
"close_mus": 0,
"write_mus": 28,
"open_mus": 104
}
}
],
"bp4_with_stats_8_threads": [
{
"rank": 0,
"start": "Fri_Apr_04_16:45:10_2025",
"threads": 8,
"bytes": 36223381318,
"mkdir_mus": 78,
"aggregation_mus": 0,
"meta_sort_merge_mus": 242,
"minmax_mus": 3244702,
"memcpy_mus": 0,
"buffering_mus": 14659454,
"transport_0": {
"type": "File_POSIX",
"close_mus": 5,
"write_mus": 23244426,
"open_mus": 69
},
"transport_1": {
"type": "File_POSIX",
"close_mus": 0,
"write_mus": 29,
"open_mus": 97
}
}
],
"bp5_no_stats": [
{
"rank": 0,
"start": "Fri_Apr_04_16:19:45_2025",
"PDW_mus": 22933110,
"PDW": {
"mus": 22933110,
"nCalls": 14
},
"ES_mus": 395,
"ES": {
"mus": 395,
"nCalls": 1
},
"PP_mus": 1,
"PP": {
"mus": 1,
"nCalls": 3
},
"ES_meta1_mus": 18,
"ES_meta1": {
"mus": 18,
"nCalls": 1
},
"ES_meta2_mus": 75,
"ES_meta2": {
"mus": 75,
"nCalls": 1
},
"ES_close_mus": 243,
"ES_close": {
"mus": 243,
"nCalls": 1
},
"ES_AWD_mus": 56,
"ES_AWD": {
"mus": 56,
"nCalls": 1
},
"databytes": 0,
"metadatabytes": 0,
"metametadatabytes": 0,
"transport_0": {
"type": "File_POSIX",
"wbytes": 36223352672,
"close": {
"mus": 4,
"nCalls": 1
},
"write": {
"mus": 21495245,
"nCalls": 39
},
"open": {
"mus": 82,
"nCalls": 1
}
},
"transport_1": {
"type": "File_POSIX",
"wbytes": 24736,
"close": {
"mus": 1,
"nCalls": 1
},
"write": {
"mus": 24,
"nCalls": 5
},
"open": {
"mus": 90,
"nCalls": 1
}
}
}
],
"bp5_with_stats": [
{
"rank": 0,
"start": "Fri_Apr_04_16:24:24_2025",
"PDW_mus": 38394962,
"PDW": {
"mus": 38394962,
"nCalls": 14
},
"ES_mus": 402,
"ES": {
"mus": 402,
"nCalls": 1
},
"PP_mus": 0,
"PP": {
"mus": 0,
"nCalls": 3
},
"ES_meta1_mus": 20,
"ES_meta1": {
"mus": 20,
"nCalls": 1
},
"ES_meta2_mus": 77,
"ES_meta2": {
"mus": 77,
"nCalls": 1
},
"ES_close_mus": 246,
"ES_close": {
"mus": 246,
"nCalls": 1
},
"ES_AWD_mus": 56,
"ES_AWD": {
"mus": 56,
"nCalls": 1
},
"databytes": 0,
"metadatabytes": 0,
"metametadatabytes": 0,
"transport_0": {
"type": "File_POSIX",
"wbytes": 36223352672,
"close": {
"mus": 5,
"nCalls": 1
},
"write": {
"mus": 20963340,
"nCalls": 39
},
"open": {
"mus": 63,
"nCalls": 1
}
},
"transport_1": {
"type": "File_POSIX",
"wbytes": 25768,
"close": {
"mus": 1,
"nCalls": 1
},
"write": {
"mus": 25,
"nCalls": 5
},
"open": {
"mus": 87,
"nCalls": 1
}
}
}
]
}