Skip to content

Performance considerations for CPU-side Min/Max computation (StatsLevel=1) #4502

@franzpoeschel

Description

@franzpoeschel

Why is this feature important?
Computation of Min/Max statistics on the CPU is considerably slower than on the GPU. Data production from the CPU is relevant since we need to transform our data before output and generally do not have GPU memory to spare for this.

Timings from the IO plugin of PIConGPU when creating a 34Gb checkpoint (single GPU run, no MPI parallelization in this test, time includes data preparation and output):

  • BP4, StatsLevel=0: 1min 6sec 208msec
  • BP4, StatsLevel=1, Threads=1: 1min 21sec 25msec
  • BP4, StatsLevel=1, Threads=8: 1min 9sec 881msec
  • BP5, StatsLevel=0: 1min 0sec 324msec
  • BP5, StatsLevel=1: 1min 15sec 727msec

Detailed statistics (profiling.json) at the end.

Describe the solution you'd like and potential required effort

There were two suggestions on how to deal with this:

  • By @pnorbert: Implement threaded CPU Min/Max computation in BP5 (feature already exists in BP4), this would enable PIConGPU users (and probably others) to have statistics with a reasonable overhead
  • Some weeks ago, I talked to @anagainaru about an option to compute statistics only for GPU variables, i.e. something like StatsLevel=GpuOnly. This would allow us to specify statistics more selectively to only include low-overhead computations by default.
    Possible downsides: Users might be confused why some variables have statistics and others not.

Describe alternatives you've considered and potential required effort
We currently deactivate statistics by default in openPMD and users need to opt-in to using them.

What is the potential impact of this feature in the community?
PIConGPU users can activate statistics for ADIOS2 without compromising too much performance.

Additional context
PIConGPU uses the Span API for Put() operations.
Test cased used a KelvinHelmholtz simulation with launch parameters picongpu -d 1 1 1 -g 192 512 192 -s 100 --openPMD.ext bp4 --openPMD.period 100:100 --openPMD.json '{"adios2":{"engine":{"parameters":{"InitialBufferSize": "35Gb", "StatsLevel": 0, "Threads": 8}}}}'

Detailed performance statistics:

{
  "bp4_no_stats": [
    {
      "rank": 0,
      "start": "Fri_Apr_04_16:48:52_2025",
      "threads": 8,
      "bytes": 36223373054,
      "mkdir_mus": 74,
      "aggregation_mus": 0,
      "meta_sort_merge_mus": 226,
      "minmax_mus": 0,
      "memcpy_mus": 0,
      "buffering_mus": 14643165,
      "transport_0": {
        "type": "File_POSIX",
        "close_mus": 5,
        "write_mus": 22837467,
        "open_mus": 69
      },
      "transport_1": {
        "type": "File_POSIX",
        "close_mus": 0,
        "write_mus": 46,
        "open_mus": 111
      }
    }
  ],
"bp4_with_stats_serial": [
    {
      "rank": 0,
      "start": "Fri_Apr_04_16:40:50_2025",
      "threads": 1,
      "bytes": 36223379678,
      "mkdir_mus": 76,
      "aggregation_mus": 0,
      "meta_sort_merge_mus": 217,
      "minmax_mus": 15412132,
      "memcpy_mus": 0,
      "buffering_mus": 14641027,
      "transport_0": {
        "type": "File_POSIX",
        "close_mus": 5,
        "write_mus": 23368561,
        "open_mus": 70
      },
      "transport_1": {
        "type": "File_POSIX",
        "close_mus": 0,
        "write_mus": 28,
        "open_mus": 104
      }
    }
  ],
  "bp4_with_stats_8_threads": [
    {
      "rank": 0,
      "start": "Fri_Apr_04_16:45:10_2025",
      "threads": 8,
      "bytes": 36223381318,
      "mkdir_mus": 78,
      "aggregation_mus": 0,
      "meta_sort_merge_mus": 242,
      "minmax_mus": 3244702,
      "memcpy_mus": 0,
      "buffering_mus": 14659454,
      "transport_0": {
        "type": "File_POSIX",
        "close_mus": 5,
        "write_mus": 23244426,
        "open_mus": 69
      },
      "transport_1": {
        "type": "File_POSIX",
        "close_mus": 0,
        "write_mus": 29,
        "open_mus": 97
      }
    }
  ],
  "bp5_no_stats": [
    {
      "rank": 0,
      "start": "Fri_Apr_04_16:19:45_2025",
      "PDW_mus": 22933110,
      "PDW": {
        "mus": 22933110,
        "nCalls": 14
      },
      "ES_mus": 395,
      "ES": {
        "mus": 395,
        "nCalls": 1
      },
      "PP_mus": 1,
      "PP": {
        "mus": 1,
        "nCalls": 3
      },
      "ES_meta1_mus": 18,
      "ES_meta1": {
        "mus": 18,
        "nCalls": 1
      },
      "ES_meta2_mus": 75,
      "ES_meta2": {
        "mus": 75,
        "nCalls": 1
      },
      "ES_close_mus": 243,
      "ES_close": {
        "mus": 243,
        "nCalls": 1
      },
      "ES_AWD_mus": 56,
      "ES_AWD": {
        "mus": 56,
        "nCalls": 1
      },
      "databytes": 0,
      "metadatabytes": 0,
      "metametadatabytes": 0,
      "transport_0": {
        "type": "File_POSIX",
        "wbytes": 36223352672,
        "close": {
          "mus": 4,
          "nCalls": 1
        },
        "write": {
          "mus": 21495245,
          "nCalls": 39
        },
        "open": {
          "mus": 82,
          "nCalls": 1
        }
      },
      "transport_1": {
        "type": "File_POSIX",
        "wbytes": 24736,
        "close": {
          "mus": 1,
          "nCalls": 1
        },
        "write": {
          "mus": 24,
          "nCalls": 5
        },
        "open": {
          "mus": 90,
          "nCalls": 1
        }
      }
    }
  ],
  "bp5_with_stats": [
    {
      "rank": 0,
      "start": "Fri_Apr_04_16:24:24_2025",
      "PDW_mus": 38394962,
      "PDW": {
        "mus": 38394962,
        "nCalls": 14
      },
      "ES_mus": 402,
      "ES": {
        "mus": 402,
        "nCalls": 1
      },
      "PP_mus": 0,
      "PP": {
        "mus": 0,
        "nCalls": 3
      },
      "ES_meta1_mus": 20,
      "ES_meta1": {
        "mus": 20,
        "nCalls": 1
      },
      "ES_meta2_mus": 77,
      "ES_meta2": {
        "mus": 77,
        "nCalls": 1
      },
      "ES_close_mus": 246,
      "ES_close": {
        "mus": 246,
        "nCalls": 1
      },
      "ES_AWD_mus": 56,
      "ES_AWD": {
        "mus": 56,
        "nCalls": 1
      },
      "databytes": 0,
      "metadatabytes": 0,
      "metametadatabytes": 0,
      "transport_0": {
        "type": "File_POSIX",
        "wbytes": 36223352672,
        "close": {
          "mus": 5,
          "nCalls": 1
        },
        "write": {
          "mus": 20963340,
          "nCalls": 39
        },
        "open": {
          "mus": 63,
          "nCalls": 1
        }
      },
      "transport_1": {
        "type": "File_POSIX",
        "wbytes": 25768,
        "close": {
          "mus": 1,
          "nCalls": 1
        },
        "write": {
          "mus": 25,
          "nCalls": 5
        },
        "open": {
          "mus": 87,
          "nCalls": 1
        }
      }
    }
  ]
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions