Skip to content

Commit fd599cc

Browse files
authored
add ldms (via ovis-hpc) (#46)
* add ldms (via ovis-hpc) * wrong url and remove non used prefix * default completions should be 0 (unset) Signed-off-by: vsoch <vsoch@users.noreply.github.com>
1 parent 1a57bc0 commit fd599cc

File tree

10 files changed

+980
-20
lines changed

10 files changed

+980
-20
lines changed

.github/workflows/main.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ jobs:
6969
test: [["perf-hello-world", "ghcr.io/converged-computing/metric-sysstat:latest", 60], # performance test
7070
["io-host-volume", "ghcr.io/converged-computing/metric-sysstat:latest", 60], # storage test
7171
["io-fio", "ghcr.io/converged-computing/metric-fio:latest", 120], # storage test
72+
["app-ldms", "ghcr.io/converged-computing/metric-ovis-hpc:latest", 120], # standalone app test
7273
["app-amg", "ghcr.io/converged-computing/metric-amg:latest", 120], # standalone app test
7374
["app-kripke", "ghcr.io/converged-computing/metric-kripke:latest", 120], # standalone app test
7475
["app-pennant", "ghcr.io/converged-computing/metric-pennant:latest", 120], # standalone app test

docs/_static/data/metrics.json

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,14 @@
2323
"image": "ghcr.io/converged-computing/metric-lammps:latest",
2424
"url": "https://www.lammps.org/"
2525
},
26+
{
27+
"name": "app-ldms",
28+
"description": "provides LDMS, a low-overhead, low-latency framework for collecting, transferring, and storing metric data on a large distributed computer system.",
29+
"family": "performance",
30+
"type": "application",
31+
"image": "ghcr.io/converged-computing/metric-ovis-hpc:latest",
32+
"url": "https://github.com/ovis-hpc/ovis"
33+
},
2634
{
2735
"name": "app-pennant",
2836
"description": "Unstructured mesh hydrodynamics for advanced architectures ",

docs/getting_started/metrics.md

Lines changed: 28 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Each of the above is a metric design, which is primarily represented in the Metr
1111
there are different families of metrics (e.g., storage, network, performance, simulation) shown in the table below as the "Family" column.
1212
We likely will tweak and improve upon these categories.
1313

14-
<iframe src="../_static/data/table.html" style="width:100%; height:850px;" frameBorder="0"></iframe>
14+
<iframe src="../_static/data/table.html" style="width:100%; height:900px;" frameBorder="0"></iframe>
1515

1616

1717
## Implemented Metrics
@@ -21,7 +21,7 @@ family once we decide on a more final set.
2121

2222
### Performance
2323

24-
These metrics are intended to assess application performance.
24+
These metrics are intended to assess application performance, where they run alongside an application of interest.
2525

2626
#### perf-sysstat
2727

@@ -32,7 +32,7 @@ These metrics are intended to assess application performance.
3232
This metric provides the "pidstat" executable of the sysstat library. The following options are available:
3333

3434

35-
|Name | Description | Type | Default |
35+
| Name | Description | Type | Default |
3636
|-----|-------------|------------|------|
3737
| color | Set to turn on color parsing | Anything set | unset |
3838
| pids | For debugging, show consistent output of ps aux | Anything set | unset |
@@ -82,7 +82,7 @@ Options you can set include:
8282
|Name | Description | Type | Default |
8383
|-----|-------------|------------|------|
8484
|testname | Name for the test | string | test |
85-
| blocksize | Size of block to write. It dfaults to 4k, but can be set from 256 to 8k. | string | 4k |
85+
| blocksize | Size of block to write. It defaults to 4k, but can be set from 256 to 8k. | string | 4k |
8686
| iodepth | Number of I/O units to keep in flight against the file. | int | 64 |
8787
| size | Total size of file to write | string | 4G |
8888
| directory | Directory (usually mounted) to test. | string | /tmp |
@@ -105,9 +105,11 @@ This is the "iostat" executable of the sysstat library.
105105
106106
This is good for mounted storage that can be seen by the operating system, but may not work for something like NFS.
107107
108-
109108
### Standalone
110109
110+
Standalone metrics can take on many designs, from a launcher/worker design to test networking, to running
111+
a metric across nodes to assess the node performance.
112+
111113
#### network-netmark
112114
113115
- [Standalone Metric Set](user-guide.md#application-metric-set)
@@ -505,24 +507,31 @@ ex3_colored-indexset_solution ex6_stencil-offset-layout_solution ex9_matrix-tr
505507
(meaning on the PATH in `/opt/Kripke/build/bin` in the container).
506508
For apps / metrics to be added, please see [this issue](https://github.com/converged-computing/metrics-operator/issues/30).
507509
508-
## Containers
510+
#### app-ldms
509511
510-
The following tools are folded into the metrics above. Often, one tool can be built into one container and used across multiple metrics.
512+
- [Standalone Metric Set](user-guide.md#application-metric-set)
513+
- *[app-ldms](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/app-ldms)*
511514
512-
### Sysstat
513515
514-
- [ghcr.io/converged-computing/metric-sysstat](https://github.com/converged-computing/metrics-operator/pkgs/container/metric-sysstat)
516+
LDMS is "a low-overhead, low-latency framework for collecting, transferring, and storing metric data on a large distributed computer system"
517+
and is packaged alongside [ovis-hpc](https://github.com/ovis-hpc/ovis). While there are complex aggregator setups we could run,
518+
for this simple metric we simply run (on each separate pod/node). The following variables are supported:
515519
516-
Sysstat is stored as a general metrics analyzer, as it provides several different metric types; It generally provides utils to monitor system performance and usage, including:
520+
|Name | Description | Type | Default |
521+
|-----|-------------|------|------|
522+
| command | The command to issue to ldms_ls (or that) |string | (see below) |
523+
| workdir | The working directory for the command | string | /opt |
524+
| completions | Number of times to run metric | int32 | unset (runs for lifetime of application or indefinitely) |
525+
| rate | Seconds to pause between measurements | int32 | 10 |
526+
527+
528+
The following is the default command:
517529
518-
- *iostat* reports CPU statistics and input/output statistics for block devices and partitions.
519-
- *mpstat* reports individual or combined processor related statistics.
520-
- *pidstat* reports statistics for Linux tasks (processes) : I/O, CPU, memory, etc.
521-
- *tapestat* reports statistics for tape drives connected to the system.
522-
- *cifsiostat* reports CIFS statistics.
530+
```bash
531+
ldms_ls -h localhost -x sock -p 10444 -l -v
532+
```
523533
524-
## LLNL Storage / Filesystems
534+
## Containers
525535
526-
- NFS
527-
- Vast
528-
- Lustre
536+
To see all associated app containers, look at the [converged-computing/metrics-container](https://github.com/converged-computing/metrics-containers)
537+
repository (with `Dockerfile`s and automation) and associated packages.

0 commit comments

Comments
 (0)