Skip to content

Commit 0c362a8

Browse files
committed
Added comparing measurements and statistics with expert mode; Added container power estimations
1 parent 4d98518 commit 0c362a8

File tree

6 files changed

+128
-1
lines changed

6 files changed

+128
-1
lines changed

content/en/docs/measuring/comparing-measurements.md

Lines changed: 63 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,22 @@ Comparison is currently possible for measurements that are:
2424
- Determine the impact of new features or refactors
2525
* Different repositories
2626
- Which 3rd party dependency meets your requirements better and at what cost?
27-
* Different usage scenarios
27+
* Different Usage Scenarios
2828
- How resource intense are certain processes of your software?
29+
* Different Usage Scenario Variables
30+
- Modular approach for comparison. Basically a mix of all of the above is possible through this
2931
* Different machines
3032
- Understand how the software behaves on different hardware configurations
3133

3234
The tool will let you know if you try to compare measurements that can't be compared.
3335

36+
To trigger a comparison in the frontend just tick the boxes of the runs you wish to compare and click the *Compare Runs* button.
37+
38+
<img class="ui centered rounded bordered" src="/img/measuring/triggering_compare_mode.webp" alt="Triggering Compare mode">
39+
40+
41+
Example of a comparison display
42+
3443
<img class="ui centered rounded bordered" src="/img/overview/comparison.webp" alt="Comparison">
3544

3645
When comparing measurements, you will see the standard deviance on the key metrics
@@ -42,3 +51,56 @@ Graphs will also include the confidence interval.
4251
<img class="ui centered rounded bordered" src="/img/overview/compare_charts.webp" alt="Graphs with confidence interval when comparing measurements">
4352

4453
Comparing measurements should help raise awareness of software energy use over time.
54+
55+
## Expert compare mode
56+
57+
In some instances the GMT will not allow certain comparisons. For instance when you compare different machines and also
58+
different repositories.
59+
A comparion like this *sounds unreasonable* for the GMT as machine and comparing should not be changed simultaneusly.
60+
61+
But still there might be instances when you want to force a certain comparison type. For instance when the repository
62+
is basically the same as the old one, just has been renamed. GMT does not understand repository renaming currently.
63+
64+
In that or similar cases you can override the default *comparison mode auto detection* and use the *Export Mode*.
65+
66+
Navigate to *Settings* and toggle *Expert compare mode*.
67+
68+
A new box will appear when comparing settings where you can force a mode. For instance treating runs from different
69+
repositories and branches with different commits, which however have run on the same machine as a *Machine* comparison
70+
will effectively compare them as being just repeated runs on the same machine.
71+
72+
<img class="ui centered rounded bordered" src="/img/measuring/expert_compare_mode.webp" alt="Expert compare mode">
73+
74+
Your runs must in any case have one common demoniator, that has at max two values. For instance:
75+
- Different repositories and branches but run on one machine
76+
- Many different machines and branches, but only two different repositories
77+
78+
## Statistical significance
79+
80+
### Comparing runs with differentiating features
81+
When running a comparison between different commits, different machines etc. the GMT will
82+
also compute a *T-test* for the two samples.
83+
84+
It will calculate the *T-test* for the means of two independent samples of scores assuming even independent variances. (Some might know this test also as *Welch’s t-test* or *Welch test*)
85+
86+
If the *p-value* is lower than **0.05** GMT will show the result as significant.
87+
88+
GMT will provide the *p-value* directly in the API output of the comparison.
89+
In the frontend it will be shown with a green / red indicator for the significance. Green meaning significant.
90+
Or it will tell you if a comparison could not be made in case there where too many missing values or the metric was not present in all runs.
91+
92+
<img class="ui centered rounded bordered" src="/img/measuring/gmt_t_test_two_samples.webp">
93+
94+
95+
### Comparing repeated runs
96+
97+
When running a comparison of repeated runs with no diffentiating criteria like different commits, repos etc. the GMT will run a *1-sample T-test*.
98+
Effectivly answering the question: "Did the last run in the set of repeated runs have a siginificant variation to the ones before".
99+
100+
This question is very typical as you will have a set of a couple of runs once measured. Then you come back to your code and just re-measure out. The value is now different and you want to tell if it is *significantly different*.
101+
102+
If the *p-value* is lower than **0.05** GMT will show the result as significant.
103+
104+
GMT will provide the *p-value* directly in the API output of the comparison.
105+
In the frontend it will be shown with a green / red indicator for the significance. Green meaning significant.
106+
Or it will tell you if a comparison could not be made in case there where too many missing values or the metric was not present in all runs.
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
---
2+
title: "Estimating Containers"
3+
description: ""
4+
lead: ""
5+
date: 2025-05-25T01:49:15+00:00
6+
weight: 842
7+
toc: true
8+
---
9+
10+
GMT is container native when it comes to orchestrating the application and capturing performance metrics.
11+
But typically the energy of the [Metric Providers]({{< relref "/docs/measuring/metric-providers/" >}}) are system level.
12+
13+
If you want to drill down the energy on a per-container level GMT offers to create an estmation based on the CPU utilization of the system.
14+
15+
### Setting up container estimations
16+
17+
Prerequisites
18+
- You must have a PSU [Metric Provider]({{< relref "/docs/measuring/metric-providers/" >}}) activated
19+
- You must have a System level CPU Utilization [Metric Provider]({{< relref "/docs/measuring/metric-providers/" >}}) activated
20+
- You must have a Container level CPU Utilization [Metric Provider]({{< relref "/docs/measuring/metric-providers/" >}}) activated
21+
22+
GMT will then
23+
24+
- Takes the baseline energy value of the machine
25+
- Takes the runtime energy value of the machine
26+
- Creates a difference
27+
- Splits the resulting difference proportionally to the individual container's CPU% in relation the other containers CPU% share
28+
29+
Example:
30+
31+
<img class="ui centered rounded bordered" src="/img/measuring/container_power_attribution.webp" alt="Container Power Attribution">
32+
33+
The difference between *Container Power* and *Container Power (+Baseline Share)* is that the *Container Power* is the overhead / additional power additional to the power that the system was already drawing during the baseline when no containers where launched.
34+
35+
*ontainer Power (+Baseline Share)* includes the attributional share of the baseline load also. It is split also according to the container's CPU% in relation the machine's CPU% share.
36+
This means that a container that has 20% CPU-Utilization in comparison to the other containers will also get 20% of the baseline power draw attributed.
37+
38+
39+
#### Requirements for reproducibility
40+
41+
This method only works if the baseline is long enough (the cluster ensures this with a 60-second timeframe) and the CPU is set to a fixed frequency without Hyperthreading and TurboBoost. Otherwise, you'll get incorrect allocations, as, for example, "30% CPU Utilization" no longer has a clear meaning due to the limited cycle count.
42+
43+
However, there is a good approximation.
44+
45+
46+
#### Limitations
47+
48+
The value is still shaky, because although utilization is more stable with a controlled cluster setup, it's not as good as, for example, CPU instructions (which would require PMU sampling), and non-CPU energy is only considered indirectly.
49+
50+
So, if you execute strange CPU instructions, such as AVX instructions or CPU steal time, or if you have a hard drive that executes asynchronous workloads like TRIM independently of CPU instructions, this will distort the energy evaluation.
51+
52+
53+
54+
#### Addon: Detailed CPU energy drill-down
55+
56+
We currently have a beta feature to be launched in Summer 2025 that utilizes AMD RAPL per-core energy registers and core-pinning to report individial container CPU energy metrics.
57+
58+
The GMT will utilize a `taskset` to pin the container to a distinct core. Since no other processes are running on your benchmarking systems in [Green Metrics Tool Cluster Hosted Service →]({{< relref "/docs/measuring/measuring-service/" >}}) the values are very reliable.
59+
60+
Private beta opens Summer 2025. If you are interested shoot us an email to [info@green-coding.io](mailto:info@green-coding.io)
61+
- The energy of the browser is measured to display and render the page
62+
- The network transfer energy is measured that was needed to download the HTML and page assets
63+
64+
To isolate this as best as possible GMT orchestrates a reverse proxy, warms up the cache by pre-loading the full page once and only then does the final measurement.
65+
Binary file not shown.
5.58 KB
Binary file not shown.
31.8 KB
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)