green-coding-solutions
diff --git a/‎content/en/docs/measuring/comparing-measurements.md
Lines changed: 63 additions & 1 deletion b/‎content/en/docs/measuring/comparing-measurements.md
Lines changed: 63 additions & 1 deletion
diff --git a/‎content/en/docs/measuring/measuring-containers.md
Lines changed: 65 additions & 0 deletions b/‎content/en/docs/measuring/measuring-containers.md
Lines changed: 65 additions & 0 deletions
diff --git a/‎static/img/measuring/container_power_attribution.webp
49.1 KB b/‎static/img/measuring/container_power_attribution.webp
49.1 KB
diff --git a/‎static/img/measuring/expert_compare_mode.webp
5.58 KB b/‎static/img/measuring/expert_compare_mode.webp
5.58 KB
diff --git a/‎static/img/measuring/gmt_t_test_two_samples.webp
31.8 KB b/‎static/img/measuring/gmt_t_test_two_samples.webp
31.8 KB
diff --git a/‎static/img/measuring/triggering_compare_mode.webp
80 KB b/‎static/img/measuring/triggering_compare_mode.webp
80 KB
@@ -24,13 +24,22 @@ Comparison is currently possible for measurements that are:
   - Determine the impact of new features or refactors
 * Different repositories
   - Which 3rd party dependency meets your requirements better and at what cost?
-* Different usage scenarios
+* Different Usage Scenarios
   - How resource intense are certain processes of your software?
+* Different Usage Scenario Variables
+  - Modular approach for comparison. Basically a mix of all of the above is possible through this
 * Different machines
   - Understand how the software behaves on different hardware configurations
 
 The tool will let you know if you try to compare measurements that can't be compared.
 
+To trigger a comparison in the frontend just tick the boxes of the runs you wish to compare and click the *Compare Runs* button.
+
+<img class="ui centered rounded bordered" src="/img/measuring/triggering_compare_mode.webp" alt="Triggering Compare mode">
+
+
+Example of a comparison display
+
 <img class="ui centered rounded bordered" src="/img/overview/comparison.webp" alt="Comparison">
 
 When comparing measurements, you will see the standard deviance on the key metrics  
@@ -42,3 +51,56 @@ Graphs will also include the confidence interval.
 <img class="ui centered rounded bordered" src="/img/overview/compare_charts.webp" alt="Graphs with confidence interval when comparing measurements">
 
 Comparing measurements should help raise awareness of software energy use over time.
+
+## Expert compare mode
+
+In some instances the GMT will not allow certain comparisons. For instance when you compare different machines and also
+different repositories.
+A comparion like this *sounds unreasonable* for the GMT as machine and comparing should not be changed simultaneusly.
+
+But still there might be instances when you want to force a certain comparison type. For instance when the repository
+is basically the same as the old one, just has been renamed. GMT does not understand repository renaming currently.
+
+In that or similar cases you can override the default *comparison mode auto detection* and use the *Export Mode*.
+
+Navigate to *Settings* and toggle *Expert compare mode*.
+
+A new box will appear when comparing settings where you can force a mode. For instance treating runs from different
+repositories and branches with different commits, which however have run on the same machine as a *Machine* comparison
+will effectively compare them as being just repeated runs on the same machine.
+
+<img class="ui centered rounded bordered" src="/img/measuring/expert_compare_mode.webp" alt="Expert compare mode">
+
+Your runs must in any case have one common demoniator, that has at max two values. For instance:
+- Different repositories and branches but run on one machine
+- Many different machines and branches, but only two different repositories
+
+## Statistical significance
+
+### Comparing runs with differentiating features
+When running a comparison between different commits, different machines etc. the GMT will
+also compute a *T-test* for the two samples.
+
+It will calculate the *T-test* for the means of two independent samples of scores assuming even independent variances. (Some might know this test also as *Welch’s t-test* or *Welch test*)
+
+If the *p-value* is lower than **0.05** GMT will show the result as significant.
+
+GMT will provide the *p-value* directly in the API output of the comparison.
+In the frontend it will be shown with a green / red indicator for the significance. Green meaning significant.
+Or it will tell you if a comparison could not be made in case there where too many missing values or the metric was not present in all runs.
+
+<img class="ui centered rounded bordered" src="/img/measuring/gmt_t_test_two_samples.webp">
+
+
+### Comparing repeated runs
+
+When running a comparison of repeated runs with no diffentiating criteria like different commits, repos etc. the GMT will run a *1-sample T-test*.
+Effectivly answering the question: "Did the last run in the set of repeated runs have a siginificant variation to the ones before".
+
+This question is very typical as you will have a set of a couple of runs once measured. Then you come back to your code and just re-measure out. The value is now different and you want to tell if it is *significantly different*.
+
+If the *p-value* is lower than **0.05** GMT will show the result as significant.
+
+GMT will provide the *p-value* directly in the API output of the comparison.
+In the frontend it will be shown with a green / red indicator for the significance. Green meaning significant.
+Or it will tell you if a comparison could not be made in case there where too many missing values or the metric was not present in all runs.
@@ -0,0 +1,65 @@
+---
+title: "Estimating Containers"
+description: ""
+lead: ""
+date: 2025-05-25T01:49:15+00:00
+weight: 842
+toc: true
+---
+
+GMT is container native when it comes to orchestrating the application and capturing performance metrics. 
+But typically the energy of the [Metric Providers]({{< relref "/docs/measuring/metric-providers/" >}}) are system level.
+
+If you want to drill down the energy on a per-container level GMT offers to create an estmation based on the CPU utilization of the system.
+
+### Setting up container estimations
+
+Prerequisites
+- You must have a PSU [Metric Provider]({{< relref "/docs/measuring/metric-providers/" >}}) activated
+- You must have a System level CPU Utilization [Metric Provider]({{< relref "/docs/measuring/metric-providers/" >}}) activated
+- You must have a Container level CPU Utilization [Metric Provider]({{< relref "/docs/measuring/metric-providers/" >}}) activated
+
+GMT will then
+
+- Takes the baseline energy value of the machine
+- Takes the runtime energy value of the machine
+- Creates a difference
+- Splits the resulting difference proportionally to the individual container's CPU% in relation the other containers CPU% share
+
+Example:
+
+<img class="ui centered rounded bordered" src="/img/measuring/container_power_attribution.webp" alt="Container Power Attribution">
+
+The difference between *Container Power* and *Container Power (+Baseline Share)* is that the *Container Power* is the overhead / additional power additional to the power that the system was already drawing during the baseline when no containers where launched.
+
+*ontainer Power (+Baseline Share)* includes the attributional share of the baseline load also. It is split also according to the container's CPU% in relation the machine's CPU% share.
+This means that a container that has 20% CPU-Utilization in comparison to the other containers will also get 20% of the baseline power draw attributed.
+
+
+#### Requirements for reproducibility
+
+This method only works if the baseline is long enough (the cluster ensures this with a 60-second timeframe) and the CPU is set to a fixed frequency without Hyperthreading and TurboBoost. Otherwise, you'll get incorrect allocations, as, for example, "30% CPU Utilization" no longer has a clear meaning due to the limited cycle count.
+
+However, there is a good approximation.
+
+
+#### Limitations
+
+The value is still shaky, because although utilization is more stable with a controlled  cluster setup, it's not as good as, for example, CPU instructions (which would require PMU sampling), and non-CPU energy is only considered indirectly.
+
+So, if you execute strange CPU instructions, such as AVX instructions or CPU steal time, or if you have a hard drive that executes asynchronous workloads like TRIM independently of CPU instructions, this will distort the energy evaluation.
+
+
+
+#### Addon: Detailed CPU energy drill-down
+
+We currently have a beta feature to be launched in Summer 2025 that utilizes AMD RAPL per-core energy registers and core-pinning to report individial container CPU energy metrics.
+
+The GMT will utilize a `taskset` to pin the container to a distinct core. Since no other processes are running on your benchmarking systems in [Green Metrics Tool Cluster Hosted Service →]({{< relref "/docs/measuring/measuring-service/" >}}) the values are very reliable.
+
+Private beta opens Summer 2025. If you are interested shoot us an email to [info@green-coding.io](mailto:info@green-coding.io)
+- The energy of the browser is measured to display and render the page
+- The network transfer energy is measured that was needed to download the HTML and page assets
+
+To isolate this as best as possible GMT orchestrates a reverse proxy, warms up the cache by pre-loading the full page once and only then does the final measurement.
+