Skip to content

chore: add canister message queue latency metric #1510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

maksymar
Copy link
Contributor

@maksymar maksymar commented Sep 16, 2024

This PR adds a metric to measure the queue latency of canister messages for both ingress and request messages.

For ingress messages, since the start time is not stored, the start time is estimated using the expiry time and the current expiration limit. The metric also tracks the call depth for request messages, as it is relevant for them.

@github-actions github-actions bot added the chore label Sep 16, 2024
@maksymar maksymar changed the title chore: add ingress message queue latency metric chore: add canister message queue latency metric Sep 16, 2024
@maksymar maksymar marked this pull request as ready for review September 16, 2024 13:40
@maksymar maksymar requested review from a team as code owners September 16, 2024 13:40
let message_type_label = "request";
let call_tree_depth_label = &metadata.call_tree_depth().to_string();
let latency = current_time()
.saturating_duration_since(*metadata.call_tree_start_time())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the call tree start time is the time when the message at the root of the call tree started execution. Regardless, it has to do with the call tree, not with the individual message being executed. So a message enqueued in the same block (i.e. at the same time) when it's executed could have a call tree start time that is arbitrarily old.

let now = current_time();
let expiry_duration = expiry_time_from_now().saturating_duration_since(now);
let start = ingress.expiry_time.saturating_sub(expiry_duration);
let latency = now.saturating_duration_since(start).as_secs_f64();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your PR description you are very likely aware of this, but I will still mention that the expiry time can be set by the caller to an arbitrary value, i.e. much lower than the maximum 5 minutes. So this calculation may also be way off.

(In fact, I seem to remember subnets where this is consistently significantly less than 5 minutes. There's an mr_unreliable_induct_ingress_message_duration_seconds metric that uses similar logic to estimate ingress induction latency. You can go here, unhide the first query and try selecting various subnets.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants