-
Notifications
You must be signed in to change notification settings - Fork 355
chore: add canister message queue latency metric #1510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
let message_type_label = "request"; | ||
let call_tree_depth_label = &metadata.call_tree_depth().to_string(); | ||
let latency = current_time() | ||
.saturating_duration_since(*metadata.call_tree_start_time()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the call tree start time is the time when the message at the root of the call tree started execution. Regardless, it has to do with the call tree, not with the individual message being executed. So a message enqueued in the same block (i.e. at the same time) when it's executed could have a call tree start time that is arbitrarily old.
let now = current_time(); | ||
let expiry_duration = expiry_time_from_now().saturating_duration_since(now); | ||
let start = ingress.expiry_time.saturating_sub(expiry_duration); | ||
let latency = now.saturating_duration_since(start).as_secs_f64(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your PR description you are very likely aware of this, but I will still mention that the expiry time can be set by the caller to an arbitrary value, i.e. much lower than the maximum 5 minutes. So this calculation may also be way off.
(In fact, I seem to remember subnets where this is consistently significantly less than 5 minutes. There's an mr_unreliable_induct_ingress_message_duration_seconds
metric that uses similar logic to estimate ingress induction latency. You can go here, unhide the first query and try selecting various subnets.)
This PR adds a metric to measure the queue latency of canister messages for both
ingress
andrequest
messages.For
ingress
messages, since the start time is not stored, the start time is estimated using the expiry time and the current expiration limit. The metric also tracks the call depth forrequest
messages, as it is relevant for them.