-
Notifications
You must be signed in to change notification settings - Fork 373
Revise MSI v2 token revocation spec to focus on certificate revocation #5498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Updated the document to reflect changes from 'Short-Lived Credential (SLC)' to 'Certificate' terminology and clarified the handling of certificate revocation scenarios.
@@ -1,46 +1,46 @@ | |||
# Short-Lived Credential (SLC) Revocation Specification | |||
# Certificate Revocation Specification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the file should also be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
docs/msi_v2/slc_revocation_spec.md
Outdated
|
||
## Overview | ||
|
||
This document outlines the design and implementation details for short-lived credential (SLC) revocation in MSI V2 scenarios. | ||
This document outlines the design and implementation details for **certificate revocation** in MSI V2 scenarios. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have 1 document that discusses all credential revocation ? (cert and token)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
docs/msi_v2/slc_revocation_spec.md
Outdated
MSAL ->> IMDS: 2. Request Short-Lived Credential (SLC) | ||
IMDS -->> MSAL: 3. Return SLC | ||
MSAL ->> eSTS: 4. Exchange SLC for Access Token | ||
MSAL ->> IMDS: 2. Request Credential (certificate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's safe to use "certificate" everywhere. I don't see a use case of another type of credential (jwt).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
docs/msi_v2/slc_revocation_spec.md
Outdated
- MSAL will **relay the response to IMDS as-is**, ensuring support for future suberror codes without requiring modifications. | ||
1. Call the certificate minting endpoint **`/issuecredential?bypass_cache=true`** to force a **new certificate** (ignore any cached/invalid state). | ||
2. Replace the current certificate with the newly issued one. | ||
3. **Retry** the token request with eSTS using the new certificate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please specify retry policy. IMO, retry once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Robbie-Microsoft keeping this as STS retry
docs/msi_v2/slc_revocation_spec.md
Outdated
| **Telemetry Validation** | Ensure `MsalMsiCounter` correctly logs telemetry tags such as `MsiSource`, `TokenType`, `bypassCache`, and `CredentialOutcome`. | Telemetry records correct values for each token acquisition attempt, including failures. | | ||
| **Test Case** | **Description** | **Expected Outcome** | | ||
|-------------------------------------------------|---------------------------------------------------------------------------------------------------------|----------------------| | ||
| **AADSTS1000610–14 Auto-Remediation** | eSTS returns 401 `invalid_client` with any of 1000610–1000614. | MSAL calls `/issuecredential?bypass_cache=true`, obtains a new certificate, retries; success or deterministic failure surfaced. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would split this into 2 - success after retry / failure after retry to make it clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with comments
7. **Platform** | ||
The runtime/OS environment. | ||
- Examples: `"net6.0-linux"`, `"net472-windows"` | ||
Each time we record `MsalMsiCounter`, include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point in time, I do not feel that the client telemetry is significant. We can add a feature request for this, but I would not prioritize it today.
Is there any server telemetry that we want to capture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there is server-side telemetry that we send and also we know mTLS was used from ESTS
docs/msi_v2/slc_revocation_spec.md
Outdated
|
||
1. **MsiSource** — `"AppService"`, `"CloudShell"`, `"AzureArc"`, `"ImdsV1"`, `"ImdsV2"`, `"ServiceFabric"` | ||
2. **TokenType** — `"Bearer"`, `"POP"`, `"mtls_pop"` | ||
3. **bypassCache** — `"true"` / `"false"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be captured in the server telemetry? In other words, given a correlation ID, can we identify the call made to the credential endpoint and to the token endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
docs/msi_v2/slc_revocation_spec.md
Outdated
3. **bypassCache** — `"true"` / `"false"` | ||
4. **CertType** — `"Platform"`, `"inMemory"`, `"UserProvided"` | ||
5. **CredentialOutcome** — `Not found` / `Retry Failed` / `Retry Succeeded` / `Success` | ||
6. **MsalVersion** — e.g., `"4.61.0"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
version and platform are included by default in the counter, I think,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about telemetry. Maybe start with having those values logged. Let's focus more on server telemetry.
docs/msi_v2/msiv2_revocation_spec.md
Outdated
participant eSTS | ||
|
||
Application ->> MSAL: 1. Request Access Token | ||
MSAL ->> IMDS: 2. Request Certificate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CSR metadata request is part of the flow. I think you should add it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
docs/msi_v2/msiv2_revocation_spec.md
Outdated
participant eSTS | ||
|
||
Application ->> MSAL: 1. Request Access Token | ||
MSAL ->> IMDS: 2. Request Certificate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention /issuecredential
like you do below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
MSAL ->> eSTS: 4. Exchange Certificate for Access Token | ||
eSTS -->> MSAL: 5. Response (HTTP 200 / error) | ||
|
||
alt Token Revoked / Attestation invalid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both alts are the same except for 5a and 5b. Do you need to repeat the entire portion of the graph after those sections? Can you just group 5a and 5b together on the graph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are two different error conditions
participant Application | ||
participant MSAL | ||
participant IMDS | ||
participant eSTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a "IMDS/eSTS" section? Why not point towards the already existing "IMDS" and "eSTS" sections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one is for cert revocation, that MSAL handles internally. The other is for token revocation.
// Certificate/attestation validation failures | ||
if (aadsts is 1000610 or 1000611 or 1000612 or 1000613 or 1000614 || aadsts == null) | ||
{ | ||
// Force a NEW certificate and retry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to force a second cert refresh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you already do it above if claims are provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are two different use cases.
| **AADSTS1000610–14 Auto-Remediation (Retry Fails)** | Same initial condition as above. New cert minted, retry still fails deterministically (e.g., repeated same code). | Failure surfaced after retry. (`CredentialOutcome=Retry Failed`) | | ||
| **Unspecified Credential Issue** | eSTS returns `invalid_client` without codes. MSAL forces new certificate and retries. | Token succeeds or failure surfaced (assert correct `CredentialOutcome`). | | ||
| **Claims Challenge Path** | Resource 401 with claims; app supplies claims; MSAL re-mints cert (`bypass_cache=true`) and retries with claims. | New token with claims. (`CredentialOutcome=Success`) | | ||
| **IMDS/IssueCredential Failure Path** | `/issuecredential` call fails (network / service / malformed). | Failure returned; no infinite retry. (`CredentialOutcome=Retry Failed` if after a retry attempt) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify which retry policy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I follow this comment
Updated revocation scenarios and clarified certificate request process.
Updated the document to reflect changes from 'Short-Lived Credential (SLC)' to 'Certificate' terminology and clarified the handling of certificate revocation scenarios.
Fixes - Spec update
Changes proposed in this request
This pull request updates the documentation for MSI V2 credential revocation, clarifying and expanding the specification to focus on certificate revocation (rather than short-lived credentials) and aligning terminology, flows, error handling, and acceptance tests with current implementation and Azure AD error codes. The changes provide detailed guidance on how MSAL should handle certificate revocation, claims challenges, and telemetry.
Key documentation improvements:
Terminology and Flow Updates:
Error Handling and Remediation:
Claims Challenge Handling:
Acceptance Tests and Telemetry:
MsalMsiCounter
to reflect the new tags and expected values for improved diagnostics.Testing
n/a
Performance impact
n/a
Documentation
n/a