Skip to content

gen - add support for mixed precision operators #1853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

zatkins-dev
Copy link
Collaborator

Adds support for mixed precision operators, set at creation time, for the CUDA gen backend. This is made possible by defining a second scalar type CeedScalarCPU, which is always double precision, and changing the CeedScalar type during the JiT pass if the CEED_JIT_MIXED_PRECISION define is set.

The inputs to the device kernels are always CeedScalarCPU arrays, to avoid having to muck around with multiple pointers and such in a CeedVector. In gen, we only do things to the input and output arrays at the beginning and end of the kernel, so all of the computation will be happening with the single precision CeedScalar arrays we copy values into. This approach minimizes the code differences between mixed and full precision runs, essentially just requiring the helper functions to have extra template parameters to ensure the input types are correct.

The support for mixed precision operators is at the backend level, while the actual usage of mixed precision operations is defined per-operator to provide maximal flexibility.

This can be extended to the CUDA ref backend too, though the benefits will likely be more mild.

@jeremylt and @nbeams, does this seem like a reasonable approach?

@zatkins-dev zatkins-dev force-pushed the zach/mixed-precision branch 2 times, most recently from 84d2396 to c08437d Compare July 11, 2025 15:35
@zatkins-dev
Copy link
Collaborator Author

Should we change the way flops are reported (say, cut them in half) for mixed precision operators?

@zatkins-dev zatkins-dev self-assigned this Jul 11, 2025
@jrwrigh
Copy link
Collaborator

jrwrigh commented Jul 11, 2025

Should we change the way flops are reported (say, cut them in half) for mixed precision operators?

I don't think so; a FLOP is a FLOP. There can be some other mechanism for adjusting that if the end-user wants to, but by default I don't think the count should be changed.


@ref User
**/
int CeedOperatorSetMixedPrecision(CeedOperator op) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should take a bool if we want to go with this interface.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better interface would specify the precision that the operator will use internally via an enum of somesort. This will make it more future proof in case we want to experiment with other precisions. Something like CEED_PRECISION_SINGLE, CEED_PRECISION_DOUBLE, etc. (if such an enum doesn't already exist).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree. I'll think about the best way to support it; I can't imagine a use case for using a higher precision for the operator than for the entire solve, and it needs to be able to be a compile time constant for the JiT to work

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think changing that makes the interface way more clear. Also, I can think of a use case; numerically unstable implementations of QFunctions that people aren't interesting in fixing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updated, lmk what you think

@@ -1303,7 +1303,7 @@ int CeedSymmetricSchurDecomposition(Ceed ceed, CeedScalar *mat, CeedScalar *lamb

// Reduce sub and super diagonal
CeedInt p = 0, q = 0, itr = 0, max_itr = n * n * n * n;
CeedScalar tol = CEED_EPSILON;
CeedScalar tol = 10 * CEED_EPSILON;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray temp fix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no that's a real fix. It isn't able to get to machine precision, now that CEED_EPSILON is machine precision

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth putting into a separate PR if it applies to main?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it technically only applies here, since CEED_EPSILON is a bit bigger on main, but fair

@zatkins-dev zatkins-dev force-pushed the zach/mixed-precision branch from 4a7a755 to 767aa77 Compare July 11, 2025 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants