-
Notifications
You must be signed in to change notification settings - Fork 57
gen - add support for mixed precision operators #1853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
84d2396
to
c08437d
Compare
Should we change the way flops are reported (say, cut them in half) for mixed precision operators? |
I don't think so; a FLOP is a FLOP. There can be some other mechanism for adjusting that if the end-user wants to, but by default I don't think the count should be changed. |
interface/ceed-operator.c
Outdated
|
||
@ref User | ||
**/ | ||
int CeedOperatorSetMixedPrecision(CeedOperator op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should take a bool
if we want to go with this interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better interface would specify the precision that the operator will use internally via an enum of somesort. This will make it more future proof in case we want to experiment with other precisions. Something like CEED_PRECISION_SINGLE
, CEED_PRECISION_DOUBLE
, etc. (if such an enum doesn't already exist).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree. I'll think about the best way to support it; I can't imagine a use case for using a higher precision for the operator than for the entire solve, and it needs to be able to be a compile time constant for the JiT to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I think changing that makes the interface way more clear. Also, I can think of a use case; numerically unstable implementations of QFunctions that people aren't interesting in fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just updated, lmk what you think
@@ -1303,7 +1303,7 @@ int CeedSymmetricSchurDecomposition(Ceed ceed, CeedScalar *mat, CeedScalar *lamb | |||
|
|||
// Reduce sub and super diagonal | |||
CeedInt p = 0, q = 0, itr = 0, max_itr = n * n * n * n; | |||
CeedScalar tol = CEED_EPSILON; | |||
CeedScalar tol = 10 * CEED_EPSILON; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray temp fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no that's a real fix. It isn't able to get to machine precision, now that CEED_EPSILON
is machine precision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth putting into a separate PR if it applies to main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it technically only applies here, since CEED_EPSILON
is a bit bigger on main, but fair
4a7a755
to
767aa77
Compare
Adds support for mixed precision operators, set at creation time, for the CUDA gen backend. This is made possible by defining a second scalar type
CeedScalarCPU
, which is always double precision, and changing theCeedScalar
type during the JiT pass if theCEED_JIT_MIXED_PRECISION
define is set.The inputs to the device kernels are always
CeedScalarCPU
arrays, to avoid having to muck around with multiple pointers and such in aCeedVector
. In gen, we only do things to the input and output arrays at the beginning and end of the kernel, so all of the computation will be happening with the single precisionCeedScalar
arrays we copy values into. This approach minimizes the code differences between mixed and full precision runs, essentially just requiring the helper functions to have extra template parameters to ensure the input types are correct.The support for mixed precision operators is at the backend level, while the actual usage of mixed precision operations is defined per-operator to provide maximal flexibility.
This can be extended to the CUDA ref backend too, though the benefits will likely be more mild.
@jeremylt and @nbeams, does this seem like a reasonable approach?