-
Notifications
You must be signed in to change notification settings - Fork 13.6k
No longer need alloca
s for consuming Result<!, i32>
and similar
#144347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some changes occurred in compiler/rustc_codegen_ssa |
#[no_mangle] | ||
pub fn make_unmake_result_never(x: i32) -> i32 { | ||
// CHECK-LABEL: define i32 @make_unmake_result_never(i32 %x) | ||
// CHECK: start: | ||
// CHECK-NEXT: ret i32 %x | ||
|
||
let y: Result<i32, Never> = Ok(x); | ||
let Ok(z) = y; | ||
z | ||
} | ||
|
||
#[no_mangle] | ||
pub fn extract_control_flow_never(x: ControlFlow<&str, Never>) -> &str { | ||
// CHECK-LABEL: define { ptr, i64 } @extract_control_flow_never(ptr align 1 %x.0, i64 %x.1) | ||
// CHECK: start: | ||
// CHECK-NEXT: %[[P0:.+]] = insertvalue { ptr, i64 } poison, ptr %x.0, 0 | ||
// CHECK-NEXT: %[[P1:.+]] = insertvalue { ptr, i64 } %[[P0]], i64 %x.1, 1 | ||
// CHECK-NEXT: ret { ptr, i64 } %[[P1]] | ||
|
||
let ControlFlow::Break(s) = x; | ||
s | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare today: https://rust.godbolt.org/z/9dezqPq6n
define i32 @make_unmake_result_never(i32 %x) unnamed_addr {
start:
%y = alloca [4 x i8], align 4
store i32 %x, ptr %y, align 4
%z = load i32, ptr %y, align 4
ret i32 %z
}
define { ptr, i64 } @extract_control_flow_never(ptr align 1 %0, i64 %1) unnamed_addr {
start:
%x = alloca [16 x i8], align 8
store ptr %0, ptr %x, align 8
%2 = getelementptr inbounds i8, ptr %x, i64 8
store i64 %1, ptr %2, align 8
%s.0 = load ptr, ptr %x, align 8
%3 = getelementptr inbounds i8, ptr %x, i64 8
%s.1 = load i64, ptr %3, align 8
%4 = insertvalue { ptr, i64 } poison, ptr %s.0, 0
%5 = insertvalue { ptr, i64 } %4, i64 %s.1, 1
ret { ptr, i64 } %5
}
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
In optimized builds GVN gets rid of these already, but in `opt-level=0` we actually make `alloca`s for this, which particularly impacts `?`-style things that use actually-only-one-variant types like this.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
No longer need `alloca`s for consuming `Result<!, i32>` and similar
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (88eefbd): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary -5.4%, secondary -2.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -0.4%, secondary -0.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.0%, secondary 0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 463.694s -> 465.155s (0.32%) |
Those html5ever binary size regressions are 336 bytes, 328 bytes, 328 bytes, and 328 bytes, which I'm not worried about. Thus I would consider this very perf-neutral and thus good to go. (I was hoping for a bit better than that, but for neutral I think it's fine, and we can try things like With no icount changes I'll demote it to |
@bors r+ |
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 052114f (parent) -> 86ef320 (this PR) Test differencesShow 7 test diffsStage 1
Stage 2
Additionally, 4 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 86ef32029427cfc4161a3fd7a51992302f7c5552 --output-dir test-dashboard And then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
Finished benchmarking commit (86ef320): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (secondary 3.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -1.1%, secondary -5.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.0%, secondary 0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 467.835s -> 467.671s (-0.04%) |
(Those cycle changes look fake in the couple I checked.) |
In optimized builds GVN gets rid of these already, but in
opt-level=0
we actually makealloca
s for this, which particularly impacts?
-style things that use actually-only-one-variant types like this.While doing so, rewrite
LocalAnalyzer::process_place
to be non-recursive, solving a 6+ year old FIXME.r? codegen