-
Notifications
You must be signed in to change notification settings - Fork 283
Add definition splitting to transpiler #1477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
16f6f85 to
4ffc28d
Compare
|
Not against this approach per se, just wanted to bring up the alternative approach of using the existing |
d2cf708 to
5b517cb
Compare
I discussed this with @thedataking in a call. One problem is that we only have the The other problem is that even given said annotations, we only know the start of the C definition, not its entire bounds (nor bounds of any relevant preceding comments). Most of the logic here is in getting precise bounds from the information in the Clang AST, which is otherwise lost. So we would need to add that logic somewhere even if we were going to use the |
e751b54 to
e51268a
Compare
|
We can't practically integrate the Rust splitting/merging tools into our |
kkysen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't practically integrate the Rust splitting/merging tools into our
Cargo.tomlbecause they use Rust 2024 and have dependencies that also do. But I don't think it's a problem to just have them live in their own subdirectories for now; they just have to be built before the c2rust-postprocess tool can invoke them.
That seems fine for now. When we get around to #1227, we can add it back.
85b2bc4 to
4fe4d8d
Compare
e51268a to
edcb0bf
Compare
|
Hm, I guess this somehow upsets the |
|
I pulled down this branch and built it on $ c2rust transpile ~/Work/lua/compile_commands.json --binary lua --emit-c-decl-map
warning: unknown warning option '-Wlogical-op'; did you mean '-Wlong-long'? [-Wunknown-warning-option]
warning: unknown warning option '-Wno-aggressive-loop-optimizations' [-Wunknown-warning-option]
warning: unknown warning option '-Wlogical-op'; did you mean '-Wlong-long'? [-Wunknown-warning-option]
warning: unknown warning option '-Wno-aggressive-loop-optimizations' [-Wunknown-warning-option]
2 warnings generated.
Transpiling lcode.c
failed to convert top-level decl CDeclId(15970)!
failed to convert top-level decl CDeclId(15996)!
failed to convert top-level decl CDeclId(17033)!
thread 'main' panicked at 'slice index starts at 50147 but ends at 36146', library/core/src/slice/index.rs:92:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtraceupdate: I tried the in-tree lua version too: $ c2rust transpile tests/integration/tests/lua/compile_commands.json --binary lua --overwrite-existing --emit-c-decl-map
Transpiling lbaselib.c
failed to convert top-level decl CDeclId(3263)!
thread 'main' panicked at 'slice index starts at 1031 but ends at 903', library/core/src/slice/index.rs:92:5also tried libxml2 in-tree: hacky workaround got me some diff --git a/c2rust-transpile/src/c_ast/mod.rs b/c2rust-transpile/src/c_ast/mod.rs
index 2041ac32c..3d60bbe6d 100644
--- a/c2rust-transpile/src/c_ast/mod.rs
+++ b/c2rust-transpile/src/c_ast/mod.rs
@@ -310,6 +310,10 @@ impl TypedAstContext {
for (decl_id, decl) in &self.c_decls {
let begin_loc: SrcLoc = decl.begin_loc().expect("no begin loc for top-level decl");
let end_loc: SrcLoc = decl.end_loc().expect("no end loc for top-level decl");
+ if begin_loc.line > end_loc.line {
+ eprintln!("Skipping invalid source range for decl {decl_id:?}: begin {begin_loc:?} > end {end_loc:?}");
+ continue;
+ }
// If encountering a new file, reset end of last top-level decl.
if prev_src_loc.fileid != begin_loc.fileid { |
|
I just pushed a couple commits that fix the comments test failure and the begin/end ordering issue; I had expected the initial traversal of decls was srcloc-sorted, but it wasn't. libxml2 seems to go through without issue now, but upon inspecting the JSON it looks like definitions are overlapping others; debugging now. |
|
I think tip-of-tree lua uses GNU labels as values which makes the c2rust transpiler barf and in turn, some of the json files emitted are not valid json. Please consider interactions between unsuccessful transpiles and this feature. |
I think you have to add (It would also be good to have a README in the tools dir showing how to build, run, and test the tools.) |
I wasn't able to reproduce this--which JSON output is malformed, and how? It's surprising that we would see invalid JSON, because we serialize it with Any tips reproducing the failure? I tried: git clone https://github.com/lua/lua
cd lua
bear -- make -j9
c2rust-transpile --overwrite-existing --emit-build-files --emit-c-decl-map compile_commands.jsonFrom the transpiler, I got this output: c2rust-transpile outputbut all the for i in *.c_decls.json; do echo -n "$i: "; json_verify < $i; done |
485b683 to
6f349c1
Compare
|
We now seem to extract correct source ranges for all the definitions in libxml2 and lua in the testsuite. |
c56e84d to
9d65be7
Compare
|
|
||
| ## Building | ||
|
|
||
| These tools rely on rust-analyzer's libraries, so they need a relatively recent version of Rust. Run `cargo build --release` in their respective directories to build them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, that picks up the c2rust nightly toolchain and fails because it is too old. I had to run cargo +stable build --release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's because I was running actual cargo, not rustup aliased to cargo. In the latter case (which is probably more common) your command is the right one; I'll clarify. Or alternately, rustup might pick up a rust-toolchain.toml if we placed one somewhere.
kkysen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I test this? Also, do we need to merge the --emit-c-decl-map changes to c2rust-transpile/c2rust-ast-exporter at the same time as the new tools/? The former looks mostly good to me, but I haven't reviewed the latter. Also, is it worth it to review split_ffi_entry_points, as FWIU, it's already in crisp?
c2rust-transpile/src/c_ast/mod.rs
Outdated
| decls_sorted.sort_by(|v1, v2| { | ||
| self.c_decls[v1] | ||
| .begin_loc() | ||
| .cmp(&self.c_decls[v2].begin_loc()) | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| decls_sorted.sort_by(|v1, v2| { | |
| self.c_decls[v1] | |
| .begin_loc() | |
| .cmp(&self.c_decls[v2].begin_loc()) | |
| }); | |
| decls_sorted.sort_by_key(|v| self.c_decls[v].begin_loc()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also move the sorting to before name_loc_map and prev_src_loc, which are only used in the loop?
| ), | ||
| }; | ||
|
|
||
| match serde_json::ser::to_writer(file, &decl_map) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably best to just write to a Vec<u8>/String and then write the file all at once. You could also write to a buffered writer, but that's easy to forget, like here.
| "Unable to write C declaration map to file {}: {}", | ||
| output_path.display(), | ||
| e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Unable to write C declaration map to file {}: {}", | |
| output_path.display(), | |
| e | |
| "Unable to write C declaration map to file {}: {e}", | |
| output_path.display(), |
| } | ||
|
|
||
| let file_content = | ||
| std::fs::read(&t.ast_context.get_file_path(t.main_file).unwrap()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use a .expect() here or fs_err? A file missing is a fairly common error, so better context is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good idea, will do.
| let mut begin_offset = src_loc_to_byte_offset(&line_end_offsets, begin); | ||
| let mut end_offset = src_loc_to_byte_offset(&line_end_offsets, end); | ||
| assert!(begin_offset <= end_offset); | ||
| const VT: u8 = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's VT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vertical tab?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, ASCII VT, which has gone out of fashion so Rust lacks an escape for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const VT: u8 = 11; | |
| const VT: u8 = 11; // Vertical Tab |
| }) | ||
| .collect::<HashMap<_, _>>(); | ||
|
|
||
| // Generate a map from Rust items to the source code of their C declarations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we put this in its own function? I think it'd be easier to follow that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, an emit_c_decl_map function would probably be good to avoid bloating fn translate.
| .collect::<HashMap<_, _>>(); | ||
|
|
||
| // Generate a map from Rust items to the source code of their C declarations. | ||
| let decl_map = if let Some(decl_source_ranges) = decl_source_ranges { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decl_source_ranges.map(|decl_sources_ranges| {})?
| assert!(begin_offset <= end_offset); | ||
| const VT: u8 = 11; | ||
| /* Skip whitespace and any trailing semicolons after the previous decl. */ | ||
| while let Some(b'\t' | b'\n' | &VT | b'\r' | b' ' | b';') = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use u8::is_ascii_whitespace + ;?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't include \v, but this could be expressed using that function if we wanted. I don't know if it's clearer, given the specific set of characters we care about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. is_ascii_whitespace also handles FF but not VT. Is VT worth handling at all? I guess it doesn't matter that much. I don't imagine anything will actually produce VTs or FFs.
The flag is independent from In addition to the testsuite, I've been using the following test case which I'll probably add as a snapshot test: #include <stddef.h>
#include <stdint.h>
int a;int bb;/*comment for cc*/int ccc;
/*
123456789012345678901234567890
*/
typedef uintptr_t ngx_uint_t;
void f(void){}void g(void){}/*comment for h*/void h(void){}int d;int e;;;;;;;;int another;
//comment for ngx_hash_init
void
ngx_hash_init(void)
{
size_t len;
ngx_uint_t align = len & ~sizeof(void *);
}
int
anotherfunc(void)
{
int var = 0;
return 4 + var;
}
int
funcholdingdefine(void)
{
#define FOO 4000+50
return FOO;
}
void forwarddeclbeforeotherdef(int x);
void funcafterdecl(int n) {}
/*real defn*/
void forwarddeclbeforeotherdef(int x) { } |
34658b1 to
d08939a
Compare
…tem to the corresponding C definition
this is being used for local definitions instead of the `static` keyword, so it should encompass the newer helpers we've defined recently
Clang's SourceRange ends at the beginning of the final token, but this is not useful if we want to look at every character of a declaration, as we may when considering the spelling of the corresponding C definition
otherwise, we treat them as one character, which means the next definition's source range will expand to include them as if they were leading comments for it
otherwise, debug builds panic here
this can occur if a macro is defined inside a top-level function
otherwise, inner decls like local variables confuse our tracking of definition bounds
d08939a to
018a315
Compare
|
Should be good to go as long as CI passes. Still needs automated test coverage but that can be a follow-up. |
this does seem to be benign in practice, so for now only emit it at debug logging level
Currently this is just the C-side implementation; I'll add commits moving the impl of Rust splitting and merging from the crisp repository tomorrow.