-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
libutil: Use Boost.URL for URI parsing #13445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot!
But going to leave for others to review a bit just case there is some issues e.g. with how much Boost or compat I didn't think of
ab42ce4
to
77750fd
Compare
7140696
to
e76b2d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed some issue in to_string(), let me know what you think.
These cases do not seem to be covered by the test suite at all.
This matcher is useful for checking error messages, which always contain ANSI escapes.
The myriad of hand-rolled URL parsing and validation code is a constant source of problems. Regexes are not a great way of writing parsers and there's a history of getting them wrong. Boost.URL is a good library we can outsource most of the heavy lifting to.
The default comparison operator can be generated by the compiler since C++20.
Boost.URL is a significantly more RFC-compliant parser than what libutil currently has a bundle of incomprehensible regexes. One aspect of this change is that RFC4007 ZoneId IPv6 literals are represented in URIs according to RFC6874 [1]. Previously they were represented naively like so: [fe80::818c:da4d:8975:415c\%enp0s25]. This is not entirely correct, because the percent itself has to be pct-encoded: > "%" is always treated as an escape character in a URI, so, according to the established URI syntax [RFC3986] any occurrences of literal "%" symbols in a URI MUST be percent-encoded and represented in the form "%25". Thus, the scoped address fe80::a%en1 would appear in a URI as http://[fe80::a%25en1]. [1]: https://datatracker.ietf.org/doc/html/rfc6874 Co-authored-by: Jörg Thalheim <joerg@thalheim.io>
7d12190
to
a54284c
Compare
Ran the whole |
One concern I have is the flake regression suite only tests public flakes, and I suspect some flakes out there will no longer work correctly with new parsing semantics.
…On Sat, Jul 19, 2025, at 4:52 PM, Sergei Zimmerman wrote:
*xokdvium* left a comment (NixOS/nix#13445) <#13445 (comment)>
Ran the whole `flake-regressions` suite and `hydraJobs.tests`. Best I can tell this doesn't regress anything. What's great is that this also fixed a long-standing issue #10898 <#10898>.
—
Reply to this email directly, view it on GitHub <#13445 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAASXLH67Q2DEPSFXGBVUSL3JKVYVAVCNFSM6AAAAACBHRJBRKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAOJSGU3DKNJWG4>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
I did see that Determinate runs GHA CI with a private flake-regressions-data repository.
Any particular issues you can think of? Did some of the prior non-compliance get ossified into lock files? |
Motivation
Boost.URL is a significantly more RFC-compliant parser
than what libutil currently has a bundle of incomprehensible
regexes.
One aspect of this change is that RFC4007 ZoneId IPv6 literals
are represented in URIs according to RFC6874 1.
Previously they were represented naively like so:
[fe80::818c:da4d:8975:415c\%enp0s25]
.This is not entirely correct, because the percent itself has to be pct-encoded:
Context
Starting to pay off the tech debt #9603.
Fixes #10898.
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.