-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add cut_on_char and rcut_on_char to String/Bytes/StringLabels #14068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
API bloat.
I disagree, it's not because you don't have time that we should add bloat to the stdlib. In fact I have a TODO to upstream the various extremely useful1 substring functions I have in my extended I didn't yet because upstream would likely find naive search distasteful and I have been too lazy to implement something better (that was my candidate) and for my light usages naive string search is entirely "good enough". But perhaps something I'd be willing to procrastinate on for the next release if I feel some good code vibes from upstream. Footnotes
|
Given (facetiously), that Especially as at the moment, if we were to add just a function to cut at a string separator we'd have something of an inconsistency between (incidentally, it does suggest that perhaps the functions should be |
stdlib/stringLabels.mli
Outdated
@@ -207,6 +207,18 @@ val split_on_char : sep:char -> string -> string list | |||
|
|||
@since 4.04 (4.05 in StringLabels) *) | |||
|
|||
val cut : sep:char -> string -> (string * string) option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this would be a good first function to use a labelled tuple in the result 🫣 (@ccasin, @goldfirere, @OlivierNicole - any JS examples, even potentially against doing this?) e.g. key/value, first/second, prefix/suffix, left/right?
If we were to go with a labelled tuple, I commit to sorting out #11792 (i.e. making syncing the docs work for this...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this it's reasonably obvious which element is what: the first element is the first part of the string, the second is the second part. I would maybe reserved high-tech tuple labels for cases where it's easier to be confused about the meaning of each part (in particular bool
values must be labelled or named somehow). But I would be tempted to suggest making them records in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my, new hammers. Please don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think labeled tuples are a fine idea to consider whenever you have a tuple with multiple components of the same type. (Or not the same type, but less so.) That said, given that there is a pretty clear ordering involved in this particular case, I don't personally think labels are called for here.
I'm talking about API and conceptual bloat. Adding more than one function that does essentially the same thing is bloat. Do we want to have to define two naming schemes for each type of separator ? Do we want users to have to choose between two functions that do the same thing ? Do we want users to have to think about which kind of separator they are going to abstract over (and break all their users when they will realize they abstracted over the wrong one) ?
As usual, being consistent with poor previous design choices should not be a priority. In fact I fought hard for |
Maybe there is a reasonable string-search implementation in OCaml somewhere out there (does not need to be the most elaborate algorithm ever), that could be used to propose a string version right now? I think the discussion would be easier if it was between two existing versions, rather that "accepting what has been proposed today vs. asking for unclear amount of work to be done in the future". |
I don't understand why this considered to be the choice here. Stdlib users have worked for the past 29 years without these two functions. Is there is an urge to add these limited interfaces versus doing the right thing? It seems to me that the stdlib is being developped on the ground of doing the right thing, not urgency. And yes I have been annoyed dozens of times that
Naive string search works perfectly well if your needle is short in fact that's what |
I think that's fair. I've renamed the functions to that. As an addition to the list in the PR description, i had forgotten to check the compiler itself:
|
My current thinking:
It is tempting to look online for a string-search implementation in OCaml that is reasonably simple while reasonably fast at the same time, benchmark it against the single-char implementation to make sure we are not imposing an unreasonable cost for this common corner case, and submit it as an alternative to the current proposal. But I haven't had the time to consider doing this recently, and apparently I'm not alone in this predicament. |
String.cut
is a function i recently had to reimplement in every single one of my personal [1] and work projects.I feel like this is especially useful when for example parsing
Unix.environment
and similar things, whereString.split_on_char
is inappropriate at worst and inefficient at best, given that the right side of an environment binding can very well contain multiple=
characters.In terms of naming, looking at sherlocode.com quickly:
opam
hasOpamStd.String.cut
andrcut
: https://github.com/ocaml/opam/blob/8c0e68e7eb527223353cca1926b433c664f729c4/src/core/opamStd.ml#L694patch
hasLib.cut
: https://github.com/hannesm/patch/blob/fe7077c7e5e55721e77e7dbc2af2c044851cef20/src/lib.mli#L4base
hasBase.String.lsplit2
: https://github.com/janestreet/base/blob/01857ea5364018edb77460517872164f501d1091/src/string_intf.ml#L334omod
hasString.cut
andrev_cut
: https://github.com/dbuenzli/omod/blob/51430c07bfe46cd46edb785715c8f49e06dec546/src/omod.ml#L14topkg
hasTopkg_string.cut ?rev
: https://github.com/dbuenzli/topkg/blob/5118b194ad08df97298b206648dd2f521b65d9e3/src/topkg_string.mli#L24flow
used to haveString_utils.split2
: https://github.com/facebook/flow/blob/cdc8cc8dbe3f3813908d95d9ce02476f87669632/src/hack_forked/utils/string/string_utils.mli#L72[1]: https://codeberg.org/kit-ty-kate/ocaml-posixutils/src/commit/49ba0e1b3c6ccd0421047bccaf96a6e92031f647/src/env.ml#L1 and 2 other unpublished projects.