-
Notifications
You must be signed in to change notification settings - Fork 714
Replace cabal project parsing with Parsec #8889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@ulysses4ever as announced, here is an early PR of the changes. Still lots of work, but I have a rough outline of the ticket. I would be glad if you would take a look! |
402f607
to
9016a3a
Compare
@grayjay I know you’re more of a solver person, but if you happen to have some time for advising and feel comfortable in this part of the code, your contribution would be priceless. |
@jgotoh there's a bi-weekly cabal devs meeting, where, I am sure, people would be delighted to hear your experience so far. The closest one is this Thursday (Apr 20th), 1 PM Eastern Time (US). The link to a Jitsi video call is posted before the meeting on #hackage at libera.chat (can be browsed using Element/Matrix or an IRC client). Are you interested? |
@ulysses4ever Thank you very much for you invitation! I've already wondered about the cabal devs main communication channel. I am definitely interested and will be glad to attend :) |
@jgotoh cool! let me know if you want me to mail you the jitsi link beforehand. |
I'm not familiar with this part of the code, but I tried to answer a couple questions based on my understanding of the parser behavior. |
c5f7509
to
5fd9791
Compare
@jgotoh Is there anyway I can help you with this? |
@andreabedini many thanks for your offer! I will push some changes this week adding Also ongoing small reviews of the code I push are helpful if you spot anything. |
Thank you for the invite, but unfortunately that would be 3:00 am here 😄 😞 I might just go through the PR in my own time and leave some comment (if there's anything I can comment on). |
96ef42c
to
e671cb1
Compare
9e0127d
to
c75f6e2
Compare
7380431
to
39c4038
Compare
I updated based on your comments @ulysses4ever @geekosaur and @philderbeast . Thanks a lot. |
The new parser replicates the grammar of the legacy parser while providing better error reporting and more maintainable code structure. The fallback strategy ensures smooth transition while the legacy parser is phased out. The flag `--project-file-parser` allows you to select which project file parser to use. * `legacy` - the old parser (will be removed in a future release) * `default` - the default parser (uses `fallback` unless compiled with `-f+legacy-comparison`) * `parsec` - the new parser using Parsec * `fallback` - the new parser using Parsec, but falling back to the old parser if it fails * `compare` - the new parser using Parsec, but comparing the results with the old parser When `cabal-install` is compiled, then the `-f+legacy-comparision` flag can be passed which changes the default parser mode to `compare`. Fixes haskell#6101 haskell#7748 haskell#10611
39c4038
to
ef1a05a
Compare
I hope to get to looking at this closer but release and other things took precedence so far. So don't feel compelled to wait for my review... I hope to get more time for cabal next week though. |
cabal-install/src/Distribution/Client/CmdInstall/ClientInstallFlags.hs
Outdated
Show resolved
Hide resolved
Co-authored-by: Phil de Joux <philderbeast@gmail.com>
- dir-y/a.config:123:1: Invalid subsection "-" | ||
- dir-x/a.config:3:1: Invalid subsection "-" | ||
- dir-x/a.config:2:1: Invalid subsection "-" | ||
- dir-x/a.config:1:1: Invalid subsection "-" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we stick to listing the errors in top to bottom order?
- dir-x/a.config: Unrecognized section '-' on line 1 | ||
- dir-x/a.config: Unrecognized section '-' on line 2 | ||
- dir-x/a.config: Unrecognized section '-' on line 3 | ||
- dir-y/a.config: Unrecognized section '-' on line 123 | ||
- x.config: Unrecognized section '-' on line 1 | ||
- x.config: Unrecognized section '-' on line 2 | ||
- x.config: Unrecognized section '-' on line 3 | ||
- y.config: Unrecognized section '-' on line 123 | ||
- y.config:123:1: Invalid subsection "-" | ||
- x.config:3:1: Invalid subsection "-" | ||
- x.config:2:1: Invalid subsection "-" | ||
- x.config:1:1: Invalid subsection "-" | ||
- dir-y/a.config:123:1: Invalid subsection "-" | ||
- dir-x/a.config:3:1: Invalid subsection "-" | ||
- dir-x/a.config:2:1: Invalid subsection "-" | ||
- dir-x/a.config:1:1: Invalid subsection "-" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say what the sort ordering is now? It looks like it was lexical before.
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two trailing newlines. Are these both necessary?
parse error | ||
Error encountered when parsing cabal file pkg.cabal: | ||
|
||
pkg.cabal:0:0: error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where an expected field is missing, does it make sense to report the location when that field is not expected at location :0:0
?
dir-elif/elif.config:4:6: error: | ||
unexpected SecArgName (Position 4 6) "_" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these lines also be indented?
imported by: cyclical-same-filename-out-out-backback.config | ||
imported by: cyclical-same-filename-out-out-backback.project | ||
Error: [Cabal-7167] | ||
Error encountered when parsing project file same-filename/cyclical-same-filename-out-out-backback.config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry a little that the longer description will lead to more line wrapping, especially for oldies like me that up the font size.
- Error parsing project file ...
+ Error encountered when parsing project file
Error: [Cabal-7167] | ||
Error encountered when parsing project file cyclical-0-self.project: | ||
|
||
cyclical-0-self.project:3:1: error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we reduce the lines required for the error reports? Perhaps lead with the error line and append the error code and problem description the way GHC does it?
[ 37 of 150] Compiling Distribution.Types.DumpBuildInfo ( src/Distribution/Types/DumpBuildInfo.hs, interpreted ) [Source file changed]
src/Distribution/Types/DumpBuildInfo.hs:25:31: error: [GHC-88464]
Variable not in scope: boolean :: Bool
|
25 | boolToDumpBuildInfo bool = if boolean then DumpBuildInfo else NoDumpBuildInfo
| ^^^^^^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would give us:
cyclical-0-self.project:3:1: error: [Cabal-7090]
cyclical import of cyclical-0-self.project;
cyclical-0-self.project
imported by: cyclical-0-self.project
1 | packages: .
2 |
3 | import: cyclical-0-self.project
| ^
Do we even need to say that we found the problem while parsing?
1 | packages: . | ||
2 | | ||
3 | -- This is an error; a trailing `:` is syntax for a field, not a stanza! | ||
4 | source-repository-package: | ||
| ^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need all of lines 1..4? Could we number only the problem line and include the lines above and below, unnumbered. This is what GHC does, isn't it?
@@ -56,7 +59,8 @@ defaultProjectFlags = | |||
{ flagProjectDir = mempty | |||
, flagProjectFile = mempty | |||
, flagIgnoreProject = toFlag False | |||
-- Should we use 'Last' here? | |||
, -- Should we use 'Last' here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @philderbeast, all useful comments, I will get back to this next week. |
Implements #6101, #7748. Finally ready for review!
Please include the following checklist in your PR:
In this PR I implemented #6101 to replace the legacy cabal.project parser (
module Distribution.Client.ProjectConfig.Legacy
) with an implementation based on Parsec.My implementation is based heavily on the existing Parsec parser of .cabal files, see module
Distribution.PackageDescription.Parsec
.My goal was to recreate the exact grammar the Legacy Parser parses using the modern Parsec framework with FieldGrammars etc.
This means the Legacy Parser and Parsec Parser should always return the same ProjectConfig value for a file (at least in this PR).
About CI:
All of the validation checks of Ubuntu are running, unfortunately windows-latest ghc-9.10.1 fails, but windows ghc-9.8.2 succeeds.
The PR consists of several main parts:
Main Entrypoint Distribution.Client.ProjectConfig.readProjectFileSkeleton
The main entrypoint into the Parsec parser is in function
readProjectFileSkeleton
inDistribution.Client.ProjectConfig
and parses a cabal.project file.Note that the legacy parser can be executed still via
readProjectFileSkeletonLegacy
in the same file, I want to remove it but in another ticket because this PR is huge already.My current implementation of
readProjectFileSkeleton
also callsreadProjectFileSkeletonLegacy
to compare the Parsec value to the legacy value, throwing errors if they are not equal.Currently I use it to verify that the parsers' grammars match.
I will delete this functionality when the PR is ready to merge, but for now I will let the functionality stay in the PR so reviewers can try to build other projects to verify the parsers' values to not diverge.
Note that because .project files are parsed by both the Legacy parser and the Parsec parser, all warnings are currently emitted twice - once by each parser.
The function
readProjectFileSkeleton
currently uses an old variant ofreadAndParseFile
fromCabal/src/Distribution/Simple/PackageDescription.hs
(new variant is here) that is not based on SymbolicPaths yet but uses FilePath instead.If we want to migrate it to use SymbolicPaths I need some help here :) Maybe this should be part of a future ticket too.
readProjectFileSkeleton
callsDistribution.Client.ProjectConfig.Parsec.parseProject
which leads me to the next main part:Module Distribution.Client.ProjectConfig.Parsec
This module contains the Parsec parser of ProjectConfigs.
Function
parseProject
is a copy of functionparseProject
of the Legacy parser (Distribution.Client.ProjectConfig.Legacy.parseProject
) but it now calls the Parsec version ofparseProjectSkeleton
.parseProjectSkeleton
parseProjectSkeleton
is a port of functionparseProjectSkeleton
of moduleDistribution.Client.ProjectConfig.Legacy
.It does not use the deprecated type
Field
fromDistribution.Deprecated.ParseUtils
anymore butDistribution.Fields.Field
instead.I tried to change as little as possible here, so we still parse the fields/sections into ProjectConfigs, import and parse other cabal.project files and process the conditional structure.
An interesting bit includes the new processing of imports (Fields with name equaling "import"):
We need to use
liftPR
to be able to composeParseResults
that involveIO
actions (for example downloading imported cabal.project files via HTTP).Composing two actions involves executing a
ParseResult
resulting in aPRState
and executing anotherParseResult
passing in the previousPRState
.Unfortunately the
PRState
that is generated when executing aParseResult
does not contain the file source where the warnings/errors came from.So we need to print any warnings/errors that came up when parsing an imported file before we return the ParseResult, otherwise we lose the source file of the warnings/errors.
I added the implementation of liftPR for Parsec ParseResults to module
Distribution.Fields.ParseResult
because I needed its constructor.Please pay special attention to reviewing this implementation, as it was quite complex to develop.
It works in the tests :)
Another interesting bit is function
fieldsToConfig
inparseProjectSkeleton
. Here we produce ProjectConfig values by parsing the current fields with aFieldGrammar
. Afterwards we parse the sections (such as source-repository-package, ...) with functiongoSections
.FieldGrammar
The
ProjectConfig
FieldGrammar is defined in moduleDistribution.Client.ProjectConfig.FieldGrammar
.It took me some while to reverse engineer all the possible field names and find out their named field equivalents in the
ProjectConfig
record, but it should be complete now.Parsing of sections such as
source-repository-package
is not done in here, see the next point below.There are some fields such as
projectConfigDryRun
,projectConfigOnlyDeps
that afaik can not be specified in a cabal.project file and need to be passed in via command line flags.They are marked by comments in the following form:
-- cli flag: projectConfigDryRun
.I also needed to add some
Parsec
instances that I added to the modules defining the respective type.For example the Parsec instance of
OptimisationLevel
is in moduleDistribution.Simple.Compiler
directly below the type definition.Parsing of Sections
Parsing of sections happens in a
StateT
monad (SectionParser
) modifying theProjectConfig
we got out of the FieldGrammar.For the SectionParser I took a lot of inspiration again from the parser of .cabal files in module
Distribution.PackageDescription.Parsec
, see typeSectionParser
.Currently I need to implement parsing of
repository
sections here as I've missed it until now but I hope it is the last type of section that is missing.Integration Tests in cabal-testsuite/PackageTests/ProjectConfig/Parsec/cabal.test.hs
A suite of integration tests.
Parsing values and comparing them to expectations.
Note that I also run the legacy parser here to make sure my expected values do not differ from the non-Parsec implementation, this will be removed in the future.
Furthermore, I've tested the implementation by successfully building the following Haskell repositories: http2, cabal, tls, hashable, hspec, ghc-lib-parser, texmath, text, lens, megaparsec.
Future PRs
There are also other aspects that I want to address, but I want to do this in future PRs because the current one is too big already:
Cabal-tests/tests/ParserTests.warningTests
to test the correct output of warnings/errors of the Parsec parserDistribution/Client/ProjectConfig/Legacy.hs