-
Notifications
You must be signed in to change notification settings - Fork 628
Open
Description
I think the regex expression is wrong.
TOKENS_ALPHANUMERIC = '[A-Za-z0-9]+(?=\s+)'
Doesn't this mean that you only consider tokens if they contain only alphanumeric characters and are followed by white space ?
Example:
WORD1,WORD2, WORD3, WORD4 Word5
In the above sentence WORD4 and Word5 would be considered as tokens as the other words have a comma in them and as such are not valid tokens.
Metadata
Metadata
Assignees
Labels
No labels