Skip to content

Stopped abnf regex from halting when it encounters a line terminator in the fragment#100

Merged
sigmavirus24 merged 2 commits into
python-hyper:mainfrom
JohnJamesUtley:fix_fragment_cutoff
May 18, 2023
Merged

Stopped abnf regex from halting when it encounters a line terminator in the fragment#100
sigmavirus24 merged 2 commits into
python-hyper:mainfrom
JohnJamesUtley:fix_fragment_cutoff

Conversation

@JohnJamesUtley
Copy link
Copy Markdown
Contributor

fixes #99

@sigmavirus24
Copy link
Copy Markdown
Collaborator

DOTALL is a sledgehammer. We should use (?s:.*) for the fragment matching if we actually want to allow newlines there. I'm not convinced we should.

@JohnJamesUtley
Copy link
Copy Markdown
Contributor Author

DOTALL is a sledgehammer. We should use (?s:.*) for the fragment matching if we actually want to allow newlines there. I'm not convinced we should.

You are absolutely correct that DOTALL is too much and the inline flag is a much cleaner solution. I've switched the pull request to amend that just now.

The change is necessary because in other similar sections of the URL, such as the query and the path, line termination characters are accepted and are converted into percent-encoded characters when the path and query are returned. It is inconsistent that the fragment section treats these same characters differently.

I would expect line termination characters in the fragment to be percent-encoded like they are in the query and the path.

Additionally, the other Python URL parsers handle line termination characters in the fragment section by converting them to percent-encoded characters in their output. Ending the parsing at line termination characters in the fragment section instead of percent-encoding them is strange behavior in my opinion. Especially since line termination characters elsewhere in the URL do not end the parsing.
(I tested Furl, Hyperlink, Urllib, and Yarl).

@sigmavirus24
Copy link
Copy Markdown
Collaborator

Can you add a test case or two?

@JohnJamesUtley
Copy link
Copy Markdown
Contributor Author

@sigmavirus24 Made the changes you requested 👍

@sigmavirus24 sigmavirus24 merged commit 3c78778 into python-hyper:main May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parsing of fragments through ParseResult.fromString can be cut short by a newline character

3 participants