|
msg271702 - (view) |
Author: Björn Lindqvist (Björn.Lindqvist) |
Date: 2016-07-30 19:57 |
This affects both Python 2 and 3. This is as expected:
>>> urlparse('abc:123.html')
ParseResult(scheme='abc', netloc='', path='123.html', params='', query='', fragment='')
>>> urlparse('123.html:abc')
ParseResult(scheme='123.html', netloc='', path='abc', params='', query='', fragment='')
>>> urlparse('abc:123/')
ParseResult(scheme='abc', netloc='', path='123/', params='', query='', fragment='')
This is NOT:
>>> urlparse('abc:123')
ParseResult(scheme='', netloc='', path='abc:123', params='', query='', fragment='')
Expected is path='123' and scheme='abc'. At least according to my reading of the rfc (https://tools.ietf.org/html/rfc1808.html) that is what should happen.
|
|
msg271703 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2016-07-30 21:12 |
See issue 14072. It may be time to look at this again, but we may still be constrained by backward compatibility.
|
|
msg271719 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-07-31 02:37 |
The main backward compatibility consideration would be Issue 754016, but don’t agree with the changes made, and would support reverting them. The original bug reporter wanted urlparse("1.2.3.4:80", "http") to be treated as the URL http://1.2.3.4:80, but the IP address was being parsed as a scheme, so the default “http” scheme was ignored.
The original fix (r83701) affected any URL that had a digit 0–9 immediately after the “scheme:” prefix. In such URLs, the scheme component was no longer parsed. A test case for “path:80” was added, and a demonstration of not parsing any scheme from www.cwi.nl:80/%7Eguido/Python.html was added in the documentation.
Later, the logic was altered to test if the URL looked like an integer (revision 495d12196487, Issue 11467). This restored proper parsing of clsid:85bbd92o-42a0-1o69-a2e4-08002b30309d and mailto:1337@example.org, although another URL given, javascript:123, remains misparsed. The documentation was subsequently adjusted in Issue 16932 to just demonstrate www.cwi.nl/%7Eguido/Python.html being parsed as a path.
The logic was watered down to its current form by revision 9f6b7576c08c, Issue 14072. Now it tests for a non-digit anywhere after the scheme, so that tel:+31641044153 is again parsed properly. But it was pointed out that tel:1234 remains misparsed.
What’s the next step in the watering-down process? All the attempts so far break valid URLs in favour of special-casing inputs that are not valid URLs.
|
|
msg271738 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2016-07-31 14:02 |
I hate to say it, but this may require a python-dev discussion. We probably ought to be parsing valid urls correctly as our top priority, but if that breaks our parsing of "reasonable" non-valid URLs (that existing code is depending on), it's going to be a backward compatibility problem.
|
|
msg271739 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2016-07-31 14:04 |
On second thought, what are the chances that special casing something that looks like an IP address in the scheme position would maintain backward compatibility?
|
|
msg271823 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-08-02 13:55 |
Depends on how you define “looks like an IP address”. Does the www.cwi.nl:80 case look like an IP address? What about “path:80” or “localhost:80”? If there is any code relying on the bug, it may just as easily involve host name as a numeric IP address.
|
|
msg271824 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2016-08-02 14:07 |
Ah, good point, I misread the scope of the problem.
|
|
msg289557 - (view) |
Author: Tim Graham (Tim.Graham) * |
Date: 2017-03-14 01:34 |
Based on discussion in issue 16932, I agree that reverting the parsing decisions from issue 754016 (as Martin suggested in msg271719) seems appropriate. I created a pull request that does that.
|
|
msg354889 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2019-10-18 13:07 |
New changeset 5a88d50ff013a64fbdb25b877c87644a9034c969 by Senthil Kumaran (Tim Graham) in branch 'master':
bpo-27657: Fix urlparse() with numeric paths (#661)
https://github.com/python/cpython/commit/5a88d50ff013a64fbdb25b877c87644a9034c969
|
|
msg354894 - (view) |
Author: miss-islington (miss-islington) |
Date: 2019-10-18 13:24 |
New changeset 82b5f6b16e051f8a2ac6e87ba86b082fa1c4a77f by Miss Islington (bot) in branch '3.7':
bpo-27657: Fix urlparse() with numeric paths (GH-661)
https://github.com/python/cpython/commit/82b5f6b16e051f8a2ac6e87ba86b082fa1c4a77f
|
|
msg354903 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2019-10-18 15:23 |
New changeset 0f3187c1ce3b3ace60f6c1691dfa3d4e744f0384 by Senthil Kumaran in branch '3.8':
[3.8] bpo-27657: Fix urlparse() with numeric paths (GH-661) (#16839)
https://github.com/python/cpython/commit/0f3187c1ce3b3ace60f6c1691dfa3d4e744f0384
|
|
msg355320 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-10-24 10:31 |
This issue got fixes, so I close it.
|
|
msg359273 - (view) |
Author: James Brown (roguelazer) |
Date: 2020-01-04 02:37 |
This is a surprising change to put in a minor release. This change totally changes the semantics of parsing scheme-less URLs with ports in them and ended up breaking a significant amount of my software. It turns out that urls like `example.com:80` are more common than one might hope, and a lot of software has always assumed that `example.com:80` would get parsed as the netloc and the software can guess the scheme based on the port...
|
|
msg359277 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2020-01-04 05:26 |
@James - Originally the issue was considered a revert and the versions were set for the merge, but I certainly recognize the problem when parsing can fail for simple URLs like `localhost:8000` which is very common.
Another developer had raised the concerns with the change in this PR: https://github.com/python/cpython/pull/16839#issuecomment-570758153
I am reopening this issue, and re-read the arguments again to understand and propose the next steps.
|
|
msg360196 - (view) |
Author: Chris Dent (Chris Dent) |
Date: 2020-01-17 15:21 |
Just to add to the list of places this is causing a regression. This has broken the target host determination routines in gabbi: https://github.com/cdent/gabbi/issues/277
While the original fix may have been strictly correct in some ways, it results in a terrible UX, and as several others have noted violated backwards compatibility.
|
|
msg361815 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2020-02-11 13:20 |
Hi Lukaz / Ned:
I will like to revert the backports done in 3.8 and 3.7.
Preferably in 3.8.2 and 3.7.7, so that this undesirable behavior exists only for a single release.
I have set this is a release blocker to catch your attention.
|
|
msg362103 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2020-02-16 21:07 |
New changeset 505b6015a1579fc50d9697e4a285ecc64976397a by Senthil Kumaran in branch '3.7':
Revert "bpo-27657: Fix urlparse() with numeric paths (GH-661)" (#18526)
https://github.com/python/cpython/commit/505b6015a1579fc50d9697e4a285ecc64976397a
|
|
msg362107 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2020-02-16 21:47 |
New changeset ea316fd21527dec53e704a5b04833ac462ce3863 by Senthil Kumaran in branch '3.8':
Revert "[3.8] bpo-27657: Fix urlparse() with numeric paths (GH-16839)" (GH-18525)
https://github.com/python/cpython/commit/ea316fd21527dec53e704a5b04833ac462ce3863
|
|
| Date |
User |
Action |
Args |
| 2020-02-16 21:47:25 | orsenthil | set | messages:
+ msg362107 |
| 2020-02-16 21:07:29 | orsenthil | set | messages:
+ msg362103 |
| 2020-02-16 18:19:45 | orsenthil | set | pull_requests:
+ pull_request17903 |
| 2020-02-16 18:17:09 | orsenthil | set | keywords:
+ patch stage: commit review -> patch review pull_requests:
+ pull_request17902 |
| 2020-02-11 13:20:49 | orsenthil | set | priority: deferred blocker -> release blocker nosy:
+ lukasz.langa, benjamin.peterson, ned.deily messages:
+ msg361815
|
| 2020-01-17 16:14:32 | vstinner | set | nosy:
- vstinner
|
| 2020-01-17 15:21:32 | Chris Dent | set | nosy:
+ Chris Dent messages:
+ msg360196
|
| 2020-01-04 17:49:08 | ned.deily | set | keywords:
+ 3.7regression, 3.8regression, - patch priority: normal -> deferred blocker |
| 2020-01-04 05:26:14 | orsenthil | set | status: closed -> open messages:
+ msg359277
assignee: orsenthil resolution: fixed -> stage: resolved -> commit review |
| 2020-01-04 02:37:16 | roguelazer | set | nosy:
+ roguelazer messages:
+ msg359273
|
| 2019-10-24 10:31:31 | vstinner | set | status: open -> closed
nosy:
+ vstinner messages:
+ msg355320
resolution: fixed stage: patch review -> resolved |
| 2019-10-18 15:23:21 | orsenthil | set | messages:
+ msg354903 |
| 2019-10-18 13:51:49 | orsenthil | set | pull_requests:
+ pull_request16388 |
| 2019-10-18 13:24:31 | miss-islington | set | nosy:
+ miss-islington messages:
+ msg354894
|
| 2019-10-18 13:07:37 | miss-islington | set | keywords:
+ patch pull_requests:
+ pull_request16382 |
| 2019-10-18 13:07:36 | orsenthil | set | messages:
+ msg354889 |
| 2018-03-15 18:57:46 | cheryl.sabella | set | stage: patch review versions:
+ Python 3.7, Python 3.8, - Python 3.5, Python 3.6 |
| 2017-03-14 01:34:28 | Tim.Graham | set | nosy:
+ Tim.Graham messages:
+ msg289557
|
| 2017-03-13 17:39:32 | Tim.Graham | set | pull_requests:
+ pull_request543 |
| 2016-08-02 14:07:03 | r.david.murray | set | messages:
+ msg271824 |
| 2016-08-02 13:55:05 | martin.panter | set | messages:
+ msg271823 |
| 2016-07-31 14:04:36 | r.david.murray | set | messages:
+ msg271739 |
| 2016-07-31 14:02:56 | r.david.murray | set | messages:
+ msg271738 |
| 2016-07-31 02:37:12 | martin.panter | set | nosy:
+ martin.panter, orsenthil
messages:
+ msg271719 versions:
+ Python 2.7, Python 3.5, Python 3.6 |
| 2016-07-30 23:52:19 | martin.panter | link | issue22891 dependencies |
| 2016-07-30 21:12:06 | r.david.murray | set | nosy:
+ r.david.murray messages:
+ msg271703
|
| 2016-07-30 19:57:17 | Björn.Lindqvist | create | |