Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-34979: fix "SyntaxError: Non-UTF-8 code start with \xe8..." caused by function decoding_fgets #9923

Closed
wants to merge 2 commits into from

Conversation

ausaki
Copy link

@ausaki ausaki commented Oct 17, 2018

Please see issue-34979 for the details.

Maybe all versions greater than Python3 are affected.

How to reproduce this issue

  1. save the following source code into a file with utf8.
# demo.py
s = '测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试'
  1. run it.
$ python3 -V
Python 3.6.4

$ python3 demo.py
  File "demo.py", line 2
SyntaxError: Non-UTF-8 code starting with '\xe8' in file demo.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

How did it happen?

function deocding_fgets read one line of raw bytes into a buffer, this buffer's size is platform independent, for example, it equals 1024 on macOS.

if the line is too long(like greater than 1023 bytes), maybe the line will be cut in the middle of a multibyte UTF-8 character, then cause function valid_utf8 failed.

How to fix this issue?

  1. There is no need to check whether the encoding of the line is utf-8 or not.

  2. If we can't find the coding spec at the top of the source file , then try to set default encoding to utf-8 and always use function fp_readl to read a line.

https://bugs.python.org/issue34979

@iritkatriel
Copy link
Member

iritkatriel commented May 29, 2021

Closing the PR following the closure of its b.p.o issue (https://bugs.python.org/issue34979)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants