Skip to content

Refactor struct module#2084

Merged
youknowone merged 2 commits into
RustPython:mainfrom
BenLewis-Seequent:ascii_bytes_like
Oct 10, 2021
Merged

Refactor struct module#2084
youknowone merged 2 commits into
RustPython:mainfrom
BenLewis-Seequent:ascii_bytes_like

Conversation

@BenLewis-Seequent
Copy link
Copy Markdown

This accepts either an string with only ascii characters or any bytes like object.

Comment thread vm/src/byteslike.rs Outdated
))
}
}
Err(obj) => PyBytesLike::try_from_object(vm, obj).map(PyAsciiBytesLike::Buffer),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd probably want to check that the bytes is all ascii as well

Edit: hmm, although CPython doesn't actually care if there's non-ascii bytes in a bytestring, for both binascii and struct

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to check if it's ascii, as all usages error if they encounter any characters that they don't recognise including bytes out of the ascii range. That also means we could get away with not checking the string is ascii as the bytes are the utf-8 encoding of it, so any bytes less than 128 match the ascii characters, and any bytes that are 128 and above are a part of an non-ascii character.

Comment thread vm/src/stdlib/binascii.rs Outdated
Comment on lines -21 to -41
impl TryFromObject for SerializedData {
fn try_from_object(vm: &VirtualMachine, obj: PyObjectRef) -> PyResult<Self> {
match_class!(match obj {
b @ PyBytes => Ok(SerializedData::Bytes(b)),
b @ PyByteArray => Ok(SerializedData::Buffer(b)),
a @ PyString => {
if a.as_str().is_ascii() {
Ok(SerializedData::Ascii(a))
} else {
Err(vm.new_value_error(
"string argument should contain only ASCII characters".to_owned(),
))
}
}
obj => Err(vm.new_type_error(format!(
"argument should be bytes, buffer or ASCII string, not '{}'",
obj.class().name,
))),
})
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe keep this? binascii does allow a bytearray, while struct doesn't:

>>> binascii.a2b_hex(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument should be bytes, buffer or ASCII string, not 'int'
>>> struct.Struct(bytearray())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Struct() argument 1 must be a str or bytes object, not bytearray

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyAsciiBytesLike aligns with what binascii accept more than this e.g. the following is allowed: binascii.a2b_base64(array.array('B', b'1111')). I'm also not sure if struct.Struct(bytearray()) should be disallowed or not. there is a comment about allowing it: https://github.com/python/cpython/blob/67acf74c4eaf64a860cc1bcda6efe6e9cb01f89b/Modules/_struct.c#L1466

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPython source code looks like to saying it is not an intention.

https://github.com/python/cpython/blob/v3.10.0/Modules/_struct.c#L1478

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it does python/cpython#28805

Comment thread vm/src/stdlib/pystruct.rs Outdated
b'n' | b'N' | b'P' => std::mem::size_of::<usize>(),
c => {
panic!("Unsupported format code {:?}", c);
panic!("Unsupported format code {:?}", c as char);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about throw an exception (like ValueError) instead of panic?

@youknowone youknowone force-pushed the ascii_bytes_like branch 2 times, most recently from 422b05f to bf0c712 Compare October 7, 2021 14:28
@youknowone youknowone mentioned this pull request Oct 9, 2021
@youknowone
Copy link
Copy Markdown
Member

_struct seems to require additional work. So I splitted binascii part to #3258
@Skinny121 do you have any advice to this PR?

@youknowone youknowone changed the title Add PyAsciiBytesLike Refactor struct module Oct 10, 2021
@youknowone youknowone merged commit a86769e into RustPython:main Oct 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants