Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struct module documentation should have more predictable examples/warnings #99146

Open
smontanaro opened this issue Nov 5, 2022 · 7 comments
Labels
docs Documentation in the Doc dir

Comments

@smontanaro
Copy link
Contributor

smontanaro commented Nov 5, 2022

Documentation

The documentation for the struct module isn't explicit about what's expected of the various examples. Working through that in a PR...

@mdickinson
Copy link
Member

mdickinson commented Nov 6, 2022

+1 for the changes. The original text here, and the choice to use big-endian for the examples, dates from over 28 years ago. These days I suspect that it's rather rare that the endianness of the machine you currently happen to be working on is relevant to the data manipulation task at hand. As such, I'd consider it a best practice to always use a struct "sigil" as the first character in your format string, and if we follow that best practice in the docs then it'll help it propagate to struct users.

@smontanaro
Copy link
Contributor Author

smontanaro commented Nov 6, 2022

Thanks @mdickinson. That suggests that I should do a bit more tweaking of both the text and the examples in my PR.

@smontanaro
Copy link
Contributor Author

smontanaro commented Nov 6, 2022

I think some consensus seems to have been reached in this thread. In particular, most examples should be explicit in their layout definitions.

@cameron-simpson's suggestion sums things up nicely:

... an example with
native byte order (no < or >) cannot work “as is” on all
platforms. And further, having it “just work” on the commonest
platform is actively misleading. I am AGAINST that.

I think the “just works” examples should all use < or >.

I think there needs to be at least one “native” example, and it should
be prefaced clearly that this may well not work identically on a
user’s machine because it is machine type (and compiler type) dependent.

And then it should be presented, with commentary.

I’d even advocate presenting the existing hhl example, with
contradicting example outputs from different platforms. So:

keep the existing output, and explain the source platform and its unpadded behaviour
add a current example (yours or any of mine) and explain its padding behaviour

@smontanaro
Copy link
Contributor Author

smontanaro commented Nov 6, 2022

As I'm working through some examples (and trying to update the documentation text), I find myself confused by some of the behavior. Consider these four similar struct.pack() examples (on an Apple M1 processor):

>>> sys.byteorder
'little'
>>> struct.pack('qqh', 1, 2, 3)
b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00'
>>> struct.pack('qqh0q', 1, 2, 3)
b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'
>>> struct.pack('>qqh0q', 1, 2, 3)
b'\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03'
>>> struct.pack('<qqh0q', 1, 2, 3)
b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00'

I would have expected the 0q suffix on the format string to force padding of the output byte string to always be a multiple of eight bytes, but that's not the case:

>>> len(struct.pack('qqh', 1, 2, 3))
18
>>> len(struct.pack('qqh0q', 1, 2, 3))
24
>>> len(struct.pack('<qqh0q', 1, 2, 3))
18
>>> len(struct.pack('>qqh0q', 1, 2, 3))
18

If I don't understand what's going on here, anything I write will be gibberish...

(Edit: update my expectation to be more forceful)

@smontanaro
Copy link
Contributor Author

smontanaro commented Nov 7, 2022

I switched to my Raspberry Pi, which is a little endian 32-bit ARM machine. I get confusing (to me) results there as well.

>>> sys.byteorder
'little'
>>> sys.maxsize
2147483647
>>> struct.pack('llh0l', 1, 2, 3)
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00'
>>> struct.pack('>llh0l', 1, 2, 3)
b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03'
>>> struct.pack('<llh0l', 1, 2, 3)
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00'

If my reading of the documentation is correct:

To align the end of a structure to the alignment requirement of a particular type, end the format with the code for that type with a repeat count of zero.

the '0l' at the end of the format strings should force padding at the end of the byte string to that necessary for long (four bytes on the Raspberry Pi). It seems not to be working. What about after the 'h' but before another 'l'?

>>> struct.pack('<llh0ll', 1, 2, 3, 4)
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x04\x00\x00\x00'
>>> len(struct.pack('<llh0ll', 1, 2, 3, 4))
14

Again, I see nothing to suggest the growing byte string was padded out to a four-byte boundary before the trailing b'\x04\x00\x00\x00' was appended.

In general, the 0<char> format doesn't seem to do what the docs say it should. Looking at the code in Modules/struct.c, nothing jumped out at me that suggested it was handing that case, though I haven't convinced myself I've looked everywhere.

@mdickinson
Copy link
Member

mdickinson commented Nov 7, 2022

So my understanding of what's going on is that for non-native packing and unpacking (e.g., your examples with < and >), alignment simply doesn't come into play at all (which is why it says "None" in the "Alignment" column) in the table here.

E.g., we get a simple non-padded 9-byte output from the following (on any machine, in theory, but here on macOS / 64-bit Intel)

>>> import struct
>>> struct.pack('>lbl', 1, 2, 3)
b'\x00\x00\x00\x01\x02\x00\x00\x00\x03'

For the native examples, it looks to me as though you're getting the padding that you'd expect.

@mdickinson
Copy link
Member

mdickinson commented Nov 7, 2022

Possibly we could reword note 3 to emphasize the contrast with the "non-native size and alignment" comment in note 2? Something like:

3. To pad the end of a structure to the alignment requirement of a particular type when using native size and alignment, end the format with the code for that type with a repeat count of zero. See [Examples](https://docs.python.org/3.11/library/struct.html#struct-examples).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
None yet
Development

No branches or pull requests

2 participants