Skip to content
This repository has been archived by the owner. It is now read-only.

Re-enable AST_COMPARE #17

Merged
merged 4 commits into from Jan 4, 2018
Merged

Re-enable AST_COMPARE #17

merged 4 commits into from Jan 4, 2018

Conversation

@azz
Copy link
Member

azz commented Jan 3, 2018

Printers can now specify custom clean functions for AST massaging.

Fixes #12

@@ -21,5 +21,4 @@ before_install:
- pyenv install -s 3.6.3
- pyenv global 2.7.11 3.6.3
script:
- yarn lint

This comment has been minimized.

@azz

azz Jan 3, 2018 Author Member

FYI @j-f1 The linting is integrated into the Jest run.

"test": "jest"
"lint": "jest -c jest.eslint.config.js",
"test": "jest",
"prettier": "prettier --plugin=. --parser=python"

This comment has been minimized.

@azz

azz Jan 3, 2018 Author Member

Useful for testing. yarn prettier test.py or echo 'foo' | yarn prettier

This comment has been minimized.

@taion

taion Jan 3, 2018 Contributor

shouldn't need the explicit --parser?

This comment has been minimized.

@azz

azz Jan 4, 2018 Author Member

Not always, but --debug-check requires it, so it can't hurt.

"test": "jest"
"lint": "jest -c jest.eslint.config.js",
"test": "jest",
"prettier": "prettier --plugin=. --parser=python"

This comment has been minimized.

@taion

taion Jan 3, 2018 Contributor

shouldn't need the explicit --parser?

const parent = path.getParentNode();
// This is overly cautious, we may want to revisit and normalize more
// cases here.
return /[ex.]/i.test(parent.source) ? parent.source : `${n.n}`;

This comment has been minimized.

@taion

taion Jan 3, 2018 Contributor

there is probably no reason to ever normalize these. consider e.g. PEP 515 section dividers, plus in general you can't represent all python numbers exactly with the JS number type anyway

This comment has been minimized.

@azz

azz Jan 4, 2018 Author Member

Changed it to always return parent.source

This comment has been minimized.

@taion

taion Jan 4, 2018 Contributor

probably just apply this check for Num then, and don't bother with the special-case number handling

e.g. you can't differentiate 1.0j from 1j in the AST anyway, but those should not be reprinted

This comment has been minimized.

@taion

taion Jan 4, 2018 Contributor

i mean in astexport as well

@@ -305,7 +305,7 @@ function genericPrint(path, options, print) {
const parent = path.getParentNode();

This comment has been minimized.

@taion

taion Jan 4, 2018 Contributor

Instead of reaching to the parent here, just drop visit_field_Num_n from astexport, then just print path.source in the handler for Num. As noted above, this is the right thing to do for complex values too.

In general it's probably best not to try to normalize numbers in Python. People do lots of numerical work, and if I'm writing out something like 1.00, it's probably for a good reason – like maybe it's np.array([0.99, 1.00, 1.01]) or something.

This comment has been minimized.

@azz

azz Jan 4, 2018 Author Member

Can't quite do that as complex is not JSON serializable. Having the source type information may be useful too, so I'll just coerce them to str on the Python side (assuming that is safe?).

@taion
taion approved these changes Jan 4, 2018
Copy link
Contributor

taion left a comment

modulo that one comment on visit_field_Num_n but it sort of doesn't matter

@@ -87,18 +87,18 @@ def visit_field_Num_n(self, val):
if isinstance(val, int):

This comment has been minimized.

@taion

taion Jan 4, 2018 Contributor

ah, okay – this should probably just be

def visit_field_Num_n(val):
    return str(val)

this value won't actually be used in the printer (which as with the JS printer will only look at the raw string value), so anything JSON-serializable that will work for the AST comparison will be fine

This comment has been minimized.

@azz

azz Jan 4, 2018 Author Member

You lose a bit of information in doing this, does it hurt to keep around the source type?

This comment has been minimized.

@taion

taion Jan 4, 2018 Contributor

i don't think you do, actually, relative to what's in the AST:

In [1]: str(1)
Out[1]: '1'

In [2]: str(1.)
Out[2]: '1.0'

In [3]: str(0xDEADBEEF)
Out[3]: '3735928559'

In [4]: str(1_000_001)
Out[4]: '1000001'

In [5]: str(5j)
Out[5]: '5j'

In [6]: str(3-4j)
Out[6]: '(3-4j)'

there might be some edge cases on float rounding, but you'd already encounter them with the above

let ppastMassaged;
let pperr = null;
try {
const ppast = parse(
prettyprint(source, path, mergedOptions),
mergedOptions
);
ppastMassaged = massageAST(ppast);
ppastMassaged = massageAST(ppast, normalizedOptions);

This comment has been minimized.

@taion

taion Jan 4, 2018 Contributor

so this is definitely a value-add, but it's not an ideal API. for purposes of comparison, given a python AST, you can call

ast.dump(tree, annotate_fields=False)

this will give you a string representation of the AST – not the most useful thing for what we're formatting, but you can then just do string equality to compare.

e.g. for this python 2 example:

In [15]: print(s)

with A() as a, B() as b:
    pass


In [16]: print(rt)
with A() as a:
    with B() as b:
        pass


In [17]: print(ast.dump(ast.parse(s), annotate_fields=False))
Module([With(Call(Name('A', Load()), [], [], None, None), Name('a', Store()), [With(Call(Name('B', Load()), [], [], None, None), Name('b', Store()), [Pass()])])])

In [18]: print(ast.dump(ast.parse(rt), annotate_fields=False))
Module([With(Call(Name('A', Load()), [], [], None, None), Name('a', Store()), [With(Call(Name('B', Load()), [], [], None, None), Name('b', Store()), [Pass()])])])

(the two dumped AST strings above are identical)

the difference being in this case that we'd be delegating everything to the ground truth AST handler, such that if there were a bug with the AST -> JSON conversion, we would know

the sort of essential problem from a correctness perspective is that we're not really dealing with a canonical python AST here

funny enough you can't actually assert abstract equality between ASTs in python without dumping them – at not least until python/cpython#1368 lands

This comment has been minimized.

@taion

taion Jan 4, 2018 Contributor

tl;dr it would be a better upstream API to have a parseAndMassage hook instead of a massageAST hook. this is because there's a built-in way to canonicalize a python AST to a string that can be used for equality comparisons, but the functionality to convert the python AST to JSON is code under test here

This comment has been minimized.

@taion

taion Jan 4, 2018 Contributor

Actually, astor.strip_tree will clean up Python ASTs such that equality checks on them in Python are possible, but I guess there's not a good way to pretty-print the diff for test failures.

This comment has been minimized.

@azz

azz Jan 4, 2018 Author Member

The nested with case is something we wouldn't change in Prettier for JS. e.g. we would never change:

if (foo) {
  if (bar) {
  }
}

to

if (foo && bar) {
}

There's a reason we massage the AST after we have printed it, as the cleanup often removes properties or nodes that are critical information required to print the code.

This comment has been minimized.

@taion

taion Jan 4, 2018 Contributor

that was an illustrative example, showing how this process produces identical dumped ASTs even when the source is quite different in terms of concrete syntax

as it happens, for with, the python 2 AST doesn't distinguish at all between those cases – hence the heuristic in https://github.com/prettier/prettier-python/pull/15/files#diff-ab7972325670f1753462f9b0e6c57900R138

like literally those two things above are identical from the perspective of the parser in python 2 (though not so in python 3)

@azz azz merged commit a58a6ce into prettier:master Jan 4, 2018
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@azz azz deleted the azz:clean-ast branch Jan 4, 2018
@azz
Copy link
Member Author

azz commented Jan 4, 2018

If you want to use astor.dump_tree instead of/as well as massageAst, feel free to open a PR for that.

@taion
Copy link
Contributor

taion commented Jan 4, 2018

It wouldn't be straightforward to use massageAst, since dump_tree wants a Python AST. I could replace that area of the source comparison entirely. Does that make sense if run_spec gets pulled upstream, though? I guess as an extra check?

@azz
Copy link
Member Author

azz commented Jan 4, 2018

Not sure I'm following, why would it be difficult to send two strings (input and output) to Python to run parse and dump_tree on both, then send back the two trees as strings?

@taion
Copy link
Contributor

taion commented Jan 4, 2018

I mean with massageAstNode – that only gets the AST node and not the original source, no?

@azz
Copy link
Member Author

azz commented Jan 4, 2018

Correct. If we want this to work for both run_spec and --debug-check we'll need to add another hook to use a custom checker rather than the generic massageAst.

@taion
Copy link
Contributor

taion commented Jan 4, 2018

Yes – that's what I meant in prettier/prettier#3610 (comment).

Though it turns out the AST here does include the full original source code (usually many times over), but we should probably remove that behavior: 59dcdbe#r26632592.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.