BUG: Fix new DTypes and new string promotion when signature is involved#26744
Conversation
| signature[1] == &PyArray_UnicodeDType && | ||
| signature[2] == &PyArray_UnicodeDType) { | ||
| /* Unicode forced, but didn't override a string input: invalid */ | ||
| return -1; |
There was a problem hiding this comment.
This part makes me wonder if I should just check it after the promoter is done and invalidate the result if this is violated. But it is OK here also.
There was a problem hiding this comment.
I agree it would be better to enforce that there, if only because IMO DType authors shouldn't have to worry about that case or add code to account for it to write a correct DType.
There was a problem hiding this comment.
Indeed, it seems strange one would even get here if the signature is already clear that StringDType should not be involved.
|
I think it would be completely fine to just backport the string promotion changes, they are trivial and fix the issue. The rest just fixes some even more niche things and the bad error. |
| if (add_promoter(umath, "add", out_strings_promoter_dtypes, 3, | ||
| all_strings_promoter) < 0) { | ||
| return -1; | ||
| } |
| assert_array_equal(arr + op, rresult) | ||
|
|
||
| # The promoter should be able to handle things if users pass `dtype=` | ||
| res = np.add("hello", string_list, dtype=StringDType) |
There was a problem hiding this comment.
Probably not worth using the dtype fixture for this since na_object and coerce doesn't matter, but maybe worth making dtype a parameter of the test that can either by StringDType or StringDType(). I could also see perhaps defining a dtype_lass_or_instance fixture and using that in a few other places in this file where we just test with StringDType() or "T".
There was a problem hiding this comment.
That doesn't work. Signatures are DType classes. It should work at least for "T", but otherwise need to look into logic to say "this is OK".
I am not sure it should be, but it is a different issue in either case.
There was a problem hiding this comment.
Ah thanks for explaining. I saw that error before but thought this change made dtype instances OK.
There was a problem hiding this comment.
Yeah, not hat happens much earlier, I explicitly allowed the singleton instancs of legacy dtypes (or maybe all singleton instances, not sure), because otherwise things would be tricky.
But, we have the "give me the DType" now also, which maybe (not sure!) makes T work. It should make T work in either case, though.
ngoldbaum
left a comment
There was a problem hiding this comment.
I looked at the refactor for allow_legacy_promotion for a while. I like the simplification of only needing to worry about that in dispatching.c and removing it from ufunc_object.c and I think that it should be the same behavior-wise for all callers, so I think the cleanup is OK and probably safe to backport.
mhvk
left a comment
There was a problem hiding this comment.
I agree the refactoring is an improvement regardless! But I do wonder about the changes to the promotor - it doesn't feel right for StringDType to decide on something that will not in the end lead to a StringDType result (but perhaps I am confused...)
| * a custom DType registered, and then we should use that. | ||
| * Further, `np.float64` is a double subclass, so must reject it. | ||
| */ | ||
| // TODO,NOTE: This function should be changed to do exact long checks |
There was a problem hiding this comment.
Is there an issue for this? Otherwise, we'll probably find this for numpy 2.13 or so...
There was a problem hiding this comment.
Created a specific one that is milestoned.
| if (op_dtypes[i] != NULL && !NPY_DT_is_legacy(op_dtypes[i]) && ( | ||
| signature[i] != NULL || // signature cannot be a pyscalar | ||
| !(PyArray_FLAGS(ops[i]) & NPY_ARRAY_WAS_PYTHON_LITERAL))) { | ||
| allow_legacy_promotion = NPY_FALSE; |
There was a problem hiding this comment.
Might it be better to call this variable "legacy_promotion_is_possible"? (no big deal, obviously, but since it is now an internal variable, might as well make it as clear as possible)
| if (allow_legacy_promotion && ((PyArray_NDIM(op1_array) == 0) | ||
| != (PyArray_NDIM(op2_array) == 0))) { | ||
| if ((PyArray_NDIM(op1_array) == 0) | ||
| != (PyArray_NDIM(op2_array) == 0)) { |
There was a problem hiding this comment.
fix indentation while you are at it? Both of this line and the actual clause.
| op_dtypes[1] != &PyArray_StringDType && | ||
| op_dtypes[2] != &PyArray_StringDType) { | ||
| /* | ||
| * This promoter was triggered with only unicode arguments, so use |
There was a problem hiding this comment.
This seems confusing - should StringDType really decide the result for something that does not include itself? Shouldn't that be up to UnicodeDType? What happens if we return -1 here?
There was a problem hiding this comment.
Things would just fail the operation, I am very sure I added it for a reason. The problem is that you would have to decide that UU->U is clearly better than UU->T. But there is nothing to decide that by, so the machinery prefers the promoter, since the promoter has the ability to resolve which one to actual use (also go to UU->U).
The other thing you might not like is that it matches at all, but that would need one of two new features (which is fine):
- Always call promoters right away even if there may be much better matches and give them the ability to say "don't know" (i.e. a -1 return might be that without an error set).
- Allow some heuristic for it like, "must contain this DType" as an additional dtype.
- Allow "matching" via a second function.
The other solution for the particular case is that if there wasn't legacy promotion involved, I would like a default promoter that ensures that ufunc(..., dtype=X) will search for ufunc(..., signature=(X, X, X)) as well if there is otherwise no match. That would do the right thing here.
There was a problem hiding this comment.
It was probably good to get this in, but it still feels weird to have a promotor for a given dtype return a result that does not involve that type at all - how can it decide for another type what is acceptable? Two of your solutions sound reasonable: returning the equivalent of NotImplemented (your -1 with no error set), or a default promotor. Maybe worth a new issue?
| signature[1] == &PyArray_UnicodeDType && | ||
| signature[2] == &PyArray_UnicodeDType) { | ||
| /* Unicode forced, but didn't override a string input: invalid */ | ||
| return -1; |
There was a problem hiding this comment.
Indeed, it seems strange one would even get here if the signature is already clear that StringDType should not be involved.
3a9559d to
da25df6
Compare
|
I'd like to see this merged to fix the bug. Let's figure out how to improve promoters in followups. |
|
Actually @seberg could you rebase and then merge this assuming the tests pass? Looks like doctests are failing because the merge base is a couple weeks old. I'm sure everything is fine I'd just rather not unintentionally break CI. |
Moving it to later makes it much simpler to consider also the signature in the decision, which is necessary to get things right. In practice, it might also be possible to just reject it later, but this seemed actually simpler.
Also makes it reject forced unicode selection, since that doens't work.
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
da25df6 to
1d1c0c0
Compare
Unfortunately, did a bit of cleanup to make this work, so this needs a bit of a careful look.
I.e. the first commit moves the "allow legacy promotion" logic to later, where the signature is more readily available. That shouldn't change anything else (the only reason it is used earlier is to decide if checking for "should use scalar" makes sense, but that code handles unknown dtypes just fine).
It then fixes the string add promoter to close the issue.
Closes gh-26735