Skip to content

BUG: Fix new DTypes and new string promotion when signature is involved#26744

Merged
ngoldbaum merged 5 commits into
numpy:mainfrom
seberg:promote-new-dtypes
Jul 6, 2024
Merged

BUG: Fix new DTypes and new string promotion when signature is involved#26744
ngoldbaum merged 5 commits into
numpy:mainfrom
seberg:promote-new-dtypes

Conversation

@seberg
Copy link
Copy Markdown
Member

@seberg seberg commented Jun 18, 2024

Unfortunately, did a bit of cleanup to make this work, so this needs a bit of a careful look.

I.e. the first commit moves the "allow legacy promotion" logic to later, where the signature is more readily available. That shouldn't change anything else (the only reason it is used earlier is to decide if checking for "should use scalar" makes sense, but that code handles unknown dtypes just fine).

It then fixes the string add promoter to close the issue.

Closes gh-26735

@seberg seberg requested a review from ngoldbaum June 18, 2024 15:14
@seberg seberg added the 09 - Backport-Candidate PRs tagged should be backported label Jun 18, 2024
signature[1] == &PyArray_UnicodeDType &&
signature[2] == &PyArray_UnicodeDType) {
/* Unicode forced, but didn't override a string input: invalid */
return -1;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part makes me wonder if I should just check it after the promoter is done and invalidate the result if this is violated. But it is OK here also.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be better to enforce that there, if only because IMO DType authors shouldn't have to worry about that case or add code to account for it to write a correct DType.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it seems strange one would even get here if the signature is already clear that StringDType should not be involved.

@seberg
Copy link
Copy Markdown
Member Author

seberg commented Jun 18, 2024

I think it would be completely fine to just backport the string promotion changes, they are trivial and fix the issue. The rest just fixes some even more niche things and the bad error.

if (add_promoter(umath, "add", out_strings_promoter_dtypes, 3,
all_strings_promoter) < 0) {
return -1;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

assert_array_equal(arr + op, rresult)

# The promoter should be able to handle things if users pass `dtype=`
res = np.add("hello", string_list, dtype=StringDType)
Copy link
Copy Markdown
Member

@ngoldbaum ngoldbaum Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth using the dtype fixture for this since na_object and coerce doesn't matter, but maybe worth making dtype a parameter of the test that can either by StringDType or StringDType(). I could also see perhaps defining a dtype_lass_or_instance fixture and using that in a few other places in this file where we just test with StringDType() or "T".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't work. Signatures are DType classes. It should work at least for "T", but otherwise need to look into logic to say "this is OK".
I am not sure it should be, but it is a different issue in either case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for explaining. I saw that error before but thought this change made dtype instances OK.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not hat happens much earlier, I explicitly allowed the singleton instancs of legacy dtypes (or maybe all singleton instances, not sure), because otherwise things would be tricky.

But, we have the "give me the DType" now also, which maybe (not sure!) makes T work. It should make T work in either case, though.

Copy link
Copy Markdown
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the refactor for allow_legacy_promotion for a while. I like the simplification of only needing to worry about that in dispatching.c and removing it from ufunc_object.c and I think that it should be the same behavior-wise for all callers, so I think the cleanup is OK and probably safe to backport.

Comment thread numpy/_core/src/umath/stringdtype_ufuncs.cpp Outdated
Comment thread numpy/_core/src/umath/stringdtype_ufuncs.cpp Outdated
Copy link
Copy Markdown
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the refactoring is an improvement regardless! But I do wonder about the changes to the promotor - it doesn't feel right for StringDType to decide on something that will not in the end lead to a StringDType result (but perhaps I am confused...)

* a custom DType registered, and then we should use that.
* Further, `np.float64` is a double subclass, so must reject it.
*/
// TODO,NOTE: This function should be changed to do exact long checks
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue for this? Otherwise, we'll probably find this for numpy 2.13 or so...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a specific one that is milestoned.

Comment thread numpy/_core/src/umath/dispatching.c Outdated
if (op_dtypes[i] != NULL && !NPY_DT_is_legacy(op_dtypes[i]) && (
signature[i] != NULL || // signature cannot be a pyscalar
!(PyArray_FLAGS(ops[i]) & NPY_ARRAY_WAS_PYTHON_LITERAL))) {
allow_legacy_promotion = NPY_FALSE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be better to call this variable "legacy_promotion_is_possible"? (no big deal, obviously, but since it is now an internal variable, might as well make it as clear as possible)

Comment thread numpy/_core/src/umath/ufunc_object.c Outdated
if (allow_legacy_promotion && ((PyArray_NDIM(op1_array) == 0)
!= (PyArray_NDIM(op2_array) == 0))) {
if ((PyArray_NDIM(op1_array) == 0)
!= (PyArray_NDIM(op2_array) == 0)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indentation while you are at it? Both of this line and the actual clause.

op_dtypes[1] != &PyArray_StringDType &&
op_dtypes[2] != &PyArray_StringDType) {
/*
* This promoter was triggered with only unicode arguments, so use
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems confusing - should StringDType really decide the result for something that does not include itself? Shouldn't that be up to UnicodeDType? What happens if we return -1 here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things would just fail the operation, I am very sure I added it for a reason. The problem is that you would have to decide that UU->U is clearly better than UU->T. But there is nothing to decide that by, so the machinery prefers the promoter, since the promoter has the ability to resolve which one to actual use (also go to UU->U).

The other thing you might not like is that it matches at all, but that would need one of two new features (which is fine):

  • Always call promoters right away even if there may be much better matches and give them the ability to say "don't know" (i.e. a -1 return might be that without an error set).
  • Allow some heuristic for it like, "must contain this DType" as an additional dtype.
  • Allow "matching" via a second function.

The other solution for the particular case is that if there wasn't legacy promotion involved, I would like a default promoter that ensures that ufunc(..., dtype=X) will search for ufunc(..., signature=(X, X, X)) as well if there is otherwise no match. That would do the right thing here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was probably good to get this in, but it still feels weird to have a promotor for a given dtype return a result that does not involve that type at all - how can it decide for another type what is acceptable? Two of your solutions sound reasonable: returning the equivalent of NotImplemented (your -1 with no error set), or a default promotor. Maybe worth a new issue?

signature[1] == &PyArray_UnicodeDType &&
signature[2] == &PyArray_UnicodeDType) {
/* Unicode forced, but didn't override a string input: invalid */
return -1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it seems strange one would even get here if the signature is already clear that StringDType should not be involved.

@seberg seberg force-pushed the promote-new-dtypes branch from 3a9559d to da25df6 Compare July 4, 2024 14:31
@seberg seberg requested a review from ngoldbaum July 5, 2024 10:19
@ngoldbaum
Copy link
Copy Markdown
Member

I'd like to see this merged to fix the bug. Let's figure out how to improve promoters in followups.

@ngoldbaum
Copy link
Copy Markdown
Member

Actually @seberg could you rebase and then merge this assuming the tests pass? Looks like doctests are failing because the merge base is a couple weeks old. I'm sure everything is fine I'd just rather not unintentionally break CI.

seberg and others added 5 commits July 6, 2024 11:33
Moving it to later makes it much simpler to consider also the
signature in the decision, which is necessary to get things right.

In practice, it might also be possible to just reject it later,
but this seemed actually simpler.
Also makes it reject forced unicode selection, since that doens't
work.
Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
@seberg seberg force-pushed the promote-new-dtypes branch from da25df6 to 1d1c0c0 Compare July 6, 2024 09:33
@ngoldbaum ngoldbaum merged commit 0bf9c46 into numpy:main Jul 6, 2024
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: new DType in signature not yet supported with np.strings.add

4 participants