Skip to content

BUG: Quantile closest_observation to round to nearest even order#26769

Merged
charris merged 3 commits into
numpy:mainfrom
aherbert:fix-closest-observation
Jul 14, 2024
Merged

BUG: Quantile closest_observation to round to nearest even order#26769
charris merged 3 commits into
numpy:mainfrom
aherbert:fix-closest-observation

Conversation

@aherbert
Copy link
Copy Markdown
Contributor

Detection of an even order statistic (1-based) must check for an odd index due to use of 0-based indexing.

See #26656

@aherbert aherbert force-pushed the fix-closest-observation branch from 4ac0e4f to 4e2accb Compare June 21, 2024 10:44
@@ -0,0 +1,5 @@
`quantile` method ``closest_observation`` chooses nearest even order statistic
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`quantile` method ``closest_observation`` chooses nearest even order statistic
`np.quantile` with method ``closest_observation`` chooses nearest even order statistic

@@ -0,0 +1,8 @@
from typing import Any
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are not related to the PR right?

Comment thread numpy/lib/_function_base_impl.py Outdated
def _weights_are_valid(weights, a, axis):
"""Validate weights array.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect these are changes from an autoformatter. Some projects prefer not to change existing code, even when not complying with the formatting conventions to avoid churn. Not sure what the numpy policy is though

@aherbert
Copy link
Copy Markdown
Contributor Author

I had to remove the setting in my text editor to not remove trailing space. All should be fixed now.

I am not sure why GH thinks I have changed numpy/typing/tests/data/pass/ma.py and numpy/ma/extras.pyi. The changes are from commit e158ef6. It seems I may have collected a commit from main when I did my checkout that has since been removed. The commit now seems to be dd450d9.

If I rebase on main this correction is rectified. Do you want me to push a rebase?

@eendebakpt
Copy link
Copy Markdown
Contributor

Yes, a rebase seems good here.

aherbert added 3 commits June 24, 2024 23:17
Detection of an even order statistic (1-based) must check for an odd
index due to use of 0-based indexing.

See numpy#26656
@aherbert aherbert force-pushed the fix-closest-observation branch from b5f3fef to 796b718 Compare June 24, 2024 22:17
@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Jun 25, 2024
@aherbert aherbert requested a review from eendebakpt July 4, 2024 14:39
``g = (1 + int(index - j > 0)) / 2``
3. ``closest_observation``: ``m = -1/2`` and
``1 - int((g == 0) & (j%2 == 0))``
``g = 1 - int((index == j) & (j%2 == 1))``
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am not reading this correct, but the value m=-1/2 does not make sense to me. Suppose we have n=4 samples and q=0. Then j = (q*n + m - 1) // 1 = (-3/2) // 1 so j=-2. The formula for the quantile (1-g)*y[j] + g*y[j+1] can then not be evaluated because both the indices y[j]=y[-2] and y[j+1]=y[-1] are not valid.
(note: also in the documentation for main I am struggling with this)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the docs state:

Note that indices j and j + 1 are clipped to the range 0 to n - 1 when the results
of the formula would be outside the allowed range of non-negative indices. 
The - 1 in the formulas for j and g accounts for Python’s 0-based indexing.

So with q=0 you can generate a negative index and the function will return the bound.

The value m=-1/2 is from Hyndmann and Fann. The method closest_neighbour applies round-to-even order behaviour when the real-valued index has a fractional value of 0.5. So with m=-1/2 this case is identified as j is exactly integer. The rounding can then be applied appropriately if j is odd or even.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about n=4, q=0.3? Then we have j=0.3*4-0.5-1 // 1 = = -1. Both j and j+1 are clipped to zero. But the closest observation is the one corresponding to j=1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since q=0.3 does not create a fractional part of 0.5 on the real-valued index then the behaviour is unchanged from the current implementation:

>>> np.quantile([1, 2, 3, 4], 0.3, method='closest_observation')
1

Some quantile interpolation methods do not work very well on tiny arrays. The purpose of the PR is to clarify what the method is doing, not solve the inherent problems with it.

The reference implementation in R is the same:

> quantile(c(1, 2, 3, 4), 0.3, type=3)
30%
  1

With q=0.375 there is a difference when rounding the real valued index of 0.375*4=1.5:

>>> np.quantile([1, 2, 3, 4], 0.375, method='closest_observation')
1
>>> np.quantile([1, 2, 3, 4], 0.375001, method='closest_observation')
2

R:

> quantile(c(1, 2, 3, 4), 0.375, type=3)
37.5%
    2

Here R rounds up to the even order statistic, numpy rounds down to the odd order statistic (order is 1-based).

@charris
Copy link
Copy Markdown
Member

charris commented Jul 14, 2024

I wouldn't worry about removing white space here and there, just don't stray too far from the intended fix.

@charris charris merged commit 2093a6d into numpy:main Jul 14, 2024
@charris
Copy link
Copy Markdown
Member

charris commented Jul 14, 2024

Thanks @aherbert .

@seberg
Copy link
Copy Markdown
Member

seberg commented Jul 30, 2024

Not a big thing, but may have been slightly better to not backport, since the old behavior wasn't really wrong, just different from literature.
(Some downstream project test against us, they need to adapt anyway, but it's nicer to not do that in a bug-fix version.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

4 participants