BUG: Quantile closest_observation to round to nearest even order#26769
Conversation
4ac0e4f to
4e2accb
Compare
| @@ -0,0 +1,5 @@ | |||
| `quantile` method ``closest_observation`` chooses nearest even order statistic | |||
There was a problem hiding this comment.
| `quantile` method ``closest_observation`` chooses nearest even order statistic | |
| `np.quantile` with method ``closest_observation`` chooses nearest even order statistic |
| @@ -0,0 +1,8 @@ | |||
| from typing import Any | |||
There was a problem hiding this comment.
These changes are not related to the PR right?
| def _weights_are_valid(weights, a, axis): | ||
| """Validate weights array. | ||
|
|
There was a problem hiding this comment.
I suspect these are changes from an autoformatter. Some projects prefer not to change existing code, even when not complying with the formatting conventions to avoid churn. Not sure what the numpy policy is though
|
I had to remove the setting in my text editor to not remove trailing space. All should be fixed now. I am not sure why GH thinks I have changed If I rebase on |
|
Yes, a rebase seems good here. |
Detection of an even order statistic (1-based) must check for an odd index due to use of 0-based indexing. See numpy#26656
b5f3fef to
796b718
Compare
| ``g = (1 + int(index - j > 0)) / 2`` | ||
| 3. ``closest_observation``: ``m = -1/2`` and | ||
| ``1 - int((g == 0) & (j%2 == 0))`` | ||
| ``g = 1 - int((index == j) & (j%2 == 1))`` |
There was a problem hiding this comment.
Maybe I am not reading this correct, but the value m=-1/2 does not make sense to me. Suppose we have n=4 samples and q=0. Then j = (q*n + m - 1) // 1 = (-3/2) // 1 so j=-2. The formula for the quantile (1-g)*y[j] + g*y[j+1] can then not be evaluated because both the indices y[j]=y[-2] and y[j+1]=y[-1] are not valid.
(note: also in the documentation for main I am struggling with this)
There was a problem hiding this comment.
Note that the docs state:
Note that indices j and j + 1 are clipped to the range 0 to n - 1 when the results
of the formula would be outside the allowed range of non-negative indices.
The - 1 in the formulas for j and g accounts for Python’s 0-based indexing.
So with q=0 you can generate a negative index and the function will return the bound.
The value m=-1/2 is from Hyndmann and Fann. The method closest_neighbour applies round-to-even order behaviour when the real-valued index has a fractional value of 0.5. So with m=-1/2 this case is identified as j is exactly integer. The rounding can then be applied appropriately if j is odd or even.
There was a problem hiding this comment.
What about n=4, q=0.3? Then we have j=0.3*4-0.5-1 // 1 = = -1. Both j and j+1 are clipped to zero. But the closest observation is the one corresponding to j=1
There was a problem hiding this comment.
Since q=0.3 does not create a fractional part of 0.5 on the real-valued index then the behaviour is unchanged from the current implementation:
>>> np.quantile([1, 2, 3, 4], 0.3, method='closest_observation')
1
Some quantile interpolation methods do not work very well on tiny arrays. The purpose of the PR is to clarify what the method is doing, not solve the inherent problems with it.
The reference implementation in R is the same:
> quantile(c(1, 2, 3, 4), 0.3, type=3)
30%
1
With q=0.375 there is a difference when rounding the real valued index of 0.375*4=1.5:
>>> np.quantile([1, 2, 3, 4], 0.375, method='closest_observation')
1
>>> np.quantile([1, 2, 3, 4], 0.375001, method='closest_observation')
2
R:
> quantile(c(1, 2, 3, 4), 0.375, type=3)
37.5%
2
Here R rounds up to the even order statistic, numpy rounds down to the odd order statistic (order is 1-based).
|
I wouldn't worry about removing white space here and there, just don't stray too far from the intended fix. |
|
Thanks @aherbert . |
|
Not a big thing, but may have been slightly better to not backport, since the old behavior wasn't really wrong, just different from literature. |
Detection of an even order statistic (1-based) must check for an odd index due to use of 0-based indexing.
See #26656