the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
climQMBC: A package with multiple bias correction methods of GCM climatic variables at daily, monthly and annual scale, developed in Python, R and MATLAB
Abstract. Climate change projections are studied using General Circulation Models (GCMs). GCMs are models that simulate climate on a broad scale, hence they cannot be directly used in local impact studies, such as, for example, hydrological studies. GCMs must go through a process of downscaling, to adjust their results in terms of spatial scale and reduce their bias before being used at the local scale. Quantile Mapping is one of the most widely used approaches for bias correcting GCM climate outputs. However, in its conventional formulation QM assumes a time-invariant correction function, which potentially results in additional biases. This has motivated the development of trend-preserving variations, accounting for a non-stationary correction function and aiming to preserve the raw GCM signal. Unfortunately, choosing which variation to use is not straight-forward. We present the climQMBC package (https://github.com/saedoquililongo/climQMBC or https://doi.org/10.5281/zenodo.18392900) as an easy-to-use tool to compare quantile mapping approaches. climQMBC is available in Python, R and MATLAB, and contains the classic QM method and four trend-preserving variations: Detrended Quantile Mapping (DQM), Quantile Delta Mapping (QDM), Unbiased Quantile Mapping (UQM) and Scaled Distribution Mapping (SDM). This package has a built-in summary report that allows comparing methods in terms of their capability of preserving raw GCM trends. A synthetic exercise showed that the most reliable methods are the UQM and DQM.
- Preprint
(2147 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 03 Jun 2026)
- RC1: 'Comment on egusphere-2026-890', Anonymous Referee #1, 24 Apr 2026 reply
-
RC2: 'Comment on egusphere-2026-890', Anonymous Referee #2, 30 May 2026
reply
The manuscript introduces climQMBC, a coding package implemented in Python, R, and MATLAB that bundles five univariate quantile mapping bias correction methods (QM, DQM, QDM, UQM, SDM) for point-based daily, monthly, and annual climate series. The authors emphasize three contributions: a unified mathematical formulation of the methods, identical functionality across three languages, and a built-in summary report function that compares methods by their ability to preserve raw GCM trends in the mean and standard deviation. Performance is assessed using a synthetic exercise in which gamma-distributed precipitation series are perturbed by controlled biases in the mean and standard deviation across a range of coefficients of variation and projected-change magnitudes. The authors conclude that UQM and DQM are the most reliable methods overall, while acknowledging that QM remains useful for its simplicity.
Major Comments
There is no real-world demonstration of bias correction. For a paper that motivates its tool by appealing to hydrology, wildfire, and heatwave applications, the complete absence of any application to actual GCM/RCM output paired with station or gridded observations is striking. Figure 4 shows a code snippet plotting a single monthly near-surface air temperature ('tas') series drawn from the package's bundled sample data; Figures 5 and 6 illustrate the report function on a precipitation example whose source, region, and driving model are not described. Neither constitutes a real-world application in the sense of documented climate-model output paired with observations for a stated location. Without a real case the reader cannot assess whether the package handles realistic complications such as elevation-dependent biases, monsoonal bimodality, drizzle-dominant tropical regimes, or arid regions where the random-near-zero replacement may dominate the corrected series. This omission also makes it impossible to evaluate computational scalability, which matters because the moving-window strategy multiplies the cost of each method substantially relative to the original single-window formulations.
NSE and KGE are inappropriate as primary metrics for distribution-based bias correction. NSE and KGE measure point-by-point temporal agreement. Quantile mapping methods, by construction, do not aim at point-by-point reconstruction of the "true" observed projected series; they aim at distributional consistency and trend preservation. The bias-corrected series is sequenced according to the modeled time series, so a high NSE only arises when the model already has the correct temporal sequencing of values, which has nothing to do with the bias correction itself. The authors should justify why these metrics are appropriate at all, or replace them with metrics that directly probe distributional similarity (Wasserstein distance, integrated quantile difference, tail-weighted scoring rules). As presented, the cumulative NSE/KGE curves in Figure 9 conflate distributional skill with the trivial fact that, in the synthetic design, the modeled and observed series share the same underlying random draws.
The novelty claim is thin for a methods paper at GMD. The five implemented methods are all from prior literature. The unified notation and the report function are useful but incremental. The strongest selling point, multi-language availability with identical functionality, is a software-engineering contribution rather than a scientific one. Major competing packages are dismissed in a single paragraph without quantitative comparison: xclim and ibicus are not benchmarked, and MBC is dismissed for lacking trend-preserving methods without acknowledging that it contributes the entire multivariate dimension that climQMBC lacks. A side-by-side benchmark on a common test case, demonstrating either superior accuracy, speed, or usability, would substantiate the contribution claim. Without it, the paper reads as documentation rather than as a research article.
The package is strictly univariate and point-based, which limits its relevance for modern impact assessment. Compound events (for example hot-dry conditions, wind-fire coupling, and precipitation-temperature co-variation in snowpack modelling) are increasingly recognized as the dominant climate-risk pathway, and they require methods that preserve inter-variable and inter-site dependence. The manuscript acknowledges this gap only obliquely. Users who adopt climQMBC and apply it independently to coupled variables risk introducing spurious decoupling, particularly in snow-rain transitions or fire-weather indices where multivariate consistency matters. This limitation deserves explicit discussion and a warning to users, not a passing remark.
Methodological choices in distribution selection and zero-precipitation handling are under-justified. Fitting candidate distributions by the method of moments and then selecting among them by the Kolmogorov-Smirnov statistic is internally inconsistent. KS is insensitive to tail behaviour, which is precisely where the extremes most relevant to impact studies live. There is also no discussion of fit-quality reporting to the user, so a user could obtain a heavily biased correction without diagnostic information. The random-near-zero replacement procedure, while pragmatic, introduces stochasticity that breaks reproducibility and whose sensitivity is admitted but never quantified. For arid regions, where the fraction of zero-precipitation days can exceed 90 percent, this procedure may govern the correction more than the actual distribution fitting. The moving-window design is not stress-tested. The authors recommend windows of 20 to 40 years following Chadwick et al. (2023), but no sensitivity analysis is presented within this paper to validate that recommendation under the synthetic conditions explored. For long projections (for example to 2100), the window for the early decades of the projected period still contains historical data, which dampens any abrupt nonstationarity. The synthetic experiment, which uses stationary gamma distributions in the projected period, cannot detect this issue at all.
The conclusions overgeneralize. Statements such as "UQM and DQM are the more stable ones" and "there is always a trend-preserving method that is more stable than the QM method" are presented as general findings, but they are conditional on the specific synthetic design. Users in a region where the GCM bias is concentrated in extreme quantiles rather than in the mean may find QDM or SDM more appropriate, and nothing in the paper would warn them otherwise. The recommendation should be reframed as "in our synthetic exercise targeting moment biases, methods that target moments perform best," which is closer to a tautology than to a usable guideline.Minor Comments
Line 38: check the citation, like the "(Andres et al., 2014)" is inconsistent with the reference-list year (Line 520, dated 2013), and some name misspelled, such as Hageman and Hagemann, in more than one place.
Line 74: the statement "there used to be a research lab with a package available in two programming languages, but one of these is no longer available" is vague and unverifiable. Please specify the lab and the package, or remove this argument from the motivation.
Line 92: the report function is restricted to the monthly temporal scale, yet the package supports daily, monthly, and annual data. Please clarify why daily and annual scales are not covered by the report function, and whether this is a planned extension.
Line 157: "by default chooses the distribution considering the one with the lowest error in the Kolmogorov-Smirnov test" reads awkwardly; rephrase, for example, as "by default selects the distribution with the lowest Kolmogorov-Smirnov test statistic". The authors should also acknowledge that the KS test is insensitive to tail behaviour, which is the regime most relevant for climate-extreme assessment.
Line 167: "1 mm as default" lacks a time unit, whereas the SDM description (around Line 242) specifies "< 1 mm/day". Please make the units explicit and consistent at first mention, and clarify how this threshold is applied at coarser (monthly, annual) temporal resolutions.
Equation (13): the subscript "RD_mh" is missing the comma separator and is inconsistent with "RD_{m,p}" used in the same equation. Please make subscript notation consistent throughout.
Equations (20)–(21): these are algebraically equivalent to the more transparent expression x^b = μ^o(1 + Δμ_bias) + (x^o − μ^o)(1 + Δσ_bias), which makes explicit that the perturbation scales the mean by (1 + Δμ_bias) and the standard deviation by (1 + Δσ_bias). Rewriting in this form would improve readability and would make the connection to Major Comment 1 transparent.
Figure 4 (move to the Supplement). The worked example is presented as a screenshot of source code (panel a) and a single illustrative time series (panel b). Embedding code as an image is poor practice: it is not selectable, searchable, accessible, or verifiable. I recommend moving the worked example to the Supplement, and, if any code is retained in the manuscript, typesetting it as a formatted code listing rather than a screenshot, with the plot rendered as a separate figure. Panel (b) is visually cluttered, with the QM, DQM, and observed lines overlapping heavily, so a shorter representative window or subplots would aid comparison.
Figure 5: the "Future" header in the second column is not defined in the caption (it is explained only in the body text, around p. 15). Please define in the caption whether this column represents the ratio for the entire future period or for a specific centered period.
Figure 6(a), right panel: the time-series figure is essentially unreadable because all five corrected series and the modeled series are overplotted as vertical bars in saturated colours. Please redesign using thin lines with transparency, or display only a representative subset.In panels (b) and (c), the legends obscure the curves to varying degrees.Line 437: "the closer to the right the curve is, the better" is technically correct for the chosen cumulative-probability convention but may confuse readers; suggest rephrasing as "curves whose mass is concentrated near NSE = 1 (that is, shifted toward the right edge of the plot) indicate better performance".
Figure 7: the in-cell percentages are described as "the percentage of synthetic time series that have some negative values that were replaced with random values below 0.01," which is unclear given that the synthetic series are gamma-distributed (strictly positive). Please clarify that the negative values arise after applying the bias formula (Eqs. 20–21), and specify the units of the 0.01 replacement threshold. Figure 7: the in-cell percentages are described as "the percentage of synthetic time series that have some negative values that were replaced with random values below 0.01," which is unclear given that the synthetic series are gamma-distributed (strictly positive). Please clarify that the negative values arise after applying the bias formula (Eqs. 20–21), and specify the units of the 0.01 replacement threshold.
Sections 4.3 and 5: the conclusion that UQM and DQM are the most reliable methods is conditional on the synthetic design, in which the bias is a simple shift in the first two moments; UQM and DQM are constructed precisely to preserve those moments, so the result is partly tautological. This caveat should be stated explicitly in the conclusion so that readers do not over-generalize the recommendation.
Code-availability section: the expected input-data format is not stated in the manuscript. Please state briefly whether the input must be a flat array, a CSV with specific column headers, or a time-indexed series, so that potential users can assess compatibility without inspecting the repository.Citation: https://doi.org/10.5194/egusphere-2026-890-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 861 | 576 | 72 | 1,509 | 66 | 113 |
- HTML: 861
- PDF: 576
- XML: 72
- Total: 1,509
- BibTeX: 66
- EndNote: 113
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This article presents a software package that consolidates multiple quantile mapping methods into one common interface, and makes it available in the three most common coding languages used by climate scientists and others who use climate projections. While limited to quantile mapping type approaches, this will be a useful software package for many people exploring climate impacts on a local or regional scale. My suggestions below are largely aimed at providing a more complete context for this package relative to available resources and methods.
1) Line 34, following the sentence on RCMs, it might be worth mentioning that there are also hybrid methods (e.g., Walton et al., 2015, https://doi.org/10.1175/JCLI-D-14-00196.1) that combine dynamical and statistical downscaling.
2) Lines 34-36, While the software being presented is only for QM variations, the sentence listing “several bias correction (or bias adjustment) methods” should be a little more explicit in what these consist of (like analogues). You might also mention that analogues would not be amenable to the approach of this software, where all methods essentially work on point-to-point bias correction. Also, current development of machine-learning approaches to downscaling is emerging (e.g., Soares et al., 2024, https://doi.org/10.5194/gmd-17-229-2024) and should be noted.
3) Lines 58-61, a little more discussion would help characterize the uncertainties. The Lafferty et al. (2023) reference would provide a good framing of this (https://doi.org/10.1038/s41612-023-00486-0).
4) Line 85, Add a short paragraph discussing the availability of downscaled data, which is increasingly being used by stakeholders to avoid having to do downscaling at all. A few examples of CMIP6 global downscaled data sets are those from the Climate Impacts Lab (Gergel et al., 2024, https://doi.org/10.5194/gmd-17-191-2024), School of Geography and Environmental Science (2022, https://doi.org/10.5285/c107618f1db34801bb88a1e927b82317) and the NASA-NEX archive (Thrasher et al., 2022, https://doi.org/10.1038/s41597-022-01393-4). At regional/continental scales there are many more data sets. Of course, each uses its own observational baseline data, training period, etc., to attributing differences to specific sources is not possible, which means there is still value in facilitating statistical downscaling.
5) Line 285, the example script (Figure 4) and the linked example notebook on github are helpful illustrations of how to use the package for a point. It may exist on the github site but I could not find an example of applying this to a gridded (e.g., netCDF) data set. Since that would be a relatively common application, developing an example of that would make this more complete.