Jump to content

Wikidata:SPARQL query service/WDQS backend update/Backend Replacement

From Wikidata

Proposal: New Database Backend for WDQS

[edit]

This document is intended for community review. Please share all feedback on the WDQS Migration talk page.

Summary

[edit]

This document outlines the proposed selection of QLever as the new backend database for the Wikidata Platform. Blazegraph and its known limitations are the cause of much of the volatility of the Wikidata Query Service (WDQS) and is unable to scale to meet the growth targets of Wikidata. After testing deployments on AWS and on prem infrastructure, repeated ingestion and index update cycles, measuring latency and throughput on a limited set of publicly available and rewritten queries from October-March of FY26, and incorporating community feedback, we believe QLever will put us on the best path to ensuring reliable, sustainable, and scalable access to Wikidata now and into the future.

Context and Problem Statement

[edit]

The Wikidata platform and the Wikidata Query Service (WDQS) have long been approaching the upper limits of their technical capacity, as demonstrated by the number of SLO-impacting incidents (~1 per week) and the erratic trends of performance indicators like query latency and throughput. The platform uses Blazegraph as the database application for the knowledge graph infrastructure, an open source project that has gone unmaintained since being acquired by Amazon in 2018. Meanwhile, the number of users, requests, edits, and data points in the knowledge graph has continued to increase.

To ensure the sustainability of WDQS, the Wikidata Platform team was established to improve and maintain the performance and stability of the service. Previous stewards of the platform explored many solutions to the scaling issue, most notably a split of the graph to reduce cost of query execution for WDQS. All solutions evaluated and implemented were acknowledged as insufficient for addressing the core issues with Blazegraph and were designed to provide additional runway until such time that a dedicated team could execute a migration to a new RDF database.

Assumptions

[edit]
  • We used metrics in line with our migration goals to evaluate candidates. The tests we conducted evaluated a set of both quantitative (e.g. query latency) and qualitative (e.g. community support) metrics. We believe our methodology accurately captures the vendor qualities that will be most important in solving problems throughout the migration and beyond.
  • We were able to accurately assess the qualitative criteria for our choice. Some of our evaluation dimensions, as mentioned above, were focused on a vendor’s community activity and project governance. We believe we evaluated these dimensions as thoroughly as possible.
  • Replacing the backend database will drive meaningful improvement on our migration metrics and on the experience of users. The data ingestion, run time benchmarking and production-replay traffic analyses (as referenced above, even if limited), validated this assumption, demonstrating that QLever drove meaningful improvements across all identified criteria when compared to Blazegraph.
  • The implementation of our proposed technical architecture will protect against similar problems of abandonware in the future. As outlined in our design document for the technical architecture of our new endpoints, we plan to decouple the service and application layers of the Wikidata platform. Doing so will make future evaluations of backend replacements less dependent on large-scale changes.

Recommendation

[edit]

After thorough evaluation, we recommend QLever as the new RDF database for the Wikidata Platform. Benchmarking and initial production-replay testing conducted from October-March of FY26 determined the system was capable of loading the entirety of the Wikidata dataset, supporting existing functionality of our platform, and meeting or exceeding all target performance indicators compared to Blazegraph.

Risks and Mitigations

[edit]
Internal Risks (Challenges to Assumptions)
[edit]
  1. We did not use the right metrics to evaluate candidates
    • Mitigation: If we discover a blindspot in our analyses or identify performance indicators that better capture our target changes, we will have a more dynamic platform infrastructure that will enable agile updates and iteration. Additionally, the QLever team has demonstrated a willingness to prioritize features that support the Wikidata use case.
  2. We were not able to accurately assess the qualitative dimensions
    • Mitigation: In the event that our assessments of qualitative dimensions were inaccurate, we will have the ability to work directly with vendors to influence them upstream or, if conditions are untenable, migrate to a new database with more ease due to architectural design choices underpinning our migration.
  3. Replacing the backend database will not drive meaningful improvement on these metrics and on the experience of users
    • Mitigation: We can be sure of improved stability (e.g. fewer incidents to manage) and growth potential (e.g. lower impact on performance at current growth rate of ~1B triples a year). If we do not see a meaningful change in other KPIs, we will have built a solid foundation on which we can continue to build towards our vision. Additionally, we are conducting production replay analyses this quarter (Q4 FY26) and throughout the migration period to de-risk assumptions around impact.
External Risks
[edit]
  1. Material changes to the vendor’s business model emerge
    • Mitigation: If our chosen vendor announces large scale changes to their business model that are not aligned with the WMF open source requirement, we will switch to a different system. Our proposed architecture will enable us to change our RDF database much more agilely, and the months of evaluation of other top candidates will accelerate our decision making process.
  2. Limited control over vendor feature development
    • Mitigation: The key aspect of our proposed tech design for the platform is the separation of service logic from the database layer. This approach will give us more independence when it comes to implementing features bespoke to the Wikidata use case.
  3. Unforeseen operational risks with being the largest yet deployment of QLever
    • Mitigation: As part of the evaluation supporting this recommendation, we have begun learning about other large deployments of QLever (e.g. UniProt and DBLP). The insight we have gleaned from these discussions will be useful in anticipating and addressing issues expected for our use case (e.g. refresh lag, indexing needs).
  4. Insufficient documentation and training resources for migration
    • Mitigation: We have brought on a contractor (April, 2026) to assist with the technical documentation and query rewriting. This dedicated resourcing will help us identify where more documentation or support is needed. The staggered rollout of our migration requirements for the community will allow sufficient time to action on these signals.

Decision Criteria

[edit]

The evaluation of success presented in our evaluation methodology document, considered both quantitative factors, such as database performance, and qualitative aspects. Accordingly, our decision was guided by the following criteria:

  • Performance: indexing time, throughput, latency
  • Features: SPARQL compliance, API quality, ability to tune the system, support for real-time indexing, ability to define custom SPARQL functions.
  • Operations: deployment ease, monitoring, maintenance overhead.
  • Community: activity, responsiveness, engagement and contribution path with upstream.

We experimented and tested both databases in AWS (WE2.4.3) and eqiad (WE2.5.1). QLever outperformed Blazegraph in all key indicators. Relative to Virtuoso, the deciding factors for moving forward are:

  1. Predictable performance and consistency across workloads.
  2. SPARQL 1.1 support with no feature gap between open source and commercial builds (specifically, GeoSPARQL).
  3. A more modern architecture and public testing infrastructure. QLever is also significantly easier to operate, less sensitive to tuning settings, and with cleaner error reporting.

Community and alignment with our open source principles was the key deciding factor. In our interactions the team has been very responsive, detailed, transparent in discussing tradeoffs and willing to operate as a partner with us and the community. While we expect that a large deployment like WDQS will uncover SPARQL compliance issues or performance bugs, QLever’s thorough, publicly reviewable, and continuous integration testing (with SPARQL compliance tested on each commit), its responsive developer community, and our focus on automating qualitative analysis and data quality checks will help safeguard the migration and the system lifetime.

Engineering effort & uncertainty

[edit]

Integrating QLever with our infrastructure and the WDQS v2 architecture should not present any significant challenge. QLever shows performance degradation when real-time updates are performed over long periods of time. The QLever team recommended approaches that we will implement, and is working on longer term solutions.

Long-term maintainability

[edit]

QLever does not introduce any new technical debt.

Options Considered

[edit]

QLever and Virtuoso are the only open source RDF database system among the ones we tested that are capable of loading 20B triples (2x the size of wikidata’s main graph), while supporting real-time index updates crucial to Wikipedia facing workflows. Both features are hard requirements to support WDQS use cases and future growth.

A. QLever

[edit]

QLever is a research-driven RDF database out of University of Freiburg, optimized for large knowledge graphs. Documentation reports strong read performance, a very active open source development. The main concern is limited operational maturity, smaller track record in production at our scale, and open questions around streaming update integration. All concerns have been addressed in our evaluation phases and in conversations with the vendor.

B. Virtuoso

[edit]

Virtuoso is a mature, battle-tested RDF database with ~26 years of development behind it, backed by OpenLink Software. Virtuoso has a heavier operational footprint, more complex configuration tuning and higher variance in results. The open-source edition has no feature parity with their closed source offer. Their closed source offering is proven at scale, and could give us a more defensible choice if we need commercial support or SLAs down the line.

Other Options

[edit]
Status quo
[edit]

The status quo would be continued support to Blazegraph, a database system that ceased development in 2016 after their team was acquired by Amazon. No new features or improvements are being implemented, and it is no longer maintained. Known and newly discovered bugs are not fixed. At our scale, we encounter frequent stability issues that place a considerable operational burden on our team and on SRE, with an increased risk of SLO-impacting service disruptions.

Moreover, relying on unmaintained technology limits the scope of platform evolution and our ability to sustain Wikidata’s expanding data size and access patterns. For example, the Blazegraph data loading pipeline is not able to reliably handle ingestion of Wikidata RDF dumps. As a result, the process can run for days and occasionally fails with non-deterministic errors.

Apache Jena and Oxigraph
[edit]

Discarded from evaluation because they were unable to load Wikidata.

MilleniumDB
[edit]

Discarded from evaluation because of lack of support to update semantics.