Page MenuHomePhabricator

Avoid querying database on every page view to check if page is in the user's reading list
Closed, ResolvedPublic3 Estimated Story Points

Description

In the current implementation of ReadingLists on web, we query the database to determine:

  • if a page is in the user's reading list, so that the "save page" button has the "save" or "unsave" icon.
  • if it is in the reading list, then get the reading list entry id to provide to the JS
  • also get the reading list size for metrics.

It is problematic for scaling reading lists to query the reading_list database tables on x1 on every page view, and it is also unnecessary.

We need to determine if we still need the reading list size metric here, and if so, consider other approaches.

For the reading list entry id, there probably is a way to use the page id for this instead.

For determining if the page in in the reading list, Amir has a suggestion:

Build a bloom filter of existing reading list page ids for each user and put it in user_properties backed by some cache. Bloom filter will take away 99%‌of the load and even if it incorrectly say "this article is in the user's reading list", then you can query x1 to actually be sure but again it won't cause any load issues. You can also put that behind memcached to make everything faster and avoid local db query too.

This is the idea that I wanted to implement for many years to remove the query of watchlist table on every logged-in page view. If you can implement it for watchlist too, to improve performance (since it'll be backed by memcached). It would be even better!

A less efficient approach could be just to have a list of page ids that are on the user's reading list and put it in memcached and check against that, vs a database query.

Event Timeline

aude renamed this task from Avoid querying database on every page view to check if page is in the user's reading list to SPIKE - Avoid querying database on every page view to check if page is in the user's reading list.Feb 24 2026, 4:00 PM
aude set the point value for this task to 3.

This can initially be a spike with a proof of concept, that considers the feasibility of the suggested approach.

Change #1245418 had a related patch set uploaded (by Aude; author: Aude):

[mediawiki/extensions/ReadingLists@master] WIP - Use bloom filter to reduce DB queries to check page bookmark status

https://gerrit.wikimedia.org/r/1245418

aude renamed this task from SPIKE - Avoid querying database on every page view to check if page is in the user's reading list to Avoid querying database on every page view to check if page is in the user's reading list.Mar 4 2026, 11:54 PM

@Ladsgroup I uploaded a new patch for this to gerrit:

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ReadingLists/+/1245418

One question with this is that I am using a library for the bloom filter (https://github.com/pleonasm/bloom-filter). This has been used by Wikimedia before, but maybe still has to go through security review?

or maybe bloom filter is something simple enough to implement ourselves as a library or something in MediaWiki?

Change #1248891 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/vendor@master] Add pleonasm/bloom-filter for ReadingLists

https://gerrit.wikimedia.org/r/1248891

To be carried forward to sprint 16 - also to be reviewed by a committee of Amir, Steph, and Anne (at minimum! feel free to also take a look)

@Ladsgroup would you be interested to help with code review for the patch? our team will also do a review since we are more familiar with how ReadingLists works.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ReadingLists/+/1245418

For now, we are implementing the bloom filter (with the composer package) in ReadingLists. As a follow up, we are definitely interested to get bloom filter into core and have it used for watchlist (and other similar use cases), since we want to be able to handle more logged-in users.

Thanks. I did a preliminary review. I‌ also think we can start using bloom filters in a lot more areas: T419826: Use a bloom filter for looking up disambig pages

Change #1262251 had a related patch set uploaded (by Aude; author: Aude):

[mediawiki/extensions/ReadingLists@master] Split bloom filter cache-related code to BookmarkBloomFilterCache

https://gerrit.wikimedia.org/r/1262251

Change #1248891 merged by jenkins-bot:

[mediawiki/vendor@master] Add pleonasm/bloom-filter v1.0.4 for ReadingLists

https://gerrit.wikimedia.org/r/1248891

Change #1245418 merged by jenkins-bot:

[mediawiki/extensions/ReadingLists@master] Use bloom filter to reduce DB queries to check page bookmark status

https://gerrit.wikimedia.org/r/1245418

Change #1262251 merged by jenkins-bot:

[mediawiki/extensions/ReadingLists@master] Split bloom filter cache-related code to BookmarkBloomFilterCache

https://gerrit.wikimedia.org/r/1262251