In the current implementation of ReadingLists on web, we query the database to determine:
- if a page is in the user's reading list, so that the "save page" button has the "save" or "unsave" icon.
- if it is in the reading list, then get the reading list entry id to provide to the JS
- also get the reading list size for metrics.
It is problematic for scaling reading lists to query the reading_list database tables on x1 on every page view, and it is also unnecessary.
We need to determine if we still need the reading list size metric here, and if so, consider other approaches.
For the reading list entry id, there probably is a way to use the page id for this instead.
For determining if the page in in the reading list, Amir has a suggestion:
Build a bloom filter of existing reading list page ids for each user and put it in user_properties backed by some cache. Bloom filter will take away 99%of the load and even if it incorrectly say "this article is in the user's reading list", then you can query x1 to actually be sure but again it won't cause any load issues. You can also put that behind memcached to make everything faster and avoid local db query too.
This is the idea that I wanted to implement for many years to remove the query of watchlist table on every logged-in page view. If you can implement it for watchlist too, to improve performance (since it'll be backed by memcached). It would be even better!
A less efficient approach could be just to have a list of page ids that are on the user's reading list and put it in memcached and check against that, vs a database query.