Instrument MediaWiki (probably in WikimediaEvents extensions) to capture the mobile behavior baseline described in https://docs.google.com/document/d/1ulTCYUC3Uhkf7tUfA3y1OSH7ahCMn_EvThAh6ROCFhI/edit
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Declined | nshahquinn-wmf | T235324 Answer key research questions to inform KaiOS app development | |||
| Resolved | SBisson | T238029 Code for InukaPageView instrumentation | |||
| Resolved | SBisson | T238028 Create InukaPageView schema |
Event Timeline
Change 551259 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/WikimediaEvents@master] Instrumentation for InukaPageView
@Catrope I imagine you're familiar with this already, which means either I misunderstood something, or there's a way you plan to work around this. If that is the case, consider this unblocked from MediaWiki perf perspective, and instead consider it blocked on CDN/Traffic perspective.
It is my understanding that we cannot allow currently regular anon page views to produce cookied/uncached responses even if we signal that correctly to the CDN layer, it as would break its ability to handle high traffic load for other users in ways that "Bad Things Might Happen". Take on further with Traffic to figure out.
My summary of the gerrit discussion:
The patch was written in a way that completely ignored the reality of the edge caching we have in production. It would have not worked and caused issues. Thanks @Krinkle for catching it and explaining how that works.
To implement event logging for anon users, the tracking script has to be sent to all users and the script itself has to figure out if a user should be included in the tracking or not. This is how other similar tracking (print, readingDepth, searchSatisfaction, etc) is implemented. This is problematic because it increases the payload and processing for a large number of users.
I have reworked my patch to work like other similar tracking. I would like to invite @Krinkle and @BBlack to voice any concern they have about adding an additional tracking script or the way to do it.
@SBisson I looked over the patch and the schema and I have a number of changes to suggest:
- My intention for the time_on_page field was to know the amount of time the user had the page visible/focused, but currently it just gives the absolute amount of time from page load to page unload. If it's feasible to track visibility time, could we add that as page_visible_time (and maybe rename time_on_page to page_open_time)?
- I just saw your comment about sections being open by default on mobile web; on Chrome for Android, I see them closed by default. Were you using the mobile website on a desktop browser? I think that when the screen size is tablet-size or higher, sections are opened by default, so they'd be closed by default for KaiOS devices. If that's the case, perhaps we can add a field like section_open_count which tracks the number of sections opened (ideally we'd track first time opens/unique section opens only, but it's fine if it's just a raw count with multiple opens of the same section included). This would also require a count of the number of sections on the page to contextualize.
- The dt field that's recorded by default will give the time that the event is received by the EventLogging server. Since we'll be holding events until the page unload, we should have a client_dt or load_dt field that gives the actual time the page was loaded so we can do our sessionizing accurately.
- The schema in its current form won't explicitly tell us if the user is on Special:Search. The fact that the page namespace will be -1 will be a strong signal, but is there any reason not to include an is_search_page variable?
I've added page_open_time and page_visible_time.
- I just saw your comment about sections being open by default on mobile web; on Chrome for Android, I see them closed by default. Were you using the mobile website on a desktop browser? I think that when the screen size is tablet-size or higher, sections are opened by default, so they'd be closed by default for KaiOS devices. If that's the case, perhaps we can add a field like section_open_count which tracks the number of sections opened (ideally we'd track first time opens/unique section opens only, but it's fine if it's just a raw count with multiple opens of the same section included). This would also require a count of the number of sections on the page to contextualize.
Right, this is responsive. They are initially collapsed on screens narrower than 720px.
I've added section_count and opened_section_count. It's a unique count. I think it tells us more about the "depth" of their interaction with the page.
- The dt field that's recorded by default will give the time that the event is received by the EventLogging server. Since we'll be holding events until the page unload, we should have a client_dt or load_dt field that gives the actual time the page was loaded so we can do our sessionizing accurately.
Sure, I can add that. What format do you want the date time in?
- The schema in its current form won't explicitly tell us if the user is on Special:Search. The fact that the page namespace will be -1 will be a strong signal, but is there any reason not to include an is_search_page variable?
DONE
Thank you for the quick responses, @SBisson!
ISO-8601 (i.e. 2019-10-02T16:15:30Z), please.
Nice! From the code, it looks like this count is the number of unique sections that the user ever opened, whether or not they're still open when the event is sent. Is that right? (I like that behavior.)
Thinking about sections made me think of another thing: our operating system filter will still catch Android and iOS tablets where the sections are big enough to be opened by default. The experience on these devices isn't really relevant to KaiOS, so I feel like we should use a screen-size filter to limit our data collection to phones only. @AMuigai, @SBisson what do you think?
I don't have any idea right now what exactly the cutoff would be, but if we decide we want to do this, I'll do some research.
@SBisson I think we're planning to use this same schema for KaiOS app events too; if so, we should probably change the os field to something like`platform` and have kaios-web and kaios-app as possible values.
I think the collection that is heavy on cookies and tracking should have been reviewed by our privacy engineer @JFishback_WMF .
@Neil_P._Quinn_WMF
<T238029+public+623660044037b780@phabricator.wikimedia.org> This is related
to the other ticket T242853 https://phabricator.wikimedia.org/T242853. I
wanted to see:
- If your query would show similar results or behaviour to the data we're
seeing in turnilo
- Determine if there are month on month fluctuations (given that I am
unable to confirm the numbers on turnilo due to the drop that we're yet to
make sense of).
Change 551259 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Instrumentation for InukaPageView
A very special thanks to @phuedx for reviewing and merging the patch!!
This task is now for @nshahquinn-wmf to test and resolve once the events are flowing in successfully and we are confident the data is usable.
Change 566391 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[operations/mediawiki-config@master] Enable InukaPageView instrumentation in labs
Change 566391 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable InukaPageView instrumentation in labs
@nshahquinn-wmf This is enabled in betalabs. After hacking my GeoIP cookie to make it look like I'm in India, I was able to see the events being sent from the browser. Do you happen to know where the betalabs events end up?
Please do search on wikitech where all analytics docs are: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/TestingOnBetaCluster
Change 566572 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/WikimediaEvents@master] InukaPageView: update schema version
Change 566572 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] InukaPageView: update schema version
Change 566579 had a related patch set uploaded (by Catrope; owner: Sbisson):
[mediawiki/extensions/WikimediaEvents@wmf/1.35.0-wmf.16] InukaPageView: update schema version
Change 566579 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.35.0-wmf.16] InukaPageView: update schema version
Mentioned in SAL (#wikimedia-operations) [2020-01-22T19:36:32Z] <catrope@deploy1001> Synchronized php-1.35.0-wmf.16/extensions/WikimediaEvents/: InukaPageView: update schema version (T238029) (duration: 01m 07s)
Change 566596 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/WikimediaEvents@master] InukaPageView: update schema version
Change 566596 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] InukaPageView: update schema version
Change 566600 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/WikimediaEvents@wmf/1.35.0-wmf.16] InukaPageView: update schema version
@nshahquinn-wmf I was able to test on betalabs. I discovered and fixed two issues: 1) there was a typo in the "client_type" field in the schema, 2) the referring_domain was mandatory but there are valid cases where the is just no referrer so I made it optional.
CentralNotice seem to be installed but window.Geo is not populated. I had to put a breakpoint at the beginning of our script so I can populated it before our script tries to read it. It is populated correctly in production so it should work.
It is populated alright. I deleted my cookies, refreshed, and it's there and working fine.
Change 566600 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.35.0-wmf.16] InukaPageView: update schema version
Mentioned in SAL (#wikimedia-operations) [2020-01-23T19:10:31Z] <urbanecm@deploy1001> Synchronized php-1.35.0-wmf.16/extensions/WikimediaMessages/extension.json: SWAT: 23a6f8e: InukaPageView: update schema version (T238029) (duration: 01m 05s)
Change 570381 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[operations/mediawiki-config@master] Enable InukaPageView logging on production Wikipedias
Change 570381 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable InukaPageView logging on production Wikipedias
Mentioned in SAL (#wikimedia-operations) [2020-02-05T19:19:41Z] <jforrester@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T238029 Enable InukaPageView logging on production Wikipedias (duration: 01m 07s)
I'm seeing events flowing into the production database, so I think this is done! I'll do a deeper check next week, when more data is available (T244174).
SELECT COUNT(*) as events FROM inukapageview WHERE year > 0 events 77019