Wikipedia Topic Builder — Wiki Education

How it works

Topic Builder lets you use AI to collect a set of Wikipedia articles about any topic. You explain what the topic is, and what sorts of articles should be central to it, peripheral but still relevant, or outside the scope of your topic. The AI gets a menu of common exploration strategies, and handles the mechanical work — running searches, traversing categories, fetching descriptions.

ScopeAgree on what counts as "in" — biographies, lists, popular-culture spinoffs, sister-language articles.

ReconnaissanceSurvey categories, WikiProjects, index pages, and Wikidata classes.

GatherCollect candidate articles from multiple strategies, with preview variants for risky pulls.

Review & scoreFetch short descriptions; optionally score articles 1–10 for topic centrality (core vs periphery).

Edge browseFind articles structured strategies missed — and the ones you specifically expect to see.

ExportDownload the final CSV for the Impact Visualizer.

Getting a more complete topic

The AI will produce a serviceable topic on its own, but it tends to stop where stopping is easy: round-number article counts, the obvious core, the first strategies it tried. The difference between a "fine" topic and a great one is mostly how involved you are at five high-leverage moments.

Push back during scoping. The AI will propose a default scope. State explicitly what's in and what's out — biographies of key people, sub-events, geographic or sister-language variants, "in popular culture" articles, lists. Ambiguity in scope becomes ambiguity in the corpus.
Don't accept the first "looks done." LLMs gravitate to round-number stopping points and pattern-complete to "this seems comprehensive." Ask "what kinds of articles haven't we tried to find yet?" — the AI usually has more strategies left when prompted.
Run complementary strategies and compare. If you've harvested via WikiProject, also try a category-tree or list-page pull. The diff between two strategies surfaces both noise (articles only one picked up) and gaps (articles the other found that the first missed).
Spot-check before exporting. Name 3–5 articles you'd expect to see — ideally obscure ones, not the obvious core. If any are missing, ask the AI to investigate; one missed article often reveals a strategy gap affecting dozens of similar ones.
Bring your domain expertise to edge calls. For ambiguous articles, your judgment about whether something is "really" the topic is what the AI doesn't have. Don't outsource these calls — your time on edges is the highest-leverage time you'll spend.

The five recommendations above are the patterns we already know help. The built-in strategies — categories, WikiProjects, list pages, Wikidata, search — cover a lot, but they aren't the whole space. If you (or the AI) have other ideas for where articles might live — an authoritative reference list, a SPARQL query, the references section of a few key articles, a topic-specific infobox template — say so. The tools are general-purpose enough that most approaches can be worked in, and a domain expert's instincts often beat the standard menu.

Sign in

Topic Builder uses your Wikimedia account to scope topics to you. Sign in once per device, paste the resulting token to your AI, and the rest of the conversation is yours.

Sign inVisit /oauth/login and approve the Wikimedia consumer.

Copy the token lineYou'll get a one-line message starting with "My Topic Builder token is…" — copy the whole line.

Paste it into the chatThe AI will call authenticate to bind your identity to the session, and may offer to remember the token for future chats.

Topics are private by default — only you can see or modify them. To share, ask the AI to set the topic's visibility to public_read (anyone reads, only you edit) or public_edit (any signed-in user reads + edits). If a token is ever leaked, ask the AI to call revoke_my_token and get a fresh one at /oauth/login.

Once signed in, visit /topics to see every topic you own and download a fresh simple or enriched CSV for any of them on demand.

Connect to Claude

Detailed setup for Claude on the web. Choose the path that matches your account.

Individual Claude account Personal / Pro

Open Settings Go to claude.ai. Click your profile icon (bottom-left), then Settings.

Add the integration Go to the Integrations tab. Click Add More, then Add custom integration. Enter this URL:

https://topic-builder.wikiedu.org/mcp

Name it Wikipedia Topic Builder. Set authentication to None — the sign-in happens in chat (see Sign in above), not in the connector.

Start a conversation Start a new chat. Click the Integrations icon (plug) in the message bar and toggle Wikipedia Topic Builder on. Then try:

"I want to build a list of all Wikipedia articles about human trafficking. Let's start with reconnaissance."

Organization account — admin setup Admin

As an org admin, you add the connector once and it becomes available to all members.

Open Organization Settings Go to claude.ai. Click your profile icon, then Settings, then navigate to your organization's admin settings.

Add a connector Find the Connectors section. Click Add Connector. Enter:

https://topic-builder.wikiedu.org/mcp

Name it Wikipedia Topic Builder. Set authentication to None — each member signs in separately in chat (see Sign in above), not via the connector.

Done The connector is now available to all members of your organization. They can enable it from the Connectors menu in any chat.

Organization account — using an enabled connector Member

Your admin has already added the Wikipedia Topic Builder connector. You just need to enable it in a chat.

Start a new chat Go to claude.ai and start a new conversation.

Enable the connector Click the Connectors menu (plug icon, or look for it below the message input). Find Wikipedia Topic Builder and toggle it on.

Start building Tell Claude what topic you want to explore:

"I want to build a list of all Wikipedia articles about human trafficking. Let's start with reconnaissance."

Connect to ChatGPT

ChatGPT supports remote MCP servers as custom connectors. Availability and menu names vary by plan (Plus / Pro / Business / Enterprise / Edu) and have shifted as OpenAI has rolled out MCP support — if any step below doesn't match your UI, check OpenAI's MCP documentation.

ChatGPT custom connector ChatGPT

Enable developer / connector mode Go to chatgpt.com. Open Settings → Connectors (or Apps & connectors). If custom connectors aren't visible, look under Advanced for a Developer mode toggle and turn it on. On Business/Enterprise, a workspace admin must enable custom connectors first.

Add a custom connector Click Add custom connector (or Create). Enter:

https://topic-builder.wikiedu.org/mcp

Name it Wikipedia Topic Builder. Set authentication to No authentication — the sign-in happens in chat (see Sign in above), not in the connector. Save.

Use it in a chat Start a new chat. Open the tools / connectors menu in the composer and enable Wikipedia Topic Builder. Then try:

"I want to build a list of all Wikipedia articles about human trafficking. Let's start with reconnaissance."

Tool-calling quality depends on the model you pick. For long multi-step topic builds, use a reasoning-capable model.

Other MCP clients

Any MCP-capable client (Cursor, Zed, Continue, custom agents built on the protocol) can connect to the URL above. Consult your client's documentation for how to add a remote MCP server with no connector-level authentication — user sign-in happens in chat via the steps under Sign in above.

Available tools

The AI drives these directly — you don't need to call them by hand.

start_topicBegin a new topic build on a specific Wikipedia language edition; pass fresh=True to clear an existing topic.

resume_topicResume an existing topic by name; nudges for feedback after a long idle gap.

list_topicsList all saved topics with their article counts.

reset_topicClear the current topic's working list and start over.

get_statusArticle count, score distribution, source breakdown, per-topic cost aggregate.

describe_topicShape-of-corpus overview: title lengths, top first-words, suspicious patterns, source-shape stats with triangulation %, redirect-collapse rate, attempted vs unused-but-applicable strategy moves, yield trend.

audit_progressRead-only synthesis of corpus state, attempted moves, unused-but-applicable moves, detected failure modes, and a one-paragraph recommendation. Pre-export gate; mid-build pivot signal.

topic_diffPartition two topics' titles into only_a / only_b / both. Use as a ratchet diagnostic against a frozen baseline, or to compare a topic against a curated blocklist or a parallel pull.

set_topic_rubricPersist a three-tier CENTRAL / PERIPHERAL / OUT rubric drafted after scope confirmation. Mandatory before any gather call. Pass topic_profile to receive shape-keyed strategy guidance in the response.

get_topic_rubricRe-read the current rubric mid-session or before export to sanity-check scope.

survey_categoriesSurvey Wikipedia's category tree without collecting yet.

check_wikiprojectCheck whether a given WikiProject exists. Cross-wiki aware; returns the local title plus a tagging_mechanism field telling you whether members can be enumerated on this wiki.

find_wikiprojectsDiscover WikiProjects by keyword — enwiki via prefixsearch, non-en via cached Wikidata cross-wiki sitelinks (~18% of enwiki projects have a non-en equivalent).

preview_wikiprojectCheap metadata about an enwiki WikiProject — total article count + importance breakdown — via the Wikipedia 1.0 bot's assessment tables. Enwiki-only (the bot doesn't run elsewhere). Call BEFORE get_wikiproject_articles to avoid timeouts on huge projects.

find_list_pagesSearch for Index, List, Outline, and Glossary pages.

wikidata_search_entityLabel-search Wikidata for the QID of a concept — call this before other Wikidata tools.

wikidata_entities_by_propertyFind entities whose Wikidata property points to a value; returns QID + label + sitelink title.

preview_wikidata_propertyTitles-only sibling — returns just QID + title + sitelink count, sorted by sitelink count desc. Use when the property has hundreds of well-attested entities and the full-body variant would overflow.

wikidata_queryRaw SPARQL against query.wikidata.org, for compound joins and multi-hop traversals.

petscanCompound query in one HTTP call — combine categories (AND/OR/NOT), template membership (article OR talk-page), namespaces, and SPARQL constraints. Preview-then-commit flow. The right tool for category ∩ WikiProject without ingesting either side.

resolve_qidsLazy-backfill Wikidata QIDs onto articles in the working list; enables cross-wiki tooling.

get_wikiproject_articlesFetch all articles tagged by a WikiProject. Dispatches by per-wiki tagging mechanism (per-project banner on en/de/ru, parameterized banner on fr/es/it/pt, no banner on ja/pl/sv).

get_category_articlesCrawl a category tree and collect articles, with cooperative time budget.

preview_category_pullDry-run of get_category_articles — counts + sample without committing.

harvest_list_pageExtract links from an Index or List page; default skips navboxes and references.

preview_harvest_list_pageDry-run of harvest_list_page — inspect before committing.

harvest_navboxExtract the article list from a navbox / infobox template — great for award, franchise, and program shapes.

get_article_contentPlain-text extract of the canonical article (RTFA). Use as planning context before drawing the rubric, or to surface domain framing the structural signals miss.

get_article_linksOutgoing mainspace links from an article — the topic's first-degree neighborhood. Pairs with the other seed-mining tools.

get_article_see_alsoLinks from the article's editor-placed See also section — the curated semantic neighborhood. Higher precision than morelike: or full outgoing links on niche topics.

get_article_backlinksArticles that link TO this one (“what links here”). Cap aggressively on prominent topics; the long tail is mostly trivial mentions.

get_article_categoriesCategories the given article belongs to — each is a descent candidate for survey_categories / get_category_articles.

get_article_templatesTemplates used on the article (filter to navbox / infobox / wikiproject). Each navbox is a harvest_navbox target.

wikidata_get_entityFull property dump and sitelinks for a Wikidata QID. Reveals which properties are populated before you commit to targeted probes.

search_articlesSearch Wikipedia using CirrusSearch operators; supports within_category scoping.

search_similarFind articles similar to a given one via morelike: — great for filling gaps.

preview_searchRun a search and return titles + descriptions without adding to the list.

preview_similarRead-only morelike: preview — critical for noisy similarity seeds.

browse_edgesFollow outgoing links from confirmed articles.

add_articlesManually add articles the AI identifies outside the gather tools.

fetch_descriptionsPull Wikidata short descriptions, with REST-intro fallback whenever the shortdesc is empty; auto-loops until done.

fetch_article_leadsFetch the first N sentences of each article's body — use when a shortdesc looks thin or misleading and you need a richer read before scoring.

score_by_extractFetch article introductions so the AI can score centrality.

set_scoresSave centrality scores (1–10) for articles; 10 = canonical core, 1 = distant periphery.

auto_score_by_keywordBulk-score articles whose title or description contains any of the given keywords.

auto_score_by_descriptionReject obvious noise via description markers (labeled axes + disqualifying), with dry-run preview.

score_all_unscoredStamp remaining unscored articles at one centrality value. Use deliberately, not as a closing ceremony.

list_sourcesShow every source label attached to articles, with counts.

get_articles_by_sourceList articles from a source, optionally excluding overlap.

get_articlesList articles with filters: score, source, regex on title or description, source intersection.

remove_articlesRemove a specific list of articles by title.

remove_by_sourceUndo a noisy pull by its source label, exact or prefix match.

remove_by_patternBulk-remove articles by title or description pattern, with dry run.

resolve_redirectsNormalize corpus titles to canonical Wikipedia form. Safe / additive — no drops. Run early in every build to prevent redirect-source duplicates.

filter_articlesResolve redirects; drop disambiguation, list, and year-prefixed pages. Refuses drops > max_drop_fraction (default 10%) without force=True.

reject_articlesAdd titles to a sticky rejection list so future gathers won't re-introduce them.

list_rejectionsList the topic's sticky rejections (title + reason + timestamp).

unreject_articlesRemove titles from the sticky rejection list.

set_topic_tagsAuthor or replace the topic's tag taxonomy. Tags stratify the corpus into named subsets (e.g. mitigation / adaptation, crew / ground-control). Tags may declare value-bearing properties (biography by gender + country). Destructive replacement: tags absent from the new list are dropped along with their membership.

get_topic_tagsRead the current tag taxonomy with property defs. For per-tag distribution counts, see audit_progress.

tag_articlesApply a tag to a list of articles by title (AI judgment). Property values not set; use set_tag_property_values for that.

untag_articlesRemove a tag from a list of articles. Tag definition is not deleted.

tag_by_sourceBulk-apply a tag to every article carrying a source label. prefix_match=True for "everything from wikiproject:*". Idempotent.

untag_by_sourceSymmetric removal: untag every article whose sources match.

tag_by_patternBulk-apply a tag where title or description matches a case-insensitive regex (both AND when both set).

tag_by_wikidataMembership + value capture in 1+N SPARQL queries, independent of corpus size. Predicates AND together; capture_properties pulls Wikidata values into article_tags.properties_json. Auto-applies: the predicate match IS the tag set.

set_tag_property_valuesSet per-article values for a tag's properties without changing membership. Manual AI-judgment fill, or override of Wikidata-derived values.

untag_allWipe a tag's membership while keeping its definition. To delete the definition too, omit the tag from set_topic_tags.

export_csvExport as CSV. Default = single-column titles, Impact-Visualizer-compatible; enriched=True = 6 columns incl. QID, description, score, sources.

prepare_iv_handoffPreview an Impact Visualizer handoff package — config + first articles + centrality histogram — without committing. Show the user before calling publish_topic.

publish_topicMint a clickable Impact Visualizer import link. Snapshots the article list + IV config (dates, editor label, description, centrality) and returns a one-click URL the user opens to create the IV topic.

submit_feedbackSubmit a retrospective at the end of a session.

fetch_task_briefEntry point for benchmark / dogfood runs: returns the full prompt + target topic name for a given task_id. Not used in normal topic-building sessions.

list_tasksList available benchmark / dogfood task briefs (for research runs).

list_exemplarsFree / preparatory: menu of authored worked-example exemplars from analogous benchmark topics (shape axes, summary, headline numbers, high-leverage moves). Call after rubric is set, before any metered tool call.

get_exemplarFree / preparatory: full case study for one exemplar by slug — tool sequence, lessons, anti-patterns. Pair with list_exemplars to pick which to read.

authenticateBind a Wikimedia OAuth token (from /oauth/login) to the session. AI offers to remember the token for future chats; sliding 30-day TTL means active users rarely re-paste.

whoamiShow the authenticated Wikimedia username for the current session, or "anonymous" if no token is bound.

revoke_my_tokenSelf-revoke a Topic Builder token. Use if a token was leaked or pasted somewhere it shouldn't be; get a fresh one at /oauth/login.

get_topic_visibilityShow the current topic's owner and visibility tier (private / public_read / public_edit).

set_topic_visibilityChange the current topic's visibility. Owner only. private = solo; public_read = anyone reads; public_edit = any signed-in user reads + edits.

Output format

By default, export_csv emits a single-column CSV — one article title per row, no header, plain UTF-8 — ready to feed into the Wiki Education Impact Visualizer, which reads exactly that shape.

Pass enriched=True for a richer six-column variant with a header row: title, wikidata_qid, description, score, source_labels (pipe-separated), first_added_at. UTF-8 with BOM so Excel detects encoding. Useful for manual review, downstream tooling, or future Impact Visualizer filtering on topic centrality.

For an end-to-end handoff that skips the CSV download, use publish_topic instead of export_csv. It mints an https://impact-visualizer.wmcloud.org/imports/<handle> URL the user clicks; Impact Visualizer server-side fetches the snapshot from https://topic-builder.wikiedu.org/packages/<handle> and creates the Topic + ArticleBag in one transaction. Centrality scores ride along per-article. Frozen at publish time — re-publish to refresh.

Build a Wikipedia topic, together with an AI.

How it works

Getting a more complete topic

Sign in

Connect to Claude

Connect to ChatGPT

Other MCP clients

Available tools

Output format