Jump to content

Wikidata:WikiProject Occupations/Roadmap

From Wikidata

Roadmap

[edit]

I am sequencing publication of native-language occupation labels (and Q-item creation where needed) by country, paused at present pending Task 2 bot approval. Phase order reflects data readiness — where GSCO has clean native-language labels from a state classifier with valid ISCO-08 mapping — rather than community demand or language size.

Each phase begins with a manual-review pilot of 10 Q-items before scaling. If reverts arrive at any point, operations pause until concerns are addressed.

Publication phases (after Task 2 approval)
Phase Countries / sources (data ready) Approximate label count Status
Phase 0 — pilot Latvia (Profesiju klasifikators) — 10 Q-items, manual review 5–10 planned, pending Task 2
Phase 1 Latvia, Moldova, Armenia, Azerbaijan, Georgia — clean datasets, ~99% ISCO-08 coverage ~30,000 labels planned
Phase 2 Romania, Italy, Spain, Portugal, France, Germany, Poland — ~99% ISCO-08 coverage ~50,000 labels planned
Phase 3 Turkey, Croatia, Bosnia & Herzegovina, Albania, Bulgaria, Hungary, Czechia, Slovakia, Slovenia ~25,000 labels planned
Phase 4 Arabic (multiple classifiers including ILO ISCO-08 AR, Saudi SSCO-2024, Palestine ASCO-2016, Jordan JSCO, UAE), Korean, Thai, Vietnamese, Bengali ~20,000 labels data acquisition in progress
Phase 5 Russia (OKPDTR — requires ISCO-08 crosswalk file before publishable), Brazil (CBO — partial ISCO mapping), India (NCO 2015), other partial-mapping classifiers varies requires crosswalk research

Coverage check before each phase

[edit]

Before opening a phase I verify:

  1. State classifier source is downloaded, parsed, and stored in gsco.io with per-entry provenance (publication date, source URL, language code).
  2. A Q-item for the classifier itself exists on Wikidata, or is created as a prerequisite (so that every label has a valid stated in (P248) reference).
  3. Sample 10 Q-items, manually review the matching, run the pilot, wait 7 days for any community feedback, then scale up.

Notes on Phase 5

[edit]

Several large classifiers do not map one-to-one to ISCO-08 unit groups at 4-digit level. Their inclusion requires crosswalk research before any publication on Wikidata:

  • Russia (OKPDTR 2025) — uses its own classification structure; ISCO-08 crosswalk exists for some categories but is not complete.
  • Brazil (CBO) — partial ISCO mapping; many CBO codes have no direct ISCO-08 equivalent.
  • India (NCO 2015) — adapted from ISCO-08 but with significant national-level extensions.
  • USA (O*NET/SOC) — separate SOC structure; ISCO-08 ↔ SOC crosswalks exist via BLS but require validation.

For these I will publish only after the crosswalk is verified against the official statistical office's published tables.