Wikidata:WikiProject Occupations/Roadmap
Roadmap
[edit]I am sequencing publication of native-language occupation labels (and Q-item creation where needed) by country, paused at present pending Task 2 bot approval. Phase order reflects data readiness — where GSCO has clean native-language labels from a state classifier with valid ISCO-08 mapping — rather than community demand or language size.
Each phase begins with a manual-review pilot of 10 Q-items before scaling. If reverts arrive at any point, operations pause until concerns are addressed.
| Phase | Countries / sources (data ready) | Approximate label count | Status |
|---|---|---|---|
| Phase 0 — pilot | Latvia (Profesiju klasifikators) — 10 Q-items, manual review | 5–10 | planned, pending Task 2 |
| Phase 1 | Latvia, Moldova, Armenia, Azerbaijan, Georgia — clean datasets, ~99% ISCO-08 coverage | ~30,000 labels | planned |
| Phase 2 | Romania, Italy, Spain, Portugal, France, Germany, Poland — ~99% ISCO-08 coverage | ~50,000 labels | planned |
| Phase 3 | Turkey, Croatia, Bosnia & Herzegovina, Albania, Bulgaria, Hungary, Czechia, Slovakia, Slovenia | ~25,000 labels | planned |
| Phase 4 | Arabic (multiple classifiers including ILO ISCO-08 AR, Saudi SSCO-2024, Palestine ASCO-2016, Jordan JSCO, UAE), Korean, Thai, Vietnamese, Bengali | ~20,000 labels | data acquisition in progress |
| Phase 5 | Russia (OKPDTR — requires ISCO-08 crosswalk file before publishable), Brazil (CBO — partial ISCO mapping), India (NCO 2015), other partial-mapping classifiers | varies | requires crosswalk research |
Coverage check before each phase
[edit]Before opening a phase I verify:
- State classifier source is downloaded, parsed, and stored in gsco.io with per-entry provenance (publication date, source URL, language code).
- A Q-item for the classifier itself exists on Wikidata, or is created as a prerequisite (so that every label has a valid stated in (P248) reference).
- Sample 10 Q-items, manually review the matching, run the pilot, wait 7 days for any community feedback, then scale up.
Notes on Phase 5
[edit]Several large classifiers do not map one-to-one to ISCO-08 unit groups at 4-digit level. Their inclusion requires crosswalk research before any publication on Wikidata:
- Russia (OKPDTR 2025) — uses its own classification structure; ISCO-08 crosswalk exists for some categories but is not complete.
- Brazil (CBO) — partial ISCO mapping; many CBO codes have no direct ISCO-08 equivalent.
- India (NCO 2015) — adapted from ISCO-08 but with significant national-level extensions.
- USA (O*NET/SOC) — separate SOC structure; ISCO-08 ↔ SOC crosswalks exist via BLS but require validation.
For these I will publish only after the crosswalk is verified against the official statistical office's published tables.