Population¶
File | Source(s) |
---|---|
population_xx.csv | DE: Rosés-Wolf database on regional GDP (version 6, 2020)(pre 1990) & Eurostat(post 1990), FR: INSEE, GB: Vision of Britain (pre 1981) & ONS (post 1981) & Wiki (Northern Ireland) and Census (London), US: Fabian Eckert, Andrés Gvirtz, Jack Liang, and Michael Peters. "A Method to Construct Geographical Crosswalks with an Application to US Counties since 1790." NBER Working Paper #26770, 2020(1830-1970) & Census(1970-2010) & NBER data(2010-2018) |
Coverage¶
Country | Geographical level | Period |
---|---|---|
DE | 2 (nuts2) | 1900-2017 |
FR | 3 (nuts3) | 1876-2017 |
GB | 2 (nuts2) | 1851-2017 |
US | 3 (county) | 1830-2018 |
Annual data
We proceed to a linear interpolation based on population_raw column to obtain population data for each year in population column.
Variables¶
Variable | Description | Type |
---|---|---|
country_code | Country code | str |
statisticalAreaCode | Statistical area code (nuts/fips) | str |
statisticalAreaName | Statistical area name (literal) | str |
year | Year | int |
population | Population in the statistical area (in thousands) | float |
population_raw | Population in the statistical area before correction (in thousands). Relevant for GB only (see notes below) | float |
Focus on US data
We obtain US population post 1970 data by aggregating county data thanks to David Dorn crossover table.
Focus on GB data
GB population data are not available at a sufficiently detailed NUTS level over long period - at least we did not find it. For instance, Rosés and Wolf (2020) only provides data at the NUTS1 level for GB. Hence, we had to build the population data for GB at the NUTS2 level ourselves. This includes 3 main stages: 1. Pre-1981 data collection, 2. Post-1981 data collection, 3. Data harmonization
Focus on DE data
Following Roses and Wolf (2020), we have merged the regions of Darmstadt and Giessen into one entity and similarly for Braunschweig and Hannover. While each of these areas correspond to a NUTS2, the multiple changes in borders make it impossible to track population estimates over time without merging the regions.
Pre-1981 data collection:
We use Vision of Britain (VoB) population data, except for London where we use data from the Census. Some VoB geographic entities have no population data though. In this case, we made our best to reconstitute the data from smaller entities with known population data. Below we detail the construction of these entities
VoB | Construction |
---|---|
Tweeddale | Peebles+Selkirkshire |
Roxburgh Ettrick and Lauderdale | Roxburghshire + Selkirkshire + Berwickshire/4 + Midlothian/4 |
Cheshire | Halton + Warrington + Cheshire east + Cheshire West and Chester |
Mid Glamorgan | Caerphilly/2 + Bridgend + Merthyr Tydfil + Rhondda; Cynon; Taff |
South Glamorgan | Vale of Glamorgan + Cardiff |
Clwyd | Flintshire + Wrexham + Denbighshire |
Dyfed | Carmarthenshire + Ceredigion + Pembrokeshire |
Gwent | Blaenau Gwent + Caerphilly/2 + Monmouthshire + Newport + Torfaen |
Vale of Glamorgan | Glamorganshire |
Missing VoB data (concentrated in 1871, 1901 and 1941) are filled with linear interpolation.
Once we have data for all VoB entities (real or imputed), we aggregate them to obtain population data at the NUTS2 level using the conversion table reported in statisticalareasvob_gb.csv.
Post-1981 data collection
After 1981, the ONS provides data at the local authority level for each year. Same as before, we aggregate them to obtain population data at the NUTS2 level using the conversion table reported in statisticalareaslau_gb.csv. The conversion table is based on the local authority to NUTS crossover table and the Scotish Review of NUTS boundaries.
Data harmonization
As pre-1981 data are constructed using a collection of sources creating potential flaws or approximations. Hence, we found it desirable to compare the two datasets in 1981 (the only year of overlap) to compute a correction coefficient obtained as \(\frac{population~in~1981~using~ONS~data_{NUTS2}}{population~in~1981~using~VoB~data_{NUTS2}}\). We then apply this correction coefficient to all pre-1981 data to make sure that the time series is consistent for each NUTS2 despite the data source change.
Note that for East Wales and Scotland, 1981 (and 1971 for East Wales) data are missing from VoB. We used the 1971 data and applied the national population growth rate to (roughly) estimate the VoB data and hence the correction coefficient.