Loading GNIS data
Loading GNIS (Geographic Name Information System)
You need 4 datasets:
1. World GNIS
- About 5,000,000 entries for everywhere but the USA and Antarctica
- You want the 250mb compressed file from here
2. USA GNIS
- About 2,000,000 entries in the USA (and protectorates)
- By state, you need all the states file (ie. "NY_DECI.TXT")
- Get them from here
- Do NOT get HIST_FEATURES_DECI.TXT or POP_PLACES_DECI.TXT
3. Antarctica (Optional)
- Get the zipfile from here
4. World Population:
- Small dataset with Cities/Country/Region location with population
- Available here
- Its a smallish dataset of unknown quality
- © by Stefan Helders www.world-gazetteer.com
Processing
Setup a Postgis database called "gnis"
1. World GNIS
i) uncompress the 250mb file (it'll be about 750mb)
ii) Use the attached perl script (parse_world.pl) to process the dataset
perl parse_world.pl geonames_dd_dms_date_20050801.txt > all_world.sql
iii) Load into your database
psql gnis -f all_world.sql > all_world.out
iv) load in the gnis_country.dump (attached to this document) file (this links the 2 letter country code to country name. This version has had 'extra' countries added to it.)
pgsql gnis -f gnis_country.dump
v) Add the feature code descriptions (attached to this document)
psql gnis -f feature_codes.sql
vi) load the sub-national unit names (ie. states/provinces/districts):
psql gnis -f subnational.sql
v) fluff the data up so that it is joinable with the USA data:
create index gnis_country_indx on gnis_country (country_code);
create index gnis_int_types_indx on gnis_int_types (type_code);
create index sub_nat_indx on gnis_intern_subnational (country_code,subnational_code);
create index gnis_country_indx2 on gnis_country ( bpchar(country_code) );
create index gnis_int_types_indx2 on gnis_int_types (bpchar(type_code));
create index sub_nat_indx2 on gnis_intern_subnational (bpchar(country_code),bpchar(subnational_code));
vacuum analyse gnis_country;
vacuum analyse gnis_intern_subnational;
vacuum analyse gnis_int_types;
alter table gnis_international add column country_name text;
alter table gnis_international add column sub_national text;
alter table gnis_international add column type text;
alter table gnis_international add column full_name_lc text;
update gnis_international set full_name_lc = lower(full_name_nd),
country_name = (select name_nd from gnis_country WHERE gnis_country.country_code =country),
sub_national = (select name_nd from gnis_intern_subnational WHERE country=country_code AND gnis_intern_subnational.subnational_code =gnis_international.sub_national_code ),
type = (select name_nd from gnis_int_types WHERE gnis_int_types.type_code = gnis_international.feature_designition_code)
;
alter table gnis_international add column est_pop int;
update gnis_international set est_pop = NULL;
2. USA
i) combine all (60) states into one big file:
cat *DECI.TXT > ../ALL_DECI.TXT
NOTE: do NOT include HIST_FEATURES_DECI.TXT or POP_PLACES_DECI.TXT in this!
ii) load into database
perl ../parse_state.pl <ALL_DECI.TXT > all_state.sql psql gnis -f all_state.sql >& all_state.out
NOTE: this converts non ASCII letters (original is likely iso-8859-1 (latin-1)) to the 'best' ASCII representation.
NOTE: the perl script is here parse_state.pl
iii) adjust unique ids
update gnis_usa set uniq_featcode = uniq_featcode+ 10000000;
iv) fluff up for capatibility
alter table gnis_usa add column country text; alter table gnis_usa add column country_name text; alter table gnis_usa rename column statecode to sub_national; alter table gnis_usa rename column feature_type to type; alter table gnis_usa add column full_name_lc text; update gnis_usa set country = 'US', country_name ='United States', full_name_lc = lower(full_name_nd); update gnis_usa set type = 'City/Town' where type ='ppl';
Combine
CREATE table gnis as
SELECT full_name_nd as full_name,
full_name_lc,
sub_national,
country_name,
country as country_code,
est_pop,
type,
uniq_featcode,
the_geom
FROM gnis_international;
;
INSERT into gnis
SELECT full_name_nd as full_name,
full_name_lc,
sub_national,
country_name,
country as country_code,
est_pop,
type,
uniq_featcode,
the_geom
FROM gnis_usa
;
World Population
perl parse_pop.pl < dataen.txt > out.sql
in postgis:
\i out.sql
update gnis_pop set the_geom= setsrid(the_Geom,4326);
create index gnis_pop_spatial on gnis_pop using gist (the_Geom gist_geometry_ops);
vacuum analyse;
insert into geometry_columns values ('','','gnis_pop','the_geom',2,4326,'POINT');
subnational.sql at: http://docs.codehaus.org/download/attachments/43149/subnational.sql
The attachments to this page are available here or hit the "Page Operations" link on the left column (near the bottom).
Comments ( Hide )
|
|
javaguru@ieee.org says:Jun 06, 2006 15:40 ( Permalink ) |
|
|
bourneaisyah says:Hi, I think the tutorial should be updated, because the link is broken for the subnational.sql And I could not continue the tutorial again because I could not have the subnationa.sql Thank you |
When updating the gnis_usa table to determine a city/town a better query would be
update gnis_usa set type = 'City/Town' where type ='ppl' and est_pop > 0;
because any place associated with people not in a specific category (i.e. school, park, etc.) is marked as ppl. an example would be subdivisions which typically have their name followed by "(subdivision)" and their type is ppl.