Loading GNIS data

Loading GNIS (Geographic Name Information System)

You need 4 datasets:

1. World GNIS

  • About 5,000,000 entries for everywhere but the USA and Antarctica
  • You want the 250mb compressed file from here

2. USA GNIS

  • About 2,000,000 entries in the USA (and protectorates)
  • By state, you need all the states file (ie. "NY_DECI.TXT")
  • Get them from here
  • Do NOT get HIST_FEATURES_DECI.TXT or POP_PLACES_DECI.TXT

3. Antarctica (Optional)

  • Get the zipfile from here

4. World Population:

  • Small dataset with Cities/Country/Region location with population
  • Available here
  • Its a smallish dataset of unknown quality
  • © by Stefan Helders www.world-gazetteer.com

Processing

Setup a Postgis database called "gnis"

1. World GNIS

i) uncompress the 250mb file (it'll be about 750mb)
ii) Use the attached perl script (parse_world.pl) to process the dataset

perl parse_world.pl geonames_dd_dms_date_20050801.txt > all_world.sql

iii) Load into your database
psql gnis -f all_world.sql > all_world.out

iv) load in the gnis_country.dump (attached to this document) file (this links the 2 letter country code to country name. This version has had 'extra' countries added to it.)
pgsql gnis -f gnis_country.dump

v) Add the feature code descriptions (attached to this document)
psql gnis -f  feature_codes.sql

vi) load the sub-national unit names (ie. states/provinces/districts):
psql gnis -f  subnational.sql

v) fluff the data up so that it is joinable with the USA data:
create index gnis_country_indx on gnis_country (country_code);
create index gnis_int_types_indx on gnis_int_types (type_code);
create index sub_nat_indx on gnis_intern_subnational (country_code,subnational_code);

create index gnis_country_indx2 on gnis_country ( bpchar(country_code) );
create index gnis_int_types_indx2 on gnis_int_types (bpchar(type_code));
create index sub_nat_indx2 on gnis_intern_subnational (bpchar(country_code),bpchar(subnational_code));

vacuum analyse gnis_country;
vacuum analyse gnis_intern_subnational;
vacuum analyse gnis_int_types;



alter table gnis_international add column country_name text;
alter table gnis_international add column sub_national text;
alter table gnis_international add column type text;
alter table gnis_international add column full_name_lc text;



update gnis_international set full_name_lc = lower(full_name_nd),
                              country_name = (select name_nd from gnis_country WHERE gnis_country.country_code =country),
                           sub_national = (select name_nd from gnis_intern_subnational WHERE country=country_code AND gnis_intern_subnational.subnational_code =gnis_international.sub_national_code ),
                           type = (select name_nd from gnis_int_types WHERE  gnis_int_types.type_code = gnis_international.feature_designition_code)
                          ;


alter table gnis_international add column est_pop int;
update gnis_international set est_pop = NULL;

2. USA

i) combine all (60) states into one big file:

cat *DECI.TXT > ../ALL_DECI.TXT

NOTE: do NOT include HIST_FEATURES_DECI.TXT or POP_PLACES_DECI.TXT in this!
ii) load into database

perl ../parse_state.pl <ALL_DECI.TXT  > all_state.sql
 psql gnis -f all_state.sql >& all_state.out

NOTE: this converts non ASCII letters (original is likely iso-8859-1 (latin-1)) to the 'best' ASCII representation.

NOTE: the perl script is here parse_state.pl

iii) adjust unique ids

update gnis_usa set uniq_featcode = uniq_featcode+ 10000000;

iv) fluff up for capatibility
alter table gnis_usa add column country text;
alter table gnis_usa add column country_name text;
alter table gnis_usa rename column  statecode to sub_national;
alter table gnis_usa rename column  feature_type to type;
alter table gnis_usa add column full_name_lc text;

update gnis_usa set country = 'US', country_name ='United States',  full_name_lc = lower(full_name_nd);


update gnis_usa set type = 'City/Town' where type ='ppl';

Combine

CREATE table gnis as
SELECT full_name_nd as full_name,
       full_name_lc,
       sub_national,
       country_name,
       country as country_code,
       est_pop,
       type,
       uniq_featcode,
       the_geom
       FROM gnis_international;

;

INSERT into gnis
SELECT  full_name_nd as full_name,
       full_name_lc,
       sub_national,
              country_name,
              country as country_code,
              est_pop,
              type,
              uniq_featcode,
              the_geom
       FROM gnis_usa
;

World Population

perl parse_pop.pl < dataen.txt > out.sql
in postgis:
   \i out.sql
   update gnis_pop set the_geom= setsrid(the_Geom,4326);
   create index gnis_pop_spatial on gnis_pop using gist (the_Geom gist_geometry_ops);
   vacuum analyse;
   insert into geometry_columns values ('','','gnis_pop','the_geom',2,4326,'POINT');

subnational.sql at: http://docs.codehaus.org/download/attachments/43149/subnational.sql
The attachments to this page are available here or hit the "Page Operations" link on the left column (near the bottom).

Added by dblasby, last edited by dave.blasby on Feb 04, 2007  (view change)

Comments

javaguru@ieee.org says:

When updating the gnis_usa table to determine a city/town a better query would be

update gnis_usa set type = 'City/Town' where type ='ppl' and est_pop > 0;

 because any place associated with people not in a specific category (i.e. school, park, etc.) is marked as ppl.  an example would be subdivisions which typically have their name followed by "(subdivision)" and their type is ppl.

bourneaisyah says:

Hi,

I think the tutorial should be updated, because the link is broken for the subnational.sql

And I could not continue the tutorial again because I could not have the subnationa.sql

Thank you




View Attachments (9) Info