Google's new Geo Search (formerly named KML search but it's been expanded) is a great way to get data on the geospatial . We want to make it so GeoServers that people stand up are automatically searchable from Google Maps and Google Earth. So if you configure a layer in GeoServer, filling out all the appropriate metadata, then it should be exposed to geosearch.
The key to making every feature in GeoServer automatically exposed on Geo Search is known URLs. One thought is to make it so vector based super overlays are the way that crawling is done. This is nice, since the crawling is given a natural hierarchy, which is probably the one we want. The downside is that we don't yet know the best methods for vector super overlays, and where features are in the hierarchy may change. So for now we should make the concerns of crawling and display orthogonal. This document will focus on the problem of exposing data to Geo Search.
Our first aim should be to provide every feature with a canonical url. This should be stable over time. I like how atom does it with its 'self' relative url:
I believe we should include such a URL in most all our output. KML gives us a bit of a hook with it's atom:link. They seem to use it to credit the URL where it came from, but we can embed it to be the canonical location of each feature. Ideally this lets the search engine know that it's crawled it before, which should be helpful for when we get complex hierarchies of regions. We will want to return full features there, but we should have the rel link to notify the crawler that it may have already been crawled.
This will also let us return pages of kml that have many features in them. The crawler should hopefully become smart enough to index those as individual entities, instead of as a 'page'.
We should have a root document that is similar to a WFS Capabilities document, but is built to be crawled. It should be accessible in several different formats, just like FeatureServer. It should ideally also include all relevant meta information that's in the other capabilities documents. The default should probably be html, so it's readable by people. It can have a number of links to the various formats, and also links to represent itself as other formats:
Each representation of the root document should then in turn link down to the individual featureTypes/layers. These should link to the same features as the format it is served in. Those in turn can have rel='alternate' links to alternate format representations, if possible.
Each featureType/layer should have its own url:
These would also be available as the formats above
http://geoserver.org/geoserver/features/roads.html http://geoserver.org/geoserver/features/roads.kml http://geoserver.org/geoserver/features/roads.atom http://geoserver.org/geoserver/features/roads.json
(For further thoughts on what the roads.html page should look like, see REST Overview Page)
kml, atom and json should all just link to the first maybe 100 features, with links to the following 100, ect. Each feature listed in that list of 100 should have a link to its canonical url, ideally with an atom:link rel='self' link if the format supports it.
The top level features resource should also support the WFS key value pair parameters:
http://geoserver.org/geoserver/features/roads?maxfeatures=10 http://geoserver.org/geoserver/features/roads?maxfeatures=10&format=kml http://geoserver.org/geoserver/features/roads?maxfeatures=10&startindex=11 http://geoserver.org/geoserver/features/roads?srsname=EPSG:900913 http://geoserver.org/geoserver/features/roads?propertyname=numlanes,geom http://geoserver.org/geoserver/features/roads?featureversion=345 http://geoserver.org/geoserver/features/roads?sortby=numlanes http://geoserver.org/geoserver/features/roads?bbox=0,0,10,10 http://geoserver.org/geoserver/features/roads?filter=population>8000000
Note that filters are CQL filters, not the normal ogc xml filters. Also note that 'format' is used instead of 'outputformat'. The rest should be similar to WFS calls. These should be able to be combined in any way. (do we want to consider sld's here? It'd be useful for kml format). Maybe for just this interface we could also consider like png output, since it would just ask geoserver to render the result. We should also probably alias start-index and max-results to startindex and maxfeatures, to have it in line with gdata apis.
With WFS we already have stable identifiers, they're just not listed as stable urls. This should be easy to do:
and of course http://geoserver.org/geoserver/features/roads/1.kml http://geoserver.org/geoserver/features/roads/1.atom http://geoserver.org/geoserver/features/roads/1.json
This is the exact equivalent of the wfs call: http://geoserver.org/wfs?request=GetFeature&featureid=roads.1
This page presents far more than we need to get started. All that we really need is kml, and we don't need any querying except maxfeatures and startindex. And relative links in the KML to the next set of features.
notes: I think we could handle topp:restricted as topp/restricted. So you have /geoserver/features/restricted, which is a shortcut for /geoserver/features/topp/restricted. Whatever the default namespace is can be used without the extra /topp/ thing... We also might consider having two network links for each, one to be crawled, one to be viewed? The other thing we could do is have two top level KML documents, one to be crawled and one to be viewed. We also will have /geoserver/features/archsites.kml be an alias for the start-index=1&max-results=100, so we could consider just having those nicer versions listed as the url.
The result of following the archsites link would be as follows:
It would be cool if there was some sort of NetworkLink that we could put for the 'next' set of features, but I couldn't find a way to do that without the google earth client attempting to follow it and resolve it. That would probably be fairly disastrous with huge, huge datasets, though we could try it out. But it'd be nice if there was a link at the end of the dataset that let users hit 'next'. Also something to think about is how to go from one of individual kml markers to the full set - could do a link in the description perhaps, that users would actively click on.
Actually, looks like we can do that really nicely with something like this:
That gives a little link to click on for each placemark, that gets really nicely turned in to a network link. What we'll have to do is figure out a way to append that to 'description' outputs automatically, at least for the links that get crawled. Easiest would be a param like links=true, or maybe one of the format_options we use for KML? Perhaps could have a nicer link like 'part of archsets dataset'.
(the above is probably much more the way we want to do things, just keeping this around for its info on sitemaps, as we still may want to autogenerate those, ect.)
For a first we don't want to pollute the geo search index with tons of data, so we'll just put a KML sitemap at the base of GeoServer that talks about the layers and provides links to the WMS for them (and perhaps super-overlays?). There is a google maps api [blog post| http://googlemapsapi.blogspot.com/2007/01/get-more-traffic-to-your-maps-api-site.html] on this:
- Identify those features on your maps site which can be displayed as KML features.
- Convert those features into KML equivalents and publish them within one or more KML files.
- Create a sitemap.xml file that identifies links to all of your KML files.
- Place the sitemap.xml in the root directory of your site.
http://www.google.com/apis/maps/sitemap.html has some good information on setting up a kml sitemap. To do it with kml it's pretty basic, I think we just need to put the following type of thing at the root:
And then I believe we just need to do a KML file for each of our layers. Would be nice if we could somehow automatically exclude the demo files. Perhaps we need a 'publish' param that would expose the KML - and maybe even auto-generate a link from a wiki? And when someone hit the publish button we could do more checks to make sure they've filled out all their appropriate metadata?
I talked with the Geo Search PM, and we sketched out a rough version of what should be in the site for a GeoServer instance. Unfortunately I lost the paper that I had the notes on, but I think I can remember most. But the first pass would be to just do a KML file that contains meta information and links to other places.
- A KML polygon placemark of the bounding box for the dataset
- Name: The title of the featureType (could fall back on name if its not a nice title? Better yet we should have our default title generation in featureTypes use the namespace and not append 'Type'
- Address: Use the contact info stuff that we gather in the web admin and use in WMS
- Phone Number: Same as address
- Snippet: Start of the Abstract? Keywords?
- description: The abstract for the layer, perhaps a preview png (can make a WMS call of the right bounds). And then consider at least one of a link to the superoverlay, or a link to the full WMS? We can make a network link to the WMS... Superoverlays look better, but are also a bigger performance hit, at least until we get JTileCache (after which they perform better than normal WMS)
- LookAt: Derive in same way we do in WMS results?
- Metadata: A describeFeatureType response? Or a link to the fgdc or iso metadata document if it's listed?