As part of the GeoTrac project ( http://projects.openplans.org/GeoTrac/ ), I set off to be able to query on tickets via shapefiles ( http://projects.openplans.org/GeoTrac/ticket/78 ). I didn’t know anything about shapefiles before beginning this, and I still don’t know much, except that I’m not a big fan. So I consulted the font of knowledge: http://en.wikipedia.org/wiki/Shapefile
Shapefiles are several files that have different extensions but the same “first name”. Its the kind of convention someone would come up with if they were being “clever” (see http://www.statusq.org/archives/2006/11/03/1193/ ). Three files are mandatory:
* .shp — shape format; the feature geometry itself
* .shx — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly
* .dbf — attribute format; columnar attributes for each shape, in dBase III format (from http://en.wikipedia.org/wiki/Shapefile )
Unfortunately, these three files contain no projection information and so do not actually give you enough information to tell where the geometry is on a map. There is an optional .prj file * .prj — projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format (from http://en.wikipedia.org/wiki/Shapefile ). So while we didn’t have a .prj file to start off with or know the projection that we needed to figure out where the geometry lived on the map, this seemed the right approach to specifying geometry.
We wrote a Trac (see http://trac.edgewall.org ) component for the GeoTicket plugin ( http://trac-hacks.org/wiki/GeoTicketPlugin ) that will support querying tickets based on region: http://trac-hacks.org/svn/geoticketplugin/0.11/geoticket/regions.py
The import of the shapefile is done with shp2pgsql ( http://postgis.refractions.net/docs/ch04.html ). The resulting SQL is then read and used to create a table in the PostgreSQL database. The administrator of the GeoTrac site is then prompted which piece of metadata attached to the polygons should be used for query. The import was fairly easy to do. Next came the query itself. A SQL query was used to select the tickets within the region:
SELECT ticket FROM ticket_location WHERE ST_CONTAINS((SELECT the_geom FROM georegions WHERE commdist=212),
st_pointfromtext('POINT(' || longitude || ' ' || latitude || ')'));
However, without specifying the SRIDs, this gives an error:
ERROR: Operation on two geometries with different SRIDs CONTEXT: SQL function "st_contains" statement 1
This may be corrected by providing the SRID for the ticket location point:
SELECT ticket FROM ticket_location WHERE ST_CONTAINS((SELECT the_geom FROM georegions WHERE commdist=212),
st_pointfromtext('POINT(' || longitude || ' ' || latitude || ')', 4326));
However, by default, the SRID of the_geom is -1. So this gives the same error, and the previous case silently “succeeds”, though of course the projection is wrong you won’t get any data back. You can see the SRID from the geometry_columns table:
trac_fleem=# select * from geometry_columns;
f_table_catalog | f_table_schema | f_table_name | f_geometry_column | coord_dimension | srid | type
----------------+----------------+--------------+-------------------+-----------------+------+-------------- |
public | georegions | the_geom | 2 | -1 | MULTIPOLYGON
(1 row)
As a makeshift solution, I’ve included a text input to specify the SRID:
Import Shapefiles Upload Shapefiles Shapefile shape format (.shp) ____________________ Shapefile shape index format (.shx) ____________________ Shapefile attribute format (.dbf) ____________________ SRID ____________________ Submit
So, given the SRID, you *should* be able to transform the_geom to the SRID you care about, using `shp2pgsql -s`. Maybe. I haven’t finished the query yet [TODO!].
So the problem remains of how to get the SRID from the .prj file. After asking, we were given the .prj file:
PROJCS["NAD_1983_UTM_Zone_18N",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000],PARAMETER["False_Northing",0],PARAMETER["Central_Meridian",-75],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0],UNIT["Meter",1]]
Ideally, in addition to allowing the GeoTrac administrator to specify the SRID (especially for the case where there is no .prj file), it would seem plausible to obtain the SRID from the .prj information.
So I looked around and tried a bunch of things and was refered to a project called GDAL: http://gdal.org/ . Specifically, I was told that ogr2ogr would be useful for shapefile vector data ( see http://gdal.org/ogr2ogr.html ). GDAL has python bindings ( http://trac.osgeo.org/gdal/wiki/GdalOgrInPython ), but unfortunately, it is not easy_install-able:
(Trac-2.4)> easy_install GDAL
Searching for GDAL
Reading http://pypi.python.org/simple/GDAL/
Reading http://www.gdal.org
Best match: GDAL 1.6.1
Downloading http://pypi.python.org/packages/source/G/GDAL/GDAL-1.6.1.tar.gz#md5=e8671d4a77041cf0f7a027f3f3e8280c
Processing GDAL-1.6.1.tar.gz
Running GDAL-1.6.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-jFd9bm/GDAL-1.6.1/egg-dist-tmp-MJllDq numpy include /home/jhammel/Trac-2.4/lib/python2.4/site-packages/numpy-1.3.0-py2.4-linux-i686.egg/numpy/core/include
Could not run gdal-config!!!!
gcc: error trying to exec 'cc1plus': execvp: No such file or directory
error: Setup script exited with error:
command 'gcc' failed with exit status 1
I haven’t been able to compile GDAL by hand, though it is available on modern package systems. So at best, requiring GDAL makes a component optional (see http://peak.telecommunity.com/DevCenter/setuptools#declaring-extras-optional-features-with-their-own-dependencies and the terribly apt http://www.reddit.com/r/Python/comments/91bnb/the_terrible_magic_of_setuptools/c0b3lhe ).
So how do you use ogr? An example is given here: http://www.gis.usu.edu/~chrisg/python/2009/lectures/ospy_demo1.py
So we try this on our own shapefile:
>>> import ogr
>>> driver = ogr.GetDriverByName('ESRI Shapefile')
>>> ds = driver.Open("sen02.shp")
Docstrings are not available for much of the python bindings:
class DataSource |
Methods defined here: | |
CopyLayer(self, src_layer, new_name, options=[]) | |
CreateLayer(self, name, srs=None, geom_type=0, options=[]) | |
DeleteLayer(self, iLayer) | |
Dereference(self) | |
Destroy(self) | |
ExecuteSQL(self, statement, region='NULL', dialect='') | |
GetDriver(self) | Returns the driver of the datasource | |
GetLayer(self, iLayer=0) | Return the layer given an index or a name | |
GetLayerByName(self, name) | |
GetLayerCount(self) | Returns the number of layers on the datasource | |
GetName(self) | Returns the name of the datasource | |
GetRefCount(self) | |
GetSummaryRefCount(self) | |
Reference(self) | |
Release(self) | |
ReleaseResultSet(self, layer) | |
TestCapability(self, cap) |
Test the capabilities of the DataSource. |
See the constants at the top of ogr.py | |
__getitem__(self, value) | Support dictionary, list, and slice -like access to the datasource. |
ds[0] would return the first layer on the datasource. | ds['aname'] would return the layer named "aname".
| ds[0:4] would return a list of the first four layers. | |
__init__(self, obj=None) | | __len__(self) | Returns the number of layers on the datasource
So it is hard to tell how to get the SRID.
A manual solution is available at: http://help.nceas.ucsb.edu/PostGIS#Import_a_shapefile_into_a_PostGIS_database
The key is the first string, NAD_1983_UTM_Zone_18N. If I do a query like the one in http://help.nceas.ucsb.edu/PostGIS#Import_a_shapefile_into_a_PostGIS_database I get a row:
trac_fleem=# select * from spatial_ref_sys where srtext like '%NAD83 / UTM zone 18N%';
srid | auth_name | auth_srid | srtext | proj4text
-------+-----------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------
26918 | EPSG | 26918 | PROJCS["NAD83 / UTM zone 18N",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-75],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AUTHORITY["EPSG","26918"]] | +proj=utm +zone=18 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
(1 row)
In other words, I can tell, as a human being looking at the file, that the `WHERE srtext LIKE` clause should be transformed from the string “NAD_1983_UTM_Zone_18N” to the string %NAD83 / UTM zone 18N%’. But while you could tell a computer how to do this….it would either be hard, or I don’t know how to do it (and couldn’t find any good documentation).
One of the takeaways from this is that this is hard and it shouldn’t be. I don’t know enough about the Geo world to say exactly why this is. Maybe the .prj file should be able to support SRIDs. Maybe `shp2pgsql` should figure out the SRID from the .prj file. Maybe the string should be canonical so that select from spatial_ref_sys query could be more robust, assuming I figured out how to parse the .prj file.