rg-gray-sh.pngI attended a great repoze.bfg sprint last weekend. We discussed creating a foundation on top of repoze.bfg that would allow for a complex content management system to be built. Tres wrote down the requirements of an example system here


To summarize, in a typical large content management system, there are multiple sources of content. Some can come from rss feeds, others are authored, and still others are media on a filesystem. Frequently, the various types of content come from different departments, and they each have their own applications/workflows to work with their content.

cmf3k.png

We divided up the entire application into several sections: the content authoring environments, the content management system, and the retail, or client application.  There can be several content authoring environments, which represent the mini applications that are geared towards a particular type of content. The content repositories would expose a thin interface that would interact with content through content_ids unique across the system. The content management system manages how the content appears on the retail site. This involves url structure, layout of the content, and who gets to see what. And the retail application itself is the end user front of the entire application.

The code that was produced can be currently found here: http://svn.repoze.org/plite/trunk/

The name is not set in stone, so if it changes, then that link may break.

Creating a generic interface for a complex system is challenging. Going through interfaces that are too thin make client code more difficult, while making the interface too specific can preclude certain content repository implementations. We began creating the various pieces of the application in hopes that they would help drive certain interface decisions.  When it was time for me to go, a basic content repository was created that would store files/pages. The content management/retail aspects were yet to be fleshed out, although they’re probably further along now.

All in all, I found that I got a great deal out of the sprint. I personally haven’t had to work with sites of this size/complexity, so it was very useful for me to hear the different opinions on design tradeoffs.

The repoze.bfg framework itself had gotten better in a number of ways since I last looked at it. If you haven’t already, it’s definitely worth checking out. The code itself is of extremely high quality, being very clean and organized. Additionally, it’s very well tested.

I look forward to seeing everyone at the next sprint!

Filed August 25th, 2009 under sprint

As part of the GeoTrac project ( http://projects.openplans.org/GeoTrac/ ), I set off to be able to query on tickets via shapefiles ( http://projects.openplans.org/GeoTrac/ticket/78 ). I didn’t know anything about shapefiles before beginning this, and I still don’t know much, except that I’m not a big fan. So I consulted the font of knowledge: http://en.wikipedia.org/wiki/Shapefile

Shapefiles are several files that have different extensions but the same “first name”. Its the kind of convention someone would come up with if they were being “clever” (see http://www.statusq.org/archives/2006/11/03/1193/ ). Three files are mandatory:

* .shp — shape format; the feature geometry itself

* .shx — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly

* .dbf — attribute format; columnar attributes for each shape, in dBase III format (from http://en.wikipedia.org/wiki/Shapefile )

Unfortunately, these three files contain no projection information and so do not actually give you enough information to tell where the geometry is on a map. There is an optional .prj file * .prj — projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format (from http://en.wikipedia.org/wiki/Shapefile ). So while we didn’t have a .prj file to start off with or know the projection that we needed to figure out where the geometry lived on the map, this seemed the right approach to specifying geometry.

We wrote a Trac (see http://trac.edgewall.org ) component for the GeoTicket plugin ( http://trac-hacks.org/wiki/GeoTicketPlugin ) that will support querying tickets based on region: http://trac-hacks.org/svn/geoticketplugin/0.11/geoticket/regions.py

The import of the shapefile is done with shp2pgsql ( http://postgis.refractions.net/docs/ch04.html ). The resulting SQL is then read and used to create a table in the PostgreSQL database. The administrator of the GeoTrac site is then prompted which piece of metadata attached to the polygons should be used for query. The import was fairly easy to do. Next came the query itself. A SQL query was used to select the tickets within the region:

SELECT ticket FROM ticket_location WHERE ST_CONTAINS((SELECT the_geom FROM georegions WHERE commdist=212),
st_pointfromtext('POINT(' || longitude || ' ' || latitude || ')')); 

However, without specifying the SRIDs, this gives an error:

ERROR: Operation on two geometries with different SRIDs CONTEXT: SQL function "st_contains" statement 1 

This may be corrected by providing the SRID for the ticket location point:

SELECT ticket FROM ticket_location WHERE ST_CONTAINS((SELECT the_geom FROM georegions WHERE commdist=212),
st_pointfromtext('POINT(' || longitude || ' ' || latitude || ')', 4326)); 

However, by default, the SRID of the_geom is -1. So this gives the same error, and the previous case silently “succeeds”, though of course the projection is wrong you won’t get any data back. You can see the SRID from the geometry_columns table:

trac_fleem=# select * from geometry_columns;
f_table_catalog | f_table_schema | f_table_name | f_geometry_column | coord_dimension | srid | type
----------------+----------------+--------------+-------------------+-----------------+------+-------------- |
                  public         | georegions   | the_geom          | 2               | -1   | MULTIPOLYGON
(1 row) 

As a makeshift solution, I’ve included a text input to specify the SRID:

Import Shapefiles Upload Shapefiles Shapefile shape format (.shp) ____________________ Shapefile shape index format (.shx) ____________________ Shapefile attribute format (.dbf) ____________________ SRID ____________________ Submit

So, given the SRID, you *should* be able to transform the_geom to the SRID you care about, using `shp2pgsql -s`. Maybe. I haven’t finished the query yet [TODO!].

So the problem remains of how to get the SRID from the .prj file. After asking, we were given the .prj file:

PROJCS["NAD_1983_UTM_Zone_18N",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000],PARAMETER["False_Northing",0],PARAMETER["Central_Meridian",-75],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0],UNIT["Meter",1]] 

Ideally, in addition to allowing the GeoTrac administrator to specify the SRID (especially for the case where there is no .prj file), it would seem plausible to obtain the SRID from the .prj information.

So I looked around and tried a bunch of things and was refered to a project called GDAL: http://gdal.org/ . Specifically, I was told that ogr2ogr would be useful for shapefile vector data ( see http://gdal.org/ogr2ogr.html ). GDAL has python bindings ( http://trac.osgeo.org/gdal/wiki/GdalOgrInPython ), but unfortunately, it is not easy_install-able:

(Trac-2.4)> easy_install GDAL
Searching for GDAL
Reading http://pypi.python.org/simple/GDAL/
Reading http://www.gdal.org
Best match: GDAL 1.6.1
Downloading http://pypi.python.org/packages/source/G/GDAL/GDAL-1.6.1.tar.gz#md5=e8671d4a77041cf0f7a027f3f3e8280c
Processing GDAL-1.6.1.tar.gz
Running GDAL-1.6.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-jFd9bm/GDAL-1.6.1/egg-dist-tmp-MJllDq numpy include /home/jhammel/Trac-2.4/lib/python2.4/site-packages/numpy-1.3.0-py2.4-linux-i686.egg/numpy/core/include
Could not run gdal-config!!!!
gcc: error trying to exec 'cc1plus': execvp: No such file or directory
error: Setup script exited with error:
command 'gcc' failed with exit status 1 

I haven’t been able to compile GDAL by hand, though it is available on modern package systems. So at best, requiring GDAL makes a component optional (see http://peak.telecommunity.com/DevCenter/setuptools#declaring-extras-optional-features-with-their-own-dependencies and the terribly apt http://www.reddit.com/r/Python/comments/91bnb/the_terrible_magic_of_setuptools/c0b3lhe ).

So how do you use ogr? An example is given here: http://www.gis.usu.edu/~chrisg/python/2009/lectures/ospy_demo1.py

So we try this on our own shapefile:

>>> import ogr
>>> driver = ogr.GetDriverByName('ESRI Shapefile')
>>> ds = driver.Open("sen02.shp") 

Docstrings are not available for much of the python bindings:

class DataSource |
Methods defined here: | |
CopyLayer(self, src_layer, new_name, options=[]) | |
CreateLayer(self, name, srs=None, geom_type=0, options=[]) | |
DeleteLayer(self, iLayer) | |
Dereference(self) | |
 Destroy(self) | |
ExecuteSQL(self, statement, region='NULL', dialect='') | |
GetDriver(self) | Returns the driver of the datasource | |
GetLayer(self, iLayer=0) | Return the layer given an index or a name | |
GetLayerByName(self, name) | |
GetLayerCount(self) | Returns the number of layers on the datasource | |
GetName(self) | Returns the name of the datasource | |
GetRefCount(self) | |
GetSummaryRefCount(self) | |
Reference(self) | |
Release(self) | |
ReleaseResultSet(self, layer) | |
TestCapability(self, cap) |
Test the capabilities of the DataSource. |
See the constants at the top of ogr.py | |
__getitem__(self, value) | Support dictionary, list, and slice -like access to the datasource. |
ds[0] would return the first layer on the datasource. | ds['aname'] would return the layer named "aname".
| ds[0:4] would return a list of the first four layers. | |
__init__(self, obj=None) | | __len__(self) | Returns the number of layers on the datasource 

So it is hard to tell how to get the SRID.

A manual solution is available at: http://help.nceas.ucsb.edu/PostGIS#Import_a_shapefile_into_a_PostGIS_database

The key is the first string, NAD_1983_UTM_Zone_18N. If I do a query like the one in http://help.nceas.ucsb.edu/PostGIS#Import_a_shapefile_into_a_PostGIS_database I get a row:

trac_fleem=# select * from spatial_ref_sys where srtext like '%NAD83 / UTM zone 18N%';
srid   | auth_name | auth_srid | srtext                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | proj4text
-------+-----------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------
26918  | EPSG      | 26918     | PROJCS["NAD83 / UTM zone 18N",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-75],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AUTHORITY["EPSG","26918"]] | +proj=utm +zone=18 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
(1 row) 

In other words, I can tell, as a human being looking at the file, that the `WHERE srtext LIKE` clause should be transformed from the string “NAD_1983_UTM_Zone_18N” to the string %NAD83 / UTM zone 18N%’. But while you could tell a computer how to do this….it would either be hard, or I don’t know how to do it (and couldn’t find any good documentation).

One of the takeaways from this is that this is hard and it shouldn’t be. I don’t know enough about the Geo world to say exactly why this is. Maybe the .prj file should be able to support SRIDs. Maybe `shp2pgsql` should figure out the SRID from the .prj file. Maybe the string should be canonical so that select from spatial_ref_sys query could be more robust, assuming I figured out how to parse the .prj file.

Filed August 21st, 2009 under Uncategorized