We have expertise in a number of different frameworks, but one that hasn’t caught on much historically at topp has been django. I can think of a variety of reasons for this, but despite these, I think taking a fresh look at django is worthwhile for topp.

Django comes with a lot of batteries included. A lot of core decisions have been made about how things interact and connect. What you gain for buying into those decisions is a whole lot of functionality. These core decisions allow many higher level pieces to be written, which adds up to a whole lot of functionality quickly.

On top of this, pinax has been gaining more momentum as well. It adds another layer on top of the already rich django core, which provides many of the features that typical web sites demand. There’s a good introductory talk to pinax here: http://blip.tv/file/1952623

I gave pinax an honest shot, and there were a couple of things that impressed me about it. First of all, the build worked consistently, and so did all of the project templates that I’ve tried, probably because they’re based on pip and virtualenv :) Secondly, any kinds of trivial modifications that I attempted, were in fact trivial to do. It almost felt like a cms to me, but if I wanted to change some text, style, functionality, you name it, it was pretty easy to do so without having to understand any complex systems.

In the talk linked to above, it’s mentioned that one of the reasons for pinax’s good integration story is that most cms’s typically have an underlying system for everything, and then you write your own hooks to plug into that system. Pinax inherits the core underlying system from django, but instead of building up its own system, it flips the perspective around and tries to provide all the pieces that are needed, and controlling how those pieces connect is application specific. In the end, most of the common changes are intuitive.

Specifically to topp, I think django/pinax is important for a number of reasons:

1. It’s popular, and it looks like that popularity is growing. If we want to build tools the masses will use, we should work with the masses, instead of trying to convince them otherwise.

2. It has a strong geo component. While it’s certainly possible with other frameworks and back-ends, django has very good integration with its orm geo-wise.

3. Not only are the batteries included, it comes with several flash lights to choose from.

It looks like we’re moving in the direction of creating more prototype style web apps. With many of these apps, we try to focus on what makes it unique, and don’t really care much about the other mundane details. Interestingly enough, this is in fact pinax’s tagline.

To sum up, topp is still imho a light, flexible organization capable of changing directions quickly. While we figure out the answers to some of our tougher internal questions, we may very well need to move quickly, in a hurry. Taking advantage of django/pinax can be one way for us to do so.

Filed November 3rd, 2009 under Uncategorized

= LDAP and Trac =

In desperation, since the problem-space is too vast to understand and
I am making no forward progress, I am taking to writing to hopefully
at least record my efforts.

My mission is to get Trac + AccountManager to authenticate using
LDAP. I know people do this, so….how hard can it be? Well, its
probably one of the vast majority of things that is easy if you know
what you’re doing and very hard if you don’t. I don’t know what I’m
doing, so I’ve devoted several hours to this so far.

So I go to the AccountManager wiki page and look at the section on
LDAP ( http://trac-hacks.org/wiki/AccountManagerPlugin#LDAP ).
Its…well, after reading I know I’m going to be in for the ride:

"""
Check LDAPAuthStore regarding how to link LdapPlugin to AccountManagerPlugin.
"""

Well, it’d be cool if LDAPAuthStore linked to a wiki page…but it
actually links to http://trac-hacks.org/ticket/1147 , which is, as one
might expect, a very long ticket.

Aside: Don’t put documentation in tickets! Its a horrible idea.
Documentation == something humans can read that lets them know how
things work. Tickets are conversations (like mailing lists and blog
posts). If I wanted to learn about cars, I probably wouldn’t start by
listening to Henry Ford’s conversations with Alexander Malcomson.

So I read the ticket, including the very helpful recollection of CC
changes and all of that. There is a python module — actually,
thirteen modules and patches — attached to the ticket that is an LDAP
authentication backend. It also requires the LdapPlugin (more on that
later).

So I read the ticket. And I read again, trying to figure out what to
do. Well, one helpful tidbit:

"""
Leave the apache setting same as after AccountManager is
installed. Don’t follow LdapPlugin’s apache setting.

Follow LdapPlugin’s trac.ini setting. didn’t use its
Permission/Groups part. It requires customize attributes
(tracperm) to be added to the LDAP server schema.
"""

Several hundred lines later, a stnank is documented
(http://trac-hacks.org/ticket/1147#comment:11 and
http://trac-hacks.org/ticket/1147#comment:12) :

"""
01/27/09 09:53:41 changed by hoffmann@…

  • status changed from closed to reopened.
  • resolution deleted.

Replying to iamer@open-craft.com:

It is working for me, can you please check your trac
configuration, and try to describe the problem more clearly ? Also
turn on debugging and see if there are any related messages
there. I am not the original author of the patches, I just merged
them and did a little modification.

Same dor me, it is not working. I am getting ERROR: Skipping
"acct_mgr.ldap_store = acct_mgr.ldap_store": (can’t import "No module
named tracusermanager.api") inside my logfile. I am using trac 0.11.2
Might that bew the problem?
01/27/09 10:15:49 changed by anonymous ¶

  • status changed from reopened to closed.
  • resolution set to fixed.

Installing the UserManagerPlugin resolved the issue
"""

Ah, so I need the UserManagerPlugin too. Um, even though all I really
want (right now, anyway) is authentication. Hmm, well, good to know.

Way way way later on in the ticket (
http://trac-hacks.org/ticket/1147#comment:26 ) mgood rises to the
level of sanity (too bad his suggestion was never taken):

"""
05/20/09 06:25:17 changed by mgood ¶

  • status changed from reopened to closed.
  • type changed from defect to enhancement.
  • resolution set to wontfix.
  • summary changed from IndexError: list index out of range to Add
  • LDAP authentication backend.

Please create a separate plugin for this backend. I’d rather not add
the extra dependencies that this requires, but it could benefit from
being in version control and having its own issue list. It should make
it easier if users can install that plugin rather than trying to keep
track of the all these patches.
"""

Yeah, that would be nice. Too bad after that things proceeded as
normal instead of discussing this further. The last comment on the
ticket ( http://trac-hacks.org/ticket/1147#comment:32 )is pretty telling too (reminds me of the boat I am in):

"""
08/15/09 23:17:14 changed by rgrant@…

Is there some concise list of tasks to perform on a new install of
TRAC to get AccountManager working with LDAP? This forum seems to be
focused on fixing bugs in existing installs.
"""

In an attempt to get out of further spelunking, I followed a
lead in a recent email to trac-users and looked at the
TracLDAPPlugin. This started off challenging and ended with no
results. Firstly, its not hosted on trac-hacks.org, so I had to
resort to google to find it. Turns out it is here:
http://pypi.python.org/pypi/TracLDAPAuth/ . Interestingly, the pypi
page says the homepage is on trac-hacks:
http://trac-hacks.org/wiki/LDAPAuthPlugin . But going to that page,
the first line is "I’m not the developer or maintainer of the
LDAPAuthPlugin, and this is not a real reference…". Wonderful. But
looking at the plugin, it looks like exactly what I want. Instead of
being a UserManager tie in (which is more than I need, at least right
now), the TracLDAPAuth plugin is only an IPasswordStore for
AccountManager, which is exactly what I want.

So I install the plugin, which goes rather smoothly except installing
python-ldap, which of course fails to compile the C extensions due to
a missing .h file. python, as a language designed for human beings
rather than computers, should really solve this problem instead of
just invoking a C compiler and running off into the blue, or if not
"solve" at least "make better". Like a nice warning, "Oh, you don’t
have this needed .h file. Well, you should get it here." That would
be friggin nice. But I digress. So I get bored and apt-get install
python-ldap, which of course isn’t listed in a dependency of setup.py
(and neither is AccountManager). The installing the plugin works
smoothly. I fill out my [ldap] section (correctly, mind you, and yes,
I’m sure because I tested this later) and enable the plugin (correctly
again) and I go to login. Nothing. Denied. So I look at the
check_password code. Its rather short:

def check_password(self, user, password):

self.log.debug(’LDAPAuth: Checking password for user %s’,
user)
try:

l = ldap.open(self.server_host)
bind_name = (self.bind_dn%user).encode(’utf8′)
self.log.debug(’LDAPAuth: Using bind DN %s’, bind_name)
l.simple_bind_s(bind_name, password.encode(’utf8′))
return True

except ldap.LDAPError:
return False

Hell, short enough to debug by hand. So I do. The following results:

>>> l.simple_bind_s(bind_name, password.encode('utf8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.4/site-packages/ldap/ldapobject.py", line
  199, in simple_bind_s
    return self.result(msgid,all=1,timeout=self.timeout)
  File "/usr/lib/python2.4/site-packages/ldap/ldapobject.py", line
  428, in result
    res_type,res_data,res_msgid = self.result2(msgid,all,timeout)
  File "/usr/lib/python2.4/site-packages/ldap/ldapobject.py", line
  432, in result2
    res_type, res_data, res_msgid, srv_ctrls =
    self.result3(msgid,all,timeout)
  File "/usr/lib/python2.4/site-packages/ldap/ldapobject.py", line
  438, in result3
    rtype, rdata, rmsgid, serverctrls =
    self._ldap_call(self._l.result3,msgid,all,timeout)
  File "/usr/lib/python2.4/site-packages/ldap/ldapobject.py", line 97,
  in _ldap_call
    result = func(*args,**kwargs)
ldap.CONFIDENTIALITY_REQUIRED: {'info': 'TLS confidentiality
required', 'desc': 'Confidentiality required'}

Oh yeah, we have TLS, which the TracLDAPAuth plugin doesn’t
enable. But since its not on trac-hacks, my first step is to get it
there so I can ticket and/or make a vendor branch. So I’ll play with
TLS via the LDAP module. I look for the docs and find them at
http://www.python-ldap.org/ . The first question in the FAQ isn’t
exactly promising:

Q: Is python-ldap yet another abandon-ware project on SourceForge?

Yiy. Well, let’s ignore that for now and look at the API. There is
an ill-documented start_tls_s function that looks like what I want:
http://www.python-ldap.org/doc/html/ldap.html#ldap.LDAPObject.start_tls_s
. Let’s try it:

>>> l.start_tls_s()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.4/site-packages/ldap/ldapobject.py", line
  528, in start_tls_s
    return self._ldap_call(self._l.start_tls_s)
  File "/usr/lib/python2.4/site-packages/ldap/ldapobject.py", line 97,
  in _ldap_call
    result = func(*args,**kwargs)
ldap.CONNECT_ERROR: {'desc': 'Connect error'}

Hmm, well, that’s awful. But I try it on my server and it magically
works!

Well, what’s the difference? Running strace, strace python -c
‘import ldap; l = ldap.open("ldap-master.openplans.org");
l.start_tls_s()’
yields the difference that on the server there is an
open call that reads from /etc/openldap.ldap.conf and on my box it
reads from /etc/ldap/ldap.conf. It does not try the other file in
either case, and no other relevent configuration (that I can tell,
anyway) is read after the python is loaded. I installed all of the
openldap packages and still on my machine it tries to read from
/etc/ldap/ldap.conf.

I mailed the author of TracLDAPAuth and requested his permission to
put it on trac-hacks so that I may more sanely tackle this issue. I
also asked the mailing list (link not sited as google groups fails to
put them in the email messages) what plugin I should use for LDAP. I
was nudged towards http://trac-hacks.org/ticket/1147 . So I decided
to bark down that road again.

I follow mgood’s suggestion and make a real host-to-live plugin out of
it and install and enabled for my Trac instance. I spent several
hours trying to get this to work with my LDAP server. No go. Not
sure if the plugin is fuxored or if the LdapPlugin is fuxored or if I
just don’t know enough about Ldap to configure it. But that this
point, I had wasted three days on the problem and was tired of it. I
did do something nice and put my plugin on
http://trac-hacks.org/wiki/LdapAuthStorePlugin so that tickets could
be filed against it, someone could adopt it, it could be versioned,
and all of those nice things associated with sane software
development. If you want it, just mail me and its yours.

So figuring that http://pypi.python.org/pypi/TracLDAPAuth/ seemed to
magically work on the servers…wait, an aside. When things magically
work — that is, they work and I don’t understand why, or in this case
they work on our PRODUCTION SERVERS and not on my development and
testing box, I get really worried. Quick + dirty says, "It works,
don’t worry about it." But I’ve been doing this long enough to know
that a tiger lurks in that closet. Meaning: lots more hours when
things stop magically working, possible unknown side-effects, possible
security problems (I mean, we are dealing with LDAP here…like, you
know, that thing we keep our passwords in. Let’s be careful, shall
we?), etc. Well, back to topic, I could rant all day.

So figuring that http://pypi.python.org/pypi/TracLDAPAuth/ seemed to
magically work on the servers, I decided to put this on trac-hacks and
hopefully the author wouldn’t complain too much and then add the two
lines necessary to support TLS and other fixes that might be
necessary. So I did: http://trac-hacks.org/wiki/TracLdapAuthPlugin
. And I emailed the author telling him what I did. I hope he’ll take
ownership or at least won’t mind. I try to add python-ldap
(remember that messy thing?) to install_requires in the setup.py
file. Well, that didn’t work so well, so I had to add the following
code to the top of the file:

try:
import ldap
except ImportError:
print """The python-ldap package isn’t installed on your system
(import ldap failed). I would just put this in install_requires,
but

I do have python-ldap on my system and I get this:

{{{
Processing dependencies for TracLDAPAuth==1.0
Searching for python-ldap==2.3.1
Reading http://pypi.python.org/simple/python-ldap/
Reading http://python-ldap.sourceforge.net/
Reading
http://sourceforge.net/project/showfiles.php?group_id=2072&package_id=2011
No local packages or download links found for python-ldap==2.3.1
error: Could not find suitable distribution for
Requirement.parse(’python-ldap==2.3.1′)
}}}

Well, that’s awful. If you know how to fix this, please file a ticket
at
http://trac-hacks.org/newticket?component=TracLdapAuthPlugin&owner=k0s
"""

System Message: ERROR/3 (trac-ldap.txt, line 285)

Unexpected indentation.

sys.exit(1)

Well, such is the state of python packaging (FIX IT! FIX IT! FIX
IT!). And I added the two lines that will add TLS (
http://trac-hacks.org/changeset/6606 ). So I’m going to mail TOPP
Labs Operations just to make sure we want to test and develop
code directly related to our security.

Looking at http://trac-hacks.org/tags/’ldap‘ , there are at least a
few other related resources in this twisted web:

  • http://trac-hacks.org/ticket/1600 : a patch for AccountManager that
    also claims to provide LDAP authentication. Haven’t looked at it
  • http://trac-hacks.org/wiki/AccountLdapPlugin : "Allows you to
    change your password defined in LDAP. Also moved the basic
    properties of LDAP (user and mail) to the corresponding properties
    in Trac". So it doesn’t claim to do auth, but it is heavily
    overlapping with the above patches and plugins that do
  • LdapPlugin : "LDAP support with group management has been added as
    a Trac extension. This extension enables to use existing LDAP
    groups to grant permissions rather than defining permissions for
    every single user on the system." So again no support for auth, but
    other plugins depend on it. Ho hum. In my opinion, auth should go
    with this plugin. If people don’t want to use the auth
    component, then don’t enable it.
  • http://trac-hacks.org/wiki/TracSysgroupsPlugin : "A permission
    system-group provider for Trac. It asks linux / unix system to
    which groups a validated user belongs. If one of this groups
    matches a permission group name created with trac-admin permission
    add command, these permissions are enabled for logged-on user."
    So I’m not sure why its tagged with "ldap", save for the loose
    association that unix can use LDAP for authentication and groups.

I also find it "funny" that AccountManager is tagged with "ldap" even
though it doesn’t support it OOTB.

So concludes my foray in LDAP + Trac. I’m still confused, but maybe,
just maybe you’ll be less confused than you were before upon reading this.

Filed September 28th, 2009 under Uncategorized

As part of the GeoTrac project ( http://projects.openplans.org/GeoTrac/ ), I set off to be able to query on tickets via shapefiles ( http://projects.openplans.org/GeoTrac/ticket/78 ). I didn’t know anything about shapefiles before beginning this, and I still don’t know much, except that I’m not a big fan. So I consulted the font of knowledge: http://en.wikipedia.org/wiki/Shapefile

Shapefiles are several files that have different extensions but the same “first name”. Its the kind of convention someone would come up with if they were being “clever” (see http://www.statusq.org/archives/2006/11/03/1193/ ). Three files are mandatory:

* .shp — shape format; the feature geometry itself

* .shx — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly

* .dbf — attribute format; columnar attributes for each shape, in dBase III format (from http://en.wikipedia.org/wiki/Shapefile )

Unfortunately, these three files contain no projection information and so do not actually give you enough information to tell where the geometry is on a map. There is an optional .prj file * .prj — projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format (from http://en.wikipedia.org/wiki/Shapefile ). So while we didn’t have a .prj file to start off with or know the projection that we needed to figure out where the geometry lived on the map, this seemed the right approach to specifying geometry.

We wrote a Trac (see http://trac.edgewall.org ) component for the GeoTicket plugin ( http://trac-hacks.org/wiki/GeoTicketPlugin ) that will support querying tickets based on region: http://trac-hacks.org/svn/geoticketplugin/0.11/geoticket/regions.py

The import of the shapefile is done with shp2pgsql ( http://postgis.refractions.net/docs/ch04.html ). The resulting SQL is then read and used to create a table in the PostgreSQL database. The administrator of the GeoTrac site is then prompted which piece of metadata attached to the polygons should be used for query. The import was fairly easy to do. Next came the query itself. A SQL query was used to select the tickets within the region:

SELECT ticket FROM ticket_location WHERE ST_CONTAINS((SELECT the_geom FROM georegions WHERE commdist=212),
st_pointfromtext('POINT(' || longitude || ' ' || latitude || ')')); 

However, without specifying the SRIDs, this gives an error:

ERROR: Operation on two geometries with different SRIDs CONTEXT: SQL function "st_contains" statement 1 

This may be corrected by providing the SRID for the ticket location point:

SELECT ticket FROM ticket_location WHERE ST_CONTAINS((SELECT the_geom FROM georegions WHERE commdist=212),
st_pointfromtext('POINT(' || longitude || ' ' || latitude || ')', 4326)); 

However, by default, the SRID of the_geom is -1. So this gives the same error, and the previous case silently “succeeds”, though of course the projection is wrong you won’t get any data back. You can see the SRID from the geometry_columns table:

trac_fleem=# select * from geometry_columns;
f_table_catalog | f_table_schema | f_table_name | f_geometry_column | coord_dimension | srid | type
----------------+----------------+--------------+-------------------+-----------------+------+-------------- |
                  public         | georegions   | the_geom          | 2               | -1   | MULTIPOLYGON
(1 row) 

As a makeshift solution, I’ve included a text input to specify the SRID:

Import Shapefiles Upload Shapefiles Shapefile shape format (.shp) ____________________ Shapefile shape index format (.shx) ____________________ Shapefile attribute format (.dbf) ____________________ SRID ____________________ Submit

So, given the SRID, you *should* be able to transform the_geom to the SRID you care about, using `shp2pgsql -s`. Maybe. I haven’t finished the query yet [TODO!].

So the problem remains of how to get the SRID from the .prj file. After asking, we were given the .prj file:

PROJCS["NAD_1983_UTM_Zone_18N",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000],PARAMETER["False_Northing",0],PARAMETER["Central_Meridian",-75],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0],UNIT["Meter",1]] 

Ideally, in addition to allowing the GeoTrac administrator to specify the SRID (especially for the case where there is no .prj file), it would seem plausible to obtain the SRID from the .prj information.

So I looked around and tried a bunch of things and was refered to a project called GDAL: http://gdal.org/ . Specifically, I was told that ogr2ogr would be useful for shapefile vector data ( see http://gdal.org/ogr2ogr.html ). GDAL has python bindings ( http://trac.osgeo.org/gdal/wiki/GdalOgrInPython ), but unfortunately, it is not easy_install-able:

(Trac-2.4)> easy_install GDAL
Searching for GDAL
Reading http://pypi.python.org/simple/GDAL/
Reading http://www.gdal.org
Best match: GDAL 1.6.1
Downloading http://pypi.python.org/packages/source/G/GDAL/GDAL-1.6.1.tar.gz#md5=e8671d4a77041cf0f7a027f3f3e8280c
Processing GDAL-1.6.1.tar.gz
Running GDAL-1.6.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-jFd9bm/GDAL-1.6.1/egg-dist-tmp-MJllDq numpy include /home/jhammel/Trac-2.4/lib/python2.4/site-packages/numpy-1.3.0-py2.4-linux-i686.egg/numpy/core/include
Could not run gdal-config!!!!
gcc: error trying to exec 'cc1plus': execvp: No such file or directory
error: Setup script exited with error:
command 'gcc' failed with exit status 1 

I haven’t been able to compile GDAL by hand, though it is available on modern package systems. So at best, requiring GDAL makes a component optional (see http://peak.telecommunity.com/DevCenter/setuptools#declaring-extras-optional-features-with-their-own-dependencies and the terribly apt http://www.reddit.com/r/Python/comments/91bnb/the_terrible_magic_of_setuptools/c0b3lhe ).

So how do you use ogr? An example is given here: http://www.gis.usu.edu/~chrisg/python/2009/lectures/ospy_demo1.py

So we try this on our own shapefile:

>>> import ogr
>>> driver = ogr.GetDriverByName('ESRI Shapefile')
>>> ds = driver.Open("sen02.shp") 

Docstrings are not available for much of the python bindings:

class DataSource |
Methods defined here: | |
CopyLayer(self, src_layer, new_name, options=[]) | |
CreateLayer(self, name, srs=None, geom_type=0, options=[]) | |
DeleteLayer(self, iLayer) | |
Dereference(self) | |
 Destroy(self) | |
ExecuteSQL(self, statement, region='NULL', dialect='') | |
GetDriver(self) | Returns the driver of the datasource | |
GetLayer(self, iLayer=0) | Return the layer given an index or a name | |
GetLayerByName(self, name) | |
GetLayerCount(self) | Returns the number of layers on the datasource | |
GetName(self) | Returns the name of the datasource | |
GetRefCount(self) | |
GetSummaryRefCount(self) | |
Reference(self) | |
Release(self) | |
ReleaseResultSet(self, layer) | |
TestCapability(self, cap) |
Test the capabilities of the DataSource. |
See the constants at the top of ogr.py | |
__getitem__(self, value) | Support dictionary, list, and slice -like access to the datasource. |
ds[0] would return the first layer on the datasource. | ds['aname'] would return the layer named "aname".
| ds[0:4] would return a list of the first four layers. | |
__init__(self, obj=None) | | __len__(self) | Returns the number of layers on the datasource 

So it is hard to tell how to get the SRID.

A manual solution is available at: http://help.nceas.ucsb.edu/PostGIS#Import_a_shapefile_into_a_PostGIS_database

The key is the first string, NAD_1983_UTM_Zone_18N. If I do a query like the one in http://help.nceas.ucsb.edu/PostGIS#Import_a_shapefile_into_a_PostGIS_database I get a row:

trac_fleem=# select * from spatial_ref_sys where srtext like '%NAD83 / UTM zone 18N%';
srid   | auth_name | auth_srid | srtext                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | proj4text
-------+-----------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------
26918  | EPSG      | 26918     | PROJCS["NAD83 / UTM zone 18N",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-75],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AUTHORITY["EPSG","26918"]] | +proj=utm +zone=18 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
(1 row) 

In other words, I can tell, as a human being looking at the file, that the `WHERE srtext LIKE` clause should be transformed from the string “NAD_1983_UTM_Zone_18N” to the string %NAD83 / UTM zone 18N%’. But while you could tell a computer how to do this….it would either be hard, or I don’t know how to do it (and couldn’t find any good documentation).

One of the takeaways from this is that this is hard and it shouldn’t be. I don’t know enough about the Geo world to say exactly why this is. Maybe the .prj file should be able to support SRIDs. Maybe `shp2pgsql` should figure out the SRID from the .prj file. Maybe the string should be canonical so that select from spatial_ref_sys query could be more robust, assuming I figured out how to parse the .prj file.

Filed August 21st, 2009 under Uncategorized

I started working on a simple calendar for opencore called Henge. The idea is that projects could post a calendar of events. My initial version doesn’t even let you describe events — each event just has a link field that you can use.

I’ve decided to put the project on hiatus, because we’re considering rewriting opencore. I think we’re finally starting to figure out how to write applications in the dvhoster, openplans_hooks, transcluder, cookieauth, etc framework. But being able to do things doesn’t mean being able to do them well or efficiently. While debugging henge, I did run into various issues caused by the stack: deliverance hides error messages; I couldn’t figure out what file I had to change to map URLs to project/x/calendar; debugging the project member http call was hassle. And I didn’t even get to the opencore featurelet.

Also, I was starting to hit the limits of SQLObject pretty hard. I wasn’t doing anything especially complicated but I was trying to write an undo mode that made sense. The data model is:

Projects have many Events.

Events have many EventDates.

When an event is deleted, its associated dates should be deleted — this is trivial. But when an event is restored, its EventDates should be restored — and only those EventDates that it had when it was deleted should be restored (not any that might have been deleted earlier). I tried a bunch of strategies to make this work well, and ended up having to do both metaprogramming and hand-coding to make it work. I think a better programmer could have done it all through metaprogramming, but it would have taken me more time than it was worth. In particular, I wanted objects to have uuids as primary keys, and that was impossible. As it turned out, that wouldn’t have gotten me what I needed, but I still should have been able to do it.

What’s actually going on is that an Event is a composite object composed of some general data and a bunch of dates. But SQLObject doesn’t natively have a way to talk about a row and its associated rows from other tables. You can walk through the metadata and find one-to-many relationships, but that’s not the same thing as ownership. If there were a way to talk about this naturally, it would be easier to support things like undo and versioning of composite objects.

I think the lesson is that SQLObject is not really powerful enough for applications which want to support complex interactions with data. This isn’t a complaint about SQLObject — it’s not meant for these sorts of complex cases. I’m not sure if SQLAlchemy is the solution, but I’m certainly interested to find out.

Anyway, there’s a few weeks of work left before I could put Henge live, and I find myself without the time to do them.

  • the aforementioned featurelet
  • events should have descriptions
  • there should be an agenda view
  • a javascript date picker
  • event permalinks which survive event deletion

Of course, there are a ton of ways to complexify things, many of which are obvious — presently, there is no interface to add multiple dates to an event. But really, the above is all that’s needed to launch.

Filed March 4th, 2009 under Uncategorized

You know how wikis inevitably tend to accrue pages that are out of date? So out of date that there’s not even a good reason for the page to exist anymore?

When I come across such a page, I’m tempted to edit it and/or delete it, but then I think about all the things that might link to it and I’m not sure what they should link to now, and at that point I lose my motivation to clean up the wiki so it keeps growing weeds.

I was inspired by thinking about the relative ease of closing tickets in a bug tracker. A couple clicks and you’re done, and links to the ticket from elsewhere in the bug tracker are now rendered in such a way that you know it’s a closed bug.

Could a wiki have a similar feature? Maybe there could be a “Mark this page as obsolete” button in the wiki UI. And a text input where you could optionally specify a new URL that supersedes the page.

Once marked as obsolete, the wiki software would:

  • Render a big “This page is obsolete” header at page top. If a replacement URL was specified, show a link to it.
  • Possibly gray out the remainder of the text, or just remove it and provide a link to the previous historical revision. (Don’t just delete it though. Sometimes people need old information. This is also why I wouldn’t want to just redirect.)
  • When rendering wiki links to such a page, style them differently somehow, so that you get some kind of warning that the link points to something obsolete… add a little icon or something? dunno. Or maybe old links could be updated to point to the replacement URL if one was provided.

Something like that. This would give the users support for one-click wiki cleanup.

­

Filed February 24th, 2009 under Uncategorized

It has taken a long time since I’ve started on trac work to get to the point where I know what my goals are for trac at TOPP  and am able to articulate them.  I’ve decided to blog this at TOPP-engineering….ifI put it in the wrong forum and this offends you, feel free to yell at me in private.

Firstly, to recap the question that seems to be asked every couple of weeks, why trac?  I’m not going to go into too much detail, but trac is nice because it is

 * mature

 * python

 * malleable

 * fast to develop in

 * open source

 * a strong community

Trac out of the box is not an amazing issue tracker.  But with a little love, trac can be not only an amazing issue tracker but a huge part of the technical solution to software development communication (and beyond!).

Moving beyond advertisement, as I really have no incentive to push for trac, but to figure out what sort of place trac development could have in our process, I have a few ideas that could synergistically improve both TOPP (I’m thinking only of TOPP labs concretely, though I think the benefits are more widely reaching) and trac as software and

community:

 * plugin maintainence/improvement:  TOPP (by which I mean, mostly me)

   has developed several Trac plugins that are widely used by both

   ourselves and the Trac community.  While the scope of a trac plugin

   is more confined than for a larger piece of software, most of these

   plugins need improvements to make them better both for TOPP and for

   the community at large.  Real time needs to be scheduled for this.

   This doesn’t have to be a primary goal, but I think not doing this

   or doing it in an half-hearted way is equivalent to abandoning

   these plugins and abandoning our stake in the trac community.

 * isolate TOPP trac needs and answer them:  I’m including

   projects.opengeo.org in this list assuming that infrastructure is

   available for TOPP labs personel to take opengeo work.  So far most

   of what I have done for our trac instances have been done only well

   after the problem has become a blocking issue for several key

   people.  This hasn’t been done out of neglect or malice, but

   because there has been no meaningful dialog on what we want our

   infrastructure (pointed here to trac) to be like and I refuse to

   guess.  This problem needs to be surmounted. Firstly, what we want

   out of trac needs to be clearly defined in such a way

   that can be easily prioritized, scheduled, and clarified.  Once

   this is done, we can start making our trac projects not just

   better, but awesome.  I think this can make a huge difference.

 * improve trac: trac is pretty awesome, but it does have some serious

   deficiencies.  Likewise, there are features that could be added to

   make it much smoother, in accordance to the idea of what trac is.

   This could be anything from new plugins, to things like TracLegos

   and supporting software, to taking out branch work in the effort to

   improve trac core.  As always, what is done here should be

   moderated by what our direct needs our and what place we want to

   take in the trac community.

These three things are conceptually the same from the mindset that these yield improvements both along the axis of improving TOPP’s process (on the technical side and policy side) and that of improving Trac and our ties to that community.  These three things are separate in the sense that in order for any of them to happen, time has to be allocated to each of these goals.  There ain’t no free lunch.

All of this also ties in to DevCenter work.  Questions about the DevCenter have usually been framed in such a light as, “Why does TOPP need a DevCenter?” as if the DevCenter is a new piece of software.  In a sense this is true.  But ultimately, the DevCenter is a blanket project that ties together (among other things) our Trac needs (as outlined above) in a coherent way that enables us to use and mold process more efficiently.  I won’t go into technical details here, but to me the DevCenter has always been more about figuring out what process we want and to implement that in a coherent and configurable

way.  Why write new software when you don’t have to?

It seems like from this text that I am heavily invested in Trac. While I believe that Trac is the right choice for TOPP, the scope of my investment is defined by what priorities are handed to me, so in that sense I am invested very little in Trac.  We have many Trac instances and projects.  We have much expertise in Trac.  Trac is open source software with a strong community, which I’m not sure how important that is to TOPP’s objectives, but its a huge plus for me both technically and idealistically.  The prioritization that I would wish is an organic growth out of the investments that have already been made (wisely, IMHO) now that we are getting returns on them.  In my observations, many of our process and infrastructure decisions have been made reactively.  We could do nothing, and devote time to maintaining the infrastructure we have now (this time loss is unavoidable, though because its unscheduled it somehow seems more forgivable).  Or we could devote resources (e.g. my time) to actively assessing and tackling these needs.  That isn’t my decision to make. But I do hope that can move beyond the point of discussing process and infrastructure to actually producing it.

Filed January 15th, 2009 under Uncategorized

I’m much better at writing for the public than writing for TOPP.  What’s up with that?

The question mark is genuine… really: what’s up with that?

Filed January 15th, 2009 under Uncategorized

I did a bit more work on the Python optimizer last week. This time, the problem was tuple assignment. Consider the code:

a, (b, c) = d, e = 1, (2, 3)

This would get translated to:

LOAD_CONST 1
LOAD_CONST (2, 3)
BUILD_TUPLE 2 # builds the tuple (1, (2, 3)
DUP_TOP       # duplicates the top item on the stack, since
              # it's going to be assigned to two targets
UNPACK_SEQUENCE 2 #now, push each element of the tuple we just
                  #built onto the stack
STORE_FAST a
UNPACK_SEQUENCE 2 #unpack (2, 3)
STORE_FAST b
STORE_FAST c
UNPACK_SEQUENCE 2 #unpack the duplicate
STORE_FAST d
STORE_FAST e

This is stupid, because it does a bunch of packing and unpacking of tuples. Tom Lee’s patch improves the situation by recognizing (1, (2, 3)) as a constant, and storing it in the constants table so that it does not need to be created on the fly. In other words, it replaces the first three operations of the above with:

LOAD_CONST (1, (2, 3))

This is much faster, but it’s still not optimally fast. The above Python code is equivalent to:

a = 1
d = 1
b, c = e = 2, 3

This produces the following bytecode:

LOAD_CONST 1
STORE_FAST a
LOAD_CONST 1
STORE_FAST b
LOAD_CONST (2, 3)
DUP_TOP
UNPACK_SEQUENCE 2
STORE_FAST b
STORE_FAST c
STORE_FAST e


My latest patch to the optimizer does this conevrsion. At a slight space cost, the sequence could be further reduced by replacing the DUP_TOP UNPACK_SEQUENCE with a LOAD_CONST for each element. I decided not to do this, because not everyone will want to make time-space tradeoffs, and I’m not 100% sure it would be faster.

 

Filed November 18th, 2008 under Uncategorized

­I spent some time over the past few weeks looking into the internals of the Python compiler and bytecode interpreter.

First, some general impressions. The code is very clean C code. It actually uses Python objects internally for things that are annoying to safely represent in C, such as strings, vectors (Python lists), and hash tables (Python dicts). This means that readers can piggyback on their existing understanding of Python. The bytecode interpreter is also straightforward and readable.

The Python virtual machine is stack-based. Python bytecode is simply a list of operations with either zero or one sixteen-bit operand, depending on the operation. For example, a common section of code is:

...
LOAD_FAST 0
GET_ITER
FOR_ITER
STORE_FAST 1
...

LOAD_FAST and STORE_FAST give access to local variables. Every function has an array of locals, and the operand of LOAD_FAST/STORE_FAST is an index into that array. LOAD_FAST puts the value of the local onto the stack, and STORE_FAST pops a value off the stack and puts it into the local. GET_ITER gets an iterator for an iterable — it replaces the top of the stack with the iterator. FOR_ITER pushes the next value from the iterator on top of the stack on to the stack. You’re expected to pop it — the next operation is nearly always STORE_FAST. In fact, the Python bytecode interpreter has a predictor which quickly checks if the next operation is STORE_FAST, and skips some tests if so. So that snippet of code is the preamble of a for loop. Local variable 0 is the iterable, and local variable 1 is the iteration variable.

If you want to see the bytecode for a function, the dis module will print it. Unfortunately, that’s all it will do — it won’t give it to you in a manipulable form. If you want to write your own functions in Python bytecode, you need peak.util.assembler, which does not in any way interoperate with dis. To check how Python interprets a given opcode, check in the Python source code, Python/ceval.c.

The python compiler works in a fairly standard fashion — the source code is tokenized, the tokens become a concrete syntax tree, the CST becomes an abstract syntax tree, and the AST is compiled to bytecode. I started looking at the peephole optimizer, which operates on compiled bytecode (that is, an array of chars). I implemented a minor optimization for the case:

r = ...
return r

The bytecode looked like:

STORE_FAST n
LOAD_FAST n
RETURN_VALUE

My patch replaced it with just:

RETURN_VALUE

My patch was rejected, because apparently the additional six lines of code didn’t buy enough of a speed improvement for an uncommon case. In retrospect, however, it also didn’t work for all bytecode (although it may have worked for all bytecode produced by the Python compiler). Imagine if there were a jump to the LOAD_FAST from somewhere else in the code — for example, in the code

if x:
  r = 1
else:
  r = 2
return r

The last seven bytes of the bytecode would still initially look like:

...
STORE_FAST n
LOAD_FAST n
RETURN_VALUE

But if the store and load were removed, then the jump would lead directly to the return, which would result in a stack underflow.

This neatly illustrates why doing anything on the bytecode level is a huge hassle. I went through a number of schemes to try to eliminate unnecessary store/load pairs at the bytecode level, but trying to figure out variable lifespans in the presence of forward and backwards jumps is not easy.

I decided instead to take a suggestion made when I submitted my (broken) patch, and work at the AST level. Thomas Lee has been working on an AST-level optimizer. He wrote a set of functions that walk the AST and perform optimizations at each level. I also changed my focus, because I was a bit bored of thinking about local variables, and I wanted something easy for my first look at high-level C code in a long time.

Python bytecode includes at least one operation which is not directly accessible from Python code: LIST_APPEND, which pops a value and a list off the stack and appends the value to the list. This is used in generating code for list comprehensions:

[x for x in somelist]

generates the following code for the body of the loop:

...
LOAD_FAST 1#a local is generated to hold the new list
LOAD_FAST 2 #the iteration variable, x
LIST_APPEND
...

Whereas if you were to try to write the comprehension with a for loop, you would get:

LOAD_FAST 1
LOAD_ATTR 0 #an index into the constant table: "append"
LOAD_FAST 2
CALL_FUNCTION 1 #remove x and the append function from the top of the stack and push the result of the call
POP_TOP

This is much slower. In part, it’s the cost of doing the attribute lookup each time, which is why it’s common to see code like

a = []
ap = a.append
for x in .....:
  ap(x + ...)
return a

But the function call is naturally slower than the simpler LIST_APPEND operation. My new patch tries to determine statically if a local is all circumstances holds a list, and if so, generates LIST_APPEND whenever user code would call local.append. It’s not perfect — in particular, I could track local types on a more fine-grained level than per-function. But it’s a start. I submitted it today.

Next, I think I’ll consider local elimination again — but at the AST level.­

Filed November 3rd, 2008 under Uncategorized

The other day I saw that Rollie had tagged http://www.kottke.org/08/09/tech-conference-panels-suck for inclusion in the PlanetDev feed.

I found that seeing it there and reading it really upset me.  Jackie saw me get upset and helped me realize that there were actually some broadly useful points to share about why it did and what this means in a greater context, so at her suggestion I’m going to try to unpack them here.

The referenced post — and the post that *it* references — is basically contentless.  Some popular blogger is linking to someone’s writeup of their personal impressions of a single panel at a tech conference.  He’s not adding any particular commentary of his own so presumably it’s an approving reference.  So for the essential content we go to the  original reference itself.  That post is equally contentless: as Kottke correctly observes with his summary headline, the post is meant to put forth a general theory about technical conferences and the “geeks” who organize and attend them  via a proof by anecdote.  It says: “I attended this one event, I felt this way about it, therefore I can offer general observations on a whole class of events and people.”  Well, we all know how rigorous that line of argument is.

Just for fun, though, let’s take it to its logical conclusion.  What, actually, is being put forth here?  That panels at tech conferences suck — because some guy on the internet went to one conference panel with an absurdly broad topic and didn’t get anything out of it except a lousy blog post — and so, I suppose, we should conclude that we shouldn’t be spending our time, energy and money on tech conferences?  Is this really a conversation worth having?  Of course there are good conferences and less valuable conferences in every field and discipline, from geology to nursing to quilting.  And at each conference, the individual sessions and panels will have a whole range of quality and outcomes.  To draw any conclusions about the broad category of tech conferences from a single failed panel at a single event is ignoring this reality and generalizing to what should be an obvious point of absurdity.  Come on.

So, essentially, it feels like this tagged item simply doesn’t contribute anything to a conversation; it’s a dead-end post with no information to impart and no worthwhile lessons to be drawn.  Its appearance on PlanetDev is an invasion of our communal space, and we all individually waste our time discovering its lack of value.

The world is full of those little distractions, though, and while I don’t like them, I don’t usually get too emotional when I see a Google Sponsored Link.  But of course this post’s content is not just unproductive; it’s unabashedly, gleefully insulting, playing to offensive stereotypes.  Har, har, tech “geeks” have no social skills, have no grounding in reality, get excited about techno-fantastical topics, aren’t good at explaining themselves.  Oh, and let’s mock their various supposed developmental disorders and drug indulgences!

This type of content sort of bothers me personally, but it points to something broader than just that.  While I’m sure this was totally unintentional, by putting this on PlanetDev Rollie effectively just pushed this type of stereotyping and mockery out into our group.  And, honestly, I really hope that we’re better than that, that we can create a culture of respect and collaboration here at TOPP, a safe space where no one will be mocked by a coworker for his interests or through implied association with a stereotype.  I can’t imagine anyone here is interested in descending into petty warring tribes based on our job descriptions.  So, please, let’s not get into the business of trading insults between designers and engineers, or any other “subgroups” at TOPP; let’s be respectful of each other and of everyone’s individual skills and interests, let’s work together without rivalry, and let’s respect our public spaces.

Filed September 30th, 2008 under Uncategorized
Next Page »