1993-2009.0.JPGNew York, August 28, 2009 - New York City’s voters approved a city charter amendment in 1988 that required publishing a Public Data Directory detailing the city’s “computerized information.”

In 1993 the first, and so far only, edition of the Public Data Directory was published with details of 300 or so databases. In recent weeks we’ve been working with the NYC Open Government Coalition to help make a digital version of the paper Directory available. The thinking is that many of the databases still exist and that this will be a starting point for a more robust detailing of current city databases.

The preponderance of the technology for the conversion was developed by the Transparency Corps, a project of the Sunlight Foundation, and New York City’s civil society catalyst The Open Planning Project.

The multi-step conversion process first scanned the Directory’s 156 pages into digital images. Next, these now digitized “pages” were read by an OCR program (Optical Character Recognition) which converted the digital images into computer readable characters. We’re now on the third step which requires two human inputs: 1.) copy the OCR text and paste it into the appropriate data fields, and 2.) with OCR delivering only 99% accuracy (e.g., it doesn’t do well differentiating a g and a q), after pasting you’ve got to compare if it accurately represents what was in the printed Directory, making any necessary corrections. The correct interpretation is not always obvious so each page is served and interpreted several times with a Levenshtein algorithm deciding on the correct version. The Transparency Corps has added a modicum of pleasure by incorporating a game-like scoring feature.

Visit the site and help make some of the conversions, each takes about 5 minutes. With the small commitment of time, this is an excellent example of an appropriate wiki task. 25% of the tasks were completed as of September 7, 91% on November 1.

