• Home

last modified November 2, 2008 by russf

Overview ClarityLogo.png

Plone and Zope find diverse applications, many of them related to document management and presentation.

There are significant opportunities to use existing and well understood machine learning and text classification techniques to help categorize and group documents, and improve the quality of document metadata within Plone.

 

Goals

Add a simple framework for defining classifiers to help generate metadata, and to group content. The underpinnings should eventually be useful for Zope3 as well as Plone. We would like our first customer to be the Plone Help Center on Plone.org, to facilitate selection of optimal keyworks, sections, etc.

 

Planning

The current plan is a result of the classifier sprint at the Arlington Career Centre , near Washington DC, October, 2008

  • complete the recipe for building svm on OSX etc. (depends on some changes to SWIG invocation??)
  • complete the clarity.classifier package
    • improve the preprocessor (lower case, remove singletons, remove high frequency words) Committed revision 74887
    • add tests Committed revision 74887
    • improve classifier parameters for simple tests, and provide a corpus for learning.
  • add a Plone integration package that uses the classifer and recommends keywords and other metadata based on existing content, with a goal of more consistently tagging articles in the Plone Help Center. 

     

Links

SVM - the technology for the first classifier implementation

 

Developers

Getting started:

svn co https://svn.plone.org/svn/collective/clarity.classify
cd clarity.classify
python bootstrap.py
bin/buildout
bin/instance test -s clarity.classify