The problem

Javascript usually applies behavior to elements on page load. Usually, there’s a list or some sort of registry of behaviors, and it’s iterated through on document ready time, and applied then. For elements that get created by javascript though, behaviors don’t get automatically applied to them. The application of those behaviors happens only at document ready time, and new elements don’t get processed magically.

Opencore’s solution

We’ve handled this in the past by explicitly calling a generic function that would apply the behaviors to elements. So when a new element was added, we would follow up the addition with an explicit “breatheLife” call. This worked, but we had to remember to call the breatheLife function every time we added elements that we wanted particular behaviors attached to.

Jquery solution

Jquery has a particular solution for this problem for a particular subset of behaviors: live events. The solution however, is a little bit different from ours. Instead of applying behaviors to new elements as they are attached to the dom, jquery gets the event as it bubbles up, and then figures out if its coming from an element that has special behavior. This technique is called event delegation. Currently, not all event types are supported, but a common set are.

Other applications

Using event delegation is also useful for situations when there are a lot of elements that need behavior applied to them. For example, if we had a huge table of links, or a large number of list items, we could add a handler to the table or unordered list, and figure out the target of the event from there. Instead of adding an event handler for each element, we can get away with only adding one that can handle all of the elements.

Filed June 26th, 2009 under best-practices

­Almanac ui

We’re nearing the end of another community almanac development cycle, where we changed the ui quite a bit. Here are some of my thoughts on it.

The operation that we wanted to optimize for was contributing a page to an almanac. We used to call them “stories” but we tried to stress the book/page metaphor a little bit more now. We also tried to make sure that adding a simple page was very easy, while also providing flexibility for the user.

Previously, we had taken a wizard-like approach for this case. A user was presented with a series of questions, taking them through the process of adding their story. This forced the user to go through the process of adding a story in a very strict way, and presented more questions to the user than was usually necessary, at least for the simple cases.

A different approach

This go around, we took an approach that feels a bit like basecamp. Instead of providing a wizard, or series of questions that the user should enter, the user is presented with a set of tools that control the type of content that the user is contributing. This allows the user to add the type of data that they are interested in, without forcing the concept of “metadata” to the user.

An advantage with this model is that it offers more flexibility for creating pages. Users can add the types of data that they want, in a way that makes the page flow more naturally. They can now have several locations, descriptions, or none at all. It also feels a lot more like what the site should be all about: adding pages to a community’s almanac.

Under the hood

Implementation wise, this means that we have to be able to generate lots of forms on demand through javascript. There are a couple of ways to manage this, but we decided to have the server generate these forms, and the client side would issue ajax requests to send/receive all data. This keeps the javascript simple, with the server handling most of the logic.

This is similar to how we handled opencore’s javascript. The client sends a request to the server (usually getting the url from an anchor or the form that the object is in), and knows how to handle a couple of different responses in a generic fashion. Usually the variations are simply to take some html returned from the server and put it somewhere, either by replacing an element, or adding it somewhere else on the page.

There was an additional complication on the almanac side, which was that we needed to apply some side effects for the content to be displayed properly. This is sort of like the “breathe life” problem, except we needed it to simply display the content properly, as opposed to interacting with additional events. For example, when a map is returned, some javascript needs to run to display the map correctly, with the right features displayed on it. For audio playback, some flowplayer behavior needs to get applied to it before it can play in the browser through flash. We did also have to tackle adding behavior as well (so the forms that got returned where submitted through ajax), but we were able to use jquery’s live events for a lot of it.

Always something

Since we’re loading a lot of javascript on page loads as well, we ran into an issue where you could click on a link before it got its document.ready behavior applied to it. The user gets presented with a download dialog box, asking to download the json return value. We worked around this by having empty onclick handlers in the markup itself, which prevented the dialog boxes. Ideally we would have non-javascript fallbacks everywhere, but we didn’t focus on dealing with that issue because a large portion of the site deals with map interactions, which isn’t really feasible without javascript. You can say that you can have page loads for each click, like panning or zooming, but drawing features on a map without javascript is tough.

Thoughts for the future

The similarities of the ajax approaches of both almanac and opencore got me thinking if these are both more specific instances of a general pattern. Kss as a solution comes to mind. What worries me about kss however is that it seems focused on completely removing all javascript from the application. But on the other hand, it seems like the concept of applying js behavior to elements in the same kind of way as style is applied to elements is a reasonable approach. And kss is a way of standardizing the behaviors, in the same sort of way that styles standardize on the display. In fact, most of the javascript that I write involves selecting some elements, and applying a simple behavior to them. Behavior will probably vary a bit more than style, but I definitely prefer having a solution for most cases than no solution at all. In any case though, I definitely think kss is a cool concept.

­

Filed June 26th, 2009 under best-practices, programming, ui

The question has arisen on two different projects (one hypothetical, one already live) at TOPP in the past week:

Should we (continue to) use ZODB as the primary data store, or should we use … something else?

Well. I’ve been using ZODB in my daily work since at least 2001, so I’m pretty well familiar with it by now. By contrast, I’ve done a little bit of work with MySQL, a tiny bit with other relational systems (SQLite, SQL Server, and Oracle), none with PostgreSQL, and almost no work with any of the current crop of object-relational mappers … nor with any of the hip new kids on the block like CouchDB.

With that background, you’d expect me to be a fairly ignorant guy who always prefers ZODB just because he doesn’t know any better, right?

But there’s also the truism that familiarity breeds contempt. Consider yourself warned about the rest of this post ;-)

That said, I’ll start by listing some of the things I really do like about ZODB.

What’s Good about ZODB

  • Persistence is very transparent (to your application) … Getting started with ZODB is really, really easy. If your app is not performance-sensitive, you can get away with very little attention to storing your data, it mostly Just Works. And you hardly have to do a damn thing. Neat. (When performance is a problem, you have to start paying a little more attention - or maybe a lot.)
  • Schema, shmema. You just throw Python objects in there and they just stick. Neat.
  • Undo. When you can use it, this is fantastic to have built in.
  • Transactions that mostly just do what you want. Okay, this is really about the default transaction policy in Zope, not ZODB per se. But this is something that I really think Zope got right. (And thanks to repoze.tm, it’s something that any WSGI app can now do with any data store that supports transactions.)
  • Text Indexing … there are some nice full-text indexes available for ZODB. On the other hand, these work at the application layer - i.e. you have to do the work of updating the index yourself, it doesn’t happen magically when you save data. And they don’t play nice with Undo. I gather there are now some that you can use without the whole Zope 2 ball of wax, which is great.
  • Container hierarchies of arbitrary depth are trivial. This is really nice and something that’s easy for a ZODB user to take for granted. Doing the equivalent in an RDBMS is typically not so fun, I gather. (Haven’t had to do it yet myself, but some quick browsing suggests that it’s really pretty icky. I suspect though that traversing a ZODB object graph will have the same performance characteristics as the fairly simple “adjacency model” described in that article.)
  • Scaling is pretty transparent. For a while at least. ZEO is trivial to set up, and to the application, it looks no different than running against a local storage. Multiple mount points (analogous to mounting different physical storage at different directories on a filesystem) also help you scale transparently.

What’s Bad

  • Persistence is very opaque (to anything other than your application).
  • Let’s unpack that a little:

    • No ad-hoc queries.

      This is more and more often the showstopper for me.

      With an RDBMS, or CouchDB for that matter, as long as the database server is running, you can poke around and see what’s there. With even a small bit of application knowledge, this can be enormously useful for troubleshooting, quickly repairing data problems, and some simple migrations, to say nothing of actual feature development. With ZODB, you have to know a lot more about the app just to look around and guess what you’re looking at.

    • You can’t even load the data without exactly the right application software installed.

      This is closely related to the previous point. It doesn’t bite you as often, but when it does, it is NO FUN.

      It’s not just that non-Python applications can’t talk to the database at all, ever. It’s that your database depends on your code too much.

      If you botch an install, or are trying to resurrect a really old one for some reason, sometimes you’ll make a mistake like having a slightly incorrect version of some dependency, such that some container class can’t be loaded. Since the ZODB is strictly a tree structure, there is no way to access any of the children of a broken container instance. Which could, if you are unlucky, translate to all your data.

      Think about that for a second: if you ever lose the ability to perfectly reconstruct your code stack, you also lose your data. Well, you could maybe try to parse something out of raw pickles, but that sure doesn’t sound fun!

      Of course, normally everything’s fine because you have the right software installed. But what if you’re doing forensic work and you don’t have enough information to know what that is? Or what if the build scripts that used to work perfectly no longer work just because some third-party upstream release is no longer compatible?

      Here’s a little story. I once did a quick job for a non-technical nonprofit that was in a bind. Their initial email went something like: “Hi, we hired a contractor to build our Plone site on a shoestring, and now he’s gone, and our production server crashed, and all we have is a .zexp export of the site, and zipfile of the code but we’re not sure if it’s the same as the production version. Can you help us get Plone started or at least get the documents and images out so we can throw up some kind of temporary static site?”

      I gave it a go for a couple days, but I was thoroughly defeated. I felt so bad I only charged them for a couple hours and felt guilty for even doing that. I never want to put an employer or client in that position again, ever.

  • No non-container relations. Expressing something like a many-to-many relation in ZODB means writing the code yourself, or installing something like http://pypi.python.org/pypi/zc.relation … which presumably works fine, but I really can’t get excited about it: I’d rather spend my time learning relational technology that might actually be portable to other systems.
  • The ZEO server is still a single point of failure and potential bottleneck; No live replication. There’s no free workaround. There is an expensive solution from Zope Corp. RelStorage could theoretically solve this problem for free by deferring to the underlying RDBMS replication, but it’s apparently not been tested.

    It’s important to note that I have never actually run a site where the ZEO server was the bottleneck, but the sites I’ve worked on have relatively small user bases, and the largest-scale ZEO cluster I’ve heard of was a news site: very read-heavy with a relatively small user base doing relatively few writes. I’ve never heard of anybody doing a large site with lots of writes from lots of users. If your goal is to build the next Facebook or Wikipedia, I don’t think there are any relevantly large real-world ZODB case studies you can emulate.

    Most of us just make do without replication or failover of any kind, and hope we will never really need it.

  • No fine-grained control over which (and how much) data you retrieve.

    In any SQL database, you can trivially do “select foo from bar” and get only the values in the foo column, regardless of what other gunk is in each row. In ZODB, you get a whole object - think of this as the equivalent of every query starting with “select *”, so you always get the entire row(s). Results tend to be fat and you have no ad-hoc control over that, short of reorganizing the database. Which leads me to…
  • Migrations are inconvenient and expensive. Migrations with ZODB typically take the form of a script containing two functions: one which updates a single instance of a particular persistent class, and another function which finds all the instances to upgrade. The latter is the non-trivial part, because there’s actually no way to find all objects of a given type short of walking the entire object tree. If you’re using Zope 2, you may have a ZCatalog handy that you can use, if it knows about all the objects you want to upgrade; or you can use the old ZopeFind API which is just a convenient (and no less expensive) way to walk the entire tree.

    And you can’t really do a migration atomically on a live site, because you’re sure to get ConflictErrors if you try to do it in a single transaction. You can solve this by taking the site down for the duration of the migration. If that’s not an option, you have to try committing and starting a new transaction after every N objects touched, which practically speaking means you’re not going to want to undo your migration. AFAIK there is no existing infrastructure for the latter approach, which means you have to rewrite it in every migration script you ever write.

  • Often, you can’t actually use undo. If a transaction touches a frequently-updated object (like oh, say, the catalog indexes), you probably won’t be able to undo that transaction for very long, because other transactions will have since touched the same object, so undoing it would cause a conflict. A transaction is not a database-wide savepoint, like a revision in Subversion; rather, a transaction only knows about the objects that were changed during its lifetime. There’s no way to revert to an arbitrary point in the past.
  • Indexing is not transparent. I very often see code in Zope applications to ensure that some index is properly updated after some value changes. It gets tiresome. By contrast, indexes in an RDB typically require no attention from the developer… but they don’t serve the same purpose.

What’s Debatable

This section could grow endlessly, but I’ll just list a couple items off the top of my head:

Speed

For years, the accepted wisdom was that ZODB was pretty fast for reads, and slow for writes. Some people claim that it’s actually fast for writes too. I don’t care much about raw benchmarks except insofar as they translate to real applications. The ZODB application I actually get to use the most - Opencore, built on Plone 3 - feels quite slow at the storage layer (some of this is catalog stress, some of it is due to storing binary files in CMFEditions, which we now know was a terrible mistake.)

I have no hard numbers to offer, hence putting this in the “Debatable” category.

Partitioning

Object traversal in Zope encourages you to map your ZODB tree directly to your URL space. I actually quite like this as it’s really easy to understand. But it makes it harder to reorganize your data for scalability reasons (eg. horizontal partitioning aka sharding) without also reorganizing the URL space and breaking links.

This is another one of those problems that’s mostly theoretical to me so far - I haven’t actually needed to do sharding on a ZODB app yet. And most people never will, but if you’re building something very ambitious, it’s something to be aware of.

As noted above, the ZODB can do one kind of partitioning by “mounting” databases in the object graph, like filesystems mounted in one Unix file tree; this is great and easy, but it’s only transparent if the mount point can replace an existing folder; it doesn’t help you with flat but dense data. Also, mounting multiple storages can be problematic when objects under one mount point refer to objects outside that mount point; see eg. the notes at the bottom of http://apidoc.zope.org/++apidoc++/Book/zodb/crossref/show.html

This is another case where I wonder if creative use of RelStorage might help, although I’ve no idea how you’d know where to split the partitions.

Optimizing is Weird

When addressing bottlenecks in an app written against a given RDBMS, there are typically pretty decent docs available that help even a novice get started with tuning their queries and setting up the proper indexes. If I google “mysql query optimization”, I find a lot of useful results on the first page. With ZODB, there are some general strategies that typically are learned the hard way. Good luck googling for docs or tips. One of the few things I found was from a presentation (PDF) that Chris McDonough made about ZODB: “The most important optimization you can perform is to write efficient code. Unfortunately, this is also the hardest way to optimize, because you need to manage all the details.”

Finally: You Can’t Take It With You

You may have noticed a theme running through some of the above.

I’m tired of feeling like I’m in a programming ghetto, and frankly ZODB feels rather marginalized. Not because of any of the things that I think are wrong with it, but just because almost nobody uses it. This has a lot of implications - lost opportunities for re-using innovative work and so forth. This is the danger that Mark Ramm recently warned the Django world about (video link, sorry; there’s no transcript anywhere AFAICT).

But more personally, I just don’t feel that I’m young enough to waste much more of my career on dead-end tech. Python is a plenty big pond for me to swim in; but most of the fish in that pond wouldn’t touch ZODB with a ten-foot pole. That’s a shame, maybe, but perception is reality, and if this was going to change, it would have changed by now. It’s been over 10 years now.

Given that, the time that I spend using ZODB could be better spent learning skills that I can more realistically apply on more future projects. Maybe even (gasp) non-Python projects.

It may seem selfish to harp on how this affects an individual’s career, but sometimes the events in your life push you in that direction. Having ZODB and Zope on my resumé gives me a certain amount of hireability right now, I think because I’m a relatively experienced fish in this tiny (and, AFAICT, not growing) pond where demand (for now) seems to slightly outstrip supply. Will ZODB be even that relevant in, say, 10 years?

Let’s ask:

Outlook not so good.

Okay, I’m putting my flame suit on now :)

(For a more generally enthusiastic take on ZODB, you should have a look at Chris McDonough’s blog post on ZODB compared to CouchDB. )

Filed March 20th, 2009 under best-practices, programming, plone

For all the web designers out there in the TOPPosphere…

From a designer’s POV, the most important files are templates, CSS, and javascript.  They are the three pillars of web design.  The sun rises and sets with them.  They are best friends.  So, ideally, they should be as close together as possible.

For example: here’s how files are laid out in a typical Pylons application:

/path/to/application/:

  public/
    images/
    javascripts/
    stylesheets/
  templates/
    index.mako
    specific-group-of-templates/
    etc.

This is a really nice layout — the files are close together, and there’s never any question where something is or where it should go.  New images go in public/images, new javscript goes in public/javascripts, etc.  This is the same from one application to another.  Ruby on Rails follows a similar pattern.

By contrast, here’s the file layout in the Almanac project:

/path/to/application/:

  parts/
    client/
       resource/
           lib/
              Almanac.js
           theme/
              almanac.css
              img/

   src/
      CommunityAlmanac/
         opengeo/
            almanac/
               templates/
                   mytemplate.pt

When I first saw this I was like AAAAH!  Not to pick on the Almanac, though; I love the Almanac.  Really, I do.  But, it can be frustrating to work with. Not only are the files in far flung branches of the filesystem, but they’re in separate checkouts.  This means lots of time spent switching back and forth between areas of the site (I ended up using two separate Coda profiles to work on these files), and two commits for nearly every change.

Could this have been avoided?  I’m not sure.  If we keep laying out apps this way, will the sky fall?  Probably not.  But if it’s possible, we should could consider the plight of the poor, defenseless designer when laying out our apps in the future.

Filed November 6th, 2008 under web-design, best-practices, programming