The question has arisen on two different projects (one hypothetical, one already live) at TOPP in the past week:
Should we (continue to) use ZODB as the primary data store, or should we use … something else?
Well. I’ve been using ZODB in my daily work since at least 2001, so I’m pretty well familiar with it by now. By contrast, I’ve done a little bit of work with MySQL, a tiny bit with other relational systems (SQLite, SQL Server, and Oracle), none with PostgreSQL, and almost no work with any of the current crop of object-relational mappers … nor with any of the hip new kids on the block like CouchDB.
With that background, you’d expect me to be a fairly ignorant guy who always prefers ZODB just because he doesn’t know any better, right?
But there’s also the truism that familiarity breeds contempt. Consider yourself warned about the rest of this post ![]()
That said, I’ll start by listing some of the things I really do like about ZODB.
What’s Good about ZODB
- Persistence is very transparent (to your application) … Getting started with ZODB is really, really easy. If your app is not performance-sensitive, you can get away with very little attention to storing your data, it mostly Just Works. And you hardly have to do a damn thing. Neat. (When performance is a problem, you have to start paying a little more attention - or maybe a lot.)
- Schema, shmema. You just throw Python objects in there and they just stick. Neat.
- Undo. When you can use it, this is fantastic to have built in.
- Transactions that mostly just do what you want. Okay, this is really about the default transaction policy in Zope, not ZODB per se. But this is something that I really think Zope got right. (And thanks to repoze.tm, it’s something that any WSGI app can now do with any data store that supports transactions.)
- Text Indexing … there are some nice full-text indexes available for ZODB. On the other hand, these work at the application layer - i.e. you have to do the work of updating the index yourself, it doesn’t happen magically when you save data. And they don’t play nice with Undo. I gather there are now some that you can use without the whole Zope 2 ball of wax, which is great.
- Container hierarchies of arbitrary depth are trivial. This is really nice and something that’s easy for a ZODB user to take for granted. Doing the equivalent in an RDBMS is typically not so fun, I gather. (Haven’t had to do it yet myself, but some quick browsing suggests that it’s really pretty icky. I suspect though that traversing a ZODB object graph will have the same performance characteristics as the fairly simple “adjacency model” described in that article.)
- Scaling is pretty transparent. For a while at least. ZEO is trivial to set up, and to the application, it looks no different than running against a local storage. Multiple mount points (analogous to mounting different physical storage at different directories on a filesystem) also help you scale transparently.
What’s Bad
- Persistence is very opaque (to anything other than your application).
- No ad-hoc queries.
This is more and more often the showstopper for me.
With an RDBMS, or CouchDB for that matter, as long as the database server is running, you can poke around and see what’s there. With even a small bit of application knowledge, this can be enormously useful for troubleshooting, quickly repairing data problems, and some simple migrations, to say nothing of actual feature development. With ZODB, you have to know a lot more about the app just to look around and guess what you’re looking at.
- You can’t even load the data without exactly the right application software installed.
This is closely related to the previous point. It doesn’t bite you as often, but when it does, it is NO FUN.
It’s not just that non-Python applications can’t talk to the database at all, ever. It’s that your database depends on your code too much.
If you botch an install, or are trying to resurrect a really old one for some reason, sometimes you’ll make a mistake like having a slightly incorrect version of some dependency, such that some container class can’t be loaded. Since the ZODB is strictly a tree structure, there is no way to access any of the children of a broken container instance. Which could, if you are unlucky, translate to all your data.
Think about that for a second: if you ever lose the ability to perfectly reconstruct your code stack, you also lose your data. Well, you could maybe try to parse something out of raw pickles, but that sure doesn’t sound fun!
Of course, normally everything’s fine because you have the right software installed. But what if you’re doing forensic work and you don’t have enough information to know what that is? Or what if the build scripts that used to work perfectly no longer work just because some third-party upstream release is no longer compatible?
Here’s a little story. I once did a quick job for a non-technical nonprofit that was in a bind. Their initial email went something like: “Hi, we hired a contractor to build our Plone site on a shoestring, and now he’s gone, and our production server crashed, and all we have is a .zexp export of the site, and zipfile of the code but we’re not sure if it’s the same as the production version. Can you help us get Plone started or at least get the documents and images out so we can throw up some kind of temporary static site?”
I gave it a go for a couple days, but I was thoroughly defeated. I felt so bad I only charged them for a couple hours and felt guilty for even doing that. I never want to put an employer or client in that position again, ever.
- No non-container relations. Expressing something like a many-to-many relation in ZODB means writing the code yourself, or installing something like http://pypi.python.org/pypi/zc.relation … which presumably works fine, but I really can’t get excited about it: I’d rather spend my time learning relational technology that might actually be portable to other systems.
- The ZEO server is still a single point of failure and potential bottleneck; No live replication. There’s no free workaround. There is an expensive solution from Zope Corp. RelStorage could theoretically solve this problem for free by deferring to the underlying RDBMS replication, but it’s apparently not been tested.
It’s important to note that I have never actually run a site where the ZEO server was the bottleneck, but the sites I’ve worked on have relatively small user bases, and the largest-scale ZEO cluster I’ve heard of was a news site: very read-heavy with a relatively small user base doing relatively few writes. I’ve never heard of anybody doing a large site with lots of writes from lots of users. If your goal is to build the next Facebook or Wikipedia, I don’t think there are any relevantly large real-world ZODB case studies you can emulate.
Most of us just make do without replication or failover of any kind, and hope we will never really need it.
- No fine-grained control over which (and how much) data you retrieve.
In any SQL database, you can trivially do “select foo from bar” and get only the values in the foo column, regardless of what other gunk is in each row. In ZODB, you get a whole object - think of this as the equivalent of every query starting with “select *”, so you always get the entire row(s). Results tend to be fat and you have no ad-hoc control over that, short of reorganizing the database. Which leads me to…
- Migrations are inconvenient and expensive. Migrations with ZODB typically take the form of a script containing two functions: one which updates a single instance of a particular persistent class, and another function which finds all the instances to upgrade. The latter is the non-trivial part, because there’s actually no way to find all objects of a given type short of walking the entire object tree. If you’re using Zope 2, you may have a ZCatalog handy that you can use, if it knows about all the objects you want to upgrade; or you can use the old ZopeFind API which is just a convenient (and no less expensive) way to walk the entire tree.
And you can’t really do a migration atomically on a live site, because you’re sure to get ConflictErrors if you try to do it in a single transaction. You can solve this by taking the site down for the duration of the migration. If that’s not an option, you have to try committing and starting a new transaction after every N objects touched, which practically speaking means you’re not going to want to undo your migration. AFAIK there is no existing infrastructure for the latter approach, which means you have to rewrite it in every migration script you ever write.
- Often, you can’t actually use undo. If a transaction touches a frequently-updated object (like oh, say, the catalog indexes), you probably won’t be able to undo that transaction for very long, because other transactions will have since touched the same object, so undoing it would cause a conflict. A transaction is not a database-wide savepoint, like a revision in Subversion; rather, a transaction only knows about the objects that were changed during its lifetime. There’s no way to revert to an arbitrary point in the past.
- Indexing is not transparent. I very often see code in Zope applications to ensure that some index is properly updated after some value changes. It gets tiresome. By contrast, indexes in an RDB typically require no attention from the developer… but they don’t serve the same purpose.
Let’s unpack that a little:
What’s Debatable
This section could grow endlessly, but I’ll just list a couple items off the top of my head:
Speed
For years, the accepted wisdom was that ZODB was pretty fast for reads, and slow for writes. Some people claim that it’s actually fast for writes too. I don’t care much about raw benchmarks except insofar as they translate to real applications. The ZODB application I actually get to use the most - Opencore, built on Plone 3 - feels quite slow at the storage layer (some of this is catalog stress, some of it is due to storing binary files in CMFEditions, which we now know was a terrible mistake.)
I have no hard numbers to offer, hence putting this in the “Debatable” category.
Partitioning
Object traversal in Zope encourages you to map your ZODB tree directly to your URL space. I actually quite like this as it’s really easy to understand. But it makes it harder to reorganize your data for scalability reasons (eg. horizontal partitioning aka sharding) without also reorganizing the URL space and breaking links.
This is another one of those problems that’s mostly theoretical to me so far - I haven’t actually needed to do sharding on a ZODB app yet. And most people never will, but if you’re building something very ambitious, it’s something to be aware of.
As noted above, the ZODB can do one kind of partitioning by “mounting” databases in the object graph, like filesystems mounted in one Unix file tree; this is great and easy, but it’s only transparent if the mount point can replace an existing folder; it doesn’t help you with flat but dense data. Also, mounting multiple storages can be problematic when objects under one mount point refer to objects outside that mount point; see eg. the notes at the bottom of http://apidoc.zope.org/++apidoc++/Book/zodb/crossref/show.html
This is another case where I wonder if creative use of RelStorage might help, although I’ve no idea how you’d know where to split the partitions.
Optimizing is Weird
When addressing bottlenecks in an app written against a given RDBMS, there are typically pretty decent docs available that help even a novice get started with tuning their queries and setting up the proper indexes. If I google “mysql query optimization”, I find a lot of useful results on the first page. With ZODB, there are some general strategies that typically are learned the hard way. Good luck googling for docs or tips. One of the few things I found was from a presentation (PDF) that Chris McDonough made about ZODB: “The most important optimization you can perform is to write efficient code. Unfortunately, this is also the hardest way to optimize, because you need to manage all the details.”
Finally: You Can’t Take It With You
You may have noticed a theme running through some of the above.
I’m tired of feeling like I’m in a programming ghetto, and frankly ZODB feels rather marginalized. Not because of any of the things that I think are wrong with it, but just because almost nobody uses it. This has a lot of implications - lost opportunities for re-using innovative work and so forth. This is the danger that Mark Ramm recently warned the Django world about (video link, sorry; there’s no transcript anywhere AFAICT).
But more personally, I just don’t feel that I’m young enough to waste much more of my career on dead-end tech. Python is a plenty big pond for me to swim in; but most of the fish in that pond wouldn’t touch ZODB with a ten-foot pole. That’s a shame, maybe, but perception is reality, and if this was going to change, it would have changed by now. It’s been over 10 years now.
Given that, the time that I spend using ZODB could be better spent learning skills that I can more realistically apply on more future projects. Maybe even (gasp) non-Python projects.
It may seem selfish to harp on how this affects an individual’s career, but sometimes the events in your life push you in that direction. Having ZODB and Zope on my resumé gives me a certain amount of hireability right now, I think because I’m a relatively experienced fish in this tiny (and, AFAICT, not growing) pond where demand (for now) seems to slightly outstrip supply. Will ZODB be even that relevant in, say, 10 years?

Okay, I’m putting my flame suit on now ![]()
(For a more generally enthusiastic take on ZODB, you should have a look at Chris McDonough’s blog post on ZODB compared to CouchDB. )