2008/09/19

Anyway, you should stop finding bugs!

The title of this post is the phrase that Paolo (my brother) told me after the bug I discovered was actually confirmed on the Zope mailing list.

The bug I've found is something which has haunted me for many years; I'm not sure why I've experienced this bug, and other people didn't; I'm surprised about it, but what I know is that software usually is known to work in a sound way in a known field. Outside of that field, you can't be sure about what it happens. This has nothing to do with closed or open source software: software is complex, and you learn how it works by experience.

After this rant, let me show a small piece of software which can show the bug in action:

import os
import sys
import unittest

if __name__ == '__main__':
execfile(os.path.join(sys.path[0], '../framework.py'))

from Testing import ZopeTestCase

from OFS import Image

from Products.ZPsycopgDA.DA import manage_addZPsycopgConnection
from Products.ZSQLMethods import SQL


class DoubleTransactionTest(ZopeTestCase.ZopeTestCase):

def _add_big_image(self, value, data):
Image.manage_addFile(self.app, "f%06s" % value, data , "a title")

def test_showdouble(self):
manage_addZPsycopgConnection(self.app, "db_connection", "", "host=localhost user=postgres dbname=template1")
self.app._setObject('sql', SQL.SQL("sql", "", "db_connection", "", "select * from pg_tables"))
self.app.sql()
data = "*" * (1 << 20)
for x in range(1000):
self._add_big_image(x, data)
print "Added %s " % x
self.app.sql()

if __name__ == '__main__':
unittest.main()


This small piece of code (which is actually a ZopeTestCase) does the following things:




  1. creates a new database connection, using the ZPsycopg Database Adapter;

  2. creates a SQL method (named, with a lack of fantasy, "sql");

  3. runs the SQL method;

  4. adds 1000 "big" images;

  5. runs again the same SQL method.




In order to understand what is the problem, you should enable the logging of queries; you would end with something like this:

SET DATESTYLE TO 'ISO'
BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
statement: select * from pg_tables
SET DATESTYLE TO 'ISO'
BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
select * from pg_tables
ABORT
ABORT

Can you see the problem? There are two database transactions! For one transaction in Zope! Indeed, one run of the test can be considered as just one transaction of Zope.

How can it be?

Well, actually I've been able to dig this problem, even though, at the moment, I've not found a "real" solution.

The bug is triggered by the size of the file (the size had been chosen so that the bug is shown, of course): when a big image is loaded, the following path of code is executed inside the OFS.Image.File._read_data method:

      if size <= 2*n:
seek(0)
if size < n: return read(size), size
return Pdata(read(size)), size

# Make sure we have an _p_jar, even if we are a new object, by
# doing a sub-transaction commit.
transaction.savepoint(optimistic=True)


So, for a large enough file (actually, greater than 128k) the transaction.savepoint is called, which, in turns causes the ZODB.Connection.savepoint method to be called. Which, just before the end, does the following:

        # While the interface doesn't guarantee this, savepoints are
# sometimes used just to "break up" very long transactions, and as
# a pragmatic matter this is a good time to reduce the cache
# memory burden.
self.cacheGC()
return result

So, just before ending, it cause the cache to be minimized; in particular, this causes the database connection to be ghostified, i.e. to be removed from the memory; but this also means that its "volatile" attributes are lost; and what is among its volatile attributes? The actual connection to the database! So, just after, when we call again the SQL method, the object is restored in memory; since it hasn't a good connection anymore, it creates a new one, thus beginning a new transaction for the same Zope transaction!

Removing the self.cacheGC() calls removes the problem, but I'm not sure about the consequences; therefore, I'm avoiding it, at the moment. But I'll see what I should do about it.

Nessun commento: