Enterprise LAMP

Andy Todd: Weird easy_install Behaviour

Dear lazyweb, I unsubscribed from the distutils-sig mailing list a while back and consequently I’m not up to date with the latest to-ings and fro-ings. But, I have a problem. As reported by someone today Gerald eggs won’t install on Windows.

Everything is fine on my Ubuntu virtual machine, but on my shiny new work laptop I have Python 2.6 and today I downloaded and installed setuptools version 06.c11. When I try and install Gerald I get an error complaining about a lack of a setup.py file;

(TEST) C:\Work\virtualenvs\TEST>easy_install gerald
Searching for gerald
Reading http://pypi.python.org/simple/gerald/
Reading http://halfcooked.com/code/gerald/
Reading http://sourceforge.net/project/showfiles.php?group_id=53184&package_id=109623
Reading http://sourceforge.net/projects/halfcooked/files
Best match: gerald 0.3.5
Downloading http://sourceforge.net/projects/halfcooked/files/gerald/0.3.5/gerald-0.3.5-py2.6.egg/download
Processing download
error: Couldn't find a setup script in c:\docume~1\andy~1.tod\locals~1\temp\easy_install-woqly0\download
(TEST) C:\Work\virtualenvs\TEST>

The only thing that I can find different is that my Ubuntu virtual machine is running version 0.6c9 of setuptools. Has the function changed between two release candidates?

Needless to say this means that Gerald won’t install under Windows using easy_install until I figure this out. All help and suggestions warmly received.

Test Automation using Perl classes

The conference season is warming up and so I’ll start offering my
Test Automation using Perl training class so I can have an excuse to
go to the workshops and conferences. Here is the schedule:

March 8-11, Berlin, Germany, after CeBIT where we ha…

Richard Tew: Patching through code modification

Previous post: Tracking class instantiations

As I have been exploring patching __init__ of classes loaded by my code reloading framework so that I can track creation of instances, I’ve been considering other approaches.

In the previous post, where there was an existing __init__ method, I renamed it and had my replacement __init__ call it before it registered the freshly created instance. But I can do better, if I modified the bytecode of the existing method, I could inject my registration call directly into it. As an optimisation, in this case it does not add much. But it is interesting to look into, and there is the possibility that this sort of functionality can be added in a more general way within the code reloading framework.

I found three commonly mentioned bytecode manipulating frameworks:

  • bytecodehacks: No longer maintained and out of date for 2.6.
  • BytecodeAssembler: Lots of dependencies and it only allows creation of bytecode, not modification of existing bytecode.
  • byteplay: One file, allows modification of existing code, works out of the box.

byteplay looks like the only suitable candidate that I can just pick up and use.

Code to be modified:

>>> class Test:...     def __init__(self):...             if f():...                     print 1...                     return...             if g():...                     print 2...                     return...             print 3...

I want to make my injected call after the logic in the function has been executed, but before it returns. In this function, there are multiple return points.

Passing the code into byteplay:

>>> import byteplay>>> c = byteplay.Code.from_code(Test.__init__.func_code)>>> print c.code

  3           1 LOAD_GLOBAL          f              2 CALL_FUNCTION        0              3 JUMP_IF_FALSE        to 13              4 POP_TOP

  4           6 LOAD_CONST           1              7 PRINT_ITEM              8 PRINT_NEWLINE

  5          10 LOAD_CONST           None             11 RETURN_VALUE        >>   13 POP_TOP

  6          15 LOAD_GLOBAL          g             16 CALL_FUNCTION        0             17 JUMP_IF_FALSE        to 27             18 POP_TOP

  7          20 LOAD_CONST           2             21 PRINT_ITEM             22 PRINT_NEWLINE

  8          24 LOAD_CONST           None             25 RETURN_VALUE        >>   27 POP_TOP

  9          29 LOAD_CONST           3             30 PRINT_ITEM             31 PRINT_NEWLINE             32 LOAD_CONST           None             33 RETURN_VALUE

Basically I want to inject my call before each LOAD_CONST None/RETURN_VALUE pair.

Code to inject:

>>> def f(self):...     events.Register(self)

Passing the code into byteplay:

>>> c2 = byteplay.Code.from_code(f.func_code)>>> print c2.code

  2           1 LOAD_GLOBAL          events              2 LOAD_ATTR            Register              3 LOAD_FAST            self              4 CALL_FUNCTION        1              5 POP_TOP

              6 LOAD_CONST           None              7 RETURN_VALUE

Basically I want to select the bytecode entries matching displayed lines 1 through 7 and insert them in place of any existing pairs as described above. But something these bytecode listings do not show, is that line numbers are also marked up with bytecode entries. So I need to make sure I do not obliterate existing line numbers in the code I am modifying, or copy over line numbers in the code I am injecting.

Injecting the call before the returns:

offset = len(c.code) - 1lastInstruction = Nonewhile offset >= 0:    instruction, value = c.code[offset]    if lastInstruction == byteplay.RETURN_VALUE and \       instruction == byteplay.LOAD_CONST:        c.code[offset:offset+2] = c2.code[1:]    lastInstruction = instruction    offset -= 1

The resulting bytecode:

>>> print c.code

  3           1 LOAD_GLOBAL          f              2 CALL_FUNCTION        0              3 JUMP_IF_FALSE        to 18              4 POP_TOP

  4           6 LOAD_CONST           1              7 PRINT_ITEM              8 PRINT_NEWLINE

  5          10 LOAD_GLOBAL          events             11 LOAD_ATTR            Register             12 LOAD_FAST            self             13 CALL_FUNCTION        1             14 POP_TOP             15 LOAD_CONST           None             16 RETURN_VALUE        >>   18 POP_TOP

  6          20 LOAD_GLOBAL          g             21 CALL_FUNCTION        0             22 JUMP_IF_FALSE        to 37             23 POP_TOP

  7          25 LOAD_CONST           2             26 PRINT_ITEM             27 PRINT_NEWLINE

  8          29 LOAD_GLOBAL          events             30 LOAD_ATTR            Register             31 LOAD_FAST            self             32 CALL_FUNCTION        1             33 POP_TOP             34 LOAD_CONST           None             35 RETURN_VALUE        >>   37 POP_TOP

  9          39 LOAD_CONST           3             40 PRINT_ITEM             41 PRINT_NEWLINE             42 LOAD_GLOBAL          events             43 LOAD_ATTR            Register             44 LOAD_FAST            self             45 CALL_FUNCTION        1             46 POP_TOP             47 LOAD_CONST           None

The next step is to make f, g and events, and to execute the modified bytecode.

Testing the bytecode:

>>> def f(): return False...>>> def g(): return False...>>> Test.__init__.im_func.func_code  = c.to_code()>>> class Events:...     def Register(self, instance):...             print "REGISTERED", instance...>>> events = Events()>>> t = Test()3REGISTERED <__main__.Test instance at 0x01D2DAD0>

Excellent. I’ll have to think about the possibilities for this. It has potential to allow the creation of all sorts of interesting features in a code reloading framework.

Neil Schemenauer: Quixote 2.7b2 beta released

I released another Quixote beta about a week ago. I think it finally fixes the problem cased by Python 2.6’s breakage of the ihooks module. What happened was that the when relative imports were implemented, the ihooks module got forgotten. If you use the 2.6 ihooks module and any package uses a relative import, you lose.

I’ve fixed the ihooks module in the SVN trunk and it will be fixed in Python 2.7. Quixote 2.7b2 works around the problem by shipping with it’s own ihooks module.

Carl Trachte: OpenBSD and Python

Last time we covered FreeBSD’s third party module, freebsd; this time we’ll take a quick look at the equivalent openbsd package for the OpenBSD operating system.

$ python2.5
Python 2.5.4 (r254:67916, Jul  1 2009, 11:37:21)
[GCC 3.3.5 (propolice)] on openbsd4
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import openbsd
>>> dir(openbsd)
['__builtins__', '__doc__', '__file__', '__name__', '__path__', '_ifconfig', '_netstat', '_packetDescriptors', '_pcap', '_sysvar', 'arc4random', 'ifconfig', 'netstat', 'packet', 'pcap', 'utils']

Let’s see what all is hidden in that utils item:

>>> dir(openbsd.utils)
['DoubleAssociation', '__builtins__', '__doc__', '__file__', '__name__', 'cksum16', 'ethToBytes', 'ethToStr', 'findLongestSubsequence', 'getBlocks', 'ip6FromPrefix', 'ip6ToBytes', 'ip6ToStr', 'ipFromPrefix', 'ipToBytes', 'ipToStr', 'isIP6Addr', 'isIPAddr', 'isStringLike', 'multichar', 'multiord']

OK, a fair number of network addressing related functions.

help(openbsd.utils.ipFromPrefix)

ipFromPrefix(prefix)
    Produce an IPv4 address (netmask) from a prefix length.

That sounds handy.  Let’s give it a shot:

>>> openbsd.utils.ipFromPrefix(24)
‘255.255.255.0′

>>> help(openbsd.utils.DoubleAssociation)

Help on class DoubleAssociation in module openbsd.utils:

class DoubleAssociation(__builtin__.dict)
 |  A double-association is a broadminded dictionary – it goes both ways.
 |
 |  The rather simple implementation below requires the keys and values to
 |  be two disjoint sets. That is, if a given value is both a key and a
 |  value in a DoubleAssociation, you get unexpected behaviour.
 |
 |  Method resolution order:
 |      DoubleAssociation
 |      __builtin__.dict
 |      __builtin__.object
 |
 |  Methods defined here:
 |
 |  __init__(self, idict=None)
 |      # FIXME:
 |      #   While DoubleAssociation is adequate for our use, it is not entirely complete:
 |      #       – Deletion should delete both associations
 |      #       – Other dict methods that set values (eg. setdefault) will need to be over-ridden.

This one is kind of interesting – let’s have a look:

>>> d = {1:’a', 2:’b', 3:’c'}
>>> d.get(1)
‘a’
>>> print d.get(’a')
None
>>> da = openbsd.utils.DoubleAssociation(d)
>>> da.get(1)
‘a’
>>> da.get(’a')
1

Just like the doc described it.  Both the keys and the values are keys, if that makes sense.

Back up to the main modules of the openbsd package:

>>> help(openbsd.arc4random)

NAME
    openbsd.arc4random

FILE
    /usr/local/lib/python2.5/site-packages/openbsd/arc4random.so

FUNCTIONS
    getbytes(…)
        Get some random bytes.

And the result -

>>> bytesx = openbsd.arc4random.getbytes(10)
>>> [bytex for bytex in bytesx]
['\xb4', '\xd1', '\x86', '\xb7', 'g', '8', '\x10', '}', '\x8b', '\xe5']

One last module on a more common theme:

NAME
    openbsd.ifconfig – A Python module for querying and manipulating network interfaces.

FILE
    /usr/local/lib/python2.5/site-packages/openbsd/ifconfig.py

CLASSES
    __builtin__.int(__builtin__.object)
        FlagVal
    __builtin__.object
        Flags
        IFConfig
        Interface
        MTU
        Media
        Metric
    exceptions.Exception(exceptions.BaseException)
        _ifconfig.IfConfigError

    class FlagVal(__builtin__.int)
     |  Method resolution order:
(etc.)
    

>>> intx = openbsd.ifconfig.Interface(’rl0′)
>>> print intx
rl0: flags=8843 mtu 1500
         media: Ethernet autoselect
         link: 00:30:bd:72:6a:a0
         inet6: fe80:2::230:bdff:fe72:6aa0
         inet: 192.168.100.100
>>> dir(intx)
['Iftype', 'Name', '__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', '__weakref__', '_addrToStr', '_addrTypeLookup', '_getAddresses', '_getinfo', '_setflags', '_setmetric', '_setmtu', 'addAddress', 'addresses', 'delAddress', 'flags', 'media', 'metric', 'mtu', 'setAddress']
>>> intx.media
media: Ethernet autoselect
>>> intx.addresses
[{'address': {'sa_family': 18L, 'iftype': 'ETHER', 'address': '00:30:bd:72:6a:a0'}}, {'netmask': {'sa_family': 24L, 'address': 'ffff:ffff:ffff:ffff::'}, 'address': {'sa_family': 24L, 'address': 'fe80:2::230:bdff:fe72:6aa0'}}, {'netmask': {'sa_family': 0L, 'address': None}, 'dstaddr': {'sa_family': 2L, 'address': '192.168.100.255'}, 'address': {'sa_family': 2L, 'address': '192.168.100.100'}}]
>>>   

ifconfig available within Python – sweet.  rl0 is the ethernet device on my old Dell tower.

Examination of the openbsd package shows that it has quite a bit to offer.  If you’re using OpenBSD, there’s nothing stopping you from doing routine sysadmin tasks with Python.  If not, now you’ve got a reason to check it out.

Carl Trachte: Python Modules for the BSD’s

Well, for FreeBSD and OpenBSD, at least.  I can’t yet vouch for NetBSD and Dragonfly BSD.

First, FreeBSD – the port is named py-freebsd.  Once built, the module can be imported with “import freebsd”.

[carl@pcbsd]/usr/local/lib/python2.6/site-packages(158)% python
Python 2.6.2 (r262:71600, Jun 24 2009, 23:31:28)
[GCC 4.2.1 20070719 [FreeBSD]] on freebsd7
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import freebsd
>>> dir(freebsd)
['__doc__', '__file__', '__name__', '__package__', '__version__', 'chflags', 'const', 'fchflags', 'fstatfs', 'geom_getxml', 'getfsent', 'getfsfile', 'getfsspec', 'getfsstat', 'gethostname', 'getloadavg', 'getlogin', 'getosreldate', 'getpriority', 'getprogname', 'getpwent', 'getpwnam', 'getpwuid', 'getquota', 'getrlimit', 'getrusage', 'ifstats', 'ipstats', 'jail', 'kevent', 'kqueue', 'ktrace', 'lchflags', 'quotaoff', 'quotaon', 'quotasync', 'reboot', 'sendfile', 'sethostname', 'setlogin', 'setpriority', 'setproctitle', 'setprogname', 'setquota', 'setrlimit', 'statfs', 'sysctl', 'sysctldescr', 'sysctlmibtoname', 'sysctlnametomib', 'tcpstats', 'udpstats']



Not a bad collection of utilities.  Let’s take a couple for a test drive:

>>> freebsd.gethostname()
‘pcbsd’

>>> freebsd.getprogname()
‘python’
>>> help(freebsd.jail)  
Help on built-in function jail in module freebsd:

jail(…)
jail(path, hostname, ip_number):
The jail() system call sets up a jail and locks the current process
in it. The “path” should be set to the directory which is to be
the root of the prison. The “hostname” can be set to the hostname
of the prison. This can be changed from the inside of the prison.
The “ip_number” can be set to the IP number assigned to the prison.

>>> # wow, you can set up a jail with python


>>> freebsd.ifstats()
>>&gt; >>> import pprint
>>> pprint.pprint(_)
{’bge0′: {’addrlen’: 6,
‘baudrate’: 100000000L,
‘collisions’: 0L,
‘flags’: 34883,
‘hdrlen’: 14,
‘hwassist’: 7L,
‘ibytes’: 19222590L,
‘ierrors’: 0L,
‘imcasts’: 577L,
‘ipackets’: 19728L,
‘iqdrops’: 0L,
‘metric’: 0L,
‘mtu’: 1500L,
‘name’: ‘bge0′,
‘noproto’: 0L,
‘obytes’: 2009038L,
‘oerrors’: 0L,
‘omcasts’: 0L,
‘opackets’: 13285L,
‘pcount’: 0,
‘physical’: 0,
’snd_drops’: 0,
’snd_len’: 0,
’snd_maxlen’: 511,
‘type’: 6},


bge0 is the ethernet device on my Thinkpad.


>>> freebsd.getlogin()
‘carl’
>>> freebsd.tcpstats()
>>> pprint.pprint(_)
{’accepts’: 0L,
‘badsyn’: 0L,
‘cachedrtt’: 147L,
‘cachedrttvar’: 150L,
‘cachedssthresh’: 4L,
‘closed’: 495L,
‘connattempt’: 360L,
‘conndrops’: 20L,
‘connects’: 340L,
‘delack’: 277L,
‘drops’: 22L,
‘keepdrops’: 0L,
‘keepprobe’: 0L,
‘keeptimeo’: 0L,
‘listendrop’: 0L,
‘mturesent’: 0L,
‘pawsdrop’: 0L,
‘persistdrop’: 0L,
‘persisttimeo’: 0L,
‘predack’: 0L,
‘preddat’: 15226L,
‘rcvackbyte’: 1093284L,
‘rcvackpack’: 1848L,
‘rcvacktoomuch’: 0L,
‘rcvafterclose’: 7L,
‘rcvbadoff’: 0L,
‘rcvbadsum’: 0L,
‘rcvbyte’: 16595286L,
‘rcvbyteafterwin’: 0L,
‘rcvdupack’: 232L,
‘rcvdupbyte’: 88723L,
‘rcvduppack’: 77L,
‘rcvoobyte’: 1015050L,
‘rcvoopack’: 919L,
‘rcvpack’: 15882L,
‘rcvpackafterwin’: 0L,
‘rcvpartdupbyte’: 525L,
‘rcvpartduppack’: 2L,
‘rcvshort’: 0L,
‘rcvtotal’: 18489L,
‘rcvwinprobe’: 0L,
‘rcvwinupd’: 3L,
‘rexmttimeo’: 118L,
‘rttupdated’: 1817L,
’sc_aborted’: 0L,
’sc_added’: 0L,
’sc_badack’: 0L,
’sc_bucketoverflow’: 0L,
’sc_cacheoverflow’: 0L,
’sc_completed’: 0L,
’sc_dropped’: 0L,
’sc_dupsyn’: 0L,
’sc_recvcookie’: 0L,
’sc_reset’: 0L,
’sc_retransmitted’: 0L,
’sc_sendcookie’: 0L,
’sc_stale’: 0L,
’sc_unreach’: 0L,
’sc_zonefail’: 0L,
’segstimed’: 1688L,
’sndacks’: 9261L,
’sndbyte’: 1098259L,
’sndctrl’: 697L,
’sndpack’: 1252L,
’sndprobe’: 0L,
’sndrexmitbyte’: 2252L,
’sndrexmitpack’: 2L,
’sndtotal’: 12381L,
’sndurg’: 0L,
’sndwinup’: 1169L,
‘timeoutdrop’: 9L}

 
22 drops, 9 of them timeouts, and a bunch of other stuff too.

Enough for today.  Next time we’ll take a quick look at the Python module for OpenBSD.


phpUnderControl 0.5.1 released – Manuel Pichler

Today I have released phpUnderControl version 0.5.1. It’s a bug fix release that closes several issues open since a long time. First of all I would like to thank Sebastian Marek who was the main contributor to this releases, so a big thankyou to you.

  • Now phpUnderControl should work with CruiseControl 2.8.3. Thanks to Mike van Riel who provided some hints on this issue in a blog comment.
  • Fixed #983: Graph unitests throw fatal error when ezComponents not available.
  • Fixed #966: phpcs-details.xsl not showing file name.
  • Closed #863: Destination option is now deprecated.
  • Fixed #862: Command line switches without parameter don’t work.
  • Fixed #861: Password is used as username in check outs. This patch was supplied by Thorsten Daners via e-mail.
  • Fixed #734: Now the build dropdown redirects to the correct build uri.
  • Implemented #703: PHPUnit test results are now the first entry on the project overview page.
  • Fixed #700: Throw an exception when the specified project does not exist.
  • Implemented #675: Use “php -l” for lint checking and not PHPUnit.
  • Implemented #625: Integrate PHP_Depend results.

Truncated by Planet PHP, read more at the original (another 2141 bytes)

TechniqueNW 10 – Stuart Herbert

Whilst everyone else was over at PHP Benelux 10 (which sounded like a great conference according to the Twitter feedback!), I was up in Morecambe, at the Technique|NorthWest training event organised by Northwest Vision and Media and run by The White Room.  A huge thanks to Paul Collins for inviting me up at the last minute to run the PHP workshop on the Saturday, and I’d love to be involved in further events like this.

I had a great time at the event, and I was delighted to see how the North West of England is trying to build and support a digital economy, instead of simply leaving it to chance.  If only South Wales had such an initiative!

Perhaps the most interesting thing I took from the weekend was the large disconnect between the people who attended and many of my friends on Twitter.  If you listen to the Twitterarti, you’d think that Adobe Flash is a technology that has run its course and is now in terminal decline (mostly because the iPhone and iPad do not support it, plus Adobe not seen as exactly a bastion of innovation these days).  And yet, by far the most popular workshop at Technique|NorthWest was the Flash workshop.  To these people, Flash is not only still relevant, but in their industry it is still the only real option for delivering online advertising campaigns.

Food for thought.

PS: I also took some photos of Morecambe before the Saturday workshops started.

Live exploiting aus Angreifersicht (XSS / CSRF), Vortrag@Mayflower-Würzburg – ThinkPHP /dev/blog – PHP

Am kommenden Donnerstag, den 04.02.2010 findet wieder ein öffentlicher Vortrag im Mayflower Büro in Würzburg statt (Pleichertorstrasse 2, 97070 Würzburg, Straßenbahn und die Haltestelle Congress Centrum).
Beginn ist um 18:00 Uhr, Thema des Vortrags ist “Live exploiting aus Angreifersicht (XSS / CSRF)“.

Anhand von interaktiven Beispielen erklärt Frank Ruske die Sicherheitsprobleme XSS (Cross-Site Scripting) & CSRF (Cross-Site Request Forgery). Welche Gefahren gibt es und wie werden diese Lücken ausgenutzt – ist das zentrale Thema dieses Vortrages?.

Die “Donnerstags-Vorträge” werden sowohl in Würzburg als auch in München gehalten. Bei Interesse einfach das Blog beobachten, um auf dem Laufenden zu bleiben!
Wir freuen uns auf viele Teilnehmer!

Größere Kartenansicht

Matt Wilkes: Why WSGI?

Earlier today I tried to write an explanation of why WSGI technologies are useful to developers, but each attempt sounded too much like snake-oil.  When you start to list what advantages you get from applying WSGI best-practises to a problem the results sound fantastic, and frankly unbelievable, but they’re all true.

Caveat emptor: I haven’t properly tested the code in this post, it should all be treated as illustrative pseudo-code. Also, this is really long, you might want to get a cup of tea.

Separation of front- and back-end

There has been a lot of noise about deliverance in the Plone community recently, many consultancy companies have deployed Deliverance-based sites and even plone.org has a deliverance front-end.  The most important advantage of this pattern is that it stops front-end developers being blocked by back-end considerations.

When developing a Plone site a designer can very rarely jump straight into modifying templates on day one.  For one thing, the markup isn’t all together, such as with the portal_tabs navigation bar.  If there is a custom markup needed for these tabs first new viewlets and views need to be created to override the templates being used, and a test instance set up to give the markup guys something to work with.  Otherwise there’d be another significant amount of work to do in integrating it later in the project.

Either way, nobody can see these new tabs in place until there has been some backend developer time allocated to the problem.  When you consider the sheer number of places markup changes happen, this is clearly an untenable situation.

By using a transformational middleware like deliverance developers can start writing production markup on day one of a project. This can (and does!) happen before the backend developers have even finished talking about what technologies to use. This extra time and freedom truly is invaluable, as it means at no point will the look and feel of the site block development of the functionality or vice-versa.

Right from the start functionality and UI can go to humant testers who help develop the automated test suite.  As time progresses and the deliverance theme starts incorporating more rules to integrate with the back-end testing of the integration of the two can begin.  Without this separation components of the UI can’t really be tested in isolation.

Reusable components

There are two very popular Plone packages providing CAPTCHA support, collective.captcha and collective.recaptcha.  To embed a CAPTCHA on your page you include:

<tal:captcha replace="here/@@captcha/image_tag" />
<input type="text" name="captcha" />

and in the code that verifies your form submission you extract the value from the request and pass it back to the captcha view:

captcha = context.REQUEST.form.get("captcha", None)
if not context.restrictedTraverse("@@captcha").verify(captcha):
    raise SpamBotException("CAPTCHA failed")

While there are convenience methods for integrating this with the various places forms are generated in Plone, this is really quite nasty.  While we have a nice, generic method for getting and verifying CAPTCHAs, it is very much tied to Plone and its various forms.

If we were to write this as a WSGI middleware we’d want to take the CAPTCHA generation and checking away from the application, so there are no import dependencies.  In essence, the CAPTCHA field is just a check if the user is human, the simplest one being a simple checkbox.  This isn’t going to filter out many spambots, but it does capture the essence of the problem, and is very simple to include in forms:

<input type="checkbox" name="isHuman" id="isHuman" />

This can be included in any forms readily, as any form library worth its salt can create a simple checkbox, then the backing code merely ensures that it’s a required field. Simples.

However, we’ve got all these nice CAPTCHA libraries, so we create a middleware that grabs this from outgoing requests and replaces it with the full CAPTCHA that we’d otherwise have pulled directly into the form.  When a request comes back through with a CAPTCHA response in it we verify it’s correct and use that to set the value of isHuman.  That would look a little like this:

class CAPTCHAMiddleware(object):

    def __init__(self, app):
        self.app = app

    def __call__(self, environ, start_response):
        req = Request(environ)
        if "captcha" in req.params:
           value = req.params["captcha"]
           del req.params["captcha"]
           req.params["isHuman"] = self.verify(value)
        res = req.get_response(self.app)
        XML = lxml.etree.XML(res.body)
        captcha = XML.xpath("//input[@name='isHuman']")
        if captcha:
            new = lxml.etree.XML(self.getCapcha())
            captcha[0].getparent().replace(captcha[0], new)
            res.body = lxml.etree.tostring(XML)
        return res(environ, start_response)

This method can then be tested to destruction to ensure that there are no ways of circumventing the CAPTCHA.  All the time the site still works with its naïve CAPTCHA in place.  Any updates to the CAPTCHA product can be done, tested and deployed separately to the main site.

Let’s do a feature comparison:

@@captcha CAPTCHAMiddleware
Can switch to other implementations ✔ ✔
Works without customising a form ✘ ✘
Works with any form library OOTB ✘ ✔
No vendor lock-in ✘ ✔

Seamless upgrades of legacy sites

So, there’s a distinct advantage shown above, but it’s not mind-blowing.  The second element still bothers me.  This would mean that although it’s easy to create new sites that use this CAPTCHA middleware it’s not easily backported.  While having this discussion with Alan Hoey of Team Rubber he suggested that he’d not use a new input method, but have a configuration file for which forms need CAPTCHAs added, and simply shield the backend from them.

At first, this sounded like a simple disagreement on implementation, until I realised that it could be elegantly implemented as a second middleware!

In this case, you’d not only have the CAPTCHA middleware I describe above, but a second middleware lower down in the stack which takes a configuration file and adds isHuman checkboxes to other forms.  If the isHuman value in the request is set to False for one of these forms it will simply raise an error.

There is, however, good reason to keep with the original architecture instead of relying on searching for the isHuman checkbox, and that is it allows applications to signal that they’re aware of how CAPTCHAs work and handle them elegantly.  Also, it allows careful positioning of CAPTCHAs in forms, the opportunity earlier to test they are shown everywhere that is needed and easier handling of dynamic forms, such as comments.

This new middleware, however, would make integration into older sites easy, and the feature matrix would start to look like:

@@captcha CAPTCHAMiddleware Two Middlewares
Can switch to other implementations ✔ ✔ ✔
Works without customising a form ✘ ✘ ✔
Works with any form library OOTB ✘ ✔ ✔
No vendor lock-in ✘ ✔ ✔

We have a winner! By breaking the problem down into small sections we now have a very extensible system that could be applied to new and old sites with a minimum of effort.  We’re not tied to Plone, or even Zope, and both the backend application and the CAPTCHA system can be readily tested in isolation.

It’s this kind of modularisation that provides the great wins for WSGI.  This originally came out of discussion in #repoze in which it was suggested that the environ could hold a special environment variable and call a special function to add a CAPTCHA to a form.  Such a system would fail all 4 of my tests above, and unfortunately many popular WSGI middlewares are written this way.  Taking the extra time and fitting your problem to normal HTTP requests and responses makes for much more re-usable middlewares.

Testing

I’ve mentioned this a few times, but it’s worth really emphasising. Many WSGI middlewares will be barely a hundred lines long, many will be much shorter. The size of the tests for these can be orders of magnitude greater than the code itself. Many people claim that 100% test coverage just isn’t possible, but when writing this kind of function it’s not a very good idea but you can easily write a very comprehensive set of tests in a short period of time.

As the middleware can evolve on its own, separately from the application that spawned it, it can get new tests and releases as part of the development of other sites. This means that old applications see benefits of new client work faster and more often than monolithic applications.

Outside the process

Matt Hamilton’s excellent Lipstick on a Pig talk is a great example of this. WSGI middlewares can be layered onto HTTP proxies, allowing them to work on any backend, Python or not. wsapi4plone can work in the same way, it is completely agnostic of whether it is plumbed directly into Zope by a WSGI stack, or if it’s being proxied out to a different instance.

By following the WSGI best-practises of depending on HTTP requests and responses rather than making Python calls between layers this kind of flexibility is built directly into your system. This means a legacy site can easily have WSGI middlewares layered on-top of it without even restarting the Zope process.

Cool, huh?

keep looking »

Warning: include(/home/remarkwit/enterpriselamp.org/wp-content/themes/Enterprise_LAMP/r_sidebar.php) [function.include]: failed to open stream: No such file or directory in /home/remarkwit/enterpriselamp.org/wp-content/themes/Enterprise_LAMP/archive.php on line 23

Warning: include() [function.include]: Failed opening '/home/remarkwit/enterpriselamp.org/wp-content/themes/Enterprise_LAMP/r_sidebar.php' for inclusion (include_path='.:/usr/local/lib/php:/usr/local/php5/lib/pear') in /home/remarkwit/enterpriselamp.org/wp-content/themes/Enterprise_LAMP/archive.php on line 23