Enterprise LAMP

Catching up on ZendCon – Echolibre

Last week our very own Helgi was at ZendCon in San José, California. He was there as a speaker to talk about Frontend Caching and “PEAR2 and Pyrus”

The first talk I gave was about frontend caching and how you can get the most speed out of your website by optimizing the various bits of the frontend.

Make sure to catch “The aftermath” on Helgi’s blog as you may get a better idea of what we do in conferences and what happens in general! :)

Catching up about ZendCon – Echolibre

Last week our very own Helgi was at ZendCon in San José, California. He was there as a speaker to talk about Frontend Caching and “PEAR2 and Pyrus”

The first talk I gave was about frontend caching and how you can get the most speed out of your website by optimizing the various bits of the frontend.

Make sure to catch “The aftermath” on Helgi’s blog as you may get a better idea of what we do in conferences and what happens in general! :)

Christian Scholz: Georg Greve’s Keynote at the Plone Conference 2009

Georg Greve of the FSF at the Plone Conference 2009

Georg Greve from the Free Software Foundation Europe was talking about decision making. He started explaining how he moved from being a software developer to being a politician.

He takes the relicensing policy as an example and wonders how we reached a decision on it. What is a majority? How do we define the qualified majority needed? Here the Plone Foundation is one important piece. If we wouldn’t have the Plone Foundation it would come down to consensus. This works well in certain group but sometimes it also leads to shoutocracy where those people who shout the loudest go on doing so and the others at some point simply give up.

This is not applicable in many cases. E.g. in the Plone Foundation it is about copyright and you need unanimous agreement, you need everybody to say “Yes”.

Copyright Law

Copyright laws are man made by man made processes. So where does it come from?
It starts with national levels which have some form of majority. In a bi/multilateral situation it is unanimity and in the united nations it’s consensus.

This sounds as if the bi/multilateral decisions are easier to make than the UN ones but that’s not necessarily true. There are a lot of rules involved and it might come down to harsher negotiations. It is also complicated with certain regimes where they technically have 100% of their people behind them.

But why do copyright laws happen? There are always some reasons behind those laws. In copyright it is about “fairness” politicians say. But what does “fair” mean? Also politicians are not expert in this field but they still need to make decisions. So they rely on other people to help with that decision.

He gives TRIPS (Trade Related aspects of Intellectual Property Rights) as an example. Originally it was driven by a very very targeted agenda by large US corporation, by a coalition of the willing. Only those who wanted the maximum went along. And now governments need to enforce something which was created by a group of coporations. He mentions the U.S. Special 301 list with the “naughty” countries who do not enforce that.

He mentions a study (LAWRENCE) where 3 professors found out that lobby expenditures can receive you a 22,000 percent return on investment. This is for tax regulations, we don’t know about copyright lobbying. So it’s a worthwhile thing to do for companies.

On which side do you wanna be? What’s your perspective of this?

“Politics is too serious a matter to be left to politicians” (Charles de Gaulle)

The only way your voice is heard is that you need to be at that table where decisions are made. E.g. Plone is also affected by this. Like software patents, web standards, etc.

“Those who are too smart to engage in politics aree punished by being governed by those who are dumber” (Plato)

Inactivity is not an option. There are moments where we have to act!

Governance?

This is going in two directions, the external government between Plone and the outside world. But there is also the inside governance, how does the community work in itself?

Plone already decided to be a community instead of being a single vendor. He thinks that this is the more sustainable approach. The single-vendor approach is complicated because it’s only one company you don’t know the future about.

There are also other issues that come up. How do you address growth? This is one of the most challenging issue. How do we keep the substance strong as people get in and how do we ensure that our community remains able to make decisions. No one has found the answer to that yet though.

What about the structural “bottleneck bugs”?

There is things in the plumbing which you might want to redo at some point. KDE just has gone through such a process which was rather painful. Many projects encounter such situations. It is very hard to find volunteers to do that work. You also cannot find a customer to pay for all of this. There is no visual change.

How do we find ways to address those problems? He knows that the Plone community is thinking about this and he also helped the Open Database Alliance to find a solution for this. There the membership fees are done like this: 50% allocated by members, 20-30% allocated by board. Members can allocate fund for certain projects. All the members interested in this can pool their funds for this project.

In order for this to work you need the right choices. The ODBA has certain seats for users, developers etc.

Food for thought

  1. Get active. Find ways to express your political agenda! Just being a small foundation of a few 100 people is not an excuse. You can make a difference!
  2. The most important asset of Plone is YOU! In the end whether the PF is structured this or that way, the strategic choices still need to express what the community wants. He was positively impressed by what he experiences at the conference as he experiences a positive and lively community. Keep it that way!
  3. Coordinate with your allies, like FSF. Talk to them as often as you can. Find common priorities and build alliances on them.

This live blogging report is without any guarantees of correctness and corrections are very welcome in the comments!

S. Lott: Painful Python Import Lessons

Python’s packages and modules are — generally — quite elegant.

They’re relatively easy to manage. The __init__.py file (to make a module into a package) is very elegant. And stuff can be put into the __init__.py file to create a kind of top-level or header module in a larger package of modules.
To a limit.
It took hours, but I found the edge of the envelope. The hard way.
We have a package with about 10 distinct Django apps. Each Django app is — itself — a package. Nothing surprising or difficult here.
At first, just one of those apps used a couple of fancy security-related functions to assure that only certain people could see certain things in the view. It turns out that merely being logged in (and a member of the right group) isn’t enough. We have some additional context choices that you must make.
The view functions wind up with a structure that looks like this.
@login_requireddef someView( request, object_id, context_from_URL ):   no_good = check_other_context( context_from_URL )   if no_good is not None: return no_good   still_no_good = check_session()   if still_no_good is not None: return still_no_good   # you get the idea

At first, just one app had this feature.
Then, it grew. Now several apps need to use check_session and check_other_context.
Where to Put The Common Code?
So, now we have the standard architectural problem of refactoring upwards. We need to move these functions somewhere accessible. It’s above the original app, and into the package of apps.
The dumb, obvious choice is the package-level __init__.py file.
Why this is dumb isn’t obvious — at first. This file is implicitly imported. Doesn’t seem like a bad thing. With one exception.
The settings.
If the settings file is in a package, and the package-level __init__.py file has any Django stuff in it — any at all — that stuff will be imported before your settings have finished being imported. Settings are loaded lazily — as late as possible. However, in the process of loading settings, there are defaults, and Django may have to use those defaults in order to finish the import of your settings.
This leads to the weird situation that Django is clearly ignoring fundamental things like DATABASE_ENGINE and similar settings. You get the dummy database engine, Yet, a basic from django.conf import settings; print settings.DATABASE_ENGINE shows that you should have your expected database.
Moral Of the Story
Nothing with any Django imports can go into the package-level __init__.py files that may get brought in while importing settings.

The Real Time Web explained with a Real World Example – Christian Stocker

Last week I gave 2 techtalks (one in Fribourg and one in Zurich) about how we use some of the “Real Time Web” technologies in Flux CMS and its related services. The graph for this looks like that:

Myrealtimeweb-1

And the details to this can be seen in the slides.

I also plan to write some blogposts about some of the technologies used as the blue ones in the graph are written by us and Open Source (see the links at the end of the slides). And I really hope it will end up in more than just a plan :)

If you want to know something in particular before I write those posts, just ask in the comments.

Eli Bendersky: Handling out-of-memory conditions in C

We’ve all been taught that when malloc returns 0, it means the machine ran out of memory. This case should be detected and "handled" by our application in some graceful manner. But what does "handled" mean here? How does an application recover from an out of memory (OOM) condition? And what about the increased code complexity of checking all those malloc return values and passing them around?

In this article I want to discuss the common policies of handling OOM conditions in C code. There is no single right approach. Therefore, I will review the code of several popular applications and libraries, to find out how they do it in order to gain useful insights for my own programming.

Note that I focus on desktop & server applications here, not embedded applications, which deserve an article of their own.

The policies

Casting minor variations aside, it’s safe to say there are three major policies for handling OOM:

recovery

The recovery policy is the least commonly used because it’s the most difficult to implement, and is highly domain-specific. This policy dictates that an application has to gracefully recover from an OOM condition. By "gracefully recover", we usually mean one or more of:

  • Release some resources and try again
  • Save the user’s work and exit
  • Clean up temporary resources and exit

Recovery is hard. To be certain that your application recovers correctly, you must be sure that the steps it takes don’t require any more dynamic memory allocation. This sometimes isn’t feasible and always difficult to implement correctly. Since C has no exceptions, memory allocation errors should be carefully propagated to the point where they can be recovered from, and this sometimes means multiple levels of function calls.

abort

The abort policy is simple and familiar: when no memory is available, print a polite error message and exit (abort) the application. This is the most commonly used policy – most command-line tools and desktop applications use it.

As a matter of fact, this policy is so common that most Unix programs use a gnulib library function xmalloc instead of malloc:

void *
xmalloc (size_t n)
{
  void *p = malloc (n);
  if (!p && n != 0)
    xalloc_die ();
  return p;
}

When this function is called, its return value isn’t checked, reducing the code’s complexity. Here’s a representative usage from the find utility:

cur_path = xmalloc (cur_path_size);
strcpy (cur_path, pathname);
cur_path[pathname_len - 2] = '/';

segfault

The segfault policy is the most simplistic of all: don’t check the return value of malloc at all. In case of OOM, a NULL pointer will get dereferenced, so the program will die in a segmentation fault.

If there are proponents to this policy, they’d probably say – "Why abort with an error message, when a segmentation fault would do? With a segfault, we can at least inspect the code dump and find out where the fault was".

Examples – libraries

In this section, I present the OOM policies of a couple of well-known libraries.

Glib

Glib is a cross platform utility library in C, used most notably for GTK+. At first sight, Glib’s approach to memory allocation is flexible. It provides two functions (with several variations):

  • g_malloc: attempts to allocate memory and exits with an error if the allocation fails, using g_error [1]. This is the abort policy.
  • g_try_malloc: attempts to allocate memory and just returns NULL if that fails, without aborting.

This way, Glib leaves the programmer the choice – you can choose the policy. However, the story doesn’t end here. What does Glib use for its own utilities? Let’s check g_array for instance. Allocation of a new array is done by means of calling g_array_maybe_expand that uses g_realloc, which is implemented with the same abort policy as g_malloc – it aborts when the memory can’t be allocated.

Curiously, Glib isn’t consistent with this policy. Many modules use g_malloc, but a couple (such as the gfileutils module) use g_try_malloc and notify the caller on memory allocation errors.

So what do we have here? It seems that one of the most popular C libraries out there uses the abort policy of memory allocations. Take that into account when writing applications that make use of Glib – if you’re planning some kind of graceful OOM recovery, you’re out of luck.

SQLite

SQLite is an extremely popular and successful embedded database [2]. It is a good example to discuss, since high reliability is one of its declared goals.

SQLite’s memory management scheme is very intricate. The user has several options for handling memory allocation:

  • A normal malloc-like scheme can be used
  • Allocation can be done from a static buffer that’s pre-allocated at initialization
  • A debugging memory allocator can be used to debug memory problems (leaks, out-of-bounds conditions, and so on)
  • Finally, the user can provide his own allocation scheme

I’ll examine the default allocation configuration, which is a normal system malloc. The SQLite wrapper for it, sqlite3MemMalloc defined in mem1.c is:

static void *sqlite3MemMalloc(int nByte){
  sqlite3_int64 *p;
  assert( nByte>0 );
  nByte = ROUND8(nByte);
  p = malloc( nByte+8 );
  if( p ){
    p[0] = nByte;
    p++;
  }
  return (void *)p;
}

malloc is used to obtain the memory. Moreover, the size of the allocation is saved right in-front of the block. This is a common idiom for allocators that can report the size of blocks allocated when passed the pointers [3].

As you can see, the pointer obtained from malloc is returned. Hence, SQLite leaves it to the user to handle an OOM condition. This is obviously the recovery policy.

Examples – applications

OOM handling in a few relatively popular applications.

Git

Distributed version control is all the rage nowadays, and Linus Torvalds’ Git is one of the most popular tools used in that domain.

Git defines its own xmalloc wrapper:

void *xmalloc(size_t size)
{
      void *ret = malloc(size);
      if (!ret && !size)
              ret = malloc(1);
      if (!ret) {
              release_pack_memory(size, -1);
              ret = malloc(size);
              if (!ret && !size)
                      ret = malloc(1);
              if (!ret)
                      die("Out of memory, malloc failed");
      }
#ifdef XMALLOC_POISON
      memset(ret, 0xA5, size);
#endif
      return ret;
}

When it runs out of memory, Git attempts to free resources and retries the allocation. This is an example of the recovery policy. If the allocation doesn’t succeed even after releasing the resources, Git aborts.

lighttpd

Lighttpd is a popular web server, notable for its speed and low memory footprint.

There are no OOM checks in Lighttpd – it’s using the segfault policy. Following are a few samples.

From network_server_init:

srv_socket = calloc(1, sizeof(*srv_socket));
srv_socket->fd = -1;

From rewrite_rule_buffer_append:

kvb->ptr = malloc(kvb->size * sizeof(*kvb->ptr));

for(i = 0; i < kvb->size; i++) {
        kvb->ptr[i] = calloc(1, sizeof(**kvb->ptr));

And there are countless other examples. It’s interesting to note that Lighttpd uses the lemon parser generator, a library which itself adheres to the abort policy. Here’s a representative example:

PRIVATE acttab *acttab_alloc(void){
  acttab *p = malloc( sizeof(*p) );
  if( p==0 ){
    fprintf(stderr,"Unable to allocate memory for a new acttab.");
    exit(1);
  }
  memset(p, 0, sizeof(*p));
  return p;
}

Redis

Redis is a key-value database that can store lists and sets as well as strings. It runs as a daemon and communicates with clients using TCP/IP.

Redis implements its own version of size-aware memory allocation function called zmalloc, which returns the value of malloc without aborting automatically when it’s NULL. All the internal utility modules in Redis faithfully propagate a NULL from zmalloc up to the application layer. When the application layer detects a returned NULL, it calls the oom function which does the following:

/* Redis generally does not try to recover from out
 * of memory conditions when allocating objects or
 * strings, it is not clear if it will be possible
 * to report this condition to the client since the
 * networking layer itself is based on heap
 * allocation for send buffers, so we simply abort.
 * At least the code will be simpler to read... */
static void oom(const char *msg) {
    fprintf(stderr, "%s: Out of memory\n",msg);
    fflush(stderr);
    sleep(1);
    abort();
}

Note the comment above this function [4]. It very clearly and honestly summarizes why the abort policy is usually the most logical one for applications.

Conclusion

In this article, the various OOM policies were explained, and many examples were shown from real-world libraries and applications. It is clear that not all tools, even the commonly used ones, are perfect in terms of OOM handling. But how should I write my code?

If you’re writing a library, you most certainly should use the recovery policy. It’s impolite at the least, and rendering your library unusable at worst, to abort or dump core in case of an OOM condition. Even if the application that includes your library isn’t some high-reliability life-support controller, it may have ideas of its own for handling OOM (such as logging it somewhere central). A good library does not impose its style and idiosyncrasies on the calling application.

This makes the code a bit more difficult to write, though not by much. Library code is usually not very deeply nested, so there isn’t a lot of error propagation up the calling stack to do.

For extra points, you can allow the application to specify the allocators and error handlers your library will use. This is a good approach for ultra-flexible, customize-me-to-the-death libraries like SQLite.

If you’re writing an application, you have more choices. I’ll be bold and say that if your application needs to be so reliable that it must recover from OOM in a graceful manner, you are probably a programmer too advanced to benefit from this article. Anyway, recovery techniques are out of scope here.

Otherwise, IMHO the abort policy is the best approach. Wrap your allocation functions with some wrapper that aborts on OOM – this will save you a lot of error checking code in your main logic. The wrapper does more: it provides a viable path to scale up in the future, if required. Perhaps when your application grows more complex you’ll want some kind of gentle recovery like Git does – if all the allocations in your application go through a wrapper, the change will be very easy to implement.

http://eli.thegreenplace.net/wp-content/uploads/hline.jpg
[1]

The documentation of g_error states:

A convenience function/macro to log an error message. Error messages are always fatal, resulting in a call to abort() to terminate the application. This function will result in a core dump; don’t use it for errors you expect. Using this function indicates a bug in your program, i.e. an assertion failure.

[2] Embedded in the sense that it can be embedded into other applications. Just link to the 500K DLL and use the convenient and powerful API – and you have a fast and robust database engine in your application.
[3] Here’s the size-checking function from the same file:
static int sqlite3MemSize(void *pPrior){
  sqlite3_int64 *p;
  if( pPrior==0 ) return 0;
  p = (sqlite3_int64*)pPrior;
  p--;
  return (int)p[0];
}
[4] I’ve reformatted it to fit on the blog page without horizontal scrolling.

Related posts:

  1. Using goto for error handling in C Introduction We would all like to write nice and…

what is pod weaver? (pt. 1: secret origins)

One or two people who write Pod regularly said, “Yeah, I saw you blogging about
that Pod thing. I had no idea what you were talking about.” A few other
people said, “neat, but how do I use it?” Its documentation is getting better,
but here’s a crash…

Ted Leung: The LumaLoop

Back in September, my friend James Duncan Davidson stopped to visit me and the family here on Bainbridge Island. Duncan has been working on a new design for a camera strap, and during that visit he showed me one of the prototypes of the LumaLoop. I spent a good portion of our time playing with the strap, and was quite taken with the design. Needless to say, I didn’t really want to give it back to him when it was time for him to go.

The following week at DjangoCon, I lost the strap portion of my Upstrap quick release strap. I liked the Upstrap, but it wasn’t ideal. The Upstrap was great because of the non stick rubber pad that they use – it really won’t move. But like most other camera straps, I found that I was constantly getting it fouled in my arms or something, especially between landscape and portrait modes.

Duncan had promised me one of the early prototypes of the LumaLoop, so I put the official black and neon yellow strap on the D3 and waited patiently. Yesterday, my LumaLoop arrived, and I quickly installed it in place of the Nikon strap. The LumaLoop is a “sling strap” similar to the Black Rapid R-Straps that have become popular recently. The Black Rapid straps screw into the tripod socket on your camera, which is a problem if you have any kind of heavy duty tripod plate mounted on your camera, or if you shoot vertically a lot (this is even more of a problem if you have small hands and a camera with a battery grip). The LumaLoop attaches to one of the regular strap mounts on your camera, and once attached, you can slide the camera up and down the strap. The mounting loop is attached with a quick release clip, so swapping cameras/straps is easy as well. Duncan has a series of blog posts that detail the reasoning behind the design:

Here’s a quick snapshot of mine:

My Luma Labs LumaLoop camera strap

You can see the loop part that goes on the camera, as well as the quick release between the loop and the rest of the strap. It’s a bit harder see the padded non-slip shoulder pad.

The LumaLoop is going to be available from Luma Labs sometime very soon (Duncan gave me perimission to talk about the LumaLoop in advance of its general availability). You can follow Luma Labs on Twitter to keep up with all of the news and the official announcement. I’m excited to have a strap that both holds my camera securely and stays out of my way when the action gets going.

(Unmoderated) manual notes are bad, mmkay? – Adam Harvey

I’ve had a couple of whinges on IRC lately about why I’m not thrilled with having user notes in their current form in the PHP manual; we get entirely too many questions in ##php from people who’ve copied code out of a note and are then annoyed when it turns out the code is wrong, broken, horrible, or all of the above.

I present this example from the DateTime::getTimestamp() manual page. It’ll be disappearing from the mirrors over the next few hours, because I’ve deleted it (and posted a much simpler note in its place), so here was its content, for posterity:

If you are using PHP < 5.3.0 you can use this function instead:

<?php
function DateTime_getTimestamp(&$dt) {
$dtz_original = $dt -> getTimezone();
$dtz_utc = new DateTimeZone("UTC");
$dt -> setTimezone($dtz_utc);
$year = intval($dt -> format("Y"));
$month = intval($dt -> format("n"));
$day = intval($dt -> format("j"));
$hour = intval($dt -> format("G"));
$minute = intval($dt -> format("i"));
$second = intval($dt -> format("s"));
$dt -> setTimezone($dtz_original);
return gmmktime($hour,$minute,$second,$month,$day,$year);
}
?>

It’s fair to say that’s an interesting approach. The normal way of doing it would be:

<?php $timestamp = $dt->format('U'); ?>

I don’t know what the answer is — moderation has its own problems to do with workload, as PEAR can attest — but a system that’s letting that go up as recommended practice (and stay up for a month) has to be looked at.

Q&A: Answering Some Questions About Object-Oriented Programming – Brandon Savage

Last week I wrote about five tips to improve object-oriented code. This generated a number of important questions, which I will attempt to answer for those who asked them.

“Often times when a developer gives each object only one responsibility, they tightly couple objects together.” Can you explain?
There are two major pitfalls in object-oriented programming: trying to do too much with an object, and trying to couple a number of objects too closely together. For this example, we’ll use the engine metaphor.

<?php

class Engine {
protected $crankshaft;
protected $pistons;
protected $radiator;

public function __construct()
{
$this->crankshaft = new Crankshaft($this);
$this->pistons[] = new Piston($this);
$this->pistons[] = new Piston($this);
$this->pistons[] = new Piston($this);
$this->pistons[] = new Piston($this);
$this->radiator = new Radiator($this);
}
}

In the above example, the Engine has one job: to direct the function of the crankshaft, pistons and radiator in order to move an automobile along. But we have a couple of fatal flaws. The first fatal flaw is that we pass the Crankshaft, Pistons and Radiator a copy of the Engine; while this might make development easy at some point (because the Radiator can send messages to the Engine) it is a poor design. It’s a poor design because the Engine controls the components, not the other way around. Additionally, we are instantiating the Crankshaft, Pistons and Radiator inside our constructor method, which means that the same Crankshaft, Piston and Radiator classes be available wherever we go. Let’s decouple this and fix it.

<?php

class Engine {
protected $crankshaft;
protected $pistons;
protected $radiator;

public function __construct(CrankshaftI $crankshaft, array $pistons, RadiatorI $radiator)
{
$this->crankshaft = $crankshaft;
foreach($pistons as $piston) {
if(!($piston instanceof PistonI) {
throw new EngineException('Improper piston type used');
}
$this->pistons[] = $piston;
}
$this->radiator = $radiator
}
}

How is this an improvement? First, we make use of dependency injection in our constructor: rather than automatically creating new objects, we expect that the objects have already been created and are being “installed” (injected) into our Engine. Second, we require that the objects utilize a certain interface. Now, some might argue that this is still tight coupling but I disagree: while you must have the interface available to you, interfaces do not define function. They simply define the methods that are available and must have been defined.

In this case, you can pull these interfaces and the Engine class out and build the library elsewhere, so long as you include the various methods of CrankshaftI, PistonI and RadiatorI. This is acceptable because every Piston will have certain methods (fire, upSwing, downSwing, injectFuel) but the functionality can be different depending on the type of piston you want.

This also gives every object its own job. The Piston is only responsible for doing Piston-related tasks; the Engine is responsible for controlling the Piston, but not executing its specific job functions. The Piston doesn’t have to be smart about the Engine or know about the Radiator; it only needs to know about its job. And so, each object has only one clearly defined role.

I do not know about dependency injection – do you have any links that do not require subscription?
I’ll talk about Dependency Injection here. It’s actually a really simple topic.

Since PHP 5, when you pass an object into a function, class, or by value, what you’re actually doing is passing it in a reference-like state. I say “reference-like” because it behaves somewhat like a reference but somewhat not (to learn more read this). When you act on the object, you change the object globally – that is, all instances of the object are changed. This is because when you pass an object by value you’re not copying the object; you’re simply passing the internal PHP value tha

Truncated by Planet PHP, read more at the original (another 3230 bytes)

« go backkeep looking »

Warning: include(/home/remarkwit/enterpriselamp.org/wp-content/themes/Enterprise_LAMP/r_sidebar.php) [function.include]: failed to open stream: No such file or directory in /home/remarkwit/enterpriselamp.org/wp-content/themes/Enterprise_LAMP/archive.php on line 23

Warning: include() [function.include]: Failed opening '/home/remarkwit/enterpriselamp.org/wp-content/themes/Enterprise_LAMP/r_sidebar.php' for inclusion (include_path='.:/usr/local/lib/php:/usr/local/php5/lib/pear') in /home/remarkwit/enterpriselamp.org/wp-content/themes/Enterprise_LAMP/archive.php on line 23