PHP 5.3 e seu Garbage Collector

Written by Pedro Mendes · 3 min read >

O PHP tem(ou tinha) um pequeno problema para tratar referências circulares, principalmente entre objetos. No blog do Alex Netkachov, um ZCE desconhecido da vida que as vezes posta umas coisas legais, ele fez o seguinte benchmark :

class Node {
public $parentNode;
public $childNodes = array();
function Node() {
$this->nodeValue = str_repeat('0123456789', 128);
function createRelationship() {
$parent = new Node();
$child = new Node();
$parent->childNodes[] = $child;
$child->parentNode = $parent;
echo 'Initial: ' . number_format(memory_get_usage(), 0, '.', ',') .
" bytesn";
for($i = 0; $i < 10000; $i++) {
echo 'Peak: ' . number_format(memory_get_peak_usage(), 0, '.', ',') .
" bytesn";
echo 'End: ' . number_format(memory_get_usage(), 0, '.', ',') .
" bytesn";

E o output disso foi:

Initial: 62,224 bytes
Peak: 34,905,216 bytes
End: 34,905,016 bytes

Muito interessante o resultado não ? 34mb para 10.000 requisições. Segundo o próprio Alex, isso seria o equivalente a “Modern systems with 1-2Gb can run 30-60 similar processes at the same time. But in shared hosting environment or when support of 100 simultaneous connections is a system requirement, such memory consumption is inadequate.”

Pois bem, um cientista da computação de algum lugar do mundo chamado David Wang escreveu um patch para o garbage collector do PHP sanando este problema com as referências circulares. Ele mesmo confessa que o PHP usa “a simple garbage collection system”, precisando de alguns improvements. Ele publicou um resumo do de suas idéias relativas a este patch e ao problema em si no, mas eu trago abaixo uma cópia da mensagem dele:

Hello, everyone. My name is David Wang and I am one of the students
participating in Google Summer of Code this year. As you may remember,
my project is to implement a garbage collector for circular references
in PHP. As the midterm for Summer of Code is coming up, my mentor,
Derick Rethans, thought it would be a good idea if I shared my
progress with the community.

As you know, PHP uses reference counting as a simple garbage
collection system. One of the primary weaknesses of reference counting
systems is that objects that refer indirectly or directly to
themselves, i.e. reference cycles, are not collected. The accumulation
of unreferenced memory that is only deallocated at the end of a
request can be prohibitively expensive. The cycle collector I am
implementing directly addresses this problem.

The cycle collector is, of necessity, a type of tracing garbage
collector. However, it uses reference count information to accelerate
the collection process. Essentially, it works by removing the internal
reference counts (that is, references from an object within the cycle
to another object within the cycles )from candidate cycles. In a cycle
that can be collected, after all internal reference counts have been
removed, the reference count of all objects within the cycle should be
0. The particular algorithm that I am using is the synchronous cycle
collector described by David Bacon and V. T. Rajan in Concurrent Cycle
Collection in Reference Counted Systems (see the article for further
details). Various optimizations allow the cycle collector to be run
relatively infrequently and execute speedily when it does run.

Note that because PHP is designed to be single-threaded, a synchronous
algorithm is used which does pause program execution for cycle
collection when cycle collection becomes necessary. If a program has
no reference cycles, the cycle collector does not run at all.

The initial phase of implementation is complete, and I am currently
profiling, optimizing and trying the modified version of PHP on
various test programs in an effort to find bugs. Meanwhile, here are
some recent benchmarks.

Some aspects of the eZ Components (,
available from SVN with the instructions here:
testing suite use reference cycles heavily. The Template test uses
circular references the most frequently, the Graph test also uses
circular references, albeit less heavily.

On the Graph test, maximum memory usage with unmodified PHP was 133.9
MB with an execution time of 8 seconds.
On the Graph test, maximum memory usage with gc was 51.6 MB with an
execution time of 9 seconds.
On the Template test, maxmium memory usage with unmodified PHP was 1.5
GB with an execution time of 30 seconds.
On the Template test, maxmium memory usage with gc was 67.3 MB with an
execution time of 1 minute.
On the whole suite of tests (which includes the Graph and Template
tests), execution time with unmodified PHP was 12:03. With cycle
collection, it was 12:43.
These tests were conducted on my dual core AMD X2 4400+ desktop with
./configure --with-gd --with-jpeg-dir --with-zlib. As you can see,
there is the classic time vs. memory trade-off.

My project is currently being hosted on the xdebug CVS. You can get
the latest version with the following commands:

cvs -d login
CVS password: srmread
cvs -d co circular
Note that I'm implementing my project on top of a CVS version of PHP
5.2 that is a couple of weeks old. However, the cycle collector can be
ported into other versions of PHP fairly easily as it does not affect
the existing Zend engine code too much.

If anyone has any questions, I'll be more than happy to answer them!


Yiduo (David) Wang

Fico feliz por essa melhoria, pois é uma pequena alteração que faz muita diferença. Só espero que não tenha seus efeitos colaterais…

Magento em um ambiente distribuído

Pedro Mendes in PHP
  ·   4 min read

Factory Pattern e PHP

Pedro Mendes in PHP
  ·   9 sec read

Deixe um comentário

O seu endereço de e-mail não será publicado.