Evolving CA Synchronization - The End (for now)

This post is long overdue.

I finished this project (and with it my MSc in Software Engineering at CIT) last September (2010). I am attaching the final thesis and appendix with the code (also available in a github repository).

The project went well from an academic standpoint (got a 90/100) but since I only had 6 months in total I had to focus on getting 'results' in terms of algorithmic efficiency (mainly performance improvements on the original EvCA for both majority classification and global synchronization). On the other end I was only able to get a glimpse at more interesting topics, such as the identification of a pattern in famous Majority Classification rules that could lead to shrinking the search space (as discussed in previous posts).

I would advise anyone interested in the project to have a look at the Future Research section of the thesis for a summary of interesting topics that I could not cover and the Conclusions for a high level view of the project and a summary of what has been achieved. 

As a conclusive thought, it would be awesome if someone could take the famous majority classification rules I collected and do some pattern extraction on them (there's a bunch here plus another one here). In that case please let me know if anything comes up! 

Either way, I am hoping to be able to get back to this research sooner rather than later, but I am not sure when this will be possible since I am starting work on a new topic (will start a blog soon on that and will probably advertise here).

Click here to download:
EvolvingCASynchronization.pdf (471 KB)
(download)

Click here to download:
AppendixA.pdf (70 KB)
(download)

Best known CA rule for Majority Classification

I managed to miss it in a previous post with the best rules for the MC task, this is the best known rule for CA density classification (Wolz, de Oliveira - 2008) - fitness 88.9% calculated over 10^4 ICs.

Here's the rule (in Wolfram notation):

11111101 11111100 11001100 11110000 11111110 00100000 10000100 11101000 
11111101 01110011 00000000 11110000 00111110 11100000 10000110 00101000

Mind blowing - the high level (or low pass) strategy seems to be very similar to rules with lower fitness (even GKL) but there seems to be almost a higher 'fractal dimension', as if (speculation) the same strategy was being applied over and over at different scales.

You can see the rule in all its majesty in the sample runs below.

(download)

Evolving Cellular Automata Synchronization - Git Repository With Matlab code

I created an open git repository called EvolvingCASynchronization with all the Matlab code written so far. 

You will find it's not a lot of code, as I am leveraging the Matlab GA toolkit, so the only things I had to write were fitness functions and functions to generate initial population with uniform density distribution. Fitness functions for global synchronization and majority classification can be found in their own folders (see list below), while the function to generate the initial population plus a number of helper functions reside in the Common folder. I also left a few logs in just give an idea of what the output of the fitness function is.

Brief list of contents by folders:
  • Common: a bunch of helper functions and classes, including a function to generate initial population for the GA
  • MajorityClassification: fitness function for majority classification + ad-hoc code to visualize/evaluate rules
  • GlobalSynchronization: same as majority classification but for the global synchronization task
The code is inherently ugly and not particularly optimized (as kind people pointed out before)  and this mostly coming from my lack of familiarity with mat-lab (and also from my lack of time to go back and make everything nice once it worked), so any advice is welcome (I doubt I'll change anything for a while but you can always fork it!).

Hopefully this will be helpful to someone.

P.S. 
It occurred to me someone may just want play with CA in Matlab, in that case I'll spare you going through all the code, here is sample code to run a rule of your choice for a 1D radius 1 CA (running Wolfram rule 90):

Effective Complexity and Compressibility of Majority Classification rules in the literature

In the previous post I listed a group of well known rules for majority classification, and put forward the thought that finding a pattern might lead to enormously shrinking the search space for the problem of evolving majority classification rules. 

 
I did not find a pattern so far - it would be interesting to spend time on it but also it kind of clashes with the scope of this project - but I went through the exercise of compressing the rules (thanks to Tom for the idea and to Tarelli for helping with a handy compression tool) and after that I tried to calculate their "effective complexity", with interesting results.
 
Effective Complexity was calculated applying a loose definition of Effective Complexity by Murray Gell-Mann, basically "the length of the description of the regularities" in a given string is used as a measure of complexity. I described "regularities" for the given rules in dry natural language, trying to be consistent in terms of formulation (using same conventions, abbreviations etc). Keeping in mind my descriptions of the rules are arguable and could be easily challenged (this is particularly valid for the last one, where I probably used some imagination, but the rule came out of a coevolution run and the structure seems a bit scrambled), results show that EC (just the length of the description itself) for the given set of rules increases with fitness:
 
GKL (Fitness: 81.6% - EC: 82)
 
2nd half is the same as the 1st except for 2nd and 6th blocks, which are inverted.
 
11111010 11111111 11111010 00000000 11111010 11111111 11111010 00000000
11111010 00000000 11111010 00000000 11111010 00000000 11111010 00000000
 
Davis (Fitness: 81.8% - EC: 118)
 
2nd half is the same as the 1st except for 2nd and 6th blocks. 2nd block has 2 inverted digits. 6th block is inverted. 
 
11111000 11111111 11111000 00000000 11111010 00111111 111101000 0000000
11111000 11110011 11111000 00000000 11111010 11000000 111101000 0000000
 
Das (Fitness: 82.17% - EC: 132)
 
2nd half is the same as the 1st except for 3rd and 8th blocks. 3rd block has 3 inverted digits while 8th block has 1 inverted digit.
 
11111111 11110000 10001100 11110000 11111111 11100000 00000000 11110000
11111111 11110000 00000000 11110000 11111111 11100000 00000000 11100000
 
Andre, Bennet, Koza (Fitness: 82.3% - EC: 149)
 
2nd and 6th blocks are the same between the 2 halves. 1st half is self similar with 2 alternating patterns. 2nd half is self similar with 3 patterns.
 
11111111 10101010 11111111 10101010 11111111 10101010 11111111 10101010
10100000 10101010 00000000 10100000 10100000 10101010 00000000 10100000
 
Ferreira 1 (Fitness: 82.5% - EC: 257)
 
2nd, 4th, 6th and 8th blocks are the same. 1st and 3rd blocks half digits are the same, the rest of the digits are inverted. 5th and 7th blocks are inverted. 1st half is self-similar with 3 patterns. 2nd half shows composite self-similarity with 3 patterns. 
 
11111111 10101010 11111111 10001000 11111111 10101010 11111111 10001000
11110000 10101010 11110000 10001000 00000000 10101010 00000000 10001000
 
Ferreira 2 (Fitness: 82.6% - EC: --)
 
same as previous but shifted - we can assume the same level of effective complexity.
 
11101110 11111111 10101010 11111111 11101110 11110000 10101010 11110000
11101110 00000000 10101010 00000000 11101110 00000000 10101010 00000000
 
Juillé and Pollack 1 (Fitness: 85.1% - EC: 301)
 
1st blocks are the same. The rest of the blocks often exhibit similarity on half of the digits of each block. Each block also often exhibit similarity between half of the digits in the non matching half block. Half of the non matching half block is often inverted compared to the 1st half of the rule.
 
11101010 10011111 10111100 10001111 11101000 11111111 00101101 10100000
11101010 10011100 11110000 10001000 11101011 00001100 00101000 10000000
 
Juillé and Pollack 2 (Fitness: 85% - EC: --)
 
same as previous but shifted - we can assume the same level of effective complexity.
 
11111010 11110011 11001010 11110000 11111010 11111111 10001000 11101000
11111010 01110011 00001010 00000000 00111010 00001100 10001010 00101000
 
 
And here's the results for rule compression with LZW, showing that for small increase in fitness compressibility seems to drop quite quickly:

Majorityclassificationrules_co

 
In summary, looks like compressibility is decreasing with increasing fitness and Effective Complexity increases with fitness
 
Another interesting test would be to calculate the Algorithmic Information Content of the regularities in the same rules (again, out of scope for me now), defined as the shortest program that produces a given string with the same regularities. This is in fact a more rigorous definition for effective complexity, though still not precisely what the autor (Gell-Mann) shows in his paper. 
 
Even though it is proven that calculated values for both Effective Complexity and AIC are "unprovable" (you can never be sure you've got the shortest definition or the shortest program), nonetheless they appear as useful tools for coarse grained estimates of complexity.

Majority Classification Rules in the Literature: Looking for a pattern

This is a collection of famous cellular automata (N=149, k=2, r=3) majority classification rules (in Wolfram notation) from relevant literature. Fitness is calculated over 10^4 random (uniformly distributed) ICs. 

Have a good look at them (respective sample runs are shown in the gallery below), as here
 obviously lies some kind of a pattern to be discovered:
 
GKL (1978) - human design - 81.6% (Fig.1)

11111010 11111111 11111010 00000000 11111010 11111111 11111010 00000000
11111010 00000000 11111010 00000000 11111010 00000000 11111010 00000000

Davis (1995) 
- human design - 81.8% 
(Fig.2)
 
11111000 11111111 11111000 00000000 11111010 00111111 111101000 0000000
11111000 11110011 11111000 00000000 11111010 11000000 111101000 0000000
 

Das (1995) 
- human design -  82.178% 
(Fig.3)
 
11111111 11110000 10001100 11110000 11111111 11100000 00000000 11110000
11111111 11110000 00000000 11110000 11111111 11100000 00000000 11100000
 
 
Andre, Bennet, Koza (1999) - genetic programming - 82.326% 
(Fig.4)
 
11111111 10101010 11111111 10101010 11111111 10101010 11111111 10101010
10100000 10101010 00000000 10100000 10100000 10101010 00000000 10100000

Ferreira (2001) - gene expression programming 1 - 82.5% 
(Fig.5)
 
11111111 10101010 11111111 10001000 11111111 10101010 11111111 10001000
11110000 10101010 11110000 10001000 00000000 10101010 00000000 10001000

Ferreira (2001) 
- gene expression programming 2 - 82.6% 
(Fig.6)
 
11101110 11111111 10101010 11111111 11101110 11110000 10101010 11110000
11101110 00000000 10101010 00000000 11101110 00000000 10101010 00000000

Juillé and Pollack (1998) - coevolution 1 - 85.1% 
(Fig.7)

11101010 10011111 10111100 10001111 11101000 11111111 00101101 10100000
11101010 10011100 11110000 10001000 11101011 00001100 00101000 10000000

Juillé and Pollack (1998) - coevolution 2 - 86.0% 
(Fig.8)

11111010 11110011 11001010 11110000 11111010 11111111 10001000 11101000
11111010 01110011 00001010 00000000 00111010 00001100 10001010 00101000
(download)
When 
found, a
 pattern would allow us to impose constraints on rule evolution that would greatly shrink the huge search space for this problem (on a CA with radius 3 we have a neighborhood of 7 cells, which means 2^7 = 128 bits rules, which means 2^128 = 3.40282367 × 10^38  possible combinations, far too many for a systematic thorough exploration). 

 

The issue of broken symmetry in evolving majority classification rules

The issue of symmetry breaking during evolution for the majority classification task has been discussed at length in the EvCA papers (see the 1994 one in particular as reference). 

It is intuitive to understand why symmetry is necessary to achieve high fitness rules - if a rule is indeed expected to correctly classify a majority of white cells in the same way as a majority of black cells, a spatial symmetry in the rule is to be expected. At the same time, less intuitive but equally important, if a rule is to perform well on both low and high densities of black (or white) cells, again symmetry in the rule configuration is certainly to be expected. 
 
But why is symmetry broken? Simmetry breaking, with the creation of specialist rules for low or high densities by selection pressures is a powerful attractor for the genetic algorithm. It is very easy for the algorithm to increase fitness by creating low/high density specialists, making it quite hard for the evolution to escape such deep basins of local minima where the population quickly gets absorbed between relatively high-fitness density asymmetric rules - such as the block expanding ones around 66% fitness as reported in a previous post. These rules represent in fact a significant improvement over the 50% fitness of the initial generations that comes out of random (high-density) configurations, representing a tricky (to say the least) obstacle for long term improvement of fitness. 
 
The considerations above are thought to be applicable to biological evolution - where symmetry breaking is not difficult to find, with highly specialized organisms thriving in all sorts of niches. The short term gain will allow gateway events and evolutionary jumps that will provide huge benefit in the short term, preventing further jumps in the long terms. In a sense we could look at local minima basins as ecological niches. Evolution will fill the niche and the only chance to get out of it is by building on top of past "mistakes" (mutation must have a big role in this).
 
If it is true that genetic algorithms are known to escape local minima easy enough compared to greedy search algorithms, in our scenario this does not happen because the local minima represents too deep a basin for evolution to be able to randomly walk out of it easily, so the genetic algorithm behaves "as designed" and it is up to ourselves to understand inherent limitations of the search space and try to put constraints on the schema in order to overcome asymmetric dead ends.
 
This problem could for example be overcome by penalizing asymmetric rules or by building symmetry into the representation, which is probably not as trivial as it sounds. Let's just have a look at the well known GKL rule (81.6% fitness, shown in Wolfram notation) to see what kind of symmetry it exhibits:
Gkl_simmetry
The GKL rule certainly exhibits some symmetry - but definitely not a parochial one. This symmetry is usually defined as composite, since it is neither simple nor specular. So it is easy to see from this simple example that devising a way of enforcing symmetry on the transformation rules is not a trivial effort, probably not particularly difficult but - again - not trivial and worthy of a dedicated study (for all I know someone could have already investigated it but I could not find reference to any work on this particular subject). I regret not having time to explore this fascinating area (it would definitely be a nice detour), since my project is due next summer. 

EvCA majority classification and first tweaks

I recently managed to run the first batch of experiments reproducing the EvCA setup for majority classification using Matlab GA Toolkit (which I have to say is pretty good). 

I ran some tests on an Amazon EC2 heavy computation unit (8 CPUS, 2.4Ghz each) which got the job done but turned out being too expensive (it's 0.6€ per hour taking ~ 2.5 days for each run, do the math), so I am now running the experiments on my personal machine (iMac QuadCore i7 2,8 Ghz - runs 8 matlab labs with hyper-threading, which is a tad slower but not too bad).
 
From the fitness curve below from one of the runs (fig.2) you can see that the fitness ends up oscillating between 0.7 and 0.8 (calculated over 100 ICs). If ran over 10^4 uniformly distributed ICs - I like to call this measurement deep fitness - the fitness obtained for one of the best performing rules in the first few runs is around 66%, lousy but in line with the simple block expansion rules discovered by the EvCA group reported as 65.2% in fitness (see the 1995 paper The Evolution of Emergent Computation for reference, which also reports best fitness of 76.9% for one of the particle-based rules discovered over different GA runs, which my few runs so far did not discover). I am showing below the fitness curve (fig.2, best fitness and mean on the top refer to the last generation) and one of the simple expanding blocks rules discovered (fig.1):
(download)
After the standard runs, I started tweaking the fitness function for speed in the most obvious way - by cutting down the number of time steps for fitness evaluation. The standard time steps are 320 (in the original setup a Poisson distribution centered at 320 is used, but I am simply fixing it to 320), I thought I'd try to make an early decision adding as a constraint that after half time steps the classification should be at least 5% under way in the correct direction, so for example if after 160 t steps a rule is classifying the IC in the wrong direction I am assuming it won't converge, and going to the next IC (again, a single rule is ran over 100 ICs to evaluate fitness while running the algorithm).
 
This led to no particularly good results, 100 generations are overall evolved faster but we get basically to the same range of fitness much slower. I am obviously forcing the potentially good but slow rules to die off in the beginning, favoring the fast but no so good ones and compromising diversity, and if that's not enough the problem is known to be inherently "difficult" to solve in a short amount of time steps (the GKL rule takes around 90 steps to correctly classify 81.5% of the times). Nonetheless, an interesting behavior can be observed, have a look at the fitness curve shown in figure 3 above.
 
What I find interesting is the accumulation of "potential" that can be observed by the gradual improvement in the mean of fitness values. This accumulation pairs up with the slow erosion (or drift) of the population by mutation (rate is 1.56% of bit changed per offspring) which doesn't alter the expression of the best performing phenotypes but prepares for a "gateway event" resulting in cascading sudden improvements, which can be observed as a very steep variation on the fitness curve, representing the avid descent of the population in the newly discovered fitness basin. Would increasing the mutation rate cause the gateway event to be anticipated? Worth a shot - but what makes this behavior fascinating to me regardless is that it is often observed in biological evolution of complex adaptive systems.

 

Consolidation on GKL Fitness

I recently consolidated the work on the majority classification fitness function and tested it on the famous GKL rule over 10^4 initial configurations. I already talked about this, but the previous test was less controlled and the fitness function I am using has changed quite a bit since then.

I got 81.56% convergence for the best run, in line with figures given by the 1996 Andre, Bennet III, Koza paper (Discovery by Genetic Programming of a Cellular Automata Rule that is Better than any Known Rule for the Majority Classification Problem), which indicates 81.6% convergence over 10^6 ICs, so the numbers are giving me a bit of confidence.
 
Every time GKL misclassifies an initial configuration my heart skips a beat. Truth is that for a density around 50% of the lattice size this rule is no better than the toss of a coin (the interval of misclassification is supposed to shrink for lattice size --> infinity but this has never been proved). This one for example with 74 black cells out of 149 should have converged to all white but "decided" to go black instead:
Gklmisclassification
It's interesting to note that GKL will converge in around 90 time steps on average, correct or not. 
 
I will be using the current version of the fitness function as input for the Matlab GA Toolkit, which I have to say is quite nice. Almost everything is customizable and it provides 90% of the scaffolding required for GA investigations out-of-the-box. I also implemented a function to generate the initial population to guarantee maximum diversity, which can be provided to the GA Toolkit as well. The first few test runs were promising - time permitting, I will probably report here in some detail about the integration with matlab GA Toolkit.

What randomness looks like

I implemented my fitness function for majority classification (which still needs to be optimized, will probably cover soon) and successfully integrated it with the Matlab GA Toolkit (part of the Global Optmization Toolkit) but cannot run full evolutions at the moment as I am being slowed down by some technical problems with my machine, which is currently crippled (thanks Dell). 

Out of boredom I just ran a random CA initial configuration with a random rule. I believe this is a good picture of what randomness looks like (and don't tell me that 1% of this is actually the afterglow radiation from the big bang):
Randomness

It's also easy to see - in front of such a scenario - how a single change in the initial configuration could lead to completely different results, even if the system spatial configuration evolves in time according to a simple fixed rule (which is in this case a random one). If this stands valid on such a small 1-D problem, just imagine on a real-world one. This is a core concept in the study of complex systems, by definition highly sensible to initial conditions.

Hadn't really thought of it in these terms before - I start to understand the nature of what fascinated Stephen Wolfram. 

Also, this would make for a beautiful gigantic poster!

GKL rule performance fitness

In a previous post I explored a demonstration of the GKL rule for majority classification. Remind that the GKL rule is the best known rule of human design for the majority classification task, but just how good is this rule?

To answer this question, in this post I shall introduce the concept of performance fitness, as used by the EvCA group in their experiments, and I will then show my results in terms of performance fitness for the GKL rule.

The concept of performance fitness is a very simple one - a given number of random CA Initial Configurations (ICs) are evaluated for a given rule. The rule is designed to achieve a given final configuration. Performance fitness is defined as the ratio of successful runs (the CA reaches the desired state within a given number of steps) divided by the number of total runs. Another common way of evaluating fitness for CA rules is proportional fitness, which we are not interested in (you can intuitively guess from the name what's the difference).

Back to our point, I was curious about the actual performance fitness of the GKL rule so I put together a loop that would run the GKL rule over a number of random configuration and calculate the ratio between successful runs (majority classification is performed correctly) and total number of runs. You can have a look at the entire matlab code here. You can see how I am using a standard lattice size of 149 (odd number chosen to avoid having the same number of 0s and 1s in the ICs) and 100 ICs to test the rule, half with more 0s and the other half with more 1s (that stuff is taken care of in a dedicated function). ICs are randomly generated as uniform distributions of [0,1] which means that a plot of the rules would be strongly peaked toward half 0s and half 1s. Everything setup, again, as per EvCA experiments. 

These are my results for 10 runs (very low):

run 1 - fitness: 0.82
run 1 - Elapsed time is 83.646629 seconds.
run 2 - fitness: 0.81
run 2 - Elapsed time is 160.758902 seconds.
run 3 - fitness: 0.84
run 3 - Elapsed time is 149.284595 seconds.
run 4 - fitness: 0.84
run 4 - Elapsed time is 84.475817 seconds.
run 5 - fitness: 0.85
run 5 - Elapsed time is 76.301021 seconds.
run 6 - fitness: 0.81
run 6 - Elapsed time is 76.448351 seconds.
run 7 - fitness: 0.84
run 7 - Elapsed time is 85.560581 seconds.
run 8 - fitness: 0.88
run 8 - Elapsed time is 80.356827 seconds.
run 9 - fitness: 0.84
run 9 - Elapsed time is 79.166180 seconds.
run 10 - fitness: 0.82
run 10 -Elapsed time is 85.654584 seconds.

avg fitness: 0.835
Elapsed time avg: 96.16 seconds

After the 10 runs above, I tried to run the same GKL rule over 10^4 ICs in one single batch and got - again - similar results (around 0.815). These results are in line with the results given by this EvCA paper (Evolving cellular automata to perform computations: Mechanisms and impediments - 1994) which states that the fitness of the GKL rule should be around 0.972 when averaged on 10^4 random uniform ICs, organized in 19 equally spaced concentration bins (which means that in the bins with high 0/1 concentration the fitness is very high while in the ones where the ratio is close to one [half 0s -half 1s] fitness is significantly lower). Note that I am not using the concept of bins (most of which have very high fitness, exception made for the few bins with concentration of 1s very close to 50%) so my fitness is significantly lower (because calculated differently). 

Anyway - the results that I got from running GKL fitness, gave me a pretty good idea of how long it will take to run my experiments (an average of 1.5 mins for running a set of 100 ICs) and at this stage my code is very close to what I envision to be my fitness function for rule evolution - I will just have to modify it to make it more general so that it takes 128 bits (the rule itself - where CA radius = 3) as input and it spits out a scalar as output, representing the performance fitness for the given input rule. I will then plug-in this function into the matlab GA Toolkit and reproduce the EvCA experiments as they are, before starting to tweak my fitness function (I have a few ideas for this, focusing on using particular strategies to generate ICs rather than picking at random) in the hope of finding a better strategy for rule evolution.