In the previous post I listed a group of well known rules for majority classification, and put forward the thought that finding a pattern might lead to enormously shrinking the search space for the problem of evolving majority classification rules.
I did not find a pattern so far - it would be interesting to spend time on it but also it kind of clashes with the scope of this project - but I went through the exercise of compressing the rules (thanks to
Tom for the idea and to
Tarelli for helping with a handy compression tool) and after that I tried to calculate their "effective complexity", with interesting results.
Effective Complexity was calculated applying a loose definition of
Effective Complexity by Murray Gell-Mann, basically "the length of the description of the regularities" in a given string is used as a measure of complexity. I described "regularities" for the given rules in dry natural language, trying to be consistent in terms of formulation (using same conventions, abbreviations etc). Keeping in mind my descriptions of the rules are arguable and could be easily challenged (this is particularly valid for the last one, where I probably used some imagination, but the rule came out of a coevolution run and the structure seems a bit scrambled), results show that EC (just the length of the description itself) for the given set of rules increases with fitness:
GKL (Fitness: 81.6% - EC: 82)
2nd half is the same as the 1st except for 2nd and 6th blocks, which are inverted.
11111010 11111111 11111010 00000000 11111010 11111111 11111010 00000000
11111010 00000000 11111010 00000000 11111010 00000000 11111010 00000000
Davis (Fitness: 81.8% - EC: 118)
2nd half is the same as the 1st except for 2nd and 6th blocks. 2nd block has 2 inverted digits. 6th block is inverted.
11111000 11111111 11111000 00000000 11111010 00111111 111101000 0000000
11111000 11110011 11111000 00000000 11111010 11000000 111101000 0000000
Das (Fitness: 82.17% - EC: 132)
2nd half is the same as the 1st except for 3rd and 8th blocks. 3rd block has 3 inverted digits while 8th block has 1 inverted digit.
11111111 11110000 10001100 11110000 11111111 11100000 00000000 11110000
11111111 11110000 00000000 11110000 11111111 11100000 00000000 11100000
Andre, Bennet, Koza (Fitness: 82.3% - EC: 149)
2nd and 6th blocks are the same between the 2 halves. 1st half is self similar with 2 alternating patterns. 2nd half is self similar with 3 patterns.
11111111 10101010 11111111 10101010 11111111 10101010 11111111 10101010
10100000 10101010 00000000 10100000 10100000 10101010 00000000 10100000
Ferreira 1 (Fitness: 82.5% - EC: 257)
2nd, 4th, 6th and 8th blocks are the same. 1st and 3rd blocks half digits are the same, the rest of the digits are inverted. 5th and 7th blocks are inverted. 1st half is self-similar with 3 patterns. 2nd half shows composite self-similarity with 3 patterns.
11111111 10101010 11111111 10001000 11111111 10101010 11111111 10001000
11110000 10101010 11110000 10001000 00000000 10101010 00000000 10001000
Ferreira 2 (Fitness: 82.6% - EC: --)
same as previous but shifted - we can assume the same level of effective complexity.
11101110 11111111 10101010 11111111 11101110 11110000 10101010 11110000
11101110 00000000 10101010 00000000 11101110 00000000 10101010 00000000
Juillé and Pollack 1 (Fitness: 85.1% - EC: 301)
1st blocks are the same. The rest of the blocks often exhibit similarity on half of the digits of each block. Each block also often exhibit similarity between half of the digits in the non matching half block. Half of the non matching half block is often inverted compared to the 1st half of the rule.
11101010 10011111 10111100 10001111 11101000 11111111 00101101 10100000
11101010 10011100 11110000 10001000 11101011 00001100 00101000 10000000
Juillé and Pollack 2 (Fitness: 85% - EC: --)
same as previous but shifted - we can assume the same level of effective complexity.
11111010 11110011 11001010 11110000 11111010 11111111 10001000 11101000
11111010 01110011 00001010 00000000 00111010 00001100 10001010 00101000
And here's the results for rule compression with LZW, showing that for small increase in fitness compressibility seems to drop quite quickly:
In summary, looks like compressibility is decreasing with increasing fitness and Effective Complexity increases with fitness.
Another interesting test would be to calculate the
Algorithmic Information Content of the regularities in the same rules (again, out of scope for me now), defined as the shortest program that produces a given string with the same regularities. This is in fact a more rigorous definition for effective complexity, though still not precisely what the autor (Gell-Mann) shows in his paper.
Even though it is proven that calculated values for both Effective Complexity and AIC are "unprovable" (you can never be sure you've got the shortest definition or the shortest program), nonetheless they appear as useful tools for coarse grained estimates of complexity.