Profiling ECA Rule 30 algorithms for O(n^y) where y < 2 (allegedly?)
I’ve made it my mission in life to create objectively the world’s least interesting headline. But I think (or sincerely hope) this post is worth the read. I talk a bit here about some new and interesting (to me at least) patterns I’ve found, in addition to summarising my profile testing of the ECA algorithm I’ve been developing.
In my previous post exploring algorithmic reduction in effort for Elementary Cellular Automata (ECA), I explored using a pregenerated lookup table to effectively skip rows when generating ECA output, so you could get from row 1 to 100 in 20 steps rather than 100. I also identified certain patterns which enabled it to operate, as well as put together some simple formulae to calculate the algorithmic complexity.
In that exercise I kind of got excited and submitted, then withdrew a paper when I realised (as some kind folk on reddit pointed out) that I hadn’t achieved the results I thought I did. I’ll just put that one on the backburner for now heh.
That said, I managed to achieve a fairly uniform factor improvement in theoretical complexity, from O(n²) to a simple fraction of it — still quadratic in nature but a practical improvement nonetheless.
Can one find the nth value with computational effort O(n^γ) for any γ<2 (I don’t know any method to achieve this)? — Stephen Wolfram
In my most recent tests it looks as though practically I’ve been able to reduce effort to O(2^n/5) (directly measured by computation time in this instance, rather than mathematically derived per my last post).
To tie a nice ribbon around this particular avenue, I wanted to run some more robust practical tests, and also explore the dynamic lookup table option to see if that would improve performance. I also wanted to add infilling so that we can generate a full string of values from the central column up to any specified row n, and test the rhombus or diamond methods across all of these.
In any case, I tested up until ~50,000 rows for various approaches. My PC is struggling to generate much more, I am definitely feeling the reality of quadratic time algorithms and I have restarted chrome more times in the past 24 hours than I have in my entire career as a developer — I think the program itself could be optimised a lot but the algorithm itself perhaps not so much?
(And I should switch my development environment to my local box — what can I say I was in the zone.)
Algorithms in the diagram
- What I labelled “standard” — Standard application of ECA Rule 30, giving roughly O(n²) complexity and commensurate run time
- “standard rhombus” — As above, however narrowing the environment during the second half of computation to reduce effort as we approach N
- “pregen x”(where x denote the pregeneration of a lookup table that allows x generations to be skipped at a time — e.g. pregen 5 lets us generate every 5th row — a cell’s neighbourhood is x*2+1
- “pregen x rhombus” — applying the rhombus optimisation to every x row for the second half of generation as we approach n
- “dynamic” — generating increasing radius lookup table dynamically as rows are generated, rather than pregenerating
- “dynamic rhombus” — applying the rhombus optimisation to the dynamic method
I spent a while tweaking the rate at which entries are generated dynamically with various exponents, wait cycles etc, and it seems as the limit for the dynamic approach is ~O(n²/5).
Basically if you generate too many lookup entries too quickly, the algorithm bogs down generating lookup tables; too slowly, and it falls behind the more naive algorithms. Maybe there’s more tweaks to be made, but I don’t think any of them will flip the graph and I’d rather step away from the code at this point and figure out the limits mathematically.
Anyway it appears after all this brute force testing as though the pregenerated lookup table where step size x=5 is the best option in all but small numbers of generations (at which point generating the lookup table is a disadvantage).
Also applying the “Rhombus” optimisation worked well on the basic Rule 30 implementation, but didn’t provide any tangible benefits to the dynamic or pregenerated lookup tables — perhaps as they’re already skipping many lookups in the in-between rows, it doesn’t make much of a difference.
Dynamic Algorithm Explained
Essentially we start our lookup table (or tree) with the basic 8 patterns of the given rule set (in this case rule 30), such as this one:
We keep track of which rule set is our largest active one, and progressively fill in the blanks for the next skip step size.
To generate every 2nd step, you’d need to fill a radius of 2 (or a pattern width of 5 cells) e.g.
Having filled out all entries for the next step, it becomes the current step, and the algorithm uses it to generate nth rows ahead. There’s 2^PatternWidth entries needed for each step.
One interesting optimisation
I’ve become intrigued on the nature of the relationships between these different step size patterns described above, as there appears to be a pattern to it.
Take a look at a parallel coordinates graph traversing higher order rules back to 0 and 1:
You can see that each level basically flips or inverts the last, which is interesting as per another previous post I felt that there was some funky dynamic between odd and even rows of data.
If I open this graph up in an image editor I can overlay generation 5 and 4 to highlight the effect:
While the number of relationships is “fanning out” exponentially in each generation, the overall pattern they follow is identical. You can play with it yourself here.
Also, back on that inversion between generations? Let’s take a closer look at the first half and second half of the rule mappings for step size 5, notably the terminal bits on the right hand side of each pattern.
Notice how the terminal bit on the right hand side (which would be the value generated by row n + step size) are the inverse of each other?
Then scan left, the next order of patterns differ only by the first bit being inverted, and so on (compare patterns side by side, say, Pattern 0 with pattern 1024).
1. Pattern 0 terminates in 0, Pattern 1024 terminates in 1
2. Next step up, Pattern 0 is 000, Pattern 1024 is 100
3. …and so on
Now that may seem obvious or coincidental, but take a look through the rest for yourself. I can post an excel file with the mappings if anyone is interested.
What this means in effect is that we can generate the second half of the pattern map from the first half simply by inverting the first bit, and mapping to its inverted counterpart. This works for any order (even the basic 8 patterns of rule 30 follow this pattern).
In itself it’s only a slight optimisation, and still only works in the downward direction (we still need to calculate the lookup table values) however I am very intrigued by this and the parallel coordinates graph above, because while the effect of rule 30 between rows seems chaotic — when unfolded like this it does follow a pattern. Can I generate other sub-regions in a similar hand-wavy way, similar to how I can brush off the second half of the table?
Whatever the “Twist” is, looks to be repeating itself all the way down the line, and maybe this way of looking at it might prove fruitful.
What’s in the code?
The current implementation (link below) heavily depends on string manipulation so could be greatly optimised through the use of bitsets and bitwise operations which I think would vastly speed things up, even though I imagine the performance difference between the different algorithms will remain proportionally the same.
That said there’s 3 main pieces to the code that can potentially be reused.
- class ElementaryAutomata — the trivial class that can be instantiated to generate any ECA rule and output from any given input.
- class LookupNode — just a very basic linked list-styled struct to help navigate between higher-order patterns and their descendants — note these entries are also flatly indexed in a map in the DynamicLookupTable class
- class DynamicLookupTable — wraps functionality around generating lookup tables either dynamically or pre-allocated. Instantiate it with your desired ECA rule number (0–255)
Below this is more of a bespoke one-off testing scenario, but you could take a look at the profiling code to see how the above classes are used — it should be fairly straightforward.
There are a bunch of toggles that turn different profile tests on and off, a list of sample sizes, then the tests themselves.
I only tested in Chrome! I have trimmed back the test sizes but if you’re feeling adventurous, uncomment the big “sampleSizes” array on line 301 (it took about an hour to process on my lappy — open your console to see progress if it looks like your machine has halted):
While I’m a relative newcomer to this type of research I do hope my meanderings are of use to someone out there who may just be getting started.
I think with some effort one could make or remake my implementation into something much, much faster — although I don’t think I’ll get much better than O(n²/5) algorithmic complexity (which is basically as good as O(n²) anyway).
That said a faster implementation would be useful to generate more output more quickly which would be useful to see any long-term trends either in performance or in the data, but I do have some other threads I would like to pluck at so I’d like to park it here for now.
If you’re still reading *raises coffee cup tiredly* I would love to hear your thoughts.