Thinking not computation
The organism is blocked and we must find the way around..
I’ve been working for some months on a small AI project to try and create a network which will move autonomously in a simple environment, not because we’ve told it to move or behave but because it’s internal state and senses cause it to act. It is designed like a small organism is designed, with sensors and a need for energy which it gets from virtual food. This food will replenish its energy and that will create a positive feedback internally, strengthening connections and improving behaviour in the future. This is the idea.
The biggest challenge has been making it all work from internal data only, not smuggling in knowledge about the world. It’s actually remarkably difficult not to smuggle things in because, as the person running the experiment, you know things and you also know what you want a successful experiment to look like. You are also, because you’re writing a computer program, perfectly capable of smuggling that information in a way that you wouldn’t be if you were experimenting with real organic subjects like C. elegans, for example.
Recently I succeeded in creating a network which would operate completely internally. It receives no knowledge from the world beyond its sensors and its sensors are very simple. They don’t smuggle in architect type knowledge of the world, they sense food and physical contact. They also have internal feedback which comes from the amount of energy the system has and the amount therefore that it needs, and this changes its behaviour. It’s quite successful but one thing I noticed on visual review of the organism in environments with food and barriers is that the behaviour is extremely mechanistic. It behaves like a computer. Every run is the same.
It’s quite interesting as a human being reviewing these things visually. It moves very obviously like a machine and when I did some ablation studies and follow-up experiments, I confirmed that its movements are computed; they’re mechanistic. I dug into it and of course it makes perfect sense. It is a computer; it is computing internally what it should do. That computation is complicated and it involves lots of steps and moves through a 300 node network but still it’s maths all the way down. If you start off in a virtual environment, which isn’t stochastic, then you will get exactly the same result because at each tick, when it does its computations, functions will be run, numbers will be computed, and those numbers will always be the same and so behaviour will always be the same, exactly the same.
It has an internal network which it uses, and connections are strengthened or weakened over time as it experiences the world. The internals look like this: `food evidence -> food sensor row -> internal relay path -> motor score surface -> movement -> contact/intake consequence -> connection strengthening`
This is a run for example where the organism moves towards food and consumes it.
Tick 0:
The organism senses food to the east. That appears internally as `food 0`, node `0`, activation `0.6456`. That node is active because the only food source is at body-relative bearing `0`.
That activates the selected path:
food sensor node 0
-> connection 0
-> internal relay node 28
-> internal relay node 33
-> connection 512
-> motor node 288Motor node `288` is `motor 0`, also east. It wins the motor surface with score `0.247477`, above the next option by `0.030191`, so the organism moves east by `0.041306`.
At this stage there is no learning write. The organism has evidence and acts, but there is no contact/intake consequence yet.
By Run 1 tick 8, the same path is stronger in effect because the food evidence is stronger as the organism gets closer:
food 0 activation: 0.924831
selected motor: 0
selected motor score: 0.346688
movement: east, 0.054480
contact: false
intake: falseStill no learning write, because it has not fed.
The key transition is Run 1 tick 9. The organism reaches local food contact:
food 0 activation: 0.960000
selected motor: 0
selected motor score: 0.359183
local contact: true
selected internal intake: true
actual intake: 0.080000Now the consequence arrives. Energy rises from `0.788368` to `0.883460`; need pressure falls from `0.909` to `0.830`. That is the organism’s internal “this action worked” signal.
That consequence strengthens three kinds of connections:
food-action/intake path:
connection 0, food 0 -> internal 0
weight 0.012000 -> 0.017800
body-state consequence path:
e.g. connection 384, body 0 -> internal 9
weight strengthened by 0.003269
motor-expression path:
connection 512, internal 5 -> motor 0
weight 0.010500 -> 0.013453The organism sees food, moves, then after intake the network strengthens the sensor/action/body/motor associations that preceded the useful consequence.
This is the organism’s “thinking”:
current sensory rows
+ body state
+ action state
+ learned network weights
-> motor score surface
-> selected action
-> body/world consequence
-> weight/action-state updateHowever there is a problem Houston.. the network is not really doing the work. Most of the decision making is really being done by the `motor score surface` This was created, it turns out naively to help the organism decide what to do, it’s not using external data it’s only calculating using internal data including the network but it is a computer none the less and it computes what to do on each tick and that is why the organism is both wholly internally motivated and also a deterministic machine:
world/body pose
-> raw food evidence rows
-> body state: need, energy, movement funding
-> current network snapshot
-> motor scoring function
-> selected motor bearing
-> movement magnitude
-> world applies movement
-> contact/intake consequence
-> learning writes update network weightsSo now the work goes back again to figure out how to make this decision-making work with only the network doing the “thinking”. That requires us to figure out how to let consequence credit the network path that produced the action when that action make have happened some time before and taken time to complete and may be complicated by dead ends and side-quests. We already have some consequence working but it’s time limited, so I’m confident I can get the network to make decisions now, but getting longer time consequence to work properly is not going to be easy I bet..
