Distributed Peer-to-peer Viruses and Evolving an AI

October 31, 2008

This is an idea that I am having a hard time with. It is sprawling and vast, and I have real trouble fitting all parts of it into my thoughts at once. I know that may sound weird. It is like a broad landscape, and I must travel it to see all its pieces, and only when I am in one place can I see any real detail, and then only detail in that single place, while other regions lose focus with distance. So, in an effort to describe it, I will by requisite be long-winded. Allow me first a very brief introduction and then some background material.

A while ago I was discussing AI (that is Artificial Intelligence) with my friend and I was wondering what people like Turing would do if they had had access to the kind of computing power we have today. We base so much on his work, but how much of it could he truly express in the language of the time: how much did he see without ever expressing it? I wondered if those first AI researchers, given our technology, would have had a better or worse time in approaching AI. Anyway, this discussion evolved into ideas about AI, and suddenly I had an interesting thought. What if one were to create a distributed Neural Network throughout the internet, perhaps with peer-to-peer technology, and then allow it to "learn" over time: what would happen?

Anyway, I actually forgot that idea until last night, while I was watching something online about AI and complex systems when it suddenly reoccurred to me. But immediately I wondered: what if those individual nodes were not static, but were able to “evolve” by some selective process, changing over time?

Ok, first I want to mention a few ideas that are well-known, just to put your mind in the kind of frame I was in at the time this came to me.

First: the common DDOS (Distributed Denial of Service) attacks, where a person can call on hundreds or thousands of individual computers all infected by a particular virus, to simultaneously send packets upon packets of information to a single server, thus overloading that computer. These attacks are difficult to fight, because the packets are indistinguishable from “normal” traffic, and they originate from all over the network and not a single source. The individual viruses that are running on unsuspecting users’ computers simply log into a server and await command. Once activated by command, they direct a stream of packets wherever they are told to do so. A single, or even a dozen of these, would be ineffectual, but taken all together, they are quite powerful.

Second: many very complex systems are actually governed from the bottom up: that is the governance is distributed to individual and simple units, the components of the system. It is the simple behavior of those individual units that can create complex behavior in the system as a whole. Consider an ant colony, where all the individual ants go about their simple little tasks, governed by fairly basic rules, but somehow, as a whole, the entire colony is capable of almost miraculous tasks. Or consider the human brain: each neuron on its own simply transfers chemicals and electrical charge from one end to the other and then on to neighboring neurons, and yet somehow combined into a brain, the system is capable of all that we are as humans. If you were to run a program on all the individual computers all around the world, each with a very simple function (like passing information along through nodes in a peer-to-peer network), what would that system as a whole be capable of? I’m pretty sure that even having every single computer on the internet today connected in such a way, it would still not match the volume of interconnection or sheer mass of a human brain: but what could it do? Could it match a rat’s brain? Or a Monkey?

Third: how would the evolutionary process, or natural selection of mutable genes affect a computer virus? This is where my idea actually starts to become a little unwieldy for me.

If a program is designed with genetic components that can be passed along through cloning (or even “sexual” reproduction: i.e. the sharing of genetic material from two “parents”), how would they be mutable? How would generational and random mutation affect them? How do we assure that we don’t limit the program’s ability to change by our design of the progenitor? If we consider the evolution of life on Earth as a model, we might ask how much variability has been excised by the mere fact that any mutations that did not follow the simple head+body+limbs model from the Precambrian explosion did not survive the Cambrian extinction? What would life look like today if that underlying precept was absent? How might a program be designed so that it was not inherently limited in its ability to change?

I want to assume that the program will be built upon “genes” or gene-like arrangements, because if we simply allow the kind of translation errors that occur in DNA by random chance in the machine code of the program, 99.99% of all offspring would be born dead: that is, unable to function. So perhaps it would be better to abstract the mutable pieces of the program into some kind of genetic code. This is actually where I have a lot of trouble imagining a system that might govern this, but perhaps it is helpful to think of the kinds of things that biological creatures do: the most important one being to interact with its environment.

I want to give the program the ability to mutate the manner in which it interacts with its environment from the progenitor. So given that starting place, how would a program interact with its environment, or put more simply: in what kind of environment does a program exist?

The program should have the ability to “see” memory addresses, to detect program code, to detect “dead” programs (those that have been unloaded from memory but still exist in residue). It should be able to see connections, routes of movement (through networks and onto physical media (disks and RAM drives). It should be able to sense when its constituent parts are being altered or endangered (consumed, overwritten, etc.). It should also have the ability to move within this environment to avoid those dangers, or to hide, or to propagate. It should recognize its own “species”.

Now, the specific design of such a program isn’t something I have worked out. But these are some of the things I would like to see it able to do. And beside this, it should be mutable.

I should take a moment to return to what I discounted above: the strictly random variability that would arise by transcription error: the changing of .001% of the 0s and 1s when a clone or offspring is made. While, as I stated above, 99.99% of such offspring would be immediately non-viable, this would in effect create truly changeable programs. If there were a check of some kind made to the offspring upon conception, testing its ability to “run” or exist in memory, to be interpretable by a computer system, then the parent program could select only viable offspring for copying. I think the biggest drawback to this kind of replication would be the time constraint, the theoretical “millions of years” it would take for such random errors to “discover” adaptation. I will discuss selective forces below.

Now, biological evolution as a process took millions of years, and the 99.99% deleterious effects of genetic mutation didn’t survive the first transcription error. Only a small few positive changes allowed species to very slowly change over time, and this brought us to where we are today. I am not sure creating such a time-intensive model would be best for my program. Also, we have to determine the selective pressures that drive its change. Evolution only works via selection: that is, the better-suited variants survive and reproduce, while the ill-suited do not. So what selective pressures would exist in the wilderness of the distributed internet? And what pressures could we present in order to encourage our program to evolve?

There would of course be such things as survival: if the program is discovered and a user tries to shut it down, or to delete it, those versions of the program that can avoid this would survive better than others. Also, those versions of the program that could bypass firewalls to communicate with other members of its species, or to propagate, would survive better. I think “community” should also be a selecting factor (perhaps, in the model of biology, we necessitate sexual reproduction and competition for mates). A program’s ability to maintain itself within the peer-to-peer network, to be friendly and to benefit the overall system would determine its surviving. There might also be selective pressures based on the nature of the environment itself. If this program were loosed on the internet, the variability of user systems would mean that versions of the program that were better at spreading would survive, and inversely, programs that were not too aggressive (e.g. did not make a nuisance of themselves, or overwhelm the network) would not be “hunted” by anti-virus and IT personnel.

It might be interesting to see a program of this type evolve on its own, to discover its own ends the way biological systems have done, but I think we might want to create an inherent purpose to it. I want to stress that I don’t mean that it should be designed so that its evolutionary process has a goal: that is not how evolution works. Selective forces allow the best adapted version of a thing survive, and this creates the appearance of goals: but evolution does not say, “hey, let’s try to make a whale in the next two million years” and then work toward that. By purpose, I mean that there must be an ecology, something that the program does with its life. This is where the peer-to-peer network and the super-system comes in. That network could be designed to learn, perhaps as a kind of “racial memory” on which individual units could draw limited information. As a neural network, it would have the ability to create associations (which might help in self-directed mutation, having learned which alterations are ultimately dead ends or deleterious in the long run). It could also help maintain population, as the system would know where there have been extinctions or the discovery of new territory (if the program that started on Windows platforms learns how to infect Macintoshes or Linux machines). We could give it certain drives, such as the desire to survive, the desire to mate, and the desire for its offspring to survive (ala biology) as well as more “e” drives, like the ability to save itself when a system is powered off and then return to life when that system comes back online.

My original thought was to create an AI with the distributed neural network that all these individual components might create, but I wonder if perhaps that goal itself should not also be evolvable. Or perhaps we do not “feed” it any information, but simply let it soak up what it finds in its environment and develop on its own from there. It has been said that as human beings, coming from a world with only one sample of known intelligence, that we might not recognize another intelligence when we meet it. It would be fascinating to watch this system evolve and change and learn, and see what comes out of it.

I am certain there are flaws in my thinking, and I am confident I have not thought of every problem or every possible solution. But this idea is fascinating to me. I do not have the technical expertise to pursue it save as an intellectual exercise. And even if I did, I’m not sure I would want to release such a program onto the internet. I don’t actually want to be responsible for creating Skynet.