In an episode from Star Trek: The Next Generation, two characters enter a holodeck, a room that projects realistic, interactive 3-D images, a kind of supercharged virtual reality simulator.
Data, an android, and his human friend Geordi programmed the holodeck as a Sherlock Holmes mystery they could solve. Unfortunately, since Data has memorized every one of Sir Arthur Conan Doyle’s stories, he already knew the outcome.
So Geordi asked the computer to create a completely new story with a character capable of defeating the very smart Data. In order to accomplish this task, the computer birthed a program that became a sentient being with its own consciousness.
Strangely enough, we’re approaching an era in which the events from this TV episode in 1988 are becoming increasingly possible.
“Treats” for robots
Pieter Abbeel, who directs the Robot Learning Lab at the UC Berkeley, is one of the world’s foremost experts on robotics and AI.
Using a principle called Deep Reinforcement Learning, Abbeel has developed algorithms that “rewards” robots for behaviors that helps them accomplish specific tasks. The end result are robots that can independently learn new things from past experiences or even the experiences of other robots.
Theoretically, Abbeel told me, a robot could develop a consciousness if it concluded that doing so would allow it to best accomplish its task.
“If you think about reinforcement learning, it’s about optimizing reward,” Abbeel said. “Think about humans. We don’t want to be hungry. We don’t want to be thirsty. We want to be reproduce. We want to be admired by our peers. There are all kinds of rewards that we optimize for. So why do humans have consciousness? It’s ultimately related to optimizing these rewards.”
“So for robots, the question is how you create an environment in which the only way for it to maximize reward is to require it to have consciousness?” he said.
So just as the computer in Star Trek created a sentient being so it could defeat Data, the AI software and deep neural networks that power robots could decide that the robot needs to “come alive” in order to hit a goal.
Not as smart as we think
From Skynet of the Terminator movies to HAL in 2001: A Space Odyssey, humans fear our synthetic creations will replace us as the dominant species on Earth. Some of our smartest people, including the late physicist Stephen Hawking and Tesla founder Elon Musk, have warned us about this very scenario.
But in reality, the most advanced AI program can’t match the intelligence of a toddler.
“I would argue that once you can build the intelligence of a three year old, you’re pretty much done anyway,” Abbeel said. “Because once you’re that smart, everything else follows.”
That’s because it’s not what we know but how we come to know it.
AI programs, including the ones that power robots, essentially just memorize large amounts of data and then spot patterns.
So in order to teach a robot to recognize a cat, you have to feed it enough images of cats for it to tell the difference between a cat and a similar looking animal– a tiger for example. In other words, the AI is only as smart as the amount and quality of information researchers give it, which requires a good deal of time and effort.
Trial and error
But thanks to Abbeel and other Berkeley researchers, robots are teaching themselves how to do things, instead of relying on humans to spoon feed it information. The Berkeley team have created algorithms that incorporate the principles of deep reinforcement learning and utility theory.
If you want a dog to roll over or play dead, you give it a treat. The same idea applies for robots. But instead of a bacon flavored dog biscuits, the algorithm rewards robots with incentives represented by numbers– the higher the number, the better the reward. So like a player earning points in a video game, the robot learns which actions it can take to maximize its rewards.
Say you want a robot to run across the room. Through trial and error, the robot must figure out the right amount of torque it should apply to its legs to accomplish its goal.
If the robot applies the incorrect amount of torque, which causes it to fall down or veer off course, it earns few or no points. The robot will keep experimenting until it discovers the amount of torque needed to successfully run across the room and thus earn the greatest number of points.
Abbeel said researchers have already wrote algorithms that incentivized robots to create their own language.
“The robots were put in an environment constructed in a way that required them to invent a way to talk to each other,” he said. “And they did. They invented their own language, because it was the most effective way to maximize reward in those environments.”
Survive at all costs
So what happens if you wanted to teach a robot to survive a hostile environment? That it could earn incentives by taking actions to protect itself?
Sentient beings possess a powerful instinct to survive at all costs. The earliest humans, for example, adopted behaviors that gave them the best chance to stay alive such as living and cooperating in groups.
“And if people didn’t like you in the group, maybe you didn’t get the good food,” Abbeel said. “Maybe you didn’t get the good water. Maybe you didn’t get the clothes, or whatever. Humans are born to pay attention to peer signals. And from that, some rewards are built into our brains. Like, you don’t want to be hungry. You don’t want to be thirsty.”
“You probably had to be part of a team,” he said. “Some kind of tribe or something where you are born with a signal reward system that picks up on what the people care about in my tribe.”
If a robot faced a situation in which we incentivized it to survive at all costs, the robot might conclude that it must develop a higher form of intelligence to accomplish this goal.
In other words, gain consciousness.