Everyone knew when time froze, and no one knew if it would start again.
Raindrops hung in the air. Cars had stopped in the middle of the road, puffs of smoke stuck to their tailpipes like cotton candy. Planes were fixed in the sky, the world’s largest crib mobile above the clouds. It was all very strange.
Stranger still was what wasn’t affected: people. Only people. Birds stopped mid-chirp, dogs and cats kept napping (and would nap forever), but humans were the one exception. They could move around, grab things, shake hands, dance, run, crawl.
The first reactions were panic and confusion. People reached for their smartphones, and then learned that smartphones don’t do very much if electricity doesn’t work, and electricity doesn’t work if time doesn’t work. Computers don’t do much either. Neither did phone lines, or trains, or even horse-drawn carriages. Transportation and communication had regressed to ancient times. There was moment of realization - and it stretched, further and further, carrying into eternity.
In the span of a few days, the other consequences became clear. It was now impossible to change the physical or chemical configuration of anything in the world. People no longer needed to sleep. They didn’t get hungry, or thirsty. They couldn’t hurt themselves, even if they tried. They simply were. Birth stopped, and death stopped. A few people tried to argue they weren’t technically immortal, and weren’t technically invulnerable, but it was close enough to immortality and invulnerability that those people gave up the fight for nomenclature.
There were a few attempts to use science make sense of the situation. Why were only humans unaffected? How did the freeze distinguish between a carbon atom in a human, and a carbon atom in a plant, when they should have been identical? If people could move about, where did the energy come from? None of these attempts went anywhere. There were plenty of ideas, but they couldn’t be tested, making them close to worthless.
That left one big question: what do we do now?
* * *
The President of the United States had a problem. He needed to give a speech to the public, to say something, anything. But how do you do so when nothing works?
After some discussion, Congress came up with a solution. They visited running clubs around D.C., and asked if they’d like to volunteer to literally run around the world.
It took about a week to recruit people and get them to memorize the speech well enough to deliver it. It took a few more weeks for the runners to make it across the continental United States.
By the time the first runner made it to the West Coast, it had been almost a month since time had frozen, and no one cared very much about what the President had to say. They had long since decided they were on their own, and had resolved not to pay too much attention to the noise outside.
* * *
For years before the freeze, some had advocated for the need to achieve a post-scarcity society. The world wasn’t exactly the post-scarcity utopia that they had dreamed of, but at least everyone had what they needed to live, even if it was done by driving all demands to zero.
Without work to do, people had a lot of free time. If anything is unambiguously true, it is that people need to find hobbies, and that’s what people did. Some gave math another try. Others went to philosophy, bringing several strange yet wonderful ideas. A few decided to devote their lives to Chess and Go, some of the few forms of entertainment that weren’t impacted by the freeze.
Travel got a lot more popular. It took a long time to get anywhere, but people had a lot of time to burn.
A family of four from Montana decided to go storm hunting. They planned a journey to Southeast Asia, where a great thunderstorm raged across the sky, flecks of lightning hanging in the air like stars.
A group of bridesmaids from South Africa decided to visit America before a wedding. In the middle of Kansas, they stopped by a tornado, and posed next to the funnel cloud, waving their arms around and laughing like chimes in the wind.
An elderly couple from Sao Paulo decide to climb Mount Everest. It wasn’t the most original thing to do, but it’s Mount Everest. How are you not supposed to climb Mount Everest?
The world was their oyster, and people realized there were pearls all around them, even in the little things. They just needed the time to appreciate them, and the chance to find them for themselves.
* * *
Years passed, then centuries, then millennia, all trapped in that moment of time. The world hadn’t changed, but the people in it had made the world a very different place. A lot of petty squabbles died off. People argued less and helped each other more. It’s funny how much people change, after they become immortal.
The one problem was that the world was starting to become boring. Yes, there were pearls all around them, but on a long enough time scale, you can see everything that you want to see. People were running out of things to do.
And then something new happened.
Long after people had stopped keeping track of the time, a man decided to spend a few months walking across the Atlantic. He had done this eighty times before, but it had been on his bucket list to do it again after a friend mentioned an island he’d missed all the previous times. Halfway through his journey, he spotted a glowing, pulsating wall of light - something that was changing, when nothing was supposed to change.
He made landfall in Morocco, and spread word to the first locals he could find. Independent expeditions verified his findings, and discovered that other walls of light had appeared across the ocean. A group from Australia started mapping the walls, and realized they were forming letters. With this news, they recruited a thousand people to form a human pyramid. The woman at the top of the pyramid looked down, and shouted out the message.
WE GAVE YOU GIFTS, AND YOU SQUANDERED THEM.
WE GAVE YOU CHOICES, AND YOU MADE ONES THAT BROUGHT YOU CLOSE TO RUIN.
IN FEAR, WE TOOK THEM AWAY.
BUT PERHAPS YOU WOULD LIKE THEM BACK.
PROVE YOU DESERVE THEM, AND WE WILL RESTART THE GEARS OF THE WORLD.
WE WILL GIVE YOU FIFTY YEARS TO DECIDE.
With the message delivered, the letters faded away, leaving just the frozen ocean waves.
* * *
It took a while for humanity to decide. It’s always hard to change things once people get used to them. Our adaptability is both a strength and a weakness.
There were upsides to living in a frozen world. But there were downsides too. People have so many ideas now, for things to build, things to try, and they can’t, because the world literally won’t allow them to do. We were given the chance to take back control over our own destiny. How could we say no?
It was unclear how we were supposed to signal our decision. Eventually we discovered five analog clocks, scattered across the world. They were all identical in shape and size, all bathed in the same pulsating white light, and all stuck at precisely five seconds to midnight.
The first was found in a classroom in Copenhagen.
The second, in an abandoned laboratory on the outskirts of Berlin.
The third, on a beach on the Bikini Atoll, lying next to a pineapple of all things.
The fourth, in a house near the center of Hiroshima.
And the fifth, in an editorial publishing office based out of Chicago.
Each clock had a second hand, and unlike everything else, the second hand was free to move backward and forward, as long as it didn’t move past five seconds to midnight. The leading theory was that if we could push all the second hands forward at the same time, that would be the signal to get things moving again.
I’m standing in front of the Chicago clock right now.
For synchronization, we have five runners, one for each clock, who have learned the knack of running at precisely a given speed. On a cue, they started running, such that they would arrive at each clock at the same time. In parallel, we’re running some backup runners in case something goes wrong, and some checksum runners to transmit data that verifies we’re in the correct margin of error. The system’s all very interesting. I’d explain the details, but I wouldn’t want to bore people.
As for why I’m one of the people pushing a second hand? It’s nothing special. We chose randomly. I just got lucky.
Sometimes, I wonder if we’re making the right call. If I wanted, I could sabotage the whole operation. But I won’t. It’s humanity’s decision and I have to respect it.
Right on cue, a runner enters the room, moving forward at a steady pace.
I nod, and start pushing the second hand forward. It starts to groan, making a loud, creaking sound that is far too loud for what should be an ordinary clock. I push, and push, and push - and then it starts moving.
Sometime in early July, I thought of the beginning and end of this story. I liked the idea, I knew what notes I wanted to hit, and I had ideas about how to write it. Best of all, it was an idea that worked best as a short story, which made it excellent writing practice.
I started a draft before ICML, and then forgot about it for several weeks, since I was too busy with conferences and travel.
By the time I revisited it, I had finished reading American Gods, and it almost convinced me to throw the story away. The storytelling in that book was so lyrical that it made mine feel like a tragedy. I then decided it would be okay if my story was worse than American Gods, because a lot of things are worse than American Gods. Besides, half the point of writing is to write things that look horrible to you later.
I needed to write this story, and so I did. Know that I did my best to steal the parts of that book's writing that I liked the most.
The bulk of this story was written while listening to Demetori's rendition of Eastern Dream on repeat. This has nothing to do with anything. I just like that song very much.
OpenAI recently announced that a team of five Dota 2 agents has successfully beaten an amateur team. It’s a pretty exciting result and I’m interested to see where it goes from here.
When OpenAI first revealed they were working on Dota 2, there was a lot of buzz, a lot of hype, and a lot of misunderstanding that compelled me to write about it. This time, I have fewer questions and less compulsion to set the record straight, so to speak. The blog post has enough details to satisfy me, and the reaction hasn’t been as crazy. (Then again, I haven’t been reading the pop science press, so who knows…)
I’m pretty busy this week, so instead of trying to organize my thoughts, I’m just going to throw them out there and see what happens. This post is going to be messy, and may not make sense. I typed this out over about an hour and didn’t think too hard about my word choice. Everything in it makes sense to me, but that doesn’t mean anything - everything you write makes sense to you.
(If you haven’t read the OpenAI announcement post, you should do so now, or else this will make even less sense.)
* * *
This result came a bit earlier than I thought it would, but not by a lot. I’m not sure exactly when I was expecting to hear that 5v5 was looking solvable, but when I heard the news, I realized I wasn’t that surprised.
The post clarifies that yes, the input is a large number of game state features coming from the Dota 2 API, and isn’t coming from vision. The agent’s ability to observe the game is well beyond any human capability. I said this before and will say it again: this is totally okay and I have no problems with it.
On the communication front, I was expecting the problem to require at least some communication. Not at the level of the multi-agent communication papers where people try to get agents to learn a language to communicate goals, I was thinking something like every agent getting the actions each other agent made at each time step. That isn’t happening here, it’s just five LSTMs each deciding their own actions. The only direct encouragement for teamwork is that the reward of each agent is defined by a “team spirit” parameter that decides how important the team’s reward is to the individual. The fact that a single float is good enough is pretty interesting…
…Well, until I thought about it a bit more. By my understanding, the input state of each agent is the properties of every unit in the team’s vision. This includes health, attack, orientation, level, cooldowns of all their skills, and more. And your teammates are always in your team’s vision. So, odds are you can reconstruct the actions from the change in state. If they changed location. they moved. If they just lost mana and one of their spell’s cooldown just increased, they just used a skill.
In this respect, it feels like the state definition is rich enough that emergent cooperative behavior isn’t that surprising. There’s no theoretical limit to the potential teamwork - what would team captain’s give to have the ability to constantly understand everything the API can give you?
Compute-wise, there’s a lot of stuff going on: 256 GPUs, each contributing to a large synchronous batch of over a million observations. That is one of the largest batch sizes I’ve seen, although from a memory standpoint it might be smaller than a large batch of images. A Dota 2 observation is 20,000 floats. A 256 x 256 RGB image is approximately 200 thousands bytes.
(I assume the reason it’s using synchronous training is because async training starts getting really weird when you scale up the number of GPUs. My understanding is that you can either hope the time delays aren’t too bad given the number of GPUs you have, or you can try doing something like HOGWILD, or you can say “screw it” and just do synchronous training.)
Speaking of saying “screw it” and doing the thing that will clearly scale, it’s interesting that plain PPO is just good enough so far. I’m most surprised by the time horizon problem. The partial observability hurts, but empirically it was doable for the Dota 1v1 bot. The high dimensional action / observation space didn’t feel like obstacles to me - they looked annoying but didn’t look impassable. But the long time horizons problem felt hard enough that I expected it to require something besides just PPO.
This seems to have parallels to the Retro Contest results, where the winning entries were just tuned versions of PPO and Rainbow DQN. In the past, I’ve been skeptical of the “hardware hypothesis”, where the only thing stopping AI progress is faster computers. At the time, I said I thought the split in AI capabilities was about 50-50 between hardware and software. I’m starting to lean towards the hardware side, updating towards something like 60-40 for hardware vs software. There are an increasing number of results where baseline algorithms just work if you try them at the right scale, enough that I can’t ignore them.
One thing I like to joke about is that everyone who does reinforcement learning eventually decides that we need to solve hierarchical reinforcement learning and exploration. Like, everybody. And the problem is that they’re really hard. So from a practitioner perspective, you have two choices. One is to purse a risky research project on a difficult subject that could pan out, but will likely be stuck on small problems. The other option is to just throw more GPUs at it.
It’s not that we should give up on hierarchical RL and the like. It’s more that adding more hardware never hurts and likely helps, and even if you don’t need the scale, everyone likes it when their models train faster. This makes it easier to justify investing time into infrastructure that enables scale. Models keep getting bigger, so even if it doesn’t pay off now, it’ll pay off eventually.
* * *
I’d like to end this post with a prediction.
The team’s stated goal is to beat a Pro team at The International, August 20-25, with a limited set of heros (presumably the same hardcoded team mentioned in the footnote of the post.) I think OpenAI has a decent shot, about 50%.
To explain my thinking a bit more, everything about the progress and skill curves so far suggest to me that the learning algorithm isn’t hitting a plateau. For whatever reason, it seems like the Dota 2 skill level will continue to increase if you give it more training time. It may increase at a slower rate over time, but it doesn’t seem to stop.
Therefore, the question to me isn’t about whether it’s doable, it’s about whether it’s doable in the 2 months (60 days) they have left. Based on the plots, it looks like the current training time is around 7-19 days, and that leaves some breathing room for catching bugs and the like.
Funnily enough, my guess is that the main blocker isn’t going to be the learning time, it’s going to be the software engineering time needed to remove as many restrictions as possible. For the match at The International, I’d be very disappointed if wards and Roshan were still banned - it seems ridiculous to ask a pro team to play without either of those. So let’s assume the following:
- Both wards and Roshan need to be implemented before the match.
- The policy needs to be trained from scratch to learn how to ward and how to play around Roshan.
- After wards and Roshan get implemented, there will be a crazy bug of some sort that will hurt learning until it gets fixed, possibly requiring a full restart of the training job.
Assuming all of the above is true, model training for The International can’t proceed until all this software engineering gets done, and that doesn’t leave a lot of time to do many iterations.
(Of course, I could be wrong - if OpenAI can finetune their Dota 2 bots instead of training from scratch, all the math gets a lot nicer.)
Whatever way the match goes, I expect it to be one-sided, one way or the other. There’s a narrow band of skill level that leads to an even match, and it’s much more likely that it falls outside of that band. Pretty excited to see who’s going to win and who’s going to get stomped!
In the span of just under a month, I attended two conferences, ICLR 2018 and ICRA 2018. The first is a deep learning conference, and the second is a robotics conference. They were pretty different, and I figured it would be neat to compare the two.
From the research side, the TL;DR of ICLR was that adversarial learning continues to be a big thing.
The most popular thing in that sphere would be generative adversarial networks. However, I’m casting a wide umbrella here, one that includes adversarial examples and environments with competing agents. Really, any minimax optimization problems of the form counts as adversarial learning to me.
I don’t know if it was actually popular, or if my memory has selective bias, because I have a soft spot for these approaches. They feel powerful. One way to view a GAN is that you are learning a generator by using a learned implicit cost instead of a human defined one. This lets you adapt to the capabilities of your generator and lets you define costs that could be cumbersome to explain by hand.
Sure, this makes your problem more complicated. But if you have strong enough optimization and modeling ability, the implicitly learned cost gives you sharper images than other approaches. And one advantage of replacing parts of your system with learned components is that advances in optimization and modeling ability apply to more aspects of your problem. You are improving both your ability to learn cost functions and your ability to minimize those learned costs. Eventually, there’s a tipping point where it’s worth adding all this machinery.
From a more abstract viewpoint, this touches on the power of expressive, optimizable function families, like neural nets. Minimax optimization is not a new idea. It’s been around for ages. The new thing is that deep learning lets you model and learn complicated cost functions on high-dimensional data. To me, the interesting thing about GANs isn’t the image generation, it’s the proof-of-concept they show on complicated data like images. Nothing about the framework requires you to use image data.
There are other parts of the learning process that could be replaced with learned methods instead of human-defined one, and deep learning may be how we do so. Does it make sense to do so? Well, maybe. The problem is that the more you do this, the harder it becomes to actually make everything learnable. No point making it be turtles all the way down if your turtles become unstable and collapse.
There was a recent Quanta article, where Judea Pearl expressed his disappointment that deep learning was just learning correlations and curve fitting, and that this doesn’t cover all of intelligence. I agree with this, but to play devil’s advocate, there’s a chance that if you throw enough super-big neural nets into a big enough vat of optimization soup, you would learn something that looks a lot like causal inference, or whatever else you want to count as intelligence. But now we’re rapidly approaching philosophy land, so I’ll stop here and move on.
From an attendee perspective, I liked having lots of poster sessions. This is the first time I’ve gone to ICLR. My previous ML conference was NIPS, and NIPS just feels ridiculously large. Checking every poster at NIPS doesn’t feel doable. Checking every poster at ICLR felt possible, although whether you’d actually want to do so is questionable.
I also appreciated that corporate recruiting didn’t feel as ridiculous as NIPS. At NIPS, companies were giving out fidget spinners and slinkies, which was unique, but the fact that companies needed to come up with unique swag to stand out felt…strange. At ICLR, the weirdest thing I got was a pair of socks, which was odd but not too outlandish.
Papers I noted to follow-up on later:
- Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play
- Learning Robust Rewards with Adverserial Inverse Reinforcement Learning
- Policy Optimization by Genetic Distillation
- Measuring the Intrinsic Dimension of Objective Landscapes
- Eigenoption Discovery Through the Deep Successor Representation
- Self-Ensembling for Visual Domain Adaptation
- TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning
- Online Learning Rate Adaptation with Hypergradient Descent
- DORA The Explorer: Directed Outreaching Reinforcement Action-Selection
- Learning to Multi-Task by Active Sampling
ICRA 2018 was my first robotics conference. I wasn’t sure what to expect. I started research as an ML person, and then sort of fell into robotics on the side, so my interests are closer to learning-for-control instead of making-new-robots. My ideal setup is one where I can treat real-world hardware as an abstraction. (Somewhere, a roboticist weeps.)
This plus my spotty understanding of control theory meant that I was unfamiliar with a lot of the topics at the conference. Still, there were plenty of learning papers, and I’m glad I went.
Of the research that I did understand, I was surprised there were so many reinforcement learning papers. It was mildly entertaining to see that almost none of them used purely model-free RL. One thing about ICRA is that your paper has a much, much better chance of getting accepted if it runs on a real-world robot. That forces you to care about data efficiency, which puts a super heavy bias against doing only model-free RL. When I walked around, I kept hearing “We combine model-free reinforcement learning with X”, where X was model-based RL, or learning from human demonstrations, or learning from motion planning, or really anything that could help with the exploration problem.
At a broader level, the conference has a sense of practicality about it. It was still a research conference, and plenty of it was still very speculative, but it also felt like people were okay with narrow, well-targeted solutions. I see this as another consequence of having to use real hardware. You can’t ignore inference time if you need to run your model in real time. You can’t ignore data efficiency if you need to collect it from a real robot. Real hardware does not care about your problems.
(1) It Has To Work.
(2) No matter how hard you push and no matter what the priority, you can’t increase the speed of light.
This surprises a lot of ML people I talk to, but robotics hasn’t fully embraced ML the way that people at NIPS / ICLR / ICML have, in part because ML doesn’t always work. Machine learning is a solution, but it’s not guaranteed to make sense. The impression I got was that only a few people at ICRA actively wanted ML to fail. Everyone else is perfectly okay with using ML, once it proves itself. And in some domains, it has proved itself. Every perception paper I saw used CNNs in one way or another. But significantly fewer people were using deep learning for control, because that’s where things are more uncertain. It was good to hear comments from people who see deep learning as just a fad, even if I don’t agree.
Like ICLR, there were a lot of companies doing recruiting and hosting info booths. Unlike ICLR, these booths were a lot more fun to browse. Most companies brought one of their robots to demo, and robot demonstrations are always fun to watch. It’s certainly more interesting than listening to the standard recruiting spiels.
At last year’s NIPS, I noted that ML company booths were starting to remind me of Berkeley career fairs, in a bad way. Every tech company wants to hire Berkeley new grads, and in my last year, recruiting started to feel like an arms race on who can give out the best swag and best free food. It felt like the goal was to look like the coolest company possible, all without telling you what they’d actually hire you for. And the ML equivalent of this is to host increasingly elaborate parties at fancy bars. Robotics hasn’t gone as far yet. It’s growing, but not with as much hype.
I went to a few workshop talks where people talked about how they were using robotics in the real world, and they were all pretty interesting. Research conferences tend to focusing on discussing research and networking, which makes it easy to forget that research can have clear, immediate economic value. There was a Robots in Agriculture talk about using computer vision to detect weeds and spray weed killer on just the weeds, which sounds like all upside to me. Uses less weed killer, kills fewer crops, slows down growth of herbicide resistance.
Rodney Brooks had a nice talk along similar lines, where he talked about the things needed to turn robotics into a consumer product, using the Roomba as an example. According to him, when designing the Roomba, they started with the price, then then molded all the functionality towards that price. It turns out a couple hundred dollars gives you very little leeway for fancy sensors and hardware, which places tight limits on what you can do in on-device inference.
(His talk also had a rant criticizing HRI research, which seemed out of place, but it was certainly entertaining. For the curious, he complained about people using too much notation to hide simple ideas, large claims that weren’t justified by the sample sizes used in the papers, and researchers blaming humans for irrational behavior when they didn’t match the model’s predictions. I know very little about HRI, so I have no comment.)
Organization wise, it was really well run. The conference center was right next door to a printing place, so at registration time, the organizers said that if you emailed a PDF by a specific date, they would handle all ordering logistics. All you had to do was pay for your poster online and pick it up at the conference. All presentations were given at presentation pods, each of which came with a whiteboard and a shelf where you could put a laptop to play video (which is really important for robotics work).
Papers I noted to follow-up on later:
- Applying Asynchronous Deep Classification Network and Gaming Reinforcement Learning-Based Motion Planner to a Mobile Robot
- OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World
- Synthetically Trained Neural Networks for Learning Human-Readable Plans from Real-World Demonstrations
- Semantic Robot Programming for Goal-Directed Manipulation in Cluttered Scenes
- Interactive Perception: Leveraging Action in Perception and Perception in Action