# Posts

• ## Three Years Later

Sorta Insightful turns three years old today! Whether you were here from the beginning, or just discovered this blog, thanks for reading.

I normally write a sappy, self-reflective post for my blogging anniversary. This year, I’m deciding to do a bunch of data analysis instead. It’s still self-reflective, just in a different vein.

## Words Written

Last year, I wrote 22,409 words. How about this year?

I wrote 24,449 words. Here’s the breakdown in chronological order.

I wrote 12 posts this year, following my trend of 1 post a month on average.

Eagle-eyed readers may notice that the reinforcement learning post was much, much longer than the rest, taking up almost 40% of the words I wrote for Sorta Insightful this year.

## View Counts

These view counts are aggregated from August 18, 2017 to today.

Okay, I knew the RL post would be the outlier. I didn’t think it would be the outlier by that much. Jeeeez.

## Time Spent Writing

For the past two years, I’ve been using Gleeo Time Tracker to track my time. I track a few things: how long I sleep, the length of my commute, how much time I spend reading books and web fiction, what video games I play (and for how long), and how long I spend writing.

Despite having two years worth of data, I’ve never bother doing any analytics on it. This post felt like a good excuse to start.

Excluding the time spent on this post, I spent 131 hours, 21 minutes writing for my blog this year. At first, this felt like less than I expected, but this averages out to about 21-22 minutes a day, which feels correct.

## When I Write

Most days, I don’t do any writing. My motivation comes in bursts. I like starting and finishing posts within a few days, and it feels like I do the most writing on the weekends. Is that true?

Gleeo Time Tracker doesn’t have the tools for this built-in, but you can export your time tracking data as a CSV file. This makes it straightforward to do further analytics. I used Python for this, since that’s my go-to programming language.

Here’s how much time I’ve spent writing on a given day of the week.

Day of the Week   Hours
Monday 20.93
Tuesday 14.18
Wednesday 33.53
Thursday 24.98
Friday 12.38
Saturday 11.18
Sunday 14.15

Turns out I was super wrong! I actually do most of my writing on Wednesday. I guess blogging is my outlet for getting through the middle of the week?

Alright, so I was wrong about the day of the week. How about the time of day? I’ve often joked that my most productive writing hours are between 11 PM and 2 AM. Is that true?

Time of Day   Hours
00:00-00:59 22.67
01:00-01:59 29.92
02:00-02:59 17.88
03:00-03:59 5.83
04:00-04:59 2.20
05:00-05:59 1.67
06:00-06:59 1.27
07:00-07:59 1.50
08:00-08:59 1.00
09:00-09:59 1.00
10:00-10:59 1.00
11:00-11:59 1.72
12:00-12:59 1.00
13:00-13:59 0.38
14:00-14:59 1.77
15:00-15:59 1.60
16:00-16:59 4.30
17:00-17:59 4.48
18:00-18:59 5.28
19:00-19:59 3.40
20:00-20:59 3.78
21:00-21:59 5.38
22:00-22:59 4.22
23:00-23:59 8.10

Well, it’s very close to true! I was an hour early, I’m most productive between midnight and 3 AM.

The more impressive (and scary) thing is that I’ve written during literally every hour of the day. Who’s even awake at 5 AM? And given that I have 1.67 hours of writing at that time, I must have done it at least twice.

## Time Spent Per Post

My time tracker data doesn’t store the post I was writing at the time. However, thanks to the magic of Git, I can reconstruct what post I was writing on a given day.

This blog is a Github Pages blog, so every change to this blog is done through a Git commit, with timestamp. First, I exported all commits I’ve ever made.

With a few exceptions, the Git history for this blog is structured as a tree. Every post starts by branching off of master. I work on the draft there, building my site locally to preview how it looks. When the post is finished, I go back to master, then run git merge --squash <branch name>. This creates a single commit that’s the sum of all changes made in that branch.

This workflow means that writing-wise, the only meaningful commits are on the offshoot branches. These commits lie on exactly one branch, which corresponds to exactly one post.

This gives me a list of commit times (from Git) and a list of (start, end) intervals (from my time tracker). If a commit lies within the interval, it’s paired with that interval. Not every interval contains a commit, since I didn’t commit my work during every writing session. but I can assign those intervals based on the closest commit. Summing all the intervals paired with the commits on a given branch gives me the time I’ve spent working on that post.

Here’s the time spent for each post, ordered from most to least. For context, I also include the number of words in that post.

Post   Hours   Word Count
rl-hard 53.33 9426
(Draft) 15.27 N/A
iclr-icra 10.68 1790
blog-paper 8.22 1319
five-seconds 6.43 1872
mlp-music 5.28 450
magic-arena 5.23 1280
mlp-italy 4.32 957
mh-2018 3.68 2112
2year 3.38 1256
research-tax 2.82 1276
sim2real-grasping 1.95 155
dota2-five 1.3 1464
(Draft) 1.02 N/A
(Draft) 0.72 N/A

A few people have asked me how long it took me to write my reinforcement learning post. Well, there’s your answer: 53 hours, 20 minutes. Based on commit timestamps, the first draft was started August 2017, most of it was written between October 2017 and Christmas 2017, and editing based on early feedback was done between Christmas 2017 and Valentine’s Day 2018.

The time for that post actually lines up eerily well with the word count. The RL post was almost 40% of the words I wrote this year, and the post took almost 40% of my writing time. This correlation immediately falls apart for the other posts.

You may have noticed the crazy outlier of “1.95 hours to write 155 words”. It’s very misleading. Based on my commit messages, that post included updates to my About page and Research page, which isn’t reflected in the reported word count.

There are two other outliers. I spent 1.3 hours writing 1464 words for the OpenAI Five post. If you read that post, the lack of polish should be obvious. I spent 5.28 hours writing 450 words for the MLP Music Recs page, but most of the work there was spent searching up songs on YouTube and narrowing the list down to 1 song per artist.

As for the drafts: we’ll see if I finish any of those. I believe it’s perfectly healthy to have lots of incomplete projects. You aren’t obligated to finish everything you start. Still, it feels weird to have 15 hours of work on an unfinished draft, when most of my posts take less than 10.

## What’s Next?

Well, to be honest, I’m not really sure. Historically, if I say I’ll write a post in the upcoming year, I never get around to writing it. This year, I’m deciding not to commit to writing anything. Instead, I’ll write whatever I have motivation to write. This isn’t really a change, it’s simply me being more realistic about what’s going to happen.

Although, to be honest, I do still want to write that post about Gunnerkrigg Court. I’ve been talking about writing that post for over two years. One day, it’ll happen. It has to.

• ## Five Seconds to Midnight

Everyone knew when time froze, and no one knew if it would start again.

Raindrops hung in the air. Cars had stopped in the middle of the road, puffs of smoke stuck to their tailpipes like cotton candy. Planes were fixed in the sky, the world’s largest crib mobile above the clouds. It was all very strange.

Stranger still was what wasn’t affected: people. Only people. Birds stopped mid-chirp, dogs and cats kept napping (and would nap forever), but humans were the one exception. They could move around, grab things, shake hands, dance, run, crawl.

The first reactions were panic and confusion. People reached for their smartphones, and then learned that smartphones don’t do very much if electricity doesn’t work, and electricity doesn’t work if time doesn’t work. Computers don’t do much either. Neither did phone lines, or trains, or even horse-drawn carriages. Transportation and communication had regressed to ancient times. There was moment of realization - and it stretched, further and further, carrying into eternity.

In the span of a few days, the other consequences became clear. It was now impossible to change the physical or chemical configuration of anything in the world. People no longer needed to sleep. They didn’t get hungry, or thirsty. They couldn’t hurt themselves, even if they tried. They simply were. Birth stopped, and death stopped. A few people tried to argue they weren’t technically immortal, and weren’t technically invulnerable, but it was close enough to immortality and invulnerability that those people gave up the fight for nomenclature.

There were a few attempts to use science make sense of the situation. Why were only humans unaffected? How did the freeze distinguish between a carbon atom in a human, and a carbon atom in a plant, when they should have been identical? If people could move about, where did the energy come from? None of these attempts went anywhere. There were plenty of ideas, but they couldn’t be tested, making them close to worthless.

That left one big question: what do we do now?

* * *

The President of the United States had a problem. He needed to give a speech to the public, to say something, anything. But how do you do so when nothing works?

After some discussion, Congress came up with a solution. They visited running clubs around D.C., and asked if they’d like to volunteer to literally run around the world.

It took about a week to recruit people and get them to memorize the speech well enough to deliver it. It took a few more weeks for the runners to make it across the continental United States.

By the time the first runner made it to the West Coast, it had been almost a month since time had frozen, and no one cared very much about what the President had to say. They had long since decided they were on their own, and had resolved not to pay too much attention to the noise outside.

* * *

For years before the freeze, some had advocated for the need to achieve a post-scarcity society. The world wasn’t exactly the post-scarcity utopia that they had dreamed of, but at least everyone had what they needed to live, even if it was done by driving all demands to zero.

Without work to do, people had a lot of free time. If anything is unambiguously true, it is that people need to find hobbies, and that’s what people did. Some gave math another try. Others went to philosophy, bringing several strange yet wonderful ideas. A few decided to devote their lives to Chess and Go, some of the few forms of entertainment that weren’t impacted by the freeze.

Travel got a lot more popular. It took a long time to get anywhere, but people had a lot of time to burn.

A family of four from Montana decided to go storm hunting. They planned a journey to Southeast Asia, where a great thunderstorm raged across the sky, flecks of lightning hanging in the air like stars.

A group of bridesmaids from South Africa decided to visit America before a wedding. In the middle of Kansas, they stopped by a tornado, and posed next to the funnel cloud, waving their arms around and laughing like chimes in the wind.

An elderly couple from Sao Paulo decide to climb Mount Everest. It wasn’t the most original thing to do, but it’s Mount Everest. How are you not supposed to climb Mount Everest?

The world was their oyster, and people realized there were pearls all around them, even in the little things. They just needed the time to appreciate them, and the chance to find them for themselves.

* * *

Years passed, then centuries, then millennia, all trapped in that moment of time. The world hadn’t changed, but the people in it had made the world a very different place. A lot of petty squabbles died off. People argued less and helped each other more. It’s funny how much people change, after they become immortal.

The one problem was that the world was starting to become boring. Yes, there were pearls all around them, but on a long enough time scale, you can see everything that you want to see. People were running out of things to do.

And then something new happened.

Long after people had stopped keeping track of the time, a man decided to spend a few months walking across the Atlantic. He had done this eighty times before, but it had been on his bucket list to do it again after a friend mentioned an island he’d missed all the previous times. Halfway through his journey, he spotted a glowing, pulsating wall of light - something that was changing, when nothing was supposed to change.

He made landfall in Morocco, and spread word to the first locals he could find. Independent expeditions verified his findings, and discovered that other walls of light had appeared across the ocean. A group from Australia started mapping the walls, and realized they were forming letters. With this news, they recruited a thousand people to form a human pyramid. The woman at the top of the pyramid looked down, and shouted out the message.

WE GAVE YOU GIFTS, AND YOU SQUANDERED THEM.

WE GAVE YOU CHOICES, AND YOU MADE ONES THAT BROUGHT YOU CLOSE TO RUIN.

IN FEAR, WE TOOK THEM AWAY.

BUT PERHAPS YOU WOULD LIKE THEM BACK.

PROVE YOU DESERVE THEM, AND WE WILL RESTART THE GEARS OF THE WORLD.

WE WILL GIVE YOU FIFTY YEARS TO DECIDE.

With the message delivered, the letters faded away, leaving just the frozen ocean waves.

* * *

It took a while for humanity to decide. It’s always hard to change things once people get used to them. Our adaptability is both a strength and a weakness.

There were upsides to living in a frozen world. But there were downsides too. People have so many ideas now, for things to build, things to try, and they can’t, because the world literally won’t allow them to do. We were given the chance to take back control over our own destiny. How could we say no?

It was unclear how we were supposed to signal our decision. Eventually we discovered five analog clocks, scattered across the world. They were all identical in shape and size, all bathed in the same pulsating white light, and all stuck at precisely five seconds to midnight.

The first was found in a classroom in Copenhagen.

The second, in an abandoned laboratory on the outskirts of Berlin.

The third, on a beach on the Bikini Atoll, lying next to a pineapple of all things.

The fourth, in a house near the center of Hiroshima.

And the fifth, in an editorial publishing office based out of Chicago.

Each clock had a second hand, and unlike everything else, the second hand was free to move backward and forward, as long as it didn’t move past five seconds to midnight. The leading theory was that if we could push all the second hands forward at the same time, that would be the signal to get things moving again.

I’m standing in front of the Chicago clock right now.

For synchronization, we have five runners, one for each clock, who have learned the knack of running at precisely a given speed. On a cue, they started running, such that they would arrive at each clock at the same time. In parallel, we’re running some backup runners in case something goes wrong, and some checksum runners to transmit data that verifies we’re in the correct margin of error. The system’s all very interesting. I’d explain the details, but I wouldn’t want to bore people.

As for why I’m one of the people pushing a second hand? It’s nothing special. We chose randomly. I just got lucky.

Sometimes, I wonder if we’re making the right call. If I wanted, I could sabotage the whole operation. But I won’t. It’s humanity’s decision and I have to respect it.

Right on cue, a runner enters the room, moving forward at a steady pace.

“It’s time.”

I nod, and start pushing the second hand forward. It starts to groan, making a loud, creaking sound that is far too loud for what should be an ordinary clock. I push, and push, and push - and then it starts moving.

Tick.

Tock.

Tick.

Tock.

Tick.

• ## Quick Opinions on OpenAI Five

OpenAI recently announced that a team of five Dota 2 agents has successfully beaten an amateur team. It’s a pretty exciting result and I’m interested to see where it goes from here.

When OpenAI first revealed they were working on Dota 2, there was a lot of buzz, a lot of hype, and a lot of misunderstanding that compelled me to write about it. This time, I have fewer questions and less compulsion to set the record straight, so to speak. The blog post has enough details to satisfy me, and the reaction hasn’t been as crazy. (Then again, I haven’t been reading the pop science press, so who knows…)

I’m pretty busy this week, so instead of trying to organize my thoughts, I’m just going to throw them out there and see what happens. This post is going to be messy, and may not make sense. I typed this out over about an hour and didn’t think too hard about my word choice. Everything in it makes sense to me, but that doesn’t mean anything - everything you write makes sense to you.

(If you haven’t read the OpenAI announcement post, you should do so now, or else this will make even less sense.)

* * *

This result came a bit earlier than I thought it would, but not by a lot. I’m not sure exactly when I was expecting to hear that 5v5 was looking solvable, but when I heard the news, I realized I wasn’t that surprised.

The post clarifies that yes, the input is a large number of game state features coming from the Dota 2 API, and isn’t coming from vision. The agent’s ability to observe the game is well beyond any human capability. I said this before and will say it again: this is totally okay and I have no problems with it.

On the communication front, I was expecting the problem to require at least some communication. Not at the level of the multi-agent communication papers where people try to get agents to learn a language to communicate goals, I was thinking something like every agent getting the actions each other agent made at each time step. That isn’t happening here, it’s just five LSTMs each deciding their own actions. The only direct encouragement for teamwork is that the reward of each agent is defined by a “team spirit” parameter that decides how important the team’s reward is to the individual. The fact that a single float is good enough is pretty interesting…

…Well, until I thought about it a bit more. By my understanding, the input state of each agent is the properties of every unit in the team’s vision. This includes health, attack, orientation, level, cooldowns of all their skills, and more. And your teammates are always in your team’s vision. So, odds are you can reconstruct the actions from the change in state. If they changed location. they moved. If they just lost mana and one of their spell’s cooldown just increased, they just used a skill.

In this respect, it feels like the state definition is rich enough that emergent cooperative behavior isn’t that surprising. There’s no theoretical limit to the potential teamwork - what would team captain’s give to have the ability to constantly understand everything the API can give you?

Compute-wise, there’s a lot of stuff going on: 256 GPUs, each contributing to a large synchronous batch of over a million observations. That is one of the largest batch sizes I’ve seen, although from a memory standpoint it might be smaller than a large batch of images. A Dota 2 observation is 20,000 floats. A 256 x 256 RGB image is approximately 200 thousands bytes.

(I assume the reason it’s using synchronous training is because async training starts getting really weird when you scale up the number of GPUs. My understanding is that you can either hope the time delays aren’t too bad given the number of GPUs you have, or you can try doing something like HOGWILD, or you can say “screw it” and just do synchronous training.)

Speaking of saying “screw it” and doing the thing that will clearly scale, it’s interesting that plain PPO is just good enough so far. I’m most surprised by the time horizon problem. The partial observability hurts, but empirically it was doable for the Dota 1v1 bot. The high dimensional action / observation space didn’t feel like obstacles to me - they looked annoying but didn’t look impassable. But the long time horizons problem felt hard enough that I expected it to require something besides just PPO.

This seems to have parallels to the Retro Contest results, where the winning entries were just tuned versions of PPO and Rainbow DQN. In the past, I’ve been skeptical of the “hardware hypothesis”, where the only thing stopping AI progress is faster computers. At the time, I said I thought the split in AI capabilities was about 50-50 between hardware and software. I’m starting to lean towards the hardware side, updating towards something like 60-40 for hardware vs software. There are an increasing number of results where baseline algorithms just work if you try them at the right scale, enough that I can’t ignore them.

One thing I like to joke about is that everyone who does reinforcement learning eventually decides that we need to solve hierarchical reinforcement learning and exploration. Like, everybody. And the problem is that they’re really hard. So from a practitioner perspective, you have two choices. One is to purse a risky research project on a difficult subject that could pan out, but will likely be stuck on small problems. The other option is to just throw more GPUs at it.

It’s not that we should give up on hierarchical RL and the like. It’s more that adding more hardware never hurts and likely helps, and even if you don’t need the scale, everyone likes it when their models train faster. This makes it easier to justify investing time into infrastructure that enables scale. Models keep getting bigger, so even if it doesn’t pay off now, it’ll pay off eventually.

* * *

I’d like to end this post with a prediction.

The team’s stated goal is to beat a Pro team at The International, August 20-25, with a limited set of heros (presumably the same hardcoded team mentioned in the footnote of the post.) I think OpenAI has a decent shot, about 50%.

To explain my thinking a bit more, everything about the progress and skill curves so far suggest to me that the learning algorithm isn’t hitting a plateau. For whatever reason, it seems like the Dota 2 skill level will continue to increase if you give it more training time. It may increase at a slower rate over time, but it doesn’t seem to stop.

Therefore, the question to me isn’t about whether it’s doable, it’s about whether it’s doable in the 2 months (60 days) they have left. Based on the plots, it looks like the current training time is around 7-19 days, and that leaves some breathing room for catching bugs and the like.

Funnily enough, my guess is that the main blocker isn’t going to be the learning time, it’s going to be the software engineering time needed to remove as many restrictions as possible. For the match at The International, I’d be very disappointed if wards and Roshan were still banned - it seems ridiculous to ask a pro team to play without either of those. So let’s assume the following:

• Both wards and Roshan need to be implemented before the match.
• The policy needs to be trained from scratch to learn how to ward and how to play around Roshan.
• After wards and Roshan get implemented, there will be a crazy bug of some sort that will hurt learning until it gets fixed, possibly requiring a full restart of the training job.

Assuming all of the above is true, model training for The International can’t proceed until all this software engineering gets done, and that doesn’t leave a lot of time to do many iterations.

(Of course, I could be wrong - if OpenAI can finetune their Dota 2 bots instead of training from scratch, all the math gets a lot nicer.)

Whatever way the match goes, I expect it to be one-sided, one way or the other. There’s a narrow band of skill level that leads to an even match, and it’s much more likely that it falls outside of that band. Pretty excited to see who’s going to win and who’s going to get stomped!