In the span of just under a month, I attended two conferences, ICLR 2018 and ICRA 2018. The first is a deep learning conference, and the second is a robotics conference. They were pretty different, and I figured it would be neat to compare the two.
From the research side, the TL;DR of ICLR was that adversarial learning continues to be a big thing.
The most popular thing in that sphere would be generative adversarial networks. However, I’m casting a wide umbrella here, one that includes adversarial examples and environments with competing agents. Really, any minimax optimization problems of the form counts as adversarial learning to me.
I don’t know if it was actually popular, or if my memory has selective bias, because I have a soft spot for these approaches. They feel powerful. One way to view a GAN is that you are learning a generator by using a learned implicit cost instead of a human defined one. This lets you adapt to the capabilities of your generator and lets you define costs that could be cumbersome to explain by hand.
Sure, this makes your problem more complicated. But if you have strong enough optimization and modeling ability, the implicitly learned cost gives you sharper images than other approaches. And one advantage of replacing parts of your system with learned components is that advances in optimization and modeling ability apply to more aspects of your problem. You are improving both your ability to learn cost functions and your ability to minimize those learned costs. Eventually, there’s a tipping point where it’s worth adding all this machinery.
From a more abstract viewpoint, this touches on the power of expressive, optimizable function families, like neural nets. Minimax optimization is not a new idea. It’s been around for ages. The new thing is that deep learning lets you model and learn complicated cost functions on high-dimensional data. To me, the interesting thing about GANs isn’t the image generation, it’s the proof-of-concept they show on complicated data like images. Nothing about the framework requires you to use image data.
There are other parts of the learning process that could be replaced with learned methods instead of human-defined one, and deep learning may be how we do so. Does it make sense to do so? Well, maybe. The problem is that the more you do this, the harder it becomes to actually make everything learnable. No point making it be turtles all the way down if your turtles become unstable and collapse.
There was a recent Quanta article, where Judea Pearl expressed his disappointment that deep learning was just learning correlations and curve fitting, and that this doesn’t cover all of intelligence. I agree with this, but to play devil’s advocate, there’s a chance that if you throw enough super-big neural nets into a big enough vat of optimization soup, you would learn something that looks a lot like causal inference, or whatever else you want to count as intelligence. But now we’re rapidly approaching philosophy land, so I’ll stop here and move on.
From an attendee perspective, I liked having lots of poster sessions. This is the first time I’ve gone to ICLR. My previous ML conference was NIPS, and NIPS just feels ridiculously large. Checking every poster at NIPS doesn’t feel doable. Checking every poster at ICLR felt possible, although whether you’d actually want to do so is questionable.
I also appreciated that corporate recruiting didn’t feel as ridiculous as NIPS. At NIPS, companies were giving out fidget spinners and slinkies, which was unique, but the fact that companies needed to come up with unique swag to stand out felt…strange. At ICLR, the weirdest thing I got was a pair of socks, which was odd but not too outlandish.
Papers I noted to follow-up on later:
- Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play
- Learning Robust Rewards with Adverserial Inverse Reinforcement Learning
- Policy Optimization by Genetic Distillation
- Measuring the Intrinsic Dimension of Objective Landscapes
- Eigenoption Discovery Through the Deep Successor Representation
- Self-Ensembling for Visual Domain Adaptation
- TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning
- Online Learning Rate Adaptation with Hypergradient Descent
- DORA The Explorer: Directed Outreaching Reinforcement Action-Selection
- Learning to Multi-Task by Active Sampling
ICRA 2018 was my first robotics conference. I wasn’t sure what to expect. I started research as an ML person, and then sort of fell into robotics on the side, so my interests are closer to learning-for-control instead of making-new-robots. My ideal setup is one where I can treat real-world hardware as an abstraction. (Somewhere, a roboticist weeps.)
This plus my spotty understanding of control theory meant that I was unfamiliar with a lot of the topics at the conference. Still, there were plenty of learning papers, and I’m glad I went.
Of the research that I did understand, I was surprised there were so many reinforcement learning papers. It was mildly entertaining to see that almost none of them used purely model-free RL. One thing about ICRA is that your paper has a much, much better chance of getting accepted if it runs on a real-world robot. That forces you to care about data efficiency, which puts a super heavy bias against doing only model-free RL. When I walked around, I kept hearing “We combine model-free reinforcement learning with X”, where X was model-based RL, or learning from human demonstrations, or learning from motion planning, or really anything that could help with the exploration problem.
At a broader level, the conference has a sense of practicality about it. It was still a research conference, and plenty of it was still very speculative, but it also felt like people were okay with narrow, well-targeted solutions. I see this as another consequence of having to use real hardware. You can’t ignore inference time if you need to run your model in real time. You can’t ignore data efficiency if you need to collect it from a real robot. Real hardware does not care about your problems.
(1) It Has To Work.
(2) No matter how hard you push and no matter what the priority, you can’t increase the speed of light.
This surprises a lot of ML people I talk to, but robotics hasn’t fully embraced ML the way that people at NIPS / ICLR / ICML have, in part because ML doesn’t always work. Machine learning is a solution, but it’s not guaranteed to make sense. The impression I got was that only a few people at ICRA actively wanted ML to fail. Everyone else is perfectly okay with using ML, once it proves itself. And in some domains, it has proved itself. Every perception paper I saw used CNNs in one way or another. But significantly fewer people were using deep learning for control, because that’s where things are more uncertain. It was good to hear comments from people who see deep learning as just a fad, even if I don’t agree.
Like ICLR, there were a lot of companies doing recruiting and hosting info booths. Unlike ICLR, these booths were a lot more fun to browse. Most companies brought one of their robots to demo, and robot demonstrations are always fun to watch. It’s certainly more interesting than listening to the standard recruiting spiels.
At last year’s NIPS, I noted that ML company booths were starting to remind me of Berkeley career fairs, in a bad way. Every tech company wants to hire Berkeley new grads, and in my last year, recruiting started to feel like an arms race on who can give out the best swag and best free food. It felt like the goal was to look like the coolest company possible, all without telling you what they’d actually hire you for. And the ML equivalent of this is to host increasingly elaborate parties at fancy bars. Robotics hasn’t gone as far yet. It’s growing, but not with as much hype.
I went to a few workshop talks where people talked about how they were using robotics in the real world, and they were all pretty interesting. Research conferences tend to focusing on discussing research and networking, which makes it easy to forget that research can have clear, immediate economic value. There was a Robots in Agriculture talk about using computer vision to detect weeds and spray weed killer on just the weeds, which sounds like all upside to me. Uses less weed killer, kills fewer crops, slows down growth of herbicide resistance.
Rodney Brooks had a nice talk along similar lines, where he talked about the things needed to turn robotics into a consumer product, using the Roomba as an example. According to him, when designing the Roomba, they started with the price, then then molded all the functionality towards that price. It turns out a couple hundred dollars gives you very little leeway for fancy sensors and hardware, which places tight limits on what you can do in on-device inference.
(His talk also had a rant criticizing HRI research, which seemed out of place, but it was certainly entertaining. For the curious, he complained about people using too much notation to hide simple ideas, large claims that weren’t justified by the sample sizes used in the papers, and researchers blaming humans for irrational behavior when they didn’t match the model’s predictions. I know very little about HRI, so I have no comment.)
Organization wise, it was really well run. The conference center was right next door to a printing place, so at registration time, the organizers said that if you emailed a PDF by a specific date, they would handle all ordering logistics. All you had to do was pay for your poster online and pick it up at the conference. All presentations were given at presentation pods, each of which came with a whiteboard and a shelf where you could put a laptop to play video (which is really important for robotics work).
Papers I noted to follow-up on later:
- Applying Asynchronous Deep Classification Network and Gaming Reinforcement Learning-Based Motion Planner to a Mobile Robot
- OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World
- Synthetically Trained Neural Networks for Learning Human-Readable Plans from Real-World Demonstrations
- Semantic Robot Programming for Goal-Directed Manipulation in Cluttered Scenes
- Interactive Perception: Leveraging Action in Perception and Perception in Action
I’ve played Magic: the Gathering on and off for nearly 15 years. It’s a great card game, with tons of depth. It’s only downside is that it can get pretty expensive. So when I heard Wizards of the Coast was working on a free-to-play version called MTG Arena, I signed up for the beta. I was lucky enough to get an invite, and the beta’s recently went out of NDA, so I figured I’d give some first impressions.
This is the first digital implementation of Magic I’ve ever played. So far, the implementation feels smooth. The animations add a bit without getting in the way, and rules-wise I’ve yet to hit any problems. I have run into issues with the auto-tap though. In one game, I was playing a UB control deck, and played one of my finishers with Cancel backup. I didn’t realize the auto-tap had tapped all my Islands until my opponents turn, and it made me lose. But I chalk that up to unfamiliarity with the interface. It was a one-off mistake and I haven’t lost like that since.
At times, I’ve also found it annoying to clear the stack when a lot of triggers happen at once, but I’m willing to accept that as a consequence of Magic’s rules engine. It doesn’t “pop” as much as Hearthstone does, but the core gameplay is a lot more interesting to me, and that’s what’s bringing me back.
The experience right after the NDA drop was pretty magical. Everyone’s account got wiped, and you can’t spend real money yet. Not only was everyone on a level playing field, no one could pay-to-win their way to a strong deck. The end result was like a massive, worldwide Sealed league. For the uninitiated, Sealed is a Magic format where every player gets 6 booster packs and builds a deck out of those cards. A Sealed league is a Sealed tournament that runs over several weeks. Every 1-2 weeks, players add an extra booster pack to their card pool.
Arena is working in almost exactly the same way, thanks to the slow unlock rate. And therein lies the problem. Most people have terrible decks, because it’s currently very difficult to build good ones, even if you spend a lot of time playing the game.
Now, I was planning on writing a post complaining about the economy, but then I read a really good post that covered everything I wanted to cover, and I realized I had nothing to add. Instead, I’ll share some points I realized.
I have a lot more respect for Hearthstone’s core economy. I don’t like where Hearthstone’s gameplay has gone, but the core dusting and crafting mechanics are well designed. 30 card decks with at most 2 copies of a card makes it easier to build a collection. The 3rd copy of every card can be disenchanted for free, and the 1st and 2nd copy can be disenchanted too if you don’t think that card will be useful in the future. In MTG Arena, I have to open 4 copies of a common before I can make progress towards the Vault, which is Arena’s equivalent of disenchanting.
The developers of MTG Arena said they decided against a disenchant system because it created feel-bad moments when people disenchanted cards they needed later. That’s true, but in its place they’ve created feel-bad moments when players open cards they don’t want, with little choice on how to turn them into cards they do want. I own several commons where I have 3 unplayed copies of the same card, and I can’t do anything with them.
At a broader level, I’ve started appreciating the fragility of things. The best part of my MTG Arena experience was at the beginning, when everything was new, people were still figuring out the meta, and draft chaff decks were competitive. Nothing about that environment was going to last, but I didn’t expect it to. In many ways it reminds me of the early years of the brony fandom. A ton of ridiculous stuff happened, and no one knew where the fandom was going, but just being on the ride was exciting. The first BronyCon must have been insane, because I doubt there was a good understanding for what a brony convention should aspire to be.
The fandom has cooled down since then. Content creators settled in. Conventions have become more like institutions. Season 3 didn’t help, given that it was disappointing compared to Seasons 1 and 2. The fandom’s still going - Season 8 premiered last week - but it’s condensed into something that’s lost a lot of its initial magic.
The question is whether people should have expected the brony fandom to keep its magic forever. On reflection, ponies were never going to stay as culturally visible as they were in 2011 or 2012. I feel a heavy part of the fandom’s growth was its unexpectedness. Very few people expected a reboot of My Little Pony to actually be good, and it was that surprise that pulled people in. Now that people know it’s a cartoon that people like, there’s less pressure to see what all the fuss is about.
There’s nothing wrong with that. Cultural touchstones come and go. But if your definition of “fandom” is calibrated to the peak insanity of that fandom, then everything afterwards is going to be a disappointment. I saw a Reddit post asking if research in deep learning was slowing down. I don’t think it is, but I do feel there have been fewer fundamental architecture shifts. There were a few years where every state-of-the-art ImageNet model introduced a new idea, and if you were watching the field at that time, the field would have looked ridiculously open. It’s a lot less ridiculous now.
I’m not a big fandom jumper. I tend to get into a few fandoms, and then stick with them for a long, long time. And for a while, I looked down on people who did jump fandoms. It felt like they were chasing the high of the new thing, that they were in love with the collective enthusiasm of fandom, instead of the work the fandom was based on. I didn’t see them as “true fans”, because as soon as the next new thing came around, they’d leave. I don’t look down on this behavior anymore. If that’s what people are looking for, who am I to judge?
It’s just that if the community does get worse, I don’t think it’s productive to complain about “the good old days.” Analyzing it or trying to fix it is fine, but I suspect that many communities start because of forces outside of their control. People get pulled in, those outside forces go away, and then when things change, people blame the community for “ruining things”, or “destroying the fandom”, instead of blaming the disappearance of the outside forces that made the community grow in the first place.
The thing that gets people into a community doesn’t have to be the thing that gets people to stay. There’s even a TVTropes page for this. If the community starts getting worse, maybe the problem isn’t the community. Maybe the problem is that you were pulled in by something that the community was never really about. And if you can’t change that, then the easiest thing to do is to leave with the memories of the good times you had.
I’ve been blown away by the reception to my most recent “Deep RL Doesn’t Work Yet” blog post. In retrospect, it was a perfect storm - I spent a lot of time on the post, it was about a popular subject, most people agreed with the post’s overarching message, and yet very few people had written about it.
To any new readers I might have: prepare to be disappointed. There is a good chance I’ll never write anything as good or as popular again. It’s a weird feeling, but in the interest of avoiding the Tough Act To Follow problem, I’m planning to write shorter posts, and to release them more frequently.
To close the thread on the RL blog post, I’ll be exploring this question: What’s the difference between a blog post and a research paper?
In some sense, the deep RL blog post could have been a paper. Topic-wise, it lies somewhere between a survey paper and a policy paper, and in principle, I could have tossed it on arXiv if I wanted to.
However, I do think it needed to be a blog post.
One reason was that I knew I wanted lots of videos. It is so, so much easier to explain the behavior of these algorithms if you can actually show videos of those behaviors. Papers have videos too, but they’re often marked as supplemental material, whereas I wanted them to be front and center.
Another was that I deliberately wanted to be more colloquial. If it wasn’t clear from the Futurama meme, the post was never trying to be formal. It’s not that formal writing is a bad writing style. It’s more that there’s a time and place for it, and I found it easier to make the points I wanted to make if I let the writing be looser.
Both of those reasons played a role in my decision to write a blog post instead of paper, but they’re also both fairly superficial. The most important reasons went a bit deeper.
This is going to sound pretentious, but the medium of writing affects expectations about that writing. Messaging apps encourage short sentences, whereas email encourages longer paragraphs, and that influences the kind of messages you can get across. In a similar vein, I feel that blog posts encourage stating opinions, whereas papers encourage stating truths. This might not make sense, so let me explain.
Both blog posts and papers argue their points by presenting evidence that support their points and explaining away evidence that refutes them. That’s practically the definition of writing. However, I feel that papers in particular are held to a high standard. People expect papers to be both careful and comprehensive. Whether the average paper does this is up for debate, but it’s certainly what people aim for.
If papers are supposed to be kernels of truth about the world, then it’s only natural that people expect high-quality arguments, where every claim is backed up with evidence. But the flip side of this is that it’s harder for papers to speculate. Increasing the burden of proof restricts what you can say. For particularly nebulous topics, it can be hard to reach that burden of proof.
In contrast, blog posts, keynote talks, and so on are much more free to be opinionated. People still expect your argument to be solid, but the burden of proof for “acceptable blog post” feels lower. That makes them well-suited for writing about topics that are inherently up for debate. Topics like, say, the state of a field, and where it is, and where it’s going.
At the time of writing the deep RL post, I knew there was a chance it would be controversial. And I was fine with that, as long as the post made it clear why I arrived at the conclusions I did. (It also helped that I was the only author on the post. That way, if people hated the post, at least they’d only hate me.)
As for the target audience: I specifically wrote the post towards people who either worked on deep RL, or had a lot of interest in deep RL. If you were in neither category and liked the post anyways, consider yourself lucky. The advantage of narrow targeting is that I was free to jump directly to the points I wanted to make.
I’m starting to believe that research papers are a shockingly inefficient way to communicate new ideas. When you’re new to a field, research papers are a dense yet rewarding gold mine. The introduction talks about a problem you didn’t even know existed. The related work is a treasure trove of papers to read next. The methods and experiment sections take time to work through, but if you read them closely enough, you’ll understand not just the idea of the paper, but also all the ideas the paper builds upon.
And that’s how it starts. Then you read another paper, and another one, and soon a pattern emerges. The introduction covers a problem you’ve known about for months. The related work section cites papers you’ve already heard about, and seems to exist just to convince other researchers the authors have seen their work. The methods section is filled with preamble and boilerplate you’ve seen a billion times. I swear, every RL paper has a paragraph like this:
Let and be the states and actions of a Markov decision process (MDP). Policy gives a distribution over actions, given state . A trajectory is a sequence (), and our objective is to learn a that maximizes reward.
Depending on the paper, it’ll either explain what Q-Learning is, or explain what policy gradient is. I was not in the mood to explain either in my RL post. So I decided to assume the reader already knew how they worked, and moved on.
Did you know there’s a paper for MDP notation? It exists just so that authors can use a single sentence for their notation, instead of writing paragraphs of it. I half-suspect this was created when someone was trying to figure out how to get their paper to fit within the page limit for one of the ML conferences.
The research paper format encourages the authors to be complete. That’s fine! I don’t think papers need to change. Papers are written for everyone, from the enthusiast to the new undergrad and the tenured professor. It’s just that very few people need everything that’s in the paper. These days, I usually read papers for their key ideas, and only read more closely if I’m trying to reproduce or extend its results. Once you strip out the problem definition, skip the careful qualification to prior work, and accept the intervening details on faith, the core idea is often just a few paragraphs.
Here is a paraphrased reviewer comment from a paper I worked on: “Your 2 minute video was better at explaining your work than the paper itself.” Isn’t that interesting? It’s technically wrong, since the video didn’t mention any of the implementation details. And at the same time, I found myself agreeing with the comment.
The RL blog post argues many things, but one thing it’s shown is that papers aren’t the only way to contribute to a field. Papers matter, but they’re not the only thing that matters. Yes, blog posts are one option, but there’s also open-sourcing code, advocating for best practices, creating tutorials for newcomers, and building better infrastructure.
I went to NIPS last year. Over the course of a week, 679 papers were presented. I read some of the posters. I only remember about a third of the ones I read. The thing I remember the most? Ali Rahimi’s Test of Time talk.
Food for thought.