Posts

Jan 20, 2017
MIT Mystery Hunt 2017

When ✈✈✈ Galactic Trendsetters ✈✈✈ did introductions this year, I realized with some surprise that this was my 5th time solving Mystery Hunt. Only my 2nd time flying in for Hunt though. Now that I’m a Real Adult with a Real Job, it’s harder to justify taking a few days off, but after five years I’ve hit a critical mass of MIT people I know and puzzle people I know.

If you want to avoid spoilers, you should stop reading now.

* * *

Let’s get some elephants out of the room first.

Was Hunt fun? Yes, absolutely. I loved the theme. The opening skit was great, the flavor was baked in all over the place, and overall it’s one of my favorite Hunts.

Was Hunt shorter than Setec Astronomy expected? Yes, absolutely.

Was having a short Hunt a bad thing? I’m not so sure. WHOOSH Galactic Trendsetters NIERRRRR got 4th this year, which is a lot better than before. From what I heard, the team got bigger, people got better at looking at metas early, and the metas were easier to solve than previous years. Combined, that meant we finished by midday Saturday. Turns out you can do things on MLK weekend besides solve puzzles. I played some board games and party games, had a nice dinner, and got to work on Facebook Hacker Cup Round 1 at a reasonable time, instead of 4 AM.

The aforementioned nice dinner, at Thelonious Monkfish.

A game from Jackbox Party Pack 3

Yes, I can see how a short Hunt could be disappointing. Some people on our team arrived Saturday morning, and by that point we only had two metapuzzles left. On the flip side, running out of puzzles is a competitive team problem. There were several teams who got to finish Hunt for the first time, and that’s really awesome.

Look, time estimation is hard. I should know, I contributed to the Sages hunt. I place no blame at all on Setec for a short Hunt. Setec wrote good puzzles. In fact, in some ways they were almost too good. To me, getting good at puzzles is less about noticing the trick, and more about learning how to solve from 60% of the data, 10% of which might be wrong. I can’t do it, but some people are eerily good at solving past errors.

Neo: What are you trying to tell me? That I can identify extractions?

Morpheus: No, Neo. I’m trying to tell you that when you’re ready, you won’t have to. [Because you’ll solve metas by pulling random letters out of the puzzle answers.]

Hunts get longer when you have to figure out incredibly obtuse extraction mechanisms, and we didn’t hit as many of those this year.

I’ll sum it up like this. The metas were clued through tons of flavortext, which made them easier to solve. That led to more backsolving constraints, which made it easier to solve puzzles. That sped up the unlocking process, and it all cascades from there. At some point we had three Quest rounds open with zero solves in each, just because we’d been solving metas and Character puzzles so quickly. The Chemist meta and Cleric meta were especially open to backsolving, and because character levels were the main unlock mechanism, we backsolved those very aggressively. (A bit too aggressively in fact. HQ got mad at us and stopped confirming our Cleric backsolves for an hour. We’re sorry!)

I did like how clear the character level unlock mechanism was. It made it easy to decide which puzzles to prioritize. We never would have done the scavenger hunt if people hadn’t insisted we unlock Wizard level 6 already. I didn’t work on it myself, but it led to a scroll of the Bee Movie script, which was a glorious sight.

I think it comes down to how much you were here for puzzles, and how much you were here for hanging out with people for puzzles. I came for a bit of both, and I got a bit of both, so I’m satisfied.

On to puzzle stories!

The first puzzle I was around all the way for was CHINA KILNS. The trick was really cute. We derped on assigning clues to answers properly, then we derped on solving the clue phrase, then we solved it.

By the time I looked at A Message From Our Sponsors, everything was IDed. Someone else got the aha. My main contribution was verifying the Poinsettia Bowl was indeed a thing.

When Pentoku came out, I loudly announced I wanted to work on a logic puzzle and people were free to join me. Then I left because about seven people started crowding around a blackboard, and I’m secretly not good at logic puzzles. Between the entire team, I’m pretty sure we solved each Pentoku twice, just because of communication issues between different rooms and the remote solvers.

I helped arrange the clues for A Lengthy Journey. This was shortly after we solved CHINA KILNS, so I was thinking we needed to make the answers form a cycle (each answer disambiguates a different clue, forming a cycle among clues.) We never realized that wasn’t what we were supposed to do. We ended Hunt with 9 puzzles unsolved, and this was one of them.

Fashionable Friends looked like fashion and so I skipped it. Someone pinged me about it later, saying it was a My Little Pony puzzle. I showed up and immediately identified the extraction mechanism, because of course cutie marks are going to be important in an MLP puzzle. I mean, duh! (It helps that last year’s MLP puzzle also used cutie mark extraction.)

I worked on Hamiltonian Path with a few other people. I let other people figure out the path constraints, and kept myself busy with identifying songs, because god damnit when else am I going to get to use my knowledge of Hamilton. I was very, very surprised that I could ID about 80% of the songs by memory, and only needed to look up lyrics for a few of them. Who knew 2-3 second snippets could be so distinctive? They were so distinctive that I didn’t even realize each clip had a number in it, which was really important for extraction. Oops. Anyways, after a very long time we got to GIVE US A VERSE, then got back a CD…and I realized I actually needed to make a team request for a laptop that had a CD player. Oh technology, you get obsoleted so quickly.

Mentally Stimulating Diversion was in fact a mentally stimulating diversion. It was a short one though, me and another person only spent about 40 minutes on it.

I did a dramatic reading of The Puzzle At The End of This Book around midnight on Friday. That was fun. We noticed all the encoded information during the dramatic reading, except for the encoding that gave SPECTER. I’m really disappointed we missed that mechanism, because it’s clearly the best one. (We did catch QALUPALIK, but it took us a while to believe that was a word.)

That was the last normal puzzle I looked at. After that I stared at metas for the rest of Hunt.

I don’t have much to say on endgame that wasn’t said at wrap-up. After solving all the metas, we scheduled a solo version of the Hungry Hungry Hippogriffs event (which didn’t look as fun as the full version), and a solo version of the Quizardry event.

The one cute thing that happened was that we asked to hear only the category on the final round of Quizardry, then tried to figure out the correct answer purely from knowing it needed to be transformable into another valid answer in that category. I suppose we unintentionally spoiled the endgame gimmick for ourselves, because endgame worked exactly the safe way - identify something in a category whose phrase satisfies a constraint we learned from before.

So long, and thanks for all the puzzles!

Comments
Nov 12, 2016
Introduction to the Hybrid Argument
I was reading through some proofs from imitation learning, and realized they were reminding me of hybrid arguments from cryptography. It’s always nice to realize connections between fields, so I figure it was worth making a quick guide to how hybrid arguments work.

***

Hybrid arguments are a proof method, like proof by induction. Like induction, they aren’t always enough to solve the problem. Also like induction, the details differ on each problem, and filling in those details is the hardest part of each method.

The hybrid argument requires the following.
- We want to compare two objects \(A\) and \(A'\).
- There is a sequence of objects \(A_0, A_1, \cdots, A_n\) such that \(A_0 = A\), \(A_n = A'\), and the \(A_i\) can be seen as an interpolation from \(A_0\) to \(A_n\). Intuitively, as \(i\) increases, \(A_i\) slowly drifts from \(A\) to \(A'\).
- The difference between two adjacent \(A_i\) in the interpolation is small.
For concreteness, let’s assume there’s a function \(f\) and we’re trying to bound \(f(A) - f(A')\). Rewrite this difference as a telescoping series.
\[f(A) - f(A') = f(A_0) - f(A_n) = \sum_{i=0}^{n-1} \left(f(A_i) - f(A_{i+1})\right)\]
Every term in the sum cancels, except for the starting \(f(A_0)\) and the ending \(-f(A_n)\).

(Man, I love telescoping series. There’s something elegant about how it all cancels out. Although in this case, we’re adding more terms instead of removing them.)

This reduces bounding \(f(A) - f(A')\) to bounding the sum of terms \(f(A_i) - f(A_{i+1})\). Since the difference between adjacent \(A_i\) is small, \(f(A) - f(A')\) is at most \(n\) times that small value. And that’s it! Really, there are only two tricks to the argument.
- Creating a sequence \(\{A_i\}\) with small enough differences.
- Applying the telescoping trick to use those differences.
It’s very important that there’s both a reasonable interpolation and the distance between interpolated objects is small. Without both these points, the argument has no power.

This is all very fuzzy, so let’s make things more concrete. This problem comes from the DAGGER paper. (Side note: if you’re doing imitation learning, DAGGER is a bit old, and AGGREVATE or Generative Adversarial Imitation Learning may be better.)

We have an environment in which agents can act for \(T\) timesteps. Let \(\pi_E\) be the expert policy, and \(\pi\) be our current policy. Let \(J(\pi)\) be the expected cost of policy \(\pi\). We want to prove that given the right assumptions, \(J(\pi)\) will be close to \(J(\pi_E)\) by the end of training.

This is done with hybrids. Define \(\pi_i\) as the policy which follows \(\pi\) for \(i\) timesteps, then follows \(\pi_E\) for the remaining \(T-i\) timesteps. Note \(\pi_0 = \pi_E\) and \(\pi_T = \pi\). The telescoping trick gives
\[J(\pi_E) - J(\pi) = J(\pi_0) - J(\pi_T) = \sum_{i=0}^{T-1} J(\pi_i) - J(\pi_{i+1})\]
The only difference between \(\pi_i\) and \(\pi_{i+1}\) is that in the first, the expert takes over after \(i\) steps, and in the second it takes over after \(i+1\) steps. The paper then argues that as long as the environment has no key decision where a single wrong move can lead to death, the ability of the expert to correct after \(i+1\) steps must be similar to its ability to correct after \(i\) steps.

This shows why hybrids are useful. They let us break down reasoning over \(n\) steps worth of differences to reasoning about \(n\) differences of 1 step each.

A similar flavor of argument shows up a ton in crypto. Very often, we’re trying to replace a true source of randomness with something that’s pseudorandom, and we need to argue that security is still preserved. For example, we have \(n\) PRNGs \(G_1,\cdots,G_n\), and \(n\) independently sampled seeds \(s_i\). Suppose we concatenated the \(n\) inputs and \(n\) outputs together to get the function
\[G'(s_1s_2\cdots s_n) = G_1(s_1)G_2(s_2)G_3(s_3)\cdots G_n(s_n)\]
We want to show \(G'\) is still a PRNG.

Here, the hybrids are functions \(H_i\), where \(H_i\) uses the first \(i\) PRNGs and uses true randomness for the remaining \(n-i\) blocks of bits. This makes \(H_0\) truly random and \(H_n = G'\). If the difference between \(H_0\) and \(H_n\) is small, \(G'\)’s output is close to truly random, which would show \(G'\) is a PRNG. This leaves arguing that switching from \(H_i\) to \(H_{i+1}\) (switching the \((i+1)\)th block of bits from true random to \(G_{i+1}\)) doesn’t change things enough to break security.

***

Like with many things, hybrid arguments are something that you have to actually do to really understand. And I don’t have a library of hybrid problems off the top of my head. That being said, I think it’s useful to know what they are and roughly how they work. Proof methods are only as useful as your ability to recognize when they might apply, and it’s hard to recognize something if you don’t know it exists.

Whenever you have two objects and a reasonable interpolation between them, it’s worth thinking about whether you can bound the difference between adjacent terms. And whenever you know how to bound the difference between two similar objects, it’s worth thinking about whether you can build an appropriate sequence that lets you chain those differences into a conclusion about objects further apart.
Comments
Nov 8, 2016
Asking The Right Questions: A Story of Failure
This is the story of how I didn’t get publishable results in time for a Nov 8th NIPS workshop deadline.

It’ll be light on details. One, I’m aiming for a broad audience. Two, I do need to be somewhat confidential about my work, because although my research is pretty open, I am working for a company. Most importantly, three: the details don’t matter for this story.

***

I first had my research idea in late September.

I was (and still am) generally interested in reinforcement learning, and more specifically ways to make RL more sample efficient. I had a few ideas that seemed like they had potential, so I decided to spend some time thinking it through. This happened right after a CFAR workshop, so I tried to make it as bulletproof as possible. What is the exact research question? How does my idea differ from state of the art, and why are those differences better. What is the fastest experiment that gives me useful data to decide what to do next. Assume something goes wrong - what’s the most likely failure point? Am I surprised if it fails, and if I’m not surprised, are there ways I can make it more surprising to fail that way?

Once I’m satisfied, I spent most of the next day writing it up into a research proposal. To my surprise, the feedback is positive.

This is both incredibly exciting and incredibly terrifying. It’s my idea, and multiple experienced researchers thinks it has promise. This sets off a ton of gut reactions.
- I own this idea, therefore the success of this idea is a measure of my research ability.
- I came up with this idea, but that doesn’t mean others can’t come up with the same idea. In fact, it is very likely another researcher has come up with the same idea. It’s a natural extension of existing work, and thoughts aren’t as novel as people think they are.
Believe whatever you want about these claims. They’re what I believe, on a structural level, and it leads to a simple conclusions.
- If I do not push on this idea right now, I’m going to get scooped, and I’ll continue to be a failure.
I should elaborate on the last point.

I haven’t been through a PhD program, or even a masters program. I did undergrad research for 2 years, then conned enough people into thinking I knew things about neural nets, and now I have an industry job that lets me do research. In all this time, I’ve never published a paper.

And now, I feel like the clock is ticking. Like I only got this position because my professor put a good word in, which works for now, but will stop working soon. If I want to keep doing research, I need to show, unequivocally, that I’m qualified. And that means publishing a paper, and getting it accepted into a top machine learning conference. Not all good research leads to papers, but publication is the best way to signal good research ability. If I can’t publish after years of undergrad research, how am I ever going to convince someone I can do deep learning research? There’s no shortage of people interested in deep learning. It’s up to me to grab the opportunities that have fallen into my lap and shove them as far as I can.

This mindset is really stressful, and I hate it, and I don’t want it to go away, because it feels like I wouldn’t be able to do anything without it.

Okay, now here’s the kicker. The next big conference deadline is ICLR, on November 4th. I have just over a month to go from idea to paper, if I want to hit that deadline, and if I miss it, the next big deadline is ICML in February.

Take all of that, and my first thought is holy shit, followed by holy shit.

Based on past history, the odds I get results in a month are low. I need to put in a ton of work just to have a chance.

But the odds are bigger than zero.

Alright. Let’s do this.

(A small part of me yells “Leeeeeeroy!” in reply, but I ignore it.)

***

October is a month of long hours. I took this job in part to avoid the horror stories I kept hearing about work-life balance in grad school, but around this time I realize that isn’t innate to grad school, it’s innate to research.

Like always, there are twists and turns, unexpected issues, the works. However, all things considered, progress is surprisingly smooth. Every week, I have more to show for my work.

Just one problem. The work isn’t coming together fast enough.

One week before the deadline, I make the call - I can’t hit ICLR. If I did a bunch of research pivots, and got really lucky, maaaaybe I could make it. But I wouldn’t be proud of that paper. I throw in the towel right before Halloween weekend, and use the commitment to feel better about spending my whole weekend visiting friends around the Bay.

Monday arrives faster than I expect, which is par for the course. Then, I hear a NIPS workshop got its deadline pushed from November 1st to November 8th. Standards for a workshop paper are lower. Four pages instead of eight. Preliminary results are more likely to get accepted.

If I start ramping up right now, do another week of late hours, and all of my experiments go well, I could do it. The odds are low. But just like last time, they’d be better than zero.

I declare bankruptcy on everything else. Self reflection time turns into coding time. Blogging turns into relaxation to prepare for another long day at work. Meta-level goals for improving my workflow get thrown out. I cancel my meetings, I throw out code quality, I ignore best practice. There are hundreds of lines of copy-pasted code that continue to haunt me, and the worst part is that breaking all those pesky software engineering rules was the right call, because I didn’t have time to do it right. There is only the deadline, there is only the potential paper, and in the face of that idol everything else fades away.

***

It’s Friday. It’s not looking good.

I share my results with my research mentor, and he thinks it isn’t worth writing a paper. I agree. This paper was always going to be a sell of tentative work instead of a presentation of compelling work, but there’s no way to sell these learning curves.

So I give up. I slot everything I threw out back in. Now that I have more time to think about how my week went, I decide to think it through one more time, just for kicks.

“Let’s assume I made a paper in three days time. What has to happen?

“That can’t happen. I’m rate-limited, my experiments take three days to run and I need to run at least two of them, sequentially. Six days can’t fit in three days. It’s actually impossible, unless I can magically make my experiments run several times faster.

“Hang on a second.”

Within ten minutes, I discover four quick changes that let me run experiments ten times faster.

Do you understand how idiotic I felt in that moment? The realization that I could have asked the same questions three weeks ago and discovered the same answers? And that if I had, my odds of hitting a paper would have skyrocketed?

If you’ve read Methods of Rationality, I had a real-life Final Exam moment. It felt like I was somebody who was pretending to consider all possibilities, but I was secretly still in a mental box. I only broke out of that box when staying in the box made success not just virtually impossible, but actually impossible, and it’s only after exiting that box that I’m able to wonder why I was inside it in the first place.

***

If you were expecting this story to have a happy ending, I’m sorry to disappoint. Like I said, this is a story about how I didn’t get results in time.

I implemented all the changes I come up with, checked the results over the weekend, and they’re still bad. Now it’s truly impossible. But because it worked so well last time, I think it through anyways. This time, I decide it actually is impossible, and that’s that. Here we are.

I have some regret, but given the odds I gave myself, I’m okay with not making the deadline. The work I did was all relevant to my research, and I know what parts I want to keep building on.

Still, I wonder where I would be if I didn’t declare bankruptcy so hard, and let myself continually question my research trajectory, instead of doubling down on it and paying the price when I failed.

Now that I have time to do meta-level planning again, I sit down to think.

“It’s okay to fail. It isn’t okay to fail in as stupid a way as I did. What can I do to make sure I never fail in this way again?”

The answer comes almost by reflex.

“Write what you learned, and share it with others. It’s the only way anything manages to stick in your brain. Make the idea unforgettable, because you’ve just realized the importance of Hamming questions at a visceral level, and shifts that strong don’t come every day, maybe not even every year. You need to capture the insight before it flies away. So write, write, write.”

I do just that. Not too much editing, because I want it done quickly. I know I won’t be able to convey all the feelings I want because the barrier between reading something and living it is huge. But I do it anyways, because I need to do it for me.

In total, I write 1750 words. (That includes the sentences you’re reading right now.) I already don’t like this post, but I think I’ve conveyed the sentiment to myself well enough that I’m okay releasing it as is.

There will certainly be other failures in my life. But if I get a say, they won’t be similar this one.
Comments

← Older posts Newer posts →

Posts

MIT Mystery Hunt 2017

Introduction to the Hybrid Argument

Asking The Right Questions: A Story of Failure