• Introduction to the Hybrid Argument

    I was reading through some proofs from imitation learning, and realized they were reminding me of hybrid arguments from cryptography. It’s always nice to realize connections between fields, so I figure it was worth making a quick guide to how hybrid arguments work.


    Hybrid arguments are a proof method, like proof by induction. Like induction, they aren’t always enough to solve the problem. Also like induction, the details differ on each problem, and filling in those details is the hardest part of each method.

    The hybrid argument requires the following.

    • We want to compare two objects and .
    • There is a sequence of objects such that , , and the can be seen as an interpolation from to . Intuitively, as increases, slowly drifts from to .
    • The difference between two adjacent in the interpolation is small.

    For concreteness, let’s assume there’s a function and we’re trying to bound . Rewrite this difference as a telescoping series.

    Every term in the sum cancels, except for the starting and the ending .

    (Man, I love telescoping series. There’s something elegant about how it all cancels out. Although in this case, we’re adding more terms instead of removing them.)

    This reduces bounding to bounding the sum of terms . Since the difference between adjacent is small, is at most times that small value. And that’s it! Really, there are only two tricks to the argument.

    • Creating a sequence with small enough differences.
    • Applying the telescoping trick to use those differences.

    It’s very important that there’s both a reasonable interpolation and the distance between interpolated objects is small. Without both these points, the argument has no power.

    You have no power here

    This is all very fuzzy, so let’s make things more concrete. This problem comes from the DAGGER paper. (Side note: if you’re doing imitation learning, DAGGER is a bit old, and AGGREVATE or Generative Adversarial Imitation Learning may be better.)

    We have an environment in which agents can act for timesteps. Let be the expert policy, and be our current policy. Let be the expected cost of policy . We want to prove that given the right assumptions, will be close to by the end of training.

    This is done with hybrids. Define as the policy which follows for timesteps, then follows for the remaining timesteps. Note and . The telescoping trick gives

    The only difference between and is that in the first, the expert takes over after steps, and in the second it takes over after steps. The paper then argues that as long as the environment has no key decision where a single wrong move can lead to death, the ability of the expert to correct after steps must be similar to its ability to correct after steps.

    This shows why hybrids are useful. They let us break down reasoning over steps worth of differences to reasoning about differences of 1 step each.

    A similar flavor of argument shows up a ton in crypto. Very often, we’re trying to replace a true source of randomness with something that’s pseudorandom, and we need to argue that security is still preserved. For example, we have PRNGs , and independently sampled seeds . Suppose we concatenated the inputs and outputs together to get the function

    We want to show is still a PRNG.

    Here, the hybrids are functions , where uses the first PRNGs and uses true randomness for the remaining blocks of bits. This makes truly random and . If the difference between and is small, ’s output is close to truly random, which would show is a PRNG. This leaves arguing that switching from to (switching the th block of bits from true random to ) doesn’t change things enough to break security.


    Like with many things, hybrid arguments are something that you have to actually do to really understand. And I don’t have a library of hybrid problems off the top of my head. That being said, I think it’s useful to know what they are and roughly how they work. Proof methods are only as useful as your ability to recognize when they might apply, and it’s hard to recognize something if you don’t know it exists.

    Whenever you have two objects and a reasonable interpolation between them, it’s worth thinking about whether you can bound the difference between adjacent terms. And whenever you know how to bound the difference between two similar objects, it’s worth thinking about whether you can build an appropriate sequence that lets you chain those differences into a conclusion about objects further apart.

  • Asking The Right Questions: A Story of Failure

    This is the story of how I didn’t get publishable results in time for a Nov 8th NIPS workshop deadline.

    It’ll be light on details. One, I’m aiming for a broad audience. Two, I do need to be somewhat confidential about my work, because although my research is pretty open, I am working for a company. Most importantly, three: the details don’t matter for this story.


    I first had my research idea in late September.

    I was (and still am) generally interested in reinforcement learning, and more specifically ways to make RL more sample efficient. I had a few ideas that seemed like they had potential, so I decided to spend some time thinking it through. This happened right after a CFAR workshop, so I tried to make it as bulletproof as possible. What is the exact research question? How does my idea differ from state of the art, and why are those differences better. What is the fastest experiment that gives me useful data to decide what to do next. Assume something goes wrong - what’s the most likely failure point? Am I surprised if it fails, and if I’m not surprised, are there ways I can make it more surprising to fail that way?

    Once I’m satisfied, I spent most of the next day writing it up into a research proposal. To my surprise, the feedback is positive.

    This is both incredibly exciting and incredibly terrifying. It’s my idea, and multiple experienced researchers thinks it has promise. This sets off a ton of gut reactions.

    • I own this idea, therefore the success of this idea is a measure of my research ability.
    • I came up with this idea, but that doesn’t mean others can’t come up with the same idea. In fact, it is very likely another researcher has come up with the same idea. It’s a natural extension of existing work, and thoughts aren’t as novel as people think they are.

    Believe whatever you want about these claims. They’re what I believe, on a structural level, and it leads to a simple conclusions.

    • If I do not push on this idea right now, I’m going to get scooped, and I’ll continue to be a failure.

    I should elaborate on the last point.

    I haven’t been through a PhD program, or even a masters program. I did undergrad research for 2 years, then conned enough people into thinking I knew things about neural nets, and now I have an industry job that lets me do research. In all this time, I’ve never published a paper.

    And now, I feel like the clock is ticking. Like I only got this position because my professor put a good word in, which works for now, but will stop working soon. If I want to keep doing research, I need to show, unequivocally, that I’m qualified. And that means publishing a paper, and getting it accepted into a top machine learning conference. Not all good research leads to papers, but publication is the best way to signal good research ability. If I can’t publish after years of undergrad research, how am I ever going to convince someone I can do deep learning research? There’s no shortage of people interested in deep learning. It’s up to me to grab the opportunities that have fallen into my lap and shove them as far as I can.

    This mindset is really stressful, and I hate it, and I don’t want it to go away, because it feels like I wouldn’t be able to do anything without it.

    Okay, now here’s the kicker. The next big conference deadline is ICLR, on November 4th. I have just over a month to go from idea to paper, if I want to hit that deadline, and if I miss it, the next big deadline is ICML in February.

    Take all of that, and my first thought is holy shit, followed by holy shit.

    Based on past history, the odds I get results in a month are low. I need to put in a ton of work just to have a chance.

    But the odds are bigger than zero.

    Alright. Let’s do this.

    (A small part of me yells “Leeeeeeroy!” in reply, but I ignore it.)


    October is a month of long hours. I took this job in part to avoid the horror stories I kept hearing about work-life balance in grad school, but around this time I realize that isn’t innate to grad school, it’s innate to research.

    Like always, there are twists and turns, unexpected issues, the works. However, all things considered, progress is surprisingly smooth. Every week, I have more to show for my work.

    Just one problem. The work isn’t coming together fast enough.

    One week before the deadline, I make the call - I can’t hit ICLR. If I did a bunch of research pivots, and got really lucky, maaaaybe I could make it. But I wouldn’t be proud of that paper. I throw in the towel right before Halloween weekend, and use the commitment to feel better about spending my whole weekend visiting friends around the Bay.

    Monday arrives faster than I expect, which is par for the course. Then, I hear a NIPS workshop got its deadline pushed from November 1st to November 8th. Standards for a workshop paper are lower. Four pages instead of eight. Preliminary results are more likely to get accepted.

    If I start ramping up right now, do another week of late hours, and all of my experiments go well, I could do it. The odds are low. But just like last time, they’d be better than zero.

    I declare bankruptcy on everything else. Self reflection time turns into coding time. Blogging turns into relaxation to prepare for another long day at work. Meta-level goals for improving my workflow get thrown out. I cancel my meetings, I throw out code quality, I ignore best practice. There are hundreds of lines of copy-pasted code that continue to haunt me, and the worst part is that breaking all those pesky software engineering rules was the right call, because I didn’t have time to do it right. There is only the deadline, there is only the potential paper, and in the face of that idol everything else fades away.


    It’s Friday. It’s not looking good.

    I share my results with my research mentor, and he thinks it isn’t worth writing a paper. I agree. This paper was always going to be a sell of tentative work instead of a presentation of compelling work, but there’s no way to sell these learning curves.

    So I give up. I slot everything I threw out back in. Now that I have more time to think about how my week went, I decide to think it through one more time, just for kicks.

    “Let’s assume I made a paper in three days time. What has to happen?

    “That can’t happen. I’m rate-limited, my experiments take three days to run and I need to run at least two of them, sequentially. Six days can’t fit in three days. It’s actually impossible, unless I can magically make my experiments run several times faster.

    “Hang on a second.”

    Within ten minutes, I discover four quick changes that let me run experiments ten times faster.

    Do you understand how idiotic I felt in that moment? The realization that I could have asked the same questions three weeks ago and discovered the same answers? And that if I had, my odds of hitting a paper would have skyrocketed?

    If you’ve read Methods of Rationality, I had a real-life Final Exam moment. It felt like I was somebody who was pretending to consider all possibilities, but I was secretly still in a mental box. I only broke out of that box when staying in the box made success not just virtually impossible, but actually impossible, and it’s only after exiting that box that I’m able to wonder why I was inside it in the first place.


    If you were expecting this story to have a happy ending, I’m sorry to disappoint. Like I said, this is a story about how I didn’t get results in time.

    I implemented all the changes I come up with, checked the results over the weekend, and they’re still bad. Now it’s truly impossible. But because it worked so well last time, I think it through anyways. This time, I decide it actually is impossible, and that’s that. Here we are.

    I have some regret, but given the odds I gave myself, I’m okay with not making the deadline. The work I did was all relevant to my research, and I know what parts I want to keep building on.

    Still, I wonder where I would be if I didn’t declare bankruptcy so hard, and let myself continually question my research trajectory, instead of doubling down on it and paying the price when I failed.

    Now that I have time to do meta-level planning again, I sit down to think.

    “It’s okay to fail. It isn’t okay to fail in as stupid a way as I did. What can I do to make sure I never fail in this way again?”

    The answer comes almost by reflex.

    “Write what you learned, and share it with others. It’s the only way anything manages to stick in your brain. Make the idea unforgettable, because you’ve just realized the importance of Hamming questions at a visceral level, and shifts that strong don’t come every day, maybe not even every year. You need to capture the insight before it flies away. So write, write, write.”

    I do just that. Not too much editing, because I want it done quickly. I know I won’t be able to convey all the feelings I want because the barrier between reading something and living it is huge. But I do it anyways, because I need to do it for me.

    In total, I write 1750 words. (That includes the sentences you’re reading right now.) I already don’t like this post, but I think I’ve conveyed the sentiment to myself well enough that I’m okay releasing it as is.

    There will certainly be other failures in my life. But if I get a say, they won’t be similar this one.

  • My Understanding of Politics

    Let me start by saying I’m incredibly underqualified to discuss anything about politics.

    On the other hand, lots of people are underqualified, and that doesn’t stop them from getting into flame wars on the Internet. And what, my understanding of politics is going to get better if I avoid thinking about it? That’s literally the opposite of what you should be doing if you want to understand politics.

    So here’s how this post is going to go. I explain how I view politics and politicians. And that’s it. No follow-up. I wait for people to object to the stupider things I say, and then I don’t reply to their objections unless I want to.

    Is this going to make me a political genius? No. Am I looking to become a political genius? No. All I want is to be less of a political idiot. This seems like a good way to get some easy gains in that direction.

    Here we go.

    The Two Rules

    I have two golden rules about politics.

    1. Politicians are motivated by two desires: power for the sake of power, and power for the sake of making the world a better place.
    2. No one gets everything they want, which turns politics into a game of compromise and quid pro quo.

    Let’s break it down.

    The Power to Change the World

    Most politicians want power. I don’t think this is a controversial claim. Try saying the reverse. “Most politicians don’t want power.” That sounds resoundingly false to me.

    What they want power for? Some people want power for the feeling of dominance over others. The euphoric feeling they get, when they tell people what to do, and it actually happens the way they asked it to get done.

    Power is my mistress. I have worked too hard at her conquest to allow anyone to take her away from me.


    I know some political scientists claim that having power is the only desire of politicians. That’s not entirely wrong, and it’s also not entirely true. If someone wanted only power and nothing else, they’d probably have an easier time in the private sector. (I’ll freely admit this is just my intuition talking.)

    That leads to the second part of the rule. When I look at a politician, I see somebody who believes world order is broken. When someone runs for office, their entire thesis is that they’d be a better leader than the current one. So of course they want power - the people in power are doing it wrong!

    To place it in narrative terms, it’s the difference between a hero who became a hero for fame, and a hero who became a hero because they believed no one else was going to rise to the occasion.

    It’s somewhat icky to admit out loud, but sometimes the first step to improving the world is a naked grab for power. If you have influence, it makes it much easier to convince people to act the way you think they ought to.

    And this should go without saying, but all politicians are genuinely trying to make the world a better place. Including the ones that disagree with you. Politicians aren’t trolls. They don’t disagree with you solely to piss you off. They disagree with you because their better world looks different from your better world.

    That being said, how much this altruistic desire matters is dependent on the politician. A town mayor is more likely to care about helping people. A senator is closer towards the end of power for power’s sake. Someone running for president almost certainly has power as their primary goal. Only a very hungry sort of person would look at the presidency, look at the incredible stress is causes, consider the increased odds of death by assassination, understand that a large part of the world is going to hate them with undying passion, and conclude that yeah, they want that.

    Obama before and after presidency

    (As a corollary, I’m not bothered when people attack politicians for being opportunists. I expect this of them. Whether they do so discreetly or openly doesn’t influence my opinion by much.)

    Even the most cynical interpretation of politics doesn’t sound that bad to me. When I was an intern, the city mayor visited our office, to talk about the role of technology in government. During the Q&A session, an audience member asked how politicians could justify switching positions so often. The mayor said that of course politicians are going to switch positions to match what votes want to do. Otherwise, they wouldn’t be representing their constituents.

    I saw his reply as a positive spin on this: even politicians motivated only by power have to follow the will of the voters.

    If a majority of the country agrees that interracial marriage should be legal, and a politician says they don’t approve of it, they’re going to lose their next election, badly. And therefore every politician agrees that interracial marriage is okay, despite the fact that as recently as 2013, thirteen percent of Americans did not approve of marriage between blacks and whites. I’m guessing the fraction is lower among members of Congress, but there are over 500 people in Congress. I think it’s reasonable to assume at least one privately disagrees with interracial marriage, but goes along with it publicly.

    At some point, there’s no difference between pandering to racists and being a racist. And similarly, at some point there’s no difference between pandering to equal rights activists and believing in equal rights.

    Votes are the means by which the public keeps a politician’s desire for power aligned with the public’s interests. Unfortunately, what’s best for the nation isn’t always what the nation wants, but that’s the world we live in. Someone who doesn’t pander for votes is going to lose their election to someone who does. And remember, the whole reason people go into politics is because they believe they’ll do a better job than everyone else running.

    It does bother me that politicians may be saying things they don’t believe in, but considering the other upsides of democracy, I’m not too upset about the current state of affairs.

    “How About a Nice Game of Politics?”

    Take a bunch of people, all interested in advancing their own agenda. Then throw them into a room, and tell them they have to agree on something.

    That’s what Congress feels like to me. Hence, the second rule of politics: no one gets everything they want.

    The president has a lot of power, but the president isn’t a dictator. Similarly, a congressman or congresswoman is just one person in the House of Representatives. Governments are huge systems of incredible complexity. All things considered, it’s surprising they actually work.

    To advance their pet issues, politicians have to balance their personal convictions with the realities of their situation. People who stick to their guns and refuse to make concessions tend not to get anything done.

    You got more than you gave.

    And I wanted what I got.
    When you got skin in the game, you stay in the game.
    But you don’t get a win unless you play in the game.

    (Hamilton, “The Room Where It Happens”)

    That’s not saying stubbornness doesn’t have its place. Weaponized stubbornness can be a powerful tool. Consider the Obama administration. The broad narrative I heard from my decidedly Democratic bubble was that Republicans blocked Democratic legislation by being obstinate.

    If you ally yourself with the blue tribe, sure, that sucks. But think about the broader play. Republicans stall the legislative process, making voters disillusioned by Democratic leadership. That gives ammunition for Republicans to use in upcoming midterm elections, which already tend to swing against the president’s party.

    The Democrats know all of this, but the clock is still ticking on this power play. They now have two options. Make huge concessions that get their bills through (at the cost of the bills’ original spirits), or gamble that painting Republicans as do-nothings will be enough come election time.

    Which move was right? Or were there other moves? I don’t know. The point is that politics is messy. You don’t get into Congress without some level of shrewdness. Backroom deals are all a part of the game.

    You can try to rise above the game, proclaiming that governments should hold themselves to higher ideals. Then you lose to the people who compromised their principles and accepted money from lobbyists.

    Politicians have to learn how to navigate this mess, or else they’re not the leader they promised voters they were going to be.

    That means quid pro quo. It means advocating for something you don’t believe in, because you were promised support for the bill they care about. Accepting money from lobbyists even if you don’t want to, because you’re stuck in a political prisoner’s dilemma. Over time, politicians become exactly the kind of person they hated as teenagers.

    In short, politics is where ideals go to die.

    The great politicians are not the ones with the purest hearts. The great politicians are the ones that recognize the system, dive into it with disdain, and surface with the general thrust of the bill they wanted to pass.

    What Guides My Vote?

    Given this lens, what makes me decide I should vote for a specific politician?

    My main goal is to support politicians who publicly support causes that match my belief. They could be lying, but guessing what they believe behind their claims is a road towards madness. It’s easier for me to trust what they say. And if they don’t actually believe in their cause, who cares? They have to support it to get votes, and if they didn’t try to support it after getting elected, they’re not getting my vote again.

    My secondary goal is to support politicians that hold council and deeply consider the best solutions to issues. Politicians are trying to make the world a better place, but they’re not going to be experts in everything. Good politicians stay informed by surrounding themselves with experts and asking them questions. Great politicians actually change their position after talking to their experts.

    (This is why I’ve never been very bothered by attacks based around politicians flip-flopping. Why should I care that they voted against something four years ago, if they’re voting for it now? Is it really that unreasonable that they changed their position?)

    Finally, I support politicians who look like they’ll actually get things done. I’m much fonder of proposals that push the status quo by a bit instead of a lot. It’s hard enough to pass a bill. Proposing a radical bill that most of the legislature hates sounds pointless. All it does is proclaim you have strange beliefs. That’s interesting, but it doesn’t lead to actual results.


    Well, there you have it. My broad views on politics. I’d like to think that it’s not totally wrong, but we’ll see how it goes.

    If you had any objections, I’d appreciate it if you could comment. And if you agreed with what I said, I’d also appreciate comments, to get confirmation that I’m on the right track. I suspect readers of this post in particular will be more politically aware than me, and if I expect politicians to evolve their policy by asking the advice of experts, I should ask the same of myself.