I was reading through some proofs from imitation learning, and realized they were reminding me of hybrid arguments from cryptography. It’s always nice to realize connections between fields, so I figure it was worth making a quick guide to how hybrid arguments work.
Hybrid arguments are a proof method, like proof by induction. Like induction, they aren’t always enough to solve the problem. Also like induction, the details differ on each problem, and filling in those details is the hardest part of each method.
The hybrid argument requires the following.
- We want to compare two objects and .
- There is a sequence of objects such that , , and the can be seen as an interpolation from to . Intuitively, as increases, slowly drifts from to .
- The difference between two adjacent in the interpolation is small.
For concreteness, let’s assume there’s a function and we’re trying to bound . Rewrite this difference as a telescoping series.
Every term in the sum cancels, except for the starting and the ending .
(Man, I love telescoping series. There’s something elegant about how it all cancels out. Although in this case, we’re adding more terms instead of removing them.)
This reduces bounding to bounding the sum of terms . Since the difference between adjacent is small, is at most times that small value. And that’s it! Really, there are only two tricks to the argument.
- Creating a sequence with small enough differences.
- Applying the telescoping trick to use those differences.
It’s very important that there’s both a reasonable interpolation and the distance between interpolated objects is small. Without both these points, the argument has no power.
This is all very fuzzy, so let’s make things more concrete. This problem comes from the DAGGER paper. (Side note: if you’re doing imitation learning, DAGGER is a bit old, and AGGREVATE or Generative Adversarial Imitation Learning may be better.)
We have an environment in which agents can act for timesteps. Let be the expert policy, and be our current policy. Let be the expected cost of policy . We want to prove that given the right assumptions, will be close to by the end of training.
This is done with hybrids. Define as the policy which follows for timesteps, then follows for the remaining timesteps. Note and . The telescoping trick gives
The only difference between and is that in the first, the expert takes over after steps, and in the second it takes over after steps. The paper then argues that as long as the environment has no key decision where a single wrong move can lead to death, the ability of the expert to correct after steps must be similar to its ability to correct after steps.
This shows why hybrids are useful. They let us break down reasoning over steps worth of differences to reasoning about differences of 1 step each.
A similar flavor of argument shows up a ton in crypto. Very often, we’re trying to replace a true source of randomness with something that’s pseudorandom, and we need to argue that security is still preserved. For example, we have PRNGs , and independently sampled seeds . Suppose we concatenated the inputs and outputs together to get the function
We want to show is still a PRNG.
Here, the hybrids are functions , where uses the first PRNGs and uses true randomness for the remaining blocks of bits. This makes truly random and . If the difference between and is small, ’s output is close to truly random, which would show is a PRNG. This leaves arguing that switching from to (switching the th block of bits from true random to ) doesn’t change things enough to break security.
Like with many things, hybrid arguments are something that you have to actually do to really understand. And I don’t have a library of hybrid problems off the top of my head. That being said, I think it’s useful to know what they are and roughly how they work. Proof methods are only as useful as your ability to recognize when they might apply, and it’s hard to recognize something if you don’t know it exists.
Whenever you have two objects and a reasonable interpolation between them, it’s worth thinking about whether you can bound the difference between adjacent terms. And whenever you know how to bound the difference between two similar objects, it’s worth thinking about whether you can build an appropriate sequence that lets you chain those differences into a conclusion about objects further apart.