Last updated November 14, 2016.

This is not a standard research page.

Due to confidence issues, I didn’t apply to PhD programs senior year. My more complete reasoning is in this blog post.

My current research area is transfer for robotics and deep reinforcement learning. Broadly, I’m interested in ways to use pretrained policies to train new policies with fewer samples. My current theory is that this requires automatically learning a shared structure/factor common to several tasks, and my current research is focused on ways to learn and utilize those shared factors.

The research I did in undergrad didn’t lead to publishable results, but I’m still proud of it. Hopefully you’ll find it neat.

Project list:

# Exploring Boosted Neural Nets for Rubik’s Cube Solving

Alex Irpan

Spring 2016. Final project for CS 281B, Advanced Topics in Decision Making

Poster (click for full size):

NIPS submission (rejected, and I never expected acceptance): PDF

Code: GitHub

Project where I trained neural nets mapping raw features to Rubik’s Cube moves. Started as a small side project, but grew over time, eventually turning into an experiment to see if AdaBoost reweighting while training could improve training time or accuracy. The short answer is that it did neither. For details, see the submission.

# Factored Q-Learning in Continuous State-Action Spaces

Alex Irpan, mentored by John Schulman

Fall 2015. Final project for CS 281A, Statistical Learning Theory

Poster (click for full size):

Informal Writeup: PDF

Code: BitBucket

The intention of this project was to add independence assumptions between different dimensions of the action space to decrease model size, with specific application to discretized continuous MDPs. Unfortunately I had difficulty getting Q-learning to work in small continuous MDPs, and there was no point moving forward until the small case worked. Experimental results from the poster are from a discrete MDP, the Atari game Asteroids.

Results were a wash, but I like some of the ideas here. I still want to test stochastic policies based on graphical models, but have too many other things on my plate.

# An Overview of Sublinear Machine Learning Algorithms

Alex Irpan*, Ronald Kwan* (worked equally)

Spring 2015. Final project for CS 270, Combinatorial Algorithms and Data Structures

Report: PDF

Summarizes algorithms for solving SDPs and learning perceptrons/SVMs in sublinear time, including their performance proogs. Each shares the same approach: describe the optimization problem as a max-min game, and sample an estimate of the gradient, which can be done in sublinear time. Multiplicative weights and concentration inequalities glues the proofs together.

# Integrating Monte Carlo Tree Search with Reinforcement Learning

Alex Irpan, mentored by John Schulman

Fall 2014 - Spring 2015

Code: BitBucket

My first research project. I experimented with integrating Monte Carlo Tree Search with Q-learning on the toy problem of Connect Four. The goal was to use MCTS for policy improvement. MCTS turns a bad rollout policy $\pi$ into a better one while producing reasonably accurate Q-values. Q-learning improves rollout policy $\pi$. MCTS on the improved $\pi$ gives more accurate Q-values, which can further improve $\pi$, and so on.

The goals I had were better implemented in this NIPS paper and in AlphaGo, so I don’t see much use in revisiting this.

# Presentation: Secure Function Evaluation with Garbled Circuits

Alex Irpan

Fall 2015. Final project for CS 276, Cryptography

Blog post: here

A presentation on Yao’s garbled circuits, the first solution to securely computing functions between two semi-honest parties. I have presentation notes, but they’re messy. The blog post is more refined and has more intuition.

# Presentation: Hiding Input Size in Two-Party Secure Computation

Alex Irpan

Spring 2016. Final project for CS 294, Secure Computation

Presentation Notes: PDF

Garbled circuits implicitly leak the input size of both parties. Can we do better? A paper by Lindell, Nissim, and Orlandi on hiding input size helps answer this question. Presented the construction for one case assuming fully homomorphic encryption, and impossibility proofs for other cases.