• Mystery Hunt 2020, Part 1

    For Mystery Hunt 2020, I hunted with teammate, the same team I hunted with last year. We ended up smashing all my expectations. By puzzle count, we were 2nd. By metas, we were 2nd to solve all metas. By Hunt finish time, we were 3rd, due to getting stuck on the last puzzle (more on that later). This was the first year where teammate got a call from HQ warning us that we were in contention for winning, and for much of Saturday, we were the lead team, which is a rather weird prospect to consider. I didn’t believe it until someone came back from trivia for Weakest Carouselink and said we played against Palindrome, indicating we unlocked the puzzle at the same time they did.

    Based on questions Left Out asked us during Hunt interactions, I don’t think many Mystery Hunt veterans know where teammate came from. Very briefly, teammate is a mix of people from Puzzlehunt CMU, some Bay Area puzzlers from Berkeley, and friends branching out from there. Before teammate, these people hunted with ✈✈✈ Galactic Trendsetters ✈✈✈, but one year they decided to split into their own team, and teammate and Galactic have been sister teams ever since.

    The two teams have similar team culture and age demographics. For the former, both are meme-heavy and very willing to backsolve. For the latter, the majority of both teams are younger than 30. I’m actually not sure if anyone on teammate is over 30, now that I think about it. This showed itself most strongly when Left Out came to deliver the Baby Shower Balloon. They started a clue with “BTS…” and three of us immediately guessed K-Pop. Then we got a question about a VH1 television show that showed music videos (Pop-Up Video), and our reaction was “What’s VH1?”

    Before Hunt, members of Galactic and teammate leadership were seriously considering merging the two teams. The downside would be a less fun Hunt, since we’d be way over the recommended limit of 70-80 people. The upside was that we’d have much better odds of winning. We didn’t merge, but based on the solve graphs, we clearly didn’t need to merge to be a contender for the coin this year. We knew Galactic wanted to be more competitive this year, and weren’t surprised they got to the coin first.

    * * *

    I felt this year’s Hunt was really good, in inventiveness of its structure and polish in its puzzles. However, for me it still feels a bit sour due to how the Hunt ended for us.

    On Friday night, we felt we were doing well. I stayed up until 6 AM, then slept for 4 hours and came back. On Saturday night, we knew we were doing well. We rightly guessed we had unlocked every round, we were making good progress on all the metas, and it started to look like if I went to bed, I would wake up after we finished Hunt. So I pulled an all-nighter, and I know other people pulled one as well. At 9 AM, we solved our last meta, and frantically sent someone to get the last penny.

    Every time you solved a round in this hunt, you got to pick up a pressed penny. Each penny had 3 arrows, 3 images, and 3 sentences of text. In short, it was definitely puzzle data, but I was expecting a shell metameta, and didn’t bother looking at them too closely. Additionally, our home base was a fairly long walk from Hunt HQ, and Left Out told us it would be fine if we picked the pennies up in batches, to avoid having to send someone there and back for each meta solve.

    We picked up the final penny, didn’t unlock a shell, and realized it was a pure metapuzzle, at which point we started looking at the pennies, and got horribly stuck. The Workshop isn’t on the hunt website yet, but the solution is described at wrap-up. We got the penny layout within the 1st hour, and were stuck on extraction for the rest.

    About 3 hours in, we had concluded two things. First, if we had been in the lead, there was no way we had a three hour lead over 2nd place. Second, the coin hadn’t been found yet. Assuming the reasonable-length runaround, that implied that we were one of several teams stuck on the same pennies puzzle, and Hunt would come down to whichever team got the break-in first. And, every other contender for the coin would have made the same inference, and were likely staring at the pennies just as intensely. That was great for motivation, but bad for anybody looking to sleep, because any moment could be the moment we got the a-ha.

    At 3 PM, Galactic found the coin. Our entire team had been looking at the Workshop pennies for 6 hours. We knew it was the only puzzle that mattered for Hunt completion, and we still couldn’t get it. Feels bad. As soon as the coin was found, Left Out called us, gave us some hints, and fastforwarded us through the endgame, which was both very fun and quite impressive.

    So, what happened? Word on the grapevine was that Workshop testsolved perfectly fine within Left Out, and they were surprised it took teams as long as it did to solve. In my opinion, part of what played into this was that we knew the pennies would be part of a puzzle, but we assumed we could look at it later, when we had complete information. By the time we actually looked at the pennies, we were very sleep-deprived due to rushing for the Cactus Canyon meta, so we weren’t as sharp. As time passed, we got increasingly burnt out on trying to figure out the same freaking pennies for hours.

    In many ways, this was similar to what happened to my team for Galactic Puzzle Hunt 2019. We put off learning Puflantu until right before looking at metas, under the logic that it’d be easier to learn it all at once after unlocking more artifacts. In practice, it turned the language learning into a big slog. What do you do, when there’s nothing to do but learn Puflantu, and Puflantu is hurting your brain?

    I’m trying to figure out why the clocks for April Fool’s Day Town from Mystery Hunt 2019 didn’t have this issue for us. It has a similar structure, where a small bit of information from every round gets pulled into one final puzzle. However, we solved that meta without too much pain in the final hours of Hunt, once we had enough prank answers. I think the crucial difference was that during the Hunt, we started looking at clocks early because there was no indication the clocks didn’t matter for the current town. We tracked them early, giving us all the data we needed to extract once Sunday came around. In contrast, by getting pennies at the end of a round, we got a signal that we didn’t need to look at them until the end. (And if they were part of a shell meta, looking at them early may have been wasted time, compared to solving other puzzles.)

    * * *

    I’ve been thinking about puzzlehunt design more. You may also have noticed that this post is labeled “Part 1”. These two facts are linked! I’m one of the organizers for My Little Pony: Puzzles Are Magic, a puzzlehunt that starts in about 10 days.

    Currently, I’m 50-50 between trying to get more sign-ups, and freaking out that people are going to hate it for not living up to expectations. I’ve been told this is a natural part of the creative process. I’m also still freaking out.

    Puzzles are Magic should be fine, but in the next few days, I’ll be spending more puzzle-time on making sure the hunt website can handle the load, and less time blogging about Mystery Hunt. When things are less busy, I’ll share specific puzzle stories.

    Last note: automated callbacks! I’m in favor of it. Somewhat selfishly, as teammate benefits from it a lot. I understand that some people like to hear a human voice now and then, and hunt organizers like how it lets them listen to teams celebrate a puzzle solve, but there’s no reason you can’t occasionally call teams for a check-in, or visit them in-person, while letting an automated system handle the majority of calls that are simply replies to puzzle guesses. It’s not a binary choice. Just saying.

    Oh, and congratulations to the new couple. It was very, very sweet.

  • The Berkeley TA Back Pay Settlement, Summarized

    I was last at Berkeley in Spring 2016, and it’s possible things have changed since then, but I’m aiming to represent the viewpoints as accurately as possible.

    On Tuesday, UAW Local 2865, the union representing TAs across the UC system, announced that UC Berkeley would pay $5 million in back pay to TAs. The story is getting picked up by a few places: The Chronicle of Higher Education, Inside Higher Ed, local Bay Area news outlets like The Mercury News and Santa Cruz Sentinel, and even some national outlets like Vice.

    In aggregate, these articles actually do a pretty good job of explaining the details, and a few of the different viewpoints, but as someone who TAed at Berkeley, in an 8 hour position, I’ve been feeling very conflicted.

    Why Does the University Owe Back Pay?

    At UC Berkeley, all TAs are paid an hourly wage. On top of this, all TAs who work at least 10 hours / week are entitled to childcare benefits and fee remission. The important part is the fee remission. If you TA for 10 hours a week or more, you are paid $7,500 for the semester on top of your hourly wage, which covers the in-state tuition for the semester.

    In the CS department, most TA positions used to be 10 hr / week or 20 hr / week positions. Starting around 2015-2016, many of these TA positions started turning into 8 hr / week appointments, making them ineligible for fee remission. Doing so let the department get more TA hours for the same amount of budget, since the pay that would have gone to fee remission gets turned into TA hours instead. They saw a welfare cliff, and decided to get as close as they could without going over.

    After this started spreading, the union filed a grievance against the University. Now, to be clear, nothing the University did was illegal. The contracts were clear that 8 hr / week positions were not eligible for fee remission, and the classes I helped teach made sure this was clear as well. However, the union argued that the University was effectively violating the spirit of the negotiated fee remission, by turning jobs that needed benefits into ones that didn’t. This is not a new practice. Companies have done this for a while, taking full-time jobs and turning them into jobs classified as contractors.

    The arbitrator ruled in favor of the union, and the University has agreed to cooperate with the decision.

    Why Did UC Berkeley Start Doing This?

    In the last few years, UC Berkeley has had a perpetual funding problem. This, plus exploding interest in CS courses, plus professor salaries rising due to competition from industry, combines to tons of strain on the CS department’s budget.

    In 2016, in light of protests by graduate student instructors (GSIs), there was a town hall to discuss the CS department’s budget, attended by professors, members of the union, and much of the CS department’s teaching staff, myself included. Throughout the town hall, the professors made it clear they supported the GSI protests, and would have hired TAs at 10 hr appointments if they had the funding for it. The Berkeley CS department does get some funding, but nowhere near enough to meet demand. The department does heavy outreach for donor support, using this to shore up the budget, but they don’t think it’s sustainable to rely on donors to the degree they are. They’ve repeatedly asked the Berkeley administration to give them more funding, and have consistently seen it go to non-academic areas, like athletics or more administrative jobs.

    One obvious solution was to restrict enrollment, instead of using this 8 hr TA loophole. However, CS enrollment is already insane. Some lower division courses literally have thousands of students. At the town hall, professors teaching these courses said they were happy to have the class be as large as possible, as long as there was TA support for it. At some point, the department decided that they’d rather have bigger classes than 10 hr TA appointments. My understanding was that they wanted this to be a one-time deal, but like the donor support, this trick became a normalized part of the budget.

    Were These 8 Hour TA Appointments Bad?

    It heavily depends on who you ask. Eight hour TA jobs were almost exclusively held by undergraduates, and in fact undergrads make up the majority of the CS department’s TA staff. This tends to surprise people, and can be interpreted as vaguely exploitative. Let me explain reasons it wasn’t.

    Undergrads started getting hired as TAs because Berkeley didn’t have enough grad student TAs to meet course demand. However, some professors found that undergraduate TAs did a better job than graduate TAs. For lower division courses, graduate and undergraduate students know the material equally well, but undergraduates actually took the course they were TAing. Grad students who learned it at different institutions were less familiar with how Berkeley taught the course.

    Additionally, some undergrad TAs would TA the same course several years in a row. This happened less with grad students, since after they met their TA requirements, they would switch to focusing on research. The increased continuity from undergrads made it easier to preserve course teaching culture and knowledge, which genuinely improved the quality of some classes.

    Finally, the increase in undergrad TAs was good for graduate school applications, since it gave more undergrads a connection to a professor who could eventually write them a letter of recommendation.

    From my point of view, these undergrad TA positions were a net positive for everyone involved.

    None of this is directly related to the union’s grievance. Fee remissions will be paid back to both undergrad TAs and graduate TAs. It is, however, indirectly related. One side effect of hiring many 8 hour TAs is that you have to hire more undergrads. More students got to hold TA positions, talk to professors, get letters of recommendation, and so on.

    You could argue this is Goodharting in action, since each professor gets less time to evaluate each TA. Maybe all it did was rubber-stamp more letters that said “this student helped me teach a course”, without actually saying anything useful for graduate school admissions. But in this instance, I don’t the incentives are entirely unaligned. Part of TAing is to help students practice teaching. I taught one section a week in my 8 hour TA appointment, while 20 hour TAs taught two. I’m sure I would have learned from the 2nd section per week, but the marginal benefit from 0th to 1st is much bigger than 1st to 2nd, and splitting the TA load across more people meant a lot more people got that 0th to 1st experience.

    My view is that the CS department set a bad precedent that wasn’t entirely bad. Eight hour TAs are not a good fit for the entire UC Berkeley campus, since it’s pretty clear that if it was universalized, no one would get fee remission or childcare benefits. However, for the unique situation the CS department was in, the outcome wasn’t terrible. As much as I’ve mentioned budget issues, the CS department has it pretty good, relative to other departments. Donors for CS are pretty rich, and I know several students who funded their education through tech company summer internships. Strong students could get $7,000 a month or more during the summer, often with a housing stipend, and that could cover in-state tuition, housing, and food until the next summer if you planned your budget right, even accounting for the insanity of Bay Area rent. Most departments do not have this luxury.

    The problem was that departments that didn’t have these luxuries would and were tempted to adopt similar policies, in a bid to fix budget problems of there own. Charts from UAW’s page show that the statistics department was starting to shift to the CS model. I think it’s good for the university to have 8 hour TA appointments go away, but I think it’s bad for the CS department to lose them.

    What’s Going to Happen Next?

    It’s still uncertain, but here’s my understanding based on discussion in the Berkeley Facebook student groups.

    First of all, existing TAs working less than 10 hours per week will continue to work the same amount, along with fee remissions. There is a contract negotiated by the union that prevents TAs from losing their jobs in the middle of the semester, so for now, this semester will play out as before.

    Starting next semester, TA hours are going to get more expensive. To minimize cost per hour, departments are incentivized to hire fewer TAs that work longer hours. Fee remission is a fixed cost paid once per student per semester, so you want that student to work as much as you can hire them for. In CS, this historically means 20 hours per week, but I’ve recently learned that Berkeley has 30 hour / week appointments in other departments, so it could go even higher. Fewer undergrad TAs will get to talk to professors, and fewer undergrads will sign up in the first place. If the only TA options were 20 hour appointments, I likely wouldn’t have taken any of them in my senior year, due to other time commitments.

    The administration will either need to allocate more TA budget, or CS class sizes will need to shrink. Historically, I’ve lost a lot of faith in the UC system and expect it to raise the budget by a token amount that doesn’t cover the shortfall. CS class enrollment was already effectively at capacity with the 8 hr / week loophole, so it has to drop. The math I saw was that four 8 hour TAs cost the same as one 20 hour TA. If the budget doesn’t increase, a shift to 20 hour TAs means 62.5% of the teaching hours as last semester. This is pretty crazy and I have no idea how they’ll even figure out enrollment.

    I’d like the union to negotiate higher pay per hour, in exchange for fee remissions, because one of the big lessons is that welfare cliffs can lead to bad consequences. If this happened, it would fix much of my issues with the current status quo, since professors could go back to offering many smaller TA appointments. However, it seems very unlikely the union will do this, and I’m not even a student anymore, so it’s not like I have much say in this.

    As with pretty much any story combining “UC Berkeley” with “budget”, it’s going to be a huge mess. Hopefully, this made it clearer why this decision was not a clear black-and-white victory for the workers, as much as some want to treat it that way.

  • What Size Should NeurIPS Be?

    Ostensibly, I’m on vacation. However, it’s raining, I have some inspiration, and I haven’t written a post in a while, so buckle up, here come some more machine learning opinions. I read some discussion about the size of NeurIPS, mostly around Andrey Kurenkov’s post at The Gradient, and wanted to weigh in.

    I’ve been to three NeurIPS: 2016, 2017, and 2019. So, no, I haven’t really been around that long. NeurIPS 2016 was my first academic conference ever, so I didn’t really know what to expect. By NeurIPS 2017, I’d been to a few and could confidently say that NeurIPS felt too big. By NeurIPS 2019, I was no longer sure NeurIPS was too big, even though it had over 60% more attendees than 2017.

    Before my first conference, I got some advice from senior researchers: if you aren’t skipping talks, you’re doing it wrong. I promptly ignored this advice and attended every talk I could, but now I get what they meant.

    Early on in your research career, it makes sense to go to talks. You know less about the field and you know fewer people. As you become more senior, it makes less sense to go to talks. It’s more likely you know a bit about the topic, and you know more people, so the value of talks go down compared to the research conversations you could have instead. Conference organizers know this. Ever wonder why there are so many coffee breaks, and why they’re all much longer than they’d need to be if people were just getting coffee? Important, valuable meetings are happening during those coffee breaks.

    In the limit, people attend conferences to meet up with the people they only see at conferences. As someone from the Bay Area, the running joke is that we travel halfway across the world to talk to people who live an hour’s drive away. It’s not that we don’t want to talk to each other, it’s that the conference environment provides a much lower activation energy to scheduling meetups, and it’s easier to have serendipitous run-ins with old friends if we’re all in the same venue.

    In this model of a research conference, all the posters, accepted papers, talks, and so on are background noise. They exist as the default option for people who don’t have plans, or who want a break from socializing. That default option is critically important to keeping everything going, but they’re not the point of the conference. The point of the conference is for everyone in the research community to gather at the same place at the same time. If you’ve been to fan conventions, it’s a very similar dynamic.

    If you take this model as true, then NeurIPS’s unofficial status as the biggest ML conference is incredibly important. If you could only go to one conference each year, you’d go to NeurIPS, because everyone else is going to go to NeurIPS.

    And if NeurIPS is the place to be, shouldn’t NeurIPS be as big as necessary?

    * * *

    Well, maybe. NeurIPS attendance is growing, but the growth is coming from different places.

    Year over year, NeurIPS has been growing way faster than any of the PhD programs that could be feeding into it. I would guess it’s growing faster than the undergrads and master’s students as well. If the growth isn’t coming from universities, it has to be coming from industry and the broader data science community - a community that is much larger and of a different makeup than the traditional ML research crowd.

    I said NeurIPS is about networking, but the question is, networking between who? It started as networking between researchers, because the makeup of attendees started as researchers. It’s been shifting ever since deep learning hype took off. It is increasingly likely that if you talk to a random attendee, they’ll be an ML enthusiast or someone working in an ML-related role at a big company, rather than someone in a PhD program.

    And I should be really, really clear here: that’s not necessarily a bad thing! But people in a PhD program have different priorities from people working at a big company, and that’s causing a culture clash.

    The size debate is just a proxy for the real debate about what NeurIPS should be. We’re in the middle of an Eternal September moment.

    Eternal September is a term I really wish more people knew about, so here’s the short version. There used to be this thing called Usenet, with its own etiquette and social norms. Every September, new students from colleges and universities would get access to Usenet, and they’d stir a fuss, but the influx was small enough for existing Usenet culture to absorb them without much change. Then, AOL opened Usenet access to anyone who wanted it. Usenet culture couldn’t integrate the firehose of interest, and it became known as the Eternal September. The original Usenet culture disappeared, in favor of whatever culture made sense for the new users.

    The parallels to NeurIPS are uncanny. A simple find-replace exactly describes what’s happening now, from the people saying NeurIPS is turning into a spectacle, to the people complaining they can’t buy tickets to a conference they really want to attend.

    Despite their foreboding name, Eternal Septembers are not inherently bad. They are what they are. But generally, they’re good for people trying to join, and bad for people that are already there and like what they have.

    So the real question is, who is NeurIPS for? Is it for the established researchers to talk shop? The newer researchers trying to present their work and build a career? The data scientist looking for new applications of ML research? Right now, it’s for all of them, and the organizers are doing their best to balance everyone’s interests correctly, which is an incredibly difficult job I wouldn’t wish on anyone. The one thing that seems clear to me is that a pure, academic-only NeurIPS untethered from industry is never going to happen. Machine learning is currently too economically viable for industry to stop caring about it. You don’t stop Eternal September. Eternal September is something that happens to you. The best you can do is nudge the final outcome the best you can.

    It’s a crazy solution, and I don’t know if it even makes sense, but maybe NeurIPS needs to be split in two. Have one act as the submission venue, where people submit and present their research, with heavier restrictions on who’s allowed to attend, and have the other act as the open-to-everyone conference, with the two co-located to encourage some crossover. If NeurIPS’s growing pains are caused by it trying to be something for everyone, then maybe we need to split NeurIPS’s responsibilities. Except, I don’t actually know what that means.

    I do believe that it’s something people should be thinking more about. So, consider this as a call to action. September approaches, and thinkpieces or blog posts aren’t going to change what happens when it does.