• Puzzlehunting 201

    I’m good at puzzlehunts. I thought about equivocating this, but I’ve been on a winning MIT Mystery Hunt team twice and recently was on the 2nd place team for the Who Killed Ickey? treasure hunt. These were definitely team efforts, but I know I made significant contributions to both.

    Those hunts cover the spectrum between puzzlehunts where the bottleneck is break-ins (Mystery Hunt), and ones where it’s more about pure execution speed (Ickey). I’ve noticed there isn’t a lot of advanced puzzlehunting content on the Internet, and I’ve been doing hunts for well over a decade, so I decided to write up some of the habits I’ve learned along the way.

    If you are new to puzzlehunts, you are not the target audience of this post. I mean, feel free to keep reading! But I will be assuming a bunch of background knowledge and won’t be trying to explain what a puzzle is or what a meta is and so forth. If you’re new, I recommend Introduction to Puzzlehunts instead.

    If you’ve done many puzzlehunts, not everything I say here will work for you. Puzzlehunting systems are a little like productivity systems: everyone’s is different, and people can be very opinionated about them even if they all share common themes. Your best bet is to try things out, take the parts that work, and ignore the parts that don’t.

    You don’t have to get faster at puzzles. I like going fast because I’ve solved enough puzzles that I want to quickly get through grunt work to see the next a-ha. To a lesser extent, I find it fun to see how much information I can shortcut when solving a puzzle. That’s not a universal opinion. Most people will have more fun if they solve puzzles than if they don’t, but you don’t have to solve puzzles quickly to have fun.

    This will spoil a few puzzles. I’ve tried to keep the spoilers as light as needed, but it’s much easier to give real-life examples than to construct new puzzles just for the sake of this post. Then again, imagine if this post was written entirely to seed content for an upcoming puzzlehunt. Boy would that be interesting! Seriously though, it’s not. You’re welcome to look, but you won’t find anything.

    The One Minute Version

    Puzzle solving can be divided into generating ideas for how a puzzle works, and executing well on those ideas.

    To generate more ideas, solve more puzzles to see what’s shown up in past puzzles, and try similar things.

    To execute better, practice by solving more puzzles.

    Wait, This Seems Too Simple

    Nah, that’s pretty much it.

    The very common story is that if you want to get good at a skill, just do it a bunch. Like, go solve 100 puzzles. No, I don’t mean 10, or 20. Stop reading and go solve literally 100 puzzles. You will learn things. You will get better. It’ll just happen. It’s part of the magic of the human brain.

    This is a very specific anecdote, but I play a platforming game called Dustforce. There’s one very hard level in the game which has a part called the Tera section. It’s almost exactly in the middle of the level, and is easily the most difficult part to clear.

    I complained about this in the Discord and was told that yes, it’s awful, but was then directed to “100 Teras”, a custom map that’s just 100 copies of the Tera section with a checkpoint between each one. “If you’re really having trouble, go beat 100 Teras”. So I did, twice. And then I stopped complaining.

    If you go solve 100 puzzles, you may not need this post. I’m still going to explain the solving strategies I’ve learned, but puzzle solving is really an activity where you learn by doing.

    Okay, on to the real post.

    Generating Ideas

    Puzzles are often described as having “a-has” or break-ins, where it suddenly becomes clear what you should do. How do you come up with a-has? How do you break in?

    This is genuinely the hardest part to give advice about. If it was easy to break in, we wouldn’t have puzzles in the first place! So I’ve decided to interpret the question differently: how do you create conditions that cause good ideas to arise? And how do you decide if those ideas are actually good?

    Use spreadsheets

    Collaboratively editable spreadsheets (typically Google Sheets) are the lingua fraca of puzzlehunting. Please use them.

    Keep clean spreadsheets

    We’re going to immediately start with a point of contention, because what “clean” means is never the same person-to-person, and people can be very opinionated on sheet hygiene.

    xkcd comic about mess

    (From xkcd)

    The bare minimum that I think is unobjectionable is:

    • Make a new sheet for each puzzle
    • Put related information close to each other (answers next to clues)
    • Use a monospace font

    The first two are just common sense. As for monospace, a lot of puzzles will use word length as part of the structure. Human Pyramid from Teammate Hunt 2021 is a good example of this. Patterns in length (matching length, exactly one letter longer, etc.) are easier to notice if you put content in monospace, so using those fonts can help you break-in on those patterns faster.

    (As an aside, Human Pyramid is displayed in monospace font specifically to encourage solvers to consider length. A similar trick is used in Pestered from MIT Mystery Hunt 2018.)

    Personally, I like to put everything in monospace, but I know some solvers who prefer only putting clue answers in monospace and leaving clues in sans serif, since they value the visual boundary between clue and answer. Pick whichever works for you.

    Transcribe puzzle content faithfully

    Before you have broken into a puzzle, any part of the puzzle could be relevant information. The act of transcribing a puzzle from a website or PDF into a spreadsheet is slightly destructive. Text styling and spacing are often the first casualties. Usually this is fine, but every now and then the parts lost during transcription are important to breaking in.

    So, try to make your sheet look as much like the puzzle as you can. Certain puzzles can make this hard (shoutouts to every puzzle with a triangular or hex grid). In those scenarios, do your best, but add a note to “see original puzzle” in the spreadsheet. And remember to look at the original puzzle if you’ve been stuck for a while, to see if there’s something about the presentation you missed.

    Share your bad ideas

    Puzzle solving is a collaborative activity, and people won’t know what ideas you have if you don’t share them. It’s okay to caveat that your idea is bad, but share it anyway. Try to follow the “yes, and…” rule-of-thumb from improv. Sometimes you will have 90% of the right idea and be missing the last 10%. Other times you will have the half-baked 10% of the right idea, and someone else will have the 90%. To get to 100% of the idea, one person needs to communicate, and you want to have a culture where both the 10% person and 90% person can be the one who communicates first.

    Count things!

    Off the top of my head, I do this a ton, I feel I do this more than most solvers, and I haven’t regretted it.

    Puzzle solving often involves relating two parts of the puzzle together. These relations often follow whole number ratios. If there are N items in both parts, it suggests matching between those parts. If there’s N items in one and 2N in another, that could mean matching 1 item from the first part with 2 items from the second. If there’s 26 of something, it could mean A1-Z26.

    A very simple example is Not Quite a Polka from Puzzles are Magic. There’s two sections of clues, and 13 clues in each section, so the puzzle is likely about solving both and combining the two.

    Another example is Oxford Children’s Dictionary from MIT Mystery Hunt 2022. There are 19 clues in both the Standard Dictionary section and Oxford Children’s section, so we likely relate the two, and there are 38 words in the bottom grid to fit into 38 blanks in the clues, suggesting we fill each with a word.

    A third example is Cryptic Command from GPH 2018. There are many steps to the puzzle, but we can start by noticing that for each card, the number of cryptics on that card matches the number of circles in the top right of that card. So we should probably relate each cryptic to a circle. (If we know enough about the Magic: the Gathering reference, this already provides some hint for what the cryptics will resolve to.)

    Note that we can make all these guesses before doing any clue solving! During Mystery Hunt, it took us about 10 minutes to get our first clue in Oxford Children’s Dictionary, but we already correctly guessed all the mechanics of the puzzle before doing so.

    Counts can also let you draw negative conclusions. If you have 7 clues in one part and 10 clues in the other, you are not going to do a 1:1 matching, so you can immediately discard trying a 1:1 match and think of doing something else.

    I call this the numerology of the puzzle. At the start of a puzzle, quickly count the most salient parts of the puzzle, and keep any interesting correspondences in mind. When testsolving Light Show from Teammate Hunt 2021, we knew it corresponded to Tumbled Tower in some way, but weren’t sure on the rules for how it would work. So I decided to count every square in the Light Show grid. There were 133 of them, which was 7 * 19, matching the 19 heptominos in Tumbled Tower. We already suspected we’d use the heptominos in some way, but counting let us conclude that they should fit exactly with no overlaps or gaps.

    Search everything

    Honestly, a lot of puzzle solving is about taking random parts of the puzzle and throwing them into a search engine. Search the flavor text. Search just half of the flavor text. Add quotation marks to do exact phrase searches. Search all your answers together and see if something shows up. Randomly drop out words from all of the previous searches and try again. Experiment with more than one tool - anecdotally, I’ve found LLMs are amazing at pop culture ID, despite the hallucinations.

    There are a lot of variables you can tweak in your search engine queries. The heuristic I use is that if there aren’t obvious starting points, I will search the puzzle title, the flavortext, all proper nouns, and any phrases that don’t read like typical English. I’ll also mix the theme of the puzzlehunt into search queries if I think it could turn up something new.

    Ask if an idea’s overconstrained

    Puzzles do not arise spontaneously. They are created by people to have a solution. As you come up with ideas on how a puzzle works, each of those ideas applies a constraint to the puzzle. For example, if we see a rows garden puzzle (see below), it’s a safe bet that answers for the blooms (the colored hexagons) are 6 letters long.

    Rows garden grid

    (This might not be the case if the puzzle is doing something funny for extraction, but let’s ignore that.)

    We say an idea is overconstrained if it would be impossible or very difficult to construct a puzzle that worked that way. All answers being 6 letters is easy. All answers being 6 letter words for colors is harder, but maybe possible. If all answers had to be 6 letter words of colors that start with Y, that’s definitely overconstrained, because after YELLOW you really don’t have many options.

    The logic goes like this:

    • The puzzle is constructible, because I’m looking at it.
    • If this overconstrained theory X is how the puzzle works, it would not be possible to construct this puzzle.
    • Therefore, X can’t be how the puzzle works.

    This can speed up your solves by letting you skip checking ideas that are implausible.

    Now, sometimes this bites you badly, if you assume a construction is impossible when it isn’t. This happened to my team when solving Highway Patrol from the DaroCaro hunt. In this puzzle, you solve a Sudoku puzzle, and eventually get to the cluephrase MIDDLES. This suggests using the middle cells of each 3x3, and we failed to extract from this for a long time. It turned out the answer was to convert each middle number using A1-Z26. The reason our team got stuck here was that we all assumed that was impossible, as doing A1-Z26 on Sudoku digits forces your cluephrase to only use letters from A to I, and surely there’s no way you could do something with that, right? We kept trying to index things instead. Apply with caution - sometimes a constructor is just insane and figures out how to make a very constrained construction work.

    Executing Faster

    By this point, we have some idea of how the puzzle works, and are in the phase of answering clues and making deductions. Our goal is to get enough data to continue the puzzle. This section is about the mechanics of how to progress through clues quickly.

    Look ahead to how the next step will work

    In my opinion this is the major thing that separates experienced puzzlehunters from new ones. New puzzlers tend to solve puzzles in a waterfall style. When given a list of clues, they fully solve the clues, then start thinking about what to do with those answers. Whereas experienced puzzlers look ahead more, thinking about what to do next while solving the clues.

    I don’t think new puzzlers are wrong to solve this way. It’s hard to look ahead if you don’t have the experience to know how puzzles tend to work. But in general, knowing what you’re aiming for can make it easier to solve the step you’re currently on, and will let you jump ahead faster. My rough heuristic is to consider how the puzzle will work at the start of the puzzle, then again at about the 50% mark. (Although, I will adjust depending on how much fun I’m having. If I’m not having fun I will definitely try to extract earlier to save myself from having to do more work.)

    The reason it’s worth looking ahead is that knowing the next step often gives you extra constraints that can make clue solving easier, or even possible. It’s pretty common for a puzzle to have deliberately ambiguous clues, that only become unambiguous after you notice a common property shared by all answers. It’s a bit like backsolving a metapuzzle, but at a smaller scale.

    Correctly looking ahead can sometimes let you skip large chunks of a puzzle, if you figure out extraction early enough to get the answer to show up in nutrimatic or Qat. One way to solve puzzles faster is to just do less work. I know some very strong solvers who can solve quickly while 100%ing puzzles, but it’s undeniably true that you don’t need to 100% puzzles to finish them. Personally, I only 100% a puzzle if I’m having a ton of fun, otherwise I’ll move on after we get the answer.

    Look for pieces of confirmation

    In sudoku puzzles, there’s this idea called “the deadly pattern”.

    Deadly pattern


    Consider the red rectangle above. The corners of the rectangle are 59-95 or 95-59. Both are valid solutions, no matter what gets placed in the rest of the grid, as switching between the two does not change the set of digits in any of the affected rows, columns, or 3x3 boxes. Since Sudoku puzzles are constructed to have a unique solution, if we ever see the deadly pattern, we know we’ve made a mistake and should backtrack. You usually do not need to assume uniqueness to solve a logic puzzle, but assuming uniqueness can give you a guardrail.

    The same is true in puzzlehunting. Answers are often alphabetical as a confirmation step, and I will usually check alpha order very early (when ~20% of the clues have been solved), since knowing this early can speed up clue solving and let you catch mistakes as soon as they break alpha order.

    To draw a transportation analogy, traffic lights and rules don’t exist to slow you down, they exist to speed you up by letting you drive through intersections without having to negotiate right of way. Guardrails are there to make your solve more streamlined - search them out and use them!

    Do the easy parts first

    When I see a crossword puzzle, I like to solve the proper nouns first. With a search engine, these are usually both easy and unambiguous. Clue difficulty is not uniform, and sometimes constructors will deliberately create an easy clue to provide a foothold for the puzzle. It’s usually worth doing a quick once-over of a puzzle to see if anything stands out. If I get stuck on a clue, I will immediately jump to the next one and only revisit the trickier clue if needed, rather than forcing myself to do it.

    Prioritize important clues

    If you see an extraction like ?N?WE???????, many solvers I know will assume the first 6 letters are going to spell ANSWER and will stop extracting any of the first 6 letters unless they get stuck.

    This is a special case of trying to direct work towards the “high information” areas. Some general wheel-of-fortune skills are helpful here: if I see an extraction like ?????I?G, I’ll usually assume it’s going to end -ing and skip solving the clue for the blank between the I and G.

    When combined with word search tools like Qat or nutrimatic, this can be ridiculously strong. In general, word search tools have trouble with long runs of blanks, but are very good at filling short gaps. If I know ordering, I’ll often jump around to solving clues that reduce runs of blanks, rather than going in order. Or, I’ll declare that I’m working from the bottom when solving with a group, because most people start from the top and it’s better to distribute work throughout a cluephrase.

    Importantly, you can’t tell what to prioritize if you haven’t tried extracting yet, which is why I value looking ahead on extraction so highly.

    At the hunt-wide level, this also means trying to push solves in rounds where you think it’ll be important to get more solves, either to unlock the meta or get more data for a meta.

    Automate or find tools for common operations

    Solve logic puzzles with Noq. Use browser extensions that make it easy to take screenshots and search images by right click (there are a few of them, I use Search by Image). Wordplays is a solid crossword clue searcher, and npinsker’s Crossword Parser is a good crossword grid sheet creator (albeit one you need to supervise, the image detection isn’t perfect). There are too many tools to list all of them, but the one on is pretty good.

    One I would highlight is spreadsheet formulas. I spent a very long period of my puzzling career not using formulas. Eventually I decided to learn spreadsheet formulas and now I can’t go back.

    Spreadsheet formulas have several advantages:

    • They’re consistent. You can’t miscount or typo a letter. (You can typo a spreadsheet formula, but usually spreadsheet formula typos lead to errors rather than incorrect letters.)
    • They’re automatic. Once you have extraction driven by spreadsheet formula, you can stay in a flow of IDing and solving without detouring into extracting, reducing time lost to context switching.
    • Due to being automatic, you can see exactly how much partial progress you’ve made on extraction, which makes it easier to prioritize clues and check if an extraction looks promising.

    Also, being good at spreadsheet formulas is by far the most transferable skill to real life. The business world runs on Excel. The very basic actions you’ll do over and over in puzzlehunts:

    • =MID(A1, k, 1) takes the k-th letter of A1.
    • =REGEXREPLACE(A1, "[^A-Za-z]", "") removes all non A-Z characters from A1.
      • If you only want to ignore spaces to worry about, =REGEXREPLACE(A1, " ", "") or =SUBSTITUTE(A1, " ", "") may be easier to remember. You usually do not need the full power of regular expressions.
    • =LEN(A1) gives the length of the word in A1.
    • =CHAR(A1 + 64) will convert 1 to 26 into A to Z.
      • If you can’t remember 64, you can do =CHAR(A1 - 1 + CODE('A')) instead.
    • =CODE(A1) - 64 will convert A to Z into 1 to 26.
    • =UPPER(A1) will put the contents in uppercase.

    It’s also worth understanding relative references vs absolute references. Consider this example I just made up.

    Sheets example

    The =MID(B2, C2, 1) in cell D2 here is a relative reference. Although the cell displays =MID(B2, C2, 1), what is actually stored internally is =MID(2 cells left, 1 cell left, 1). Dragging a formula will copy-paste that relative offset to each cell, which is usually exactly what we want.

    In some cases, you will want to refer to a fixed position. This is called an absolute reference, and you can do so by prepending $ to the column, row, or both. $B2 is an absolute ref to column B and a relative ref to row 2, whereas B$2 is a relative ref to column B and an absolute ref to row 2.


    Indexing multiple columns of indices: =MID($A1, B1, 1) locks the word to always be from column A, while letting the word change per row.

    Indexing a single word many times: =MID($A$1, B1, 1) locks the indexed word to be a specific cell.

    As I’ve become more fluent in spreadsheets, I’ve started using them in more complex ways. For example, if the clues for a puzzle include enumeration, I’ll quickly add =LEN(REGEXREPLACE(A1, "[^A-Za-z]", "")) next to each answer column, to make it easier to verify our words are matching the given lengths. If you find you use a formula often, you can use named functions to save them and import them elsewhere. (Although I confess I always forget to set this up before a hunt, so I don’t use them very often.)

    The spreadsheet rabbit hole goes very deep. I recommend You Suck at Excel by Joel Spolsky as a classic spreadsheet introduction. (You Suck at Excel recently disappeared off YouTube - the link above is a bilibili mirror.) For more puzzlehunt-focused guides, see Yet Another Puzzlehunt Spreadsheet Tutorial by betaveros and Google Sheets Puzzle Tricks by Jonah Ostroff.

    Use code for more complicated extractions or searches

    (If you can’t program, ignore this.)

    Sometimes, you hit the limits of what public tools can do. In these scenarios, it can be pretty helpful to write code trying more complex extractions or searches. I recommend downloading a wordlist (Scrabble dictionary is a good target), and writing some basic functions for A1-Z26, morse, and so on. Any time you write one-off code, check if you think it’ll be useful for a future hunt, and save it somewhere. A good starting place is solvertools for generic operations and grilops for logic puzzles.

    The power of having basic encodings implemented in code is that it makes it possible to write brute force searches. Once, I was solving a metapuzzle from a now-offline hunt. Based on flavor, we were very confident it was going to be about Morse code, each of our feeder answers would convert to a dot or dash, and we’d read out a message. We couldn’t figure out how to do the conversion, but the round only had 7 puzzles, so I decided to write a script to brute force the Morse for all \(2^7 = 128\) possibilities. This worked. I did something similar for the Silph Puzzle Hunt metameta. We solved the first and last group, but couldn’t figure out the answer to the (4 7) group in the middle. Doing a word search in my phrase list, I found there were around 2500 reasonable phrases of enumeration (4 7) in it, so out of despair I decided to brute force a list of all possible extractions and scrolled through it until I noticed something that looked like the answer.

    One advantage of code is that if it exhaustively checks all possibilities, and it still fails to extract the puzzle, then you know you should try something different from the extraction you attempted, assuming you trust your code and the inputs you’re giving it. (Once again, this has burned me before, when I concluded an extraction method was wrong because the answer didn’t appear in my local wordlist. Proceed with caution!)

    When picking what to work on, check what’s underinvested

    Most people solve puzzlehunts in a team, and it’s usually not good for everyone on the team to pile onto one puzzle. Instead it’s better for people to spread out. When looking for a puzzle to work on, it can be helpful to check which ones have enough people to push it to the finish without you, and which ones need assistance.

    In general, puzzles with big lists of clues (e.g. crosswords) can absorb lots of people, and puzzles with many serial a-has (e.g. logic puzzles) see diminishing returns. It’s not always clear if a puzzle is more serial or parallel. One quick hack is to just ask the people working on the puzzle if they want help or think they have it under control.

    In especially big hunts (i.e. Mystery Hunt), you can also strategize at the round level, picking rounds based on how many people are working in that round and how many solves they’ve gotten so far.

    Getting Unstuck

    Everyone gets stuck on puzzles. The question is what to do about it.

    Check your work

    I have seen so, so many times where a puzzle was stuck because of a silly mistake. Check your work. Cannot overstate this enough. I know one friend who’s joked that their puzzle solving tips guide would just be “check your work” repeated 50 times.

    Do a different puzzle

    There’s no shame in abandoning a puzzle for now and coming back to it later. It’s very easy to tunnel vision too hard on a problem.

    However, there is a certain art to this. Puzzles are normally done in teams. If you abandon a puzzle, it’s good to make a clean copy of your sheet first. Keep your scratch work as is, but in the clean copy, organize your work and explain what you’ve found. Importantly, remove all the speculative half-baked ideas you had. Given that you’re stuck on the puzzle, it’s more likely your half-baked ideas are wrong, and you should avoid biasing future solvers into the same rabbit hole.

    Look for unused information

    Most puzzles will try to use all channels of information they can, and will not have extraneous info. That doesn’t mean every puzzle will use 100% of its info, but extra info is the first place to look. Try listing everything that has and hasn’t been used yet.

    Look for missing information

    To me, this is slightly distinct from unused information. Unused information is when you know there are aspects of the puzzle dataset you haven’t used yet. Missing information is when the information you need doesn’t even exist in your spreadsheet, and needs an a-ha to figure out.

    The longer a spreadsheet stays unextracted, the more likely it is that the spreadsheet is fundamentally missing the information needed to extract. I’m a big fan of qhex’s extraction basher, which will try a wide battery of indexing and ordering mechanisms. It doesn’t actually work that often in my experience, but when I see it fail, it does encourage me to consider if there’s a way to derive another column of data we could be extracting from.

    Assume a few errors

    To err is human. If you don’t like a few letters, you can always pretend those were solved wrong and switch them to wildcards in nutrimatic. I’ve also been pointed to, which I haven’t used before but supports “probably correct” letters.

    It’s important not to overdo this, but getting good at error correction can really take you quite far. This extends to other forms of errors as well. For example, sometimes I’ll assume our indices were derived incorrectly, and try Caesar shifting in case they’re all off by one.


    A game designer painstakingly carves a beautiful sculpture out of wood, first chiselling it out of a raw block, and then gradually rounding off any rough edges, making sure it works when it’s viewed from any angle.

    The speedrunner takes that sculpture and they look it over carefully, from top to bottom, from every angle, and deeply understand it. They appreciate all the work that went into the design, all the strengths or the weak points, and then, having understood it perfectly, they break it over their knee.

    Getting Over it Developer Reacts to 1 Minute 24 Second Speedrun

    For the uninitiated, cheesing a puzzle means to solve it through an unintended path. It comes from video game slang, and usually implies solving with less work than intended.

    Cheesing can be a little controversial…it’s a bit subversive, and as tools for cheesing have gotten stronger, puzzle design has had to adapt in ways that sometimes makes a puzzle worse. Hardening a puzzle against nutrimatic sometimes makes it less friendly to new solvers who don’t know how to exploit nutrimatic.

    I view cheesing the same as backsolving - it’s a ton of fun if it works, but it wouldn’t be nearly so fun if it worked all the time.

    Here are common cheese tactics:

    • If you don’t know how to order, you can try random anagramming.
    • If you have an ordered list of words, but don’t know the indices, you can construct a regular expression to take one letter from each word and see what possibilities show up in nutrimatic. For example, this works on the list of words in the spreadsheet video earlier up.

      Nutrimatic results for extracting from WOW, WHAT, EXCITING, WORDS, VERY, and WONDERFUL

      Another puzzle where I remember it working was Hackin’ the Beanstalk from MIT Mystery Hunt 2020.

    • If you have an ordered list of indices, know what words you’re indexing, but don’t know how they match up, you can also construct a regex. If the index is (3), and we know the word we’d index is one of CAT, DOG, or HORSE, then we know that letter can only be [tgr]. This was how our team solved the Flexibility meta from Shardhunt. We understood the last section was forming a path going back and forth between six 6-letter words, indexing the face seen at each step of the path, but we couldn’t figure out the interpretation. So we tried a cheese in hopes we could skip that step, and it worked. (Albeit with having to look at page 2 of nutrimatic results.)

      Shardhunt sheet

      In testsolving, I solved Mouth Mash from Teammate Hunt 2020 this way as well, and the puzzle was redesigned to make this cheese less effective.

    • If enumerations are given for a multi-word phrase, and the phrase is long enough, sometimes the enumeration itself is enough to constrain the phrase. You can use OneLook or nutrimatic to check for this. For example, (2 2 2 3 2 2 4 2 3 8) only has one notable match. This cheese is usually better done in OneLook, since the phrases in its dictionary are all only “real” phrases that could be puzzle answers.

    To draw an analogy to engineering solutions to high school math contest problems, cheesing is not a replacement to learning how to solve puzzles, but it can be very effective. If you cheese a puzzle, you should go read how it was supposed to work after solutions are released, or try to reverse engineer the extraction from the puzzle answer.

    You should also make sure other solvers are okay with you cheesing a puzzle before trying to do so. Cheesing is a way to skip ahead of the struggle, but sometimes people in the struggle are having lots of fun.

    Get in the constructor’s mindset

    If you get really stuck, it can help to ask why a puzzle was constructed the way it is. Why was this information provided to you? Why is this clue phrased the way it is?

    A very recent top-of-mind example is Goodreads from Brown Puzzlehunt 2024. There is a point in the puzzle where you extract a bunch of numeric library classifications (i.e. Language = 400 in Dewey Decimal). We were confused on whether to extract using “Language” or “400”. One clue comes with a note saying “in base 63”, so the eventual argument made was:

    Saying “base 63” is so random. This has to only exist because they couldn’t make extraction work with standard library classes. So we should leave the class as the number, since there’s no great way to interpret base 63 otherwise.

    In general, weird, strange, or obscure clues tend to indicate that there’s a very specific reason that clue had to be weird, strange, or obscure, and you should consider what that reason is. For example, I once wrote a puzzle where I needed to clue EOGANACHTA, which was a travesty, but it was the most reasonable option that fit the letter constraints I needed. A sharp-eyed solver that got that clue should be immediately suspicious that future extraction will be based on the letters used rather than its meaning.

    To give a non-puzzle example, in the board game Codenames, you are trying to clue words for your team and not the opposing team. If I get a clue like “TOMATO: 2”, and see RED and FRUIT on the board, I’ll mention RED and FRUIT fits, but I’ll also start wondering why we got “TOMATO” as a clue instead of a more typical red fruit like APPLE, CHERRY, or STRAWBERRY. Perhaps a word like PIE is on the board, and they were trying to avoid it.

    How much getting into the constructor’s mindset helps you will depend on how well you understand typical puzzle design. This is one reason people who write puzzles tend to get better at solving them. Like, maybe the reason ✈✈✈Galactic Trendsetters✈✈✈ is so strong is because parts of their team have been writing hunts every year for 7 years.


    Let’s say we’ve solved a meta, or understand the constraints it will place on our answers, and want to backsolve a puzzle. How do we do so?

    Verify it’s possible

    Very often, the way a meta uses its answer will lead to it not backsolving very easily. If it only uses 1 letter from the feeder answer, you basically have no hope and should just move on.

    Read through the progress on each unsolved puzzle

    In my experience, it is actually pretty rare that you can backsolve a puzzle entirely from the meta constraints. The more common strategy is that forward solving the puzzle gives some information about the answer, the meta provides other information, and only the combination of both is enough to constrain the backsolve. In the Department Store Puzzlehunt from Puzzlehunt CMU, we only backsolved Karkar’s Very Fun Crossword because we had forward solved enough to know we’d like the answer to be a compound word. In MIT Mystery Hunt 2019, teammate backsolved 2 puzzles from Thanksgiving Town/​President’s Day Town by noticing the themes of those puzzles matched incredibly well with some candidate answers.

    Before attempting a backsolve, read through each puzzle to refresh yourself on the theme of the puzzle, or see if there’s any hints about the length of that puzzle’s answer, like the number of clues or given blanks. If you don’t know the length of the answer, one trick in nutrimatic is that it supports & for “match both sides”, and A{a,b} will match exactly \(a\) to \(b\) letters (i.e. A{3,6} = 3 to 6 letters). So you can do searches like AAAxA*&A{7,10}, which means “an answer whose 4th letter is X and is 7 to 10 letters long”. This comes up in forward solving, but is more useful in backsolving.

    Bringing This Together

    To showcase these strategies together, here is a puzzle I remember speedrunning especially quickly: The Three Little Pigs from Hunt 20 2.1 Puzzle Hunt. This hunt was designed to be on the easier side, so this made it more susceptible to speedrunning. I’ve reproduced the puzzle content below.

    The Three Little Pigs

    It's all about 3

    This puzzle uses cryptic clues. If you are new to cryptic clues, a guide such as this may help.

    Nice hug in deity (4)
    Break small round pan (4)
    Shatter quick for pancakes? (9)
    Head of public relations takes primate document (5)
    Little matter from master weight (6)
    Among a hubbub blessing, spheres (7)
    Plain vehicle turns everything and one (7)
    Southern team leader's uncooked fodder (5)
    Cease the odd sets of pi (4)

    Bachelor's around Astley blocks (6)
    Cisgender in Social Security, or small tools (8)
    Diamond inside a meal (6)
    Flower mob coming back around failure (7)
    Particle misuses one turn (7)
    Soda bubble result (3)
    Sweet delayed after cold homecoming (9)
    Turn an official list of bread (4)
    Unpleasant drug lyrics within (4)

    Here’s how my team did that hunt, as reconstructed from Google Sheets history, and annotated with the strategies mentioned earlier.

    There are 9 clues in each half (counting).

    The puzzle is very strongly hinting 3, so my guess is that either we will form 3 groups of 3 from each column, or we’ll pair the columns and use 3 some other way (looking ahead).

    Clues in the right column are ordered alphabetically, clues in the left column are not (looking for confirmation). That suggests ordering by the left column. If only one column is ordered, that also suggests pairing between columns, because (constructor mindset) it wouldn’t make sense to change the ordering between columns if each column was used identically.

    Since it seems likely we’ll do pairing, let’s solve a few from both columns, since having progress on both will make it easier to find a pair (prioritize important clues).

    Nice hug in deity (4)
    Break small round pan (4) = SNAP
    Shatter quick for pancakes? (9)
    Head of public relations takes primate document (5)
    Little matter from master weight (6)
    Among a hubbub blessing, spheres (7)
    Plain vehicle turns everything and one (7)
    Southern team leader's uncooked fodder (5)
    Cease the odd sets of pi (4) = STOP

    Bachelor's around Astley blocks (6)
    Cisgender in Social Security, or small tools (8)
    Diamond inside a meal (6)
    Flower mob coming back around failure (7)
    Particle misuses one turn (7)
    Soda bubble result (3) = POP
    Sweet delayed after cold homecoming (9)
    Turn an official list of bread (4)
    Unpleasant drug lyrics within (4)

    Hey, SNAP and POP could form a group with CRACKLE. Perhaps this is how we use 3 - we get clues for two parts of a set of 3. STOP could be the start of STOP DROP ROLL, so let’s see if we can find DROP or ROLL in the right column, and otherwise focus on solving the left column since that provides ordering (prioritize important clues).

    Nice hug in deity (4)
    Break small round pan (4) = SNAP
    Shatter quick for pancakes? (9) = BREAKFAST
    Head of public relations takes primate document (5)
    Little matter from master weight (6)
    Among a hubbub blessing, spheres (7)
    Plain vehicle turns everything and one (7)
    Southern team leader's uncooked fodder (5)
    Cease the odd sets of pi (4) = STOP

    Bachelor's around Astley blocks (6) = BRICKS
    Cisgender in Social Security, or small tools (8)
    Diamond inside a meal (6)
    Flower mob coming back around failure (7)
    Particle misuses one turn (7)
    Soda bubble result (3) = POP
    Sweet delayed after cold homecoming (9)
    Turn an official list of bread (4) = ROLL?
    Unpleasant drug lyrics within (4)

    With BREAKFAST on the left, we should try to find LUNCH or DINNER, and with BRICKS on the right, we should try to find STRAW or STICKS. At this point, we can provisionally pair those target words to cryptics by just looking for a definition that matches (cheese). In the real solve, we did do the wordplay, but only after we knew what word we wanted it to resolve to.

    Nice hug in deity (4)
    Break small round pan (4) = SNAP
    Shatter quick for pancakes? (9) = BREAKFAST
    Head of public relations takes primate document (5)
    Little matter from master weight (6)
    Among a hubbub blessing, spheres (7)
    Plain vehicle turns everything and one (7)
    Southern team leader's uncooked fodder (5) = STRAW
    Cease the odd sets of pi (4) = STOP

    Bachelor's around Astley blocks (6) = BRICKS
    Cisgender in Social Security, or small tools (8)
    Diamond inside a meal (6) = DINNER?
    Flower mob coming back around failure (7)
    Particle misuses one turn (7)
    Soda bubble result (3) = POP
    Sweet delayed after cold homecoming (9)
    Turn an official list of bread (4) = ROLL?
    Unpleasant drug lyrics within (4)

    Let’s try extracting from the missing words for each group of 3 (looking ahead). So far, we have:


    Having 4/9 is enough to try nutrimatic. First instinct is to read first letters, but ending in SD seems bad. If I were making this puzzle (constructor mindset), I’d want to put the theme of 3 everywhere, so let’s try indexing the 3rd letter.

    This gives, and if we scroll down the list of nutrimatic results a bit, we see DANCE TRIO, which was the answer.

    During the hunt, some teams solved this puzzle in 5 minutes. My team was not that fast (13 minutes), but you can see how applying a few tricks let us focus directly on the solve path and reduced unnecessary effort. This solve was a very extreme case, and the tricks I’ve described are usually not this effective, but try them out sometime. I expect them to be of use.

    Thanks to Eugene C., Evan Chen, Nicholai Dimov, Jacqui Fashimpaur, Cameron Montag, Nishant Pappireddi, Olga Vinogradova, and Patrick Xia for giving feedback on earlier drafts of this post.

  • Solving Crew Battle Strategy With Math

    In Super Smash Bros tournaments, there’s occasionally an event called a crew battle. (They show up in other fighting games too, but I mostly watch Smash.) Two teams of players compete in a series of 1v1 matches. For the first match, each team picks a player simultaneously. They fight, and the loser is eliminated, while the winner stays in. The losing team then picks a player to go in. Elimination matches repeat until one team is out of players.

    These events are always pretty hype. People like team sports! They’re also often structured in a regional way (i.e. US vs Japan, West Coast vs East Coast), which can emphasize and play up regional rivalries. The strategy for deciding who to send in can be complicated, and although it’s usually done by intuition, I’ve always wondered what optimal strategy would be.

    The reason it’s complicated is that fighting game character matchups aren’t perfectly balanced. Some characters counter other characters. Sometimes players are unusually good at a specific matchup - a Fox v Fox matchup is theoretically 50-50 but some players have earned a reputation for being really good at Fox dittos. And how about generic player strength? Typical rule of thumb is to keep your strongest player (the anchor) for last, because the last player faces the most psychological pressure. But if we ignored those factors, is that actually correct? When do you send in your 2nd strongest player, or weakest player? What is the optimal crew battle strategy?


    Let’s formalize the problem a bit. For the sake of simplicity, I will ignore that Smash games have stocks, and assume each match is a total win or loss. I expect the conclusions to be similar anyways.

    Let’s call the teams \(A\) and \(B\). There are \(n\) players on each team, denoted as \(\{a_1, a_2, \cdots, a_n\}\) and \(\{b_1, b_2, \cdots, b_n\}\). Each player has a certain chance of beating each other player which can be described as an \(n \times n\) matrix of probabilities, where row \(i\) column \(j\) is the probability \(a_i\) beats \(b_j\). We’ll denote that as \(\Pr(a_i > b_j)\). This matrix doesn’t need to be symmetric, or have its rows or columns sum to 1.

    \[\begin{bmatrix} \Pr(a_1 > b_1) & \Pr(a_1 > b_2) & \cdots & \Pr(a_1 > b_n) \\ \Pr(a_2 > b_1) & \Pr(a_2 > b_2) & \cdots & \Pr(a_2 > b_n) \\ \vdots & \vdots & \ddots & \vdots \\ \Pr(a_n > b_1) & \Pr(a_n > b_2) & \cdots & \Pr(a_n > b_n) \end{bmatrix}\]

    We’ll call this the matchup matrix. Each team knows the matchup matrix, and has a goal of maximizing their team’s win probability. To make this concrete, we could take the example of a rock-paper-scissors crew battle. Each team has 3 players: a rock player, a paper player, and a scissors player. That would give the following matchup matrix.

    \[\begin{array}{c|ccc} & \textbf{rock} & \textbf{paper} & \textbf{scissors} \\ \hline \textbf{rock} & 0.5 & 0 & 1 \\ \textbf{paper} & 1 & 0.5 & 0 \\ \textbf{scissors} & 0 & 1 & 0.5 \end{array}\]

    Normally, in RPS you’d play again on a tie. Since there are no ties in crew battles, we’ll instead say that if both players match, we pick a winner randomly. Here’s an example game:

    A picks rock, B picks scissors
    A rock beats B scissors
    B sends in paper
    A rock loses to B paper
    A sends in scissors
    A scissors beats B paper
    B sends in rock
    A scissors loses to B rock
    A sends in paper
    A paper beats B rock
    B is out of players - A wins.

    This is a pretty silly crew battle because of the lack of drama, but we’ll come back to this example later.

    First off: given a generic matchup matrix, should we expect there to be an efficient algorithm that solves the crew battle?

    My suspicion is, probably not. Finding the optimal strategy is at some level similar to picking the optimal order of the \(N\) players on each team, which immediately sets off travelling salesman alarm bells in my head. Solving crew battles seems like a strictly harder version of solving regular matrix payoff games, where you only have 1 round of play, and a quick search I did indicates those are already suspected to be hard to solve generally. (See Daskalakis, Goldberg, Papadimitriou 2009 if curious.)

    Still, even though there isn’t an efficient algorithm, there definitely is an algorithm. Let’s define a function \(f\), where \(f(p, team, S_A, S_B)\) is the probability team \(A\) wins if

    • The player who just won is \(p\).
    • The team deciding to who to send in is \(team\) (the team that \(p\) is not on).
    • \(S_A\) is the set of players left on team A, ignoring player \(p\).
    • \(S_B\) is the set of players left on team B, ignoring player \(p\).

    Such an \(f\) can be defined recursively. Here are the base cases.

    • If \(team = A\) and \(S_A\) is empty, then \(f(p, team, S_A, S_B) = 0\), since A has lost due to having 0 players left.
    • If \(team = B\) and \(S_B\) is empty, then \(f(p, team, S_A, S_B) = 1\), since A has won due to B having 0 players left.

    (One clarification: the crew battle is not necessarily over if \(S_A\) or \(S_B\) is empty. When a team is on their last player, they will have 0 players left to send in, but their last player could still beat the entire other team if they play well enough.)

    Here are the recursive cases.

    \[\begin{align*} f(a_i, B, S_A, S_B) = \min_{j \in S_B} & \Pr(a_i > b_j) f(a_i, B, S_A, S_B - \{b_j\}) \\ & + (1-\Pr(a_i > b_j)) f(b_j, A, S_A, S_B - \{b_j\}) \end{align*}\]

    (It is Team B’s turn. They want to play the player \(b_j\) that minimizes the probability that team A wins. The current player will either stay as \(a_i\) or change to \(b_j\).)

    \[\begin{align*} f(b_j, A, S_A, S_B) = \max_{i \in S_A} & \Pr(a_i > b_j) f(a_i, B, S_A - \{a_i\}, S_B) \\ & + (1-\Pr(a_i > b_j)) f(b_j, A, S_A - \{a_i\}, S_B) \end{align*}\]

    (The reverse, where it’s team A’s turn and they want to maximize the probability A wins.)

    Computing this \(f\) can be done with dynamic programming, in \(O(n^22^{2n})\) time. That’s not going to work at big scales, but I only want to study teams of like, 3-5 players, so this is totally doable.

    One neat thing about crew battles is that after the first match, it turns into a turn-based perfect information game. Sure, the outcome of each match is random, but once it’s your turn, the opposing team is locked into their player. That means we don’t have to consider strategies that randomly pick among the remaining players - there will be exactly one reply that’s best. And that means the probability of winning the crew battle assuming optimal play is locked in as soon as the first match is known.

    This means we can reduce all the crew battle outcomes down to a single \(n \times n\) matrix, which I’ll call the crew battle matrix. Let \(C\) be that matrix. Each entry \(C_{ij}\) in the crew battle matrix is the probability that team A wins the crew battle, if in the very first match A sends in \(a_i\) and B send in \(b_j\).

    \[\begin{align*} C_{ij} = &\Pr(a_i > b_j) f(a_i, B, S_A - \{a_i\}, S_B - \{b_j\}) \\ &+ (1-\Pr(a_i, b_j)) f(b_j, A, S_A - \{a_i\}, S_B - \{b_j\}) \end{align*}\] \[C = \begin{bmatrix} C_{11} & C_{12} & \cdots & C_{1n} \\ C_{21} & C_{22} & \cdots & C_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ C_{n1} & C_{n2} & \cdots & C_{nn} \end{bmatrix}\]

    This turns our original more complicated problem into exactly a 1 round matrix game like prisoner’s dilemma or stag hunt…assuming we have \(f\). But computing \(f\) by hand is really annoying, and likely does not have a closed form. There’s no way to get \(f\) computed for arbitrary crew battles unless we use code.

    So I Wrote Some Python Code to Compute \(f\) for Arbitrary Crew Battles

    Let’s consider the RPS crew battle again. Here’s the matchup matrix:

    \[matchup: \begin{array}{c|ccc} & \textbf{rock} & \textbf{paper} & \textbf{scissors} \\ \hline \textbf{rock} & 0.5 & 0 & 1 \\ \textbf{paper} & 1 & 0.5 & 0 \\ \textbf{scissors} & 0 & 1 & 0.5 \end{array}\]

    and here’s what my code outputs for the crew battle matrix, assuming optimal play.

    \[crew battle: \begin{array}{c|ccc} & \textbf{rock} & \textbf{paper} & \textbf{scissors} \\ \hline \textbf{rock} & 0.5 & 0 & 1 \\ \textbf{paper} & 1 & 0.5 & 0 \\ \textbf{scissors} & 0 & 1 & 0.5 \end{array}\]

    No, that’s not a typo. The two are identical! Let’s consider a crew battle of “noisy RPS”, where rock is only favored to beat scissors, rather than 100% to win, and so on.

    \[matchup: \begin{array}{c|ccc} & \textbf{rock} & \textbf{paper} & \textbf{scissors} \\ \hline \textbf{rock} & 0.5 & 0.3 & 0.7 \\ \textbf{paper} & 0.7 & 0.5 & 0.3 \\ \textbf{scissors} & 0.3 & 0.7 & 0.5 \end{array}\] \[crew battle: \begin{array}{c|ccc} & \textbf{rock} & \textbf{paper} & \textbf{scissors} \\ \hline \textbf{rock} & 0.500 & 0.399 & 0.601 \\ \textbf{paper} & 0.601 & 0.500 & 0.399 \\ \textbf{scissors} & 0.399 & 0.601 & 0.500 \\ \end{array}\]

    Intuitively, I feel it makes sense that this pulls towards the original RPS matrix - in some sense, the crew battle is like playing 3 rounds of RPS instead of 1, just with more complicated restrictions.

    Here’s the log of a sample game.

    Game start, A = rock B = scissors
    Win prob for A is 0.601
    A = rock beats B = scissors
    If B sends in paper: A wins 0.729
    If B sends in rock: A wins 0.776
    B sends in paper
    A = rock beats B = paper
    If B sends in rock: A wins 0.895
    B sends in rock
    A = rock beats B = rock
    B has no more players
    A wins

    Bad beat for team B here, getting entirely swept by rock.

    For good measure, here’s a random matchup matrix and the corresponding game matrix.

    \[matchup: \begin{array}{c|ccc} & \textbf{b}_1 & \textbf{b}_2 & \textbf{b}_3 \\ \hline \textbf{a}_1 & 0.733 & 0.666 & 0.751 \\ \textbf{a}_2 & 0.946 & 0.325 & 0.076 \\ \textbf{a}_3 & 0.886 & 0.903 & 0.089 \\ \end{array}\] \[crew battle: \begin{array}{c|ccc} & \textbf{b}_1 & \textbf{b}_2 & \textbf{b}_3 \\ \hline \textbf{a}_1 & 0.449 & 0.459 & 0.757 \\ \textbf{a}_2 & 0.748 & 0.631 & 0.722 \\ \textbf{a}_3 & 0.610 & 0.746 & 0.535 \\ \end{array}\]

    Alright. Let’s try something new. What if all player matchups are transitive? Suppose each player has a certain power level \(p\), and if players of power levels \(p\) and \(q\) fight, the power level \(p\) player wins \(p/(p+q)\) of the time. Below is a random matchup matrix following this rule. I’ve sorted the players such that \(a_1\) and \(b_1\) are weakest, while \(a_3\) and \(b_3\) are strongest. Since the matchup matrix is the win probability of A, the values go down when reading across a row (fighting better B players), and up when going down a column (fighting better A players)

    \[matchup: \begin{array}{c|ccc} & \textbf{b}_1 & \textbf{b}_2 & \textbf{b}_3 \\ \hline \textbf{a}_1 & 0.289 & 0.199 & 0.142 \\ \textbf{a}_2 & 0.585 & 0.462 & 0.365 \\ \textbf{a}_3 & 0.683 & 0.568 & 0.468 \\ \end{array}\] \[crew battle: \begin{array}{c|ccc} & \textbf{b}_1 & \textbf{b}_2 & \textbf{b}_3 \\ \hline \textbf{a}_1 & 0.385 & 0.385 & 0.385 \\ \textbf{a}_2 & 0.385 & 0.385 & 0.385 \\ \textbf{a}_3 & 0.385 & 0.385 & 0.385 \\ \end{array}\]

    …Huh. This suggests it doesn’t matter who each team sends out first. The probability of winning the crew battle is the same. Trying this with several random 3x3s reveals the same pattern.

    Maybe this is just because 3-player crew battles are weird? Let’s try this on a 5-player crew battle.

    \[matchup: \begin{array}{c|ccccc} & \textbf{b}_1 & \textbf{b}_2 & \textbf{b}_3 & \textbf{b}_4 & \textbf{b}_5 \\ \hline \textbf{a}_1 & 0.715 & 0.706 & 0.564 & 0.491 & 0.340 \\ \textbf{a}_2 & 0.732 & 0.723 & 0.584 & 0.511 & 0.358 \\ \textbf{a}_3 & 0.790 & 0.782 & 0.659 & 0.591 & 0.435 \\ \textbf{a}_4 & 0.801 & 0.794 & 0.675 & 0.608 & 0.453 \\ \textbf{a}_5 & 0.807 & 0.800 & 0.682 & 0.616 & 0.461 \\ \end{array}\] \[crew battle: \begin{array}{c|ccccc} & \textbf{b}_1 & \textbf{b}_2 & \textbf{b}_3 & \textbf{b}_4 & \textbf{b}_5 \\ \hline \textbf{a}_1 & 0.731 & 0.731 & 0.731 & 0.731 & 0.731 \\ \textbf{a}_2 & 0.731 & 0.731 & 0.731 & 0.731 & 0.731 \\ \textbf{a}_3 & 0.731 & 0.731 & 0.731 & 0.731 & 0.731 \\ \textbf{a}_4 & 0.731 & 0.731 & 0.731 & 0.731 & 0.731 \\ \textbf{a}_5 & 0.731 & 0.731 & 0.731 & 0.731 & 0.731 \\ \end{array}\]

    Same thing occurs! Let’s play a sample game. Even if the win probability is the same no matter who starts, surely the choice of who to send out in the future matches matter.

    Game start, A = 1 B = 2
    Win prob for A is 0.731
    A = 1 beats B = 2
    If B sends in 0: A wins 0.804
    If B sends in 1: A wins 0.804
    If B sends in 3: A wins 0.804
    If B sends in 4: A wins 0.804
    B sends in 0
    A = 1 beats B = 0
    If B sends in 1: A wins 0.836
    If B sends in 3: A wins 0.836
    If B sends in 4: A wins 0.836
    B sends in 1
    A = 1 loses B = 1
    If A sends in 0: A wins 0.759
    If A sends in 2: A wins 0.759
    If A sends in 3: A wins 0.759
    If A sends in 4: A wins 0.759
    A sends in 0
    A = 0 beats B = 1
    If B sends in 4: A wins 0.800
    If B sends in 3: A wins 0.800
    B sends in 4
    A = 0 beats B = 4
    If B sends in 3: A wins 0.969
    B sends in 3
    A = 0 beats B = 3
    B has no more players
    A wins

    The probability of A winning the crew battles shifts as matches are decided in favor of A or B, but the win probability does not change for any of the choices.

    I found this pretty surprising! Going into this I assumed there would be some rule of thumb tied to difference in skill level. I certainly didn’t expect it to not matter at all. This result does depend on how we’ve modeled skill, that player with skill \(p\) beats player with skill \(q\) with probability \(p/(p+q)\). However, this model is very common. It’s called the Bradley-Terry model and is the basis of Elo ratings. (And the basis for tuning AI chatbot responses based on human feedback.)

    Let’s write up this observation more formally.

    Conjecture: Suppose you have an \(N\) player crew battle, with players \(A = \{a_1, a_2, \cdots, a_N\}\) and \(B = \{b_1, b_2, \cdots, b_N\}\). Further suppose each player has a non-negative skill level \(p\), and every matchup follows \(\Pr(a_i > b_j) = p_{a_i} / (p_{a_i} + p_{b_j})\). Then the probability \(A\) wins the crew battle does not depend on strategy.

    I haven’t proved this myself, but it feels very provable. Consider it an exercise to prove it yourself.

    So…What Does This Mean?

    Assuming this conjecture is true, I believe it means you should entirely ignore player strength when picking players, and should only focus on factors that aren’t tied to skill, like character matchups. In Super Smash Bros. Melee, Fox is favored against Puff, so if you’re playing against Hungrybox (the best Puff player of all time), this suggests you should send in Generic Netplay Fox even though they’ll most likely get wrecked. And conversely, it suggests a crew should avoid sending Hungrybox in if Generic Netplay Fox is in the ring, even though Hungrybox would probably win.

    This is a pretty extreme conclusion, and I wouldn’t blindly follow it. Psychologically, it’s good for team morale if everyone can contribute, and sending people into matchups with a big mismatch in skill hurts that. But also, this suggests that crew battle strategy really isn’t that important in the first place! At the top level, it’s rare to see very lopsided character matchups. Pro players tend to pick strong characters that have good matchups against most of the field. Given that the lopsided matchups are the only ways to get edges via strategy, the math suggests that team captains actually don’t have much leverage over the outcome of the crew battle.

    If your crew loses, you don’t get to blame bad crew battle strategy. Your crew is just worse. Deal with it, take the L.

  • MIT Mystery Hunt 2024

    This has spoilers for MIT Mystery Hunt 2024. Spoilers are not labeled. I will add puzzle links once there’s a stable link to them.

    I hunted with teammate again this year, because there is nothing quite like writing a Mystery Hunt to forge friends through fire.


    This year, I got to Boston much earlier than usual. This was in-part because the company I work for does limited vacation, which I’m bad at using. I needed to spend some to avoid hitting the vacation cap, and what better time than Mystery Hunt?

    This made my Hunt much more relaxed than usual, since I got a few days to adjust to East Coast time, and was able to schedule visits to Level99 and Boxaroo before Hunt. We only went to Level99 because of Dan Katz’s post on the subject. Let it be known that we had fun, puzzle blog recommendations are good, and I would recommend it too. Now that I know what the challenge rooms are, the Puzzlvaria post reads so differently. (My group also took a hint on Pirates Brig, with the same “yes, really do what you think you should do” reaction. However we figured out a way to three-star the room without much athleticism.)

    For Boxaroo, we did a friendly competition with another group from teammate. We’d each do two escape rooms, fastest combined time wins. The Boxaroo organizers knew we were doing this, so they:

    • Invited mutual friends spoiled on the rooms to spectate and heckle our attempts.
    • Told us we were “3 minutes slower than the other group” instead of our actual time.

    When we compared notes afterwards, we lost 😔. It was very close though, with our total time only 30 seconds slower. I guess that’s like getting 2nd at Mystery Hunt by 10 minutes.

    We also got shown a backstage tour of the room, due to finishing early. Some non-spoilery notes are that the room had dynamic extra puzzles that trigger if the group is solving it quickly, and the room has a “wedding proposal mode”.

    Finally, we did Puzzled Pint, except, being silly people, we decided to make it more interesting by doing it “all brain”. No writing implements allowed, and each puzzle must be solved before looking at the next one. With 8 people, this seemed doable, but then the first puzzle was a nonogram, prompting a “OH NO IT’S SO OVER”. We didn’t solve the nonogram, but we did solve the puzzle, and eventually the entire set. Here’s the puzzle from “Animal Casino” if you want to attempt the same challenge.

    A puzzle from Puzzled Pint

    Big Picture Thoughts

    Hunt went long again this year, although this time it was more because of puzzle count than puzzle difficulty. If you were forced to pick how a Mystery Hunt runs long, I think most people would pick the “too many puzzles” side of Mystery Hunt 2024 over the “too difficult puzzles” side of Mystery Hunt 2023.

    Still, my preferred Hunt ending time is Sunday morning. On Saturday, TTBNL told our team captain that Hunt was projected to end Sunday evening, and while this was great for planning sleep, it did make me a bit worried we wouldn’t finish. After no “coin has been found” email came by Sunday 10 PM, I was extra concerned. I ended up pulling an all-nighter to try to push towards a finish, which we missed by two metas, Sedona and Nashville.

    Running a Hunt with 237 puzzles is just…an insane number of puzzles. It is not necessarily a problem. One minor complaint I have about Hunt discourse is when people say “puzzle count” when they really mean “length of hunt” or “difficulty of hunt”. It’s perfectly possible to write a Hunt with 237 puzzles that ends on Sunday if the difficulty is tuned correctly. A quick estimate: in 2022, Death & Mayhem was the first team to finish the Ministry, at Friday 18:47 EST, or 5.5 hours from puzzle release. The Investigation and the Ministry is approximately 40 puzzles. A very naive linear extrapolation of (237 puzzles / 40 puzzles * 5.5 hours) gives a 33 hour finish time of Sunday 1 AM. You could target that difficulty, overshoot a bit, and still end with a reasonable end time.

    The issue is more that you are really creating a harder problem for yourself than you need to. Puzzle creation time isn’t linear to difficulty, and you need a lot of hands on deck if you start with a high puzzle count. TTBNL was a big enough team that I could see it working out, and understand why the team thought it would work out. But in practice, the difficulty trended up higher than the structure allowed. The “fish” puzzles in Hole were a bit harder than I expected “fish” puzzles to be, and the killers in this Hunt were just as hard as killers in other Mystery Hunts I’ve done. I still had a ton of fun, the majority of puzzles I did were clean and had cool ideas, and the fraction of “meh” puzzles was no higher than previous years. There were just a lot of them.

    I really like that TTBNL did in-person interactions for each Overworld meta, to the point that I think we should have done so in 2023 and found a way to handle the logistics hell it would create. And when TTBNL decided to give out free answers on each meta interaction, doing so with a “you need to use it now” caveat was a great way to avoid the “teams stockpile free answers” problem we ran into during 2023. Between the events giving 2 free answers instead of 1, and the gifted free answers, I believe we used around 15 free answers by the end of Hunt. It still does not feel great to free answer your way to metas, but it feels a little better when it’s gradual rather than nuking a round at the end.


    Instead of going straight to the team social, I stopped by the Mathcamp reunion, which I failed to go to last year due to writing Hunt. I was very amused to see one group playing Snatch and another group playing Set, because these are exactly the two games supported by teammate’s Discord bot. I got back to the team social in time to play a custom Only Connect game. The Missing Vowels round included a “famous horses” section, and I got flamed for losing a race to identify PINKIE PIE. I knew the answer. My reaction time is bad. Gimme some slack.

    We very definitely did not want to win this year (it wasn’t even asked on the team survey), so we had a #losecomm this year to figure out the most reasonable way to do so while still having fun.

    The solution they arrived at was that no one was allowed to start or even look at a metapuzzle until all feeders in the round had been forwardsolved. This could be overruled at the discretion of losecomm. For example, if the last feeder was super stuck or grindy, we’d skip it for the sake of fun. We would also avoid using free answers. Otherwise, hints and wild guesses were all fair game.

    This policy is really more restrictive than it sounds, because the meta solvers on teammate are pretty overpowered and solving at 70% of the feeders is often like getting to skip 50% of the work. It also implicitly means no backsolving, because you’ll never be in a position to do so. That further reduces your puzzle width, assuming that the hunt structure awards unlocks to backsolves.

    Kickoff and Tech

    I enjoyed kickoff a lot. The flight safety health & safety video was excellent, and getting Mike Brown (author of “How I Killed Pluto and Why It Had It Coming”) was a nice touch. TTBNL has a number of Caltech people, so it makes sense they could do it, but it was still funny.

    As we walk towards our classrooms, I try to login to the hunt site from my phone, and manage to do so once but see a 500 error on a refresh. That’s not a great sign. Once we get to our rooms, the 500 errors persist, and…now it is time for a tangent.

    The Tech Rabbit Hole

    Early in the handoff between 2023 and 2024, TTBNL decided they wanted to use the 2023 hunt code. We cleaned it up, released it as the next iteration of tph-site, and gave advice during the year on debugging Docker errors, setting up registration, providing examples of interactive puzzles, and so on. As Hunt got closer, these messages got more frequent and shifted to email handling, webserver parameters, and server sizing. Most of our recommendations at this time were “use money to pay your way out of problems, running a server for a weekend is not that expensive if you just want CPUs”, and so they used the same specs we did: a 48-core machine with a ton of RAM and similar webserver settings.

    When tph-site is under load, it’s common for the server to stall for a bit, eventually respond, and recover from there. When the hunt site does not recover, everyone who’s done tech infra for teammate starts suspecting an issue with too many database connections. This has been a persistent problem with tph-site’s usage of websockets via Django Channels, where the codebase is super hungry on database connections when many websockets are open. We’ve never fully resolved this, but intuitively there’s no way a site with a few thousand concurrent users should need 600+ Postgres connections. It just…no, something smells wrong there on the math. This issue has burned us in the past, but we’ve always found a way around it with connection pooling and using bigger servers.

    We’re pretty invested in getting Hunt fixed so that we can do puzzles. A few teammate people drop into the #tech channel of the handoff server to help debug with TTBNL. There isn’t too much we can do, aside from passing along sample queries we used for diagnostics, asking questions about server configuration, and recommending TTBNL remove all non-essential websockets.

    In discussions with the tech team, we find that:

    • TTBNL ran a load test before Hunt. The server worked, with an initial spike of delay that recovered later, similar to tests we saw before.
    • The live server is behaving differently from that load test, becoming unresponsive.
    • The typical locations that should contain error logs contain nothing (???!!!)
    • CPU and RAM usage are high, but not near their limits.
    • Although the site is not responding, TTBNL is able to directly connect to the Postgres database and finds the database connection rate is high, but also not at the maximum set in the config files.

    This makes debugging the issue really hard, since there are no logs, there is no reproduction of the error outside prod, and the server isn’t hitting any obvious bottlenecks. And so the best routes we can recommend to TTBNL are to apply random changes to the database config while we all read webserving documentation to see if there’s something we missed.

    TTBNL will probably go into more detail in their AMA, but my rough understanding is that they figure out the request queue is the reason the server crashes and stops responding. As a temporary measure, they deploy a change that caps the request queue size, causing the server to throw more 500s immediately instead of trying to queue them. (This is later explained to teams as “fixing the server by making it fail faster”.) It still seems likely there’s a resource leak, and the root cause isn’t traced, but the site is now stable enough that it will keep working as long as it’s restarted occasionally, which is good enough to make Hunt go forward.

    xkcd comic about how restarting a server every 24 hours is easier than debugging the root cause

    (From xkcd)

    We got a very brief shout-out at wrap-up for helping fix the server. I don’t think we did much besides moral support.

    I will say that I’m not sure we would have done better in TTBNL’s position. Whatever happened is mysterious enough that we haven’t seen it before. I’ve been poking into tph-site post-hunt, and I still don’t understand why the load test pre-Hunt failed to capture the during-Hunt behavior. Maybe the 15% more Hunt participants this year pushed things over the edge? Maybe a new team’s hunt management software hammered the backend too hard? Maybe MIT Guest Wi-Fi does something weird? I kept seeing “Blocked Page” errors on MIT Guest Wi-Fi when I tried to use Google search on Firefox, unless I used Private Browsing, so they’re definitely doing something different than normal Wi-Fi.

    Whatever the cause, I suspect that this is a problem that is better cut at the source. By now, both GPH 2022 and Mystery Hunt have had server issues tied to websockets via Django Channels. For GPH 2023, Galactic entirely rewrote their backend for GalactiCardCaptors to avoid Channels because they didn’t understand why it broke for them and lost trust in its scaling. And for the Projection Device, although avoiding Channels wasn’t the intention, the backend for it was written in Go instead.

    I think the platonic ideal hunt server would remove Django Channels from the codebase and use an alternate solution for websockets. That is the ideal, but I’m not sure the migration work would be worth it. The Projection Device may not have used Channels, but the core hunt site of Silenda from that year did. So did Spoilr, the codebase for Mystery Hunt 2022, and tph-site from Mystery Hunt 2023. Real companies have made Django Channels work for them, and past hunts have used it without major issues. This might be a case of preferring the devil that’s already implemented over the one that’s not.

    Hunt! (Friday)

    With the site fixed, it’s time to get into the puzzles proper.

    The Throne Room

    Herc-U-Lease - Ah, the scavenger hunt! Technically not in this round but I’ll put it here since I didn’t do anything in this aside from getting nowhere on Annual International Fictionary Night.

    We did a few tasks, but on doing a cost-benefit analysis, we decided the effort needed to get enough drachma was too high for the reward. By the time the nerf came in Saturday, we had a lot of open puzzles to work on, and the cost post-nerf was still too high for us.

    Looking at past Mystery Hunts, the 2022 scav hunt maxed out at 100 points, with 10 points for the hardest tasks. The 2023 scav hunt maxed out at 90 points, with 30 points for the hardest tasks, although I’m guessing most teams did the 10-15 point tasks. That’s around 10 hard tasks for both hunts if you’re on a big team. The 2024 scav hunt maxed out at 60 drachma and gave 3 drachma for the hardest tasks, or 20 hard tasks, twice as many. Even post nerf to 45 drachma, it was still 1.5x longer. Given that the goal of scavenger hunts is to get teams to do goofy things for your and their entertainment, I’d recommend future teams trend easier and target 10 hard tasks as their maximum.

    I’ll still include our video for “throw something through 12 rings, each held by a different person”, because it never saw the light of day.

    Everyone knew that tangerine was going to hit someone.

    The Underworld Court

    This round was released via Google Docs to teams, using phone callbacks. I don’t really miss them but it was a fun throwback.

    Badges Badges Badges - The first puzzle I worked on while waiting for puzzle release. Honestly I’m surprised this is the first time someone made the nametags a puzzle, but they have only been a thing for 3 years. I quickly recognized mine was NATO, but by the time I figured out what “echo” is, someone else has already solved the full badge. Then I got sidetracked into tech debugging.

    Roguelikes with a K - Listen, I’m always down to try a roguelike. I quickly learn I’m bad at roguelikes relative to teammate, and busy myself with organizing the sheet instead. We submit (well, call-in) what we think is the final answer, but realize the next step before TTBNL calls us back. I still have objections to some of the interpretations, they felt a bit loose. We considered Wordle to have resource management, since you had a limited number of guesses, but figured out it needed to be “false” during our error correction tweaking. Overall, cool idea, just wish it was tighter.

    Dating Stars - The solvers of this puzzle had figured out the Chinese zodiac, but not anything else in the virgin vs chad memes. They were convinced the Western zodiac would be relevant, and in an increasingly desperate attempt, they asked a Homestuck consultant (me) to check if anything lined up with the trolls. I read over the puzzle and said “definitely not, also what in this puzzle would clue Homestuck???” They broke-in after I left.


    Judges of the Underworld (meta) - Two of us (including me) were convinced that the heights of the pillars on the round page would be important for ordering, and got confused why it was reading so poorly, right up until someone resorted the sheet. We definitely did not need all the feeders to solve this meta, but those are the rules.

    Rivers of the Dead

    Why the Romans Never Invented Logic Puzzles - This idea was both cursed and a lot of fun. Three of us were working on a grid, slowly making deductions and backtracking through mistakes, and then Lumia says “okay, I’ve solved the logic puzzle” and drops the completed grid into the spreadsheet. This has happened to me enough times that I’ve stopped questioning it.

    Initially, the large fractions of Js in the cluephrase makes me think we need to filter it with the Latin alphabet (i.e. there are no Js in Latin so ignore all Js). This doesn’t work, but I eventually decide that it really ought to be a do-it-again. We notice the self-confirming step and solve it from there. I get annoyed enough at the puzzle to start writing code to bash the finale, but the puzzle is finished before I get my code going.

    Two Outs, Two Strikes, and… - I didn’t work on this puzzle, but want to call it out as very funny.

    temporary name - We solve the answer matching pretty quickly. I volunteer to enter it into the site. It fails to do anything, and I announce we’re missing something. Around 10 minutes later, someone else resubmits the same answers to the website and it works, unlocking part 2. I guess I filled out the form wrong? Oops.

    Do You Like Wordle? - This was a very infamous puzzle to me, because it was our last feeder in all the Underworld rounds. We initially ignored the puzzle due to the warning notice in errata. That meant that when we got back to it, we had very little progress on it, and effectively the entire team was forced to play Wordle due to our losecomm policies of forwardsolving everything. We split into two rough teams. One team played a bunch of Wordle games and shared screenshots into Discord, while the second team studied the screenshots to try to determine when a game would solve to a blue square or green square.

    Wordle screenshot

    I asked some people to try games where all 26 letters were used. After finding this always led to blue games, they proposed the correct extraction idea - that a game was blue if the target letter appeared in any guess, and green if it appeared in no guesses.

    I was skeptical of this, because the letters we had were starting ..i[phcdn][gk]hq, and I wanted it to form a 5x6 grid in the end instead of a 30 letter cluephrase. But teammates were adamant that this was the start of “BRING HQ …”. Seeing it continue [vx]sm did not give me much confidence, but I had no better ideas and the theory was looking consistent, so I started grinding out letters while saying “there’s a chance we’re doing everything wrong” every few minutes. Eventually we got enough to read out the cluephrase.

    I have mixed feelings on this puzzle. The place it had in our unlock progress was always going to make me get annoyed with it, no matter the quality of the puzzle, but it did feel especially grindy. As the author of Quandle, a puzzle that asks you to solve 50 Wordles, I realize how hypocritical it is of me to say this. I think for me it came down to this puzzle outstaying its welcome. Generally, I found I needed to play 3-4 games per board to restrict the letter enough in our regex to move on. That works out to 90-120 Wordles for the entire puzzle. I think Quandle is solvable at around 40-45 Wordles in comparison, and those extra 50 Wordles made the difference. Additionally, the game sometimes messed with my ability to generate useful runs. I’d start a game thinking “this time, I will use S and T but not R”, eventually realize the winning word was ROAST, and go “goddamnit”, losing the sense of control I associate with video games and interactive puzzles. In short, idea cool, but execution a bit too much of a drag.

    Solving Wordle got us to the meta, and the runaround. I decided to go spectate the runaround with a bunch of other teammates, but when I realized this was going to look like someone reading a page aloud for many minutes, I bailed. People at our HQ told us TTBNL was going to unlock new rounds for us while the runaround ran, and I realized I came to Mystery Hunt to do puzzles, not watch people do puzzles.

    The Hole in the Ceiling of Hades

    I personally liked that puzzles unlocked in Hole throughout the Hunt, and know a bunch of people on teammate declared themselves “no Overworld, only fish” during the weekend. However, when the story page said “you are quite sure whatever’s up there isn’t necessary for getting out of Hades”, many of us interpreted it as “this round is not needed for Hunt completion at all”, and thought it was an optional round. This doesn’t make sense on reflection (why would a team write 50 optional puzzles), but, it’s what we thought.

    Discussion on whether to use an event reward on a fish puzzle

    The fact that we had over 40 solves in the round despite believing it was optional is a testament to the joy people got from doing easier puzzles in between Overworld puzzles.

    It also meant I did almost none of this round. Oops! At least I’ll have a lot of things to go back to.

    Streams of Numbers - I worked on this puzzle before we knew the round was supposed to be easy. That is my excuse for everything that went wrong.

    After the initial ID of the numbers, we got very stuck on extraction. OEIS didn’t turn up anything, and after some shitposts like “it’s a fluid dynamics puzzle”, I decided to try extrapolating the sequences. How? Well, I guess I could extend the polynomial defined by the points…

    I fit a polynomial to the first sequence, treating the values \(a_1, a_2, a_3, \cdots, a_n\) as points \((1, a_1), (2, a_2), (3, a_3), \cdots, (n, a_n)\), then evaluating the polynomial at \(n+1\). This gave back another integer. I tried it on another sequence and saw the same thing, so I excitedly shared this fact and we derived numbers for the rest. They were again all integers. In puzzle solving, you are often looking for the coincidence that isn’t a coincidence, the designed structure that suggests you’ve found something important. Getting integers for every sequence? Yeah, that had to be puzzle content.

    Or was it? After failing to extract from values like -12335, the two of us working on the puzzle started to suspect that any polynomial defined by integer y-values would extrapolate to further integer y-values. “It’s finite differences right?” I considered this, said “Yeah you’re right”, but we both agreed that we were bad at math and should ask people better at math for a second opinion.

    Upon asking the room, the responses were 50% “IDK sounds like you know the math better than we do” and 50% “no this has to be puzzle content”. I asked for a counterexample where integer points led to non-integer extrapolations, showing that typing random integers into Wolfram Alpha’s polynomial interpolation solver kept giving back new integers. This ended when someone new looked at the puzzle, and proclaimed “I don’t know the math, but I do know this puzzle is definitely not about polynomial interpolation”.

    We abandoned the puzzle and it got extracted by fresh eyes a while later.

    So, Is This Guaranteed by Math?

    Yes, and it’s exactly because of the method of finite differences. If you haven’t seen it before, it’s pretty cool, albeit mostly useful in high-school math competitions that are long behind me. Do we have time for math? Of course we have time for math, what a silly question.

    If you have values \(a,b,c,\cdots\) that you suspect are generated by a polynomial, then you can take consecutive differences \(b-a, c-b, \cdots\), take the consecutive differences of that, and repeat. Eventually you will end at a sequence of all constants.

    1 4 9 16 25
     3 5 7 9
      2 2 2

    If you extend the constants, and propagate the difference back up, you get the next value of the polynomial. I’ll mark the new values in parens.

    1 4 9 16 25 (25+11=36)
     3 5 7 9 (9+2=11)
      2 2 2 (2)

    Since this is all addition and subtraction, each step always ends at another integer, so the next value of \(f(x)\) must be an integer too.

    As for why this is the case, the short version is that if you have an \(n\)-degree polynomial \(f\), then the polynomial \(g\) defined by \(g(x) = f(x+1) - f(x)\) is at most an \((n-1)\)-degree polynomial (all the \(x^n\) terms cancel out). The first line is writing out the values of said \(g\). The second line is the values of the \((n-2)\)-degree polynomial \(g_2(x) = g(x+1) - g(x)\). Repeating this keeps reducing the degree, ending at a 0-degree (constant) polynomial. Extending the constant and propagating the sum back up is the same as backtracking through the series of \(g\) polynomials back to \(f\).

    In fact, you can derive the closed form of \(f(x)\) from any such difference table, but if you want to know how, you should really just read the Brilliant article about finite differences instead. It has rigorous proofs for all these steps.

    What were we talking about? Oh right, puzzles! Unfortunately that was the only puzzle I did in this round. I was asked to solve some stuck clues in 🤞📝🧩 but was just as stuck on them as other teammates. And I didn’t look at the meta for this round.

    Hunt! (Saturday)

    I got to HQ at 8 AM the next day. Losecomm announced that the “forward solve” everything policy was gone. We were now allowed to backsolve, abandon hard puzzles, look at metas early, use free answers, and generally solve as fast as possible, with losecomm transitioning to a wincomm posture until further notice.

    This felt early to me, but after Hunt ended, I got the full story: TTBNL did an HQ visit, and losecomm took the opportunity to ask TTBNL if they could tell us where we were in the leaderboard. We were told we were outside the top 10. Historically, teams outside the top 10 don’t finish Hunt. Our handicaps were too strong and we needed to speed up if we wanted to see everything.

    I still did not try-hard as much as I could have, since I was approaching Hunt from a “forward solve cool things” standpoint. This is the first hunt in a while where I tried zero backsolves. Well, I’ve heard most of the Overworld metas were hard to backsolve anyways.

    Minneapolis-Saint Paul, MN

    Triangles - Oh boy, this puzzle. We started the puzzle on the D&D side, except I had no appetite to search up D&D rules, so instead I looked into the wordplay clues. We knew we wanted groups of 5 assembling a D20 from the beginning, but making it work was pretty tricky given the (intentional) ambiguity. Still, we were able to break in on some easier categories like single letters and the NATO alphabet. Once we got about half the categories, we started assembling the D20, using it to aid the wordplay. By the end, we had the D20 assembled despite only figuring out 10 of 12 groupings. My favorite moment was when we knew “Web browser feature” went to the “ARCHITECTURE + LITERATURE + PHYSICS” category, but just could not get it. Out of exasperation, I looked at my Firefox window and spoke everything I could see out loud. “Forward, back, refresh, home, tab, toolbar, extension, bookmark, history - WAIT, HISTORY”.

    With the wordplay done, we figured out the ordering of D&D rules, mostly from me inspecting network traffic and getting suspicious why the request was including the count of rules seen so far. After much effort, we got all the D&D data to be consistent with the checksums, but became convinced that the numbering on the D20 would be driven by combining the wordplay half with the D&D half, rather than from just the wordplay half. Looking at the hints, I don’t think we ever would have gotten it, and we were pretty willing to move on after being stuck for many hours. (The assembled D20 was stomped flat and tossed into a trash can at the end of Hunt. No one wanted to bring it home.)

    In a more normal Hunt I would be upset, but we were just trying to have fun and the wordplay part of the solve was rewarding enough.

    Yellowstone, WY

    The 10,000 Commit Git Repository -

    *puzzle unlocks*

    Brian + Alex Gotsis from teammate tech team: “Do you want to work on this puzzle?”

    Me: “Sounds terrifying. I’m in the middle of this Triangles solve, but I’ll come by after we finish or get stuck.”

    (Entire Git puzzle is started and solved in the time it takes us to get 1/4th of the way through Triangles.)

    Hell, MI

    I missed this entire round. The bits and pieces I saw looked cool, I’ll have to take another look later. The majority of this round was solved between Saturday 1 AM - Saturday 5 AM, and then the meta was handed off to people who’d actually gotten sleep.

    Las Vegas, NV

    The Strat - An amusing early morning solve. Came in, solved some clues, wrote a small code snippet to help assist in building the word ladders, and broke-in on the central joke of the puzzle. We then got stuck on extraction for a long, long time. We managed to solve it eventually by shitposting enough memes about the subject to notice a few key words, saving the people who were studying real-life evolutionary trees.

    I feel like this puzzle would have worked if the enumerations were either removed, or changed to be total length ignoring spaces. They helped confirm things once we knew what we were doing, but were quite misleading beforehand. We spent a long time looking at the “breakpoints” implied by the enumerations.

    Luxor - did some IDing, but quickly left when I realized I was not interested in researching the subject matter.

    Mandalay Bay -


    In my heart of hearts, I am a gambler. But I understand my flaws and don’t want to get into gambling in cases where it could lead to me losing real money. So instead I clicked the Mandalay Bay slot machine a few hundred times to contribute data.

    We figured out the mechanic pretty quickly, although we did hit some contradictions in the emoji-to-letter mapping that we had to backtrack a bit to resolve. After assigning most of the letters, the distribution analysis people came back with the outlier emojis and we solved.

    Our main objection at the time was the lack of a “roll 10x” or roll 100x” button as seen in puzzles like Thrifty/Thrifty. I suppose it wasn’t a huge deal though, we didn’t need as many rolls to break-in as that puzzle.

    Planet Hollywood - I helped on the clue solving and ordering. For the extraction, right before Mystery Hunt we had run an internal puzzle event with an identical “connect the dots” mechanic as this one. That one clued spots in a specific shopping mall, and we initially thought we needed to find the locations of each restaurant within the Planet Hollywood resort. I bailed, but looking back I see the extraction was less painful than I thought.

    Everglades, FL

    Oh boy, we really overcomplicated the Hydra meta and almost full-solved the round before finishing it. This was when we started using our free answers to strategically direct solves towards specific metas.

    How to Quadruple Your Money in Hollywood - Originally, I thought this puzzle was going to be a joke about Hollywood perpetually remaking movies, recycling plots to sell the same idea multiple times. I still don’t understand why the 2nd last entry is formatted like letters rather than email - is this supposed to be because the movie for this clue predated widespread email?

    Isle of Misfit Puzzles - The minipuzzles I did (East Stony Mountaion, Kitchen Island, CrXXXXgrXm Island) were fun, although I did spend a bunch of time meticulously coloring cells in Sheets to match the Clue board and then saw it get unused in extraction. If it got used in numeral extraction, we skipped that for most minipuzzles. We were confident enough in the Hashi idea that most minis only had 2-3 possible numerals, and we brute-forced the numbers via the answer checker. I generally appreciated partial confirms but think this was one case that went too far. Final step was still cool though.

    The Champion - I keep doing puzzles thinking they’ll be Magic: the Gathering puzzles, and then get baited into doing something else. We started by IDing the combat tricks. I am still embarrassed that our last ID was Gods Willing, I literally play that card in competitive.

    From there, we got Yoked Ox first, and I did the right Scryfall query to break in on the right set of 16 cards to use. This then led to a surprisingly difficult step of pairing flavor text to puzzle content. In retrospect, identifying a few matching words across 16 paragraphs of text was always going to be a bit tricky.

    What then proceeded was a total struggle of trying to pair The Iliad to The Theriad. I tried to do this, and mostly failed, getting distracted by Sumantle instead. The two of us who’d done all the MTG identification were both going to an event, so we called for help to fix our data while heading to the Student Center

    “what are you doing?”
    “fact checking the iliad”

    Nero Says - I have done Mystery Hunt for over 10 years and this is the first time I’ve done an event. Exciting! Normally I skip events for puzzles, but this year I wanted to try something new.

    I got picked for the “detail-oriented” event, which ended up being a game of Simon Says in a time loop. Featuring many gotcha moments, it was very unlikely you’d clear it the first time, but upon losing the first time we got a worksheet that hinted at actions we should take to solve the event and break out of the time loop. Every now and then, you’d have things like “find someone with the same birth month as you, then tie at rock-paper-scissors”, and the intention was that you’d remember to pair with each other again the next loop.

    One item on the worksheet was “what secret phrase will you unlock if everyone fails the first instruction”, and, in very predictable fashion, every time we tried to achieve a few people trolled by not messing up the first instruction. On around the 5th try, TTBNL decided to declare that we’d done it, although I definitely spotted one troll trying to keep going.

    We wheel-of-fortuned the answer before the event finished, but decided to stick around to see more silly stuff.

    The Champion, Redux - Coming back to the sheet, we saw the ordering had changed a lot. We didn’t fully ID everything, but had enough right to use the bracket to error correct. I set up the VLOOKUP to do extraction (cute cluephrase), and we finished the puzzle. We were very thankful this puzzle got solved. This is one of those puzzles where I would have hated it if we got stuck, but didn’t because we didn’t.

    5050 Matchups - Every few puzzlehunts, I assume something is RPS-101 when it isn’t. This time it actually was RPS-101! The consequence is that I ended up solving, like, ten copies of 5050 Matchups, while taking breaks from…

    Sumantle - This puzzle was cool at the beginning, but I have no idea what model got used for this, the scoring was incredibly weird. Our guess was that it was a pretty lightweight model, because the semantic scoring seemed worse than state-of-the-art. For many words in the first layer of the bracket, we had a ton of guesses in the 2-20 range without landing on the exact target word. The people on the team with ML experience (including me) tried importing multiple off-the-shelf word embedders, to search for new guesses via code, but we couldn’t find one consistent with the website. Word2vec didn’t match, GloVe didn’t match, and overall it just got very frustrating to be close but have no real recourse besides guessing more random words. We started joking about a “Sumantle tax”, where you were obligated to throw some guesses into Sumantle every hour.

    I feel this puzzle would have been a lot better with a pity option if you guessed enough words near the target word. Having one would have reduced frustration and probably cut our solve time from 5 hours to like, 2 hours.

    Mississippi River

    The Hermit Crab - We needed a hint to understand what to do (trying to break-in from the last one was a mistake), but this was a cool way to use previous Mystery Hunt puzzles once we understood what we were doing. Was very funny when we figured out how to use Random Hall. I ended up solving Story 1, 2, 3, and 4, although I had to recruit logic puzzle help to check some theories on how story 4 worked. For story 10, I got the initial break-in, but needed help on extract. Personally not a fan of classifying SAMSUNG NOTE SEVENS as “fire”, but it is defendable. We also needed a hint on how to reuse the final shell - I think it was fair but would have taken us a while to figure out.

    99% of Mystery Hunt Teams Cannot Solve This - LOL that this puzzle unlocked while our math olympiad coach was eating dinner. That is all.

    Newport, RI

    Najaf to… - teammate has a few geography fans, who write puzzles like First You Visit MIT and play GeoGuessr Duels. Many times in this Hunt, the “Geovengers” assembled and disassembled as puzzles that looked like geography turned into not-geo puzzles. This was the puzzle that finally made the Geovengers stick together for a puzzle.

    How does it work? Oh I have no idea. I didn’t work on it.

    Augmented Raility - video game puzzle video game puzzle video game puzzle.

    teammate is the kind of team that takes this puzzle and identifies all the games and 80% of the maps in the games in 5 minutes without using search engines. Finding the exact positions took longer, but not too long, and we nutrimatic-ed the answer at 5/10 letters. I wonder if choosing Streets for GoldenEye 64 was a reference to Streets 1:12 (slightly NSFW due to language).

    Von Schweetz’s Big Question - We did the first step, which reminded me a lot of Anthropology from Puzzles are Magic, and then started the second step, which also reminded me of Anthropology from Puzzles are Magic - not mechanically, more that I suspected the 1st step was an artificial excuse to make indices more interesting for the 2nd step. Unfortunately we didn’t finish this one since the meta for the round was solved before we got very far.

    Nashville, TN

    Sorry Not Sorry - Incredible puzzle idea. Not sure I like that it’s such a sparse “diagramless” but we were cackling for most of the solve.

    Duet (meta) - We understood the round structure of Nashville, and that there was likely something after Duet, but that didn’t make doing so easy. There was pretty significant despair at the midway point of this puzzle, which we got to at like Monday 5 AM. The people working on this meta needed a hint to remember that this was a video puzzle with information not seen in the transcript.

    Oahu, HI

    I did not work on this round. My impression of this round was entirely colored by Fren Amis unlocking, and every cryptics person leaving their puzzle to join the Brazilian cashewfruit rabbit hole for 12 hours.

    “help i am trapped in a foreign language cryptic and it is eating me like a sad cold blob of paneer”

    New York City, NY

    New York City was the round where we started think we’d unlocked the last round of the Hunt. It was not, we still had 3 rounds ahead of us. This hunt made me appreciate the design of the Pen Station round page in Mystery Hunt 2022, where you could see there was room for 10 regions in Bookspace. I’m starting to believe that it’s okay for Mystery Hunts to more transparent on their hunt structure than they normally are. The people who like optimizing Hunt unlocks can go solve their optimization problem, and the people who don’t will appreciate knowing where they are in Hunt. This is something that Mystery Hunt 2024 was worse at. (As a side note, I was also sad that there was no activity log, but we skipped implementing it in the 2023 codebase for time, so I guess I can only blame myself.)

    A More 6 ∪ 28 ∪ 496 ∪ … - Featured my favorite hint response:

    The puzzle is not about chemistry. It is about the US government.

    Intelligence Collection - Codenames is always a good time. We found the assassins pretty early, and initially thought it would be a do-it-again using the clue words to make a new Codenames grid. This felt unlikely, since it threw away a lot of information, so that part of the sheet was labeled “copium meta”. I had to explain what “copium” meant to someone unfamiliar with the word. This was a definite lowlight in my journey of realizing where I fall on the degeneracy scale. At least we figured out the colors from the copium meta!

    Queen Marchesa to g4 - Two Magic: the Gathering puzzles in one puzzlehunt? Surely this is illegal. Well, I won’t complain. I was hopelessly lost on most of the chess steps and mostly played MTG rules consulting and identification of the high-level objective. I was quite proud of figuring out the correct interpretation of the Time Walk puzzle, but this definitely paled to the bigbrain solutions for the Rage Nimbus and Arctic Nishoba boards. This puzzle was HARD but I felt it justified itself.

    Olympic Park, WA

    Oil Paintings -

    An image that is definitely of Mr. Peanut, not the Boston Celtics leprechaun

    Yeah IDK I’m still pretty confident this is Mr. Peanut.

    Transylvanian Math - This is the kind of puzzle that on its surface could be really annoying to solve, but ended up being really fun because the source material was so good. I’m curious, did anyone else break-in by finding a 2003 forum post written by a fan of Britney Spears?

    ENNEAGRAM - I was recruited to help unstick this puzzle. As preparation, they made a clean version of the sheet, excluding exactly the part of the dataset that was needed to extract, hence I didn’t get anywhere. If you’re new, don’t worry, this happens all the time. Shoutouts to this flyer many of us saw when walking back to hotels.

    A flyer for an enneagram class

    Gaia (meta) - This was by far the most memorable meta solve of Hunt for me, so, strap in, this’ll take a while.

    On opening the puzzle, we dragged a few stars around, noted some things, and identified the Vulpecula constellation. I figured out the Gaia catalogues connection, and started digging into the database to see what we could find. We tried to recruit astronomy help, and learned that:

    1. This year teammate had someone who looks for exoplanets in their spare time.
    2. They had already left and were traveling home.

    So, we’re on our own.

    We start by transcribing position and motion data of the stars in Vulpecula, to figure out the mapping between puzzle coordinates and star coordinates. After dealing with the horrors of understanding the sexagesimal system, and how to convert between hours and degrees, we muddle our way towards understanding the puzzle coordinates are just milliarcseconds from Earth’s perspective.

    By now, we’ve realized that there are 10 unique letters that we want to map to digits, so that we can get new motion vectors for the stars. But, we don’t know how to do so. I propose that the “= X” for each answer is the sum of the letters in the answer, but this gets shot down because it feels too constrained (the value for ENNEAGRAM’s answer is very low for its length).

    I take a break to get some water, and, thinking about the meta, decide that it’s still worth trying. I don’t have any better ideas on how to use the answers, and I know how to write a Z3 solver to bash it with code. I’m confident I can prove the idea is 100% correct or 100% wrong in at most 30 minutes.

    By the time I come back, the other Gaia meta people have independently decided it’s worth trying the “sum of letters” idea. What follows is a teammate classic: can the programmer make their code work faster than the people working by hand? I win the race and find our 4/6 answers only gives two solutions.

    Excitedly we find that one works and one does not, so we’re clearly on the right track. We split up the work: one person generates the IDs (“I can write a VLOOKUP”), a 2nd generates the target positions (“I understand milliarcseconds”), and a 3rd drags the stars into place (“I have a high-quality mouse”). This gets us to Columba, we extract letters from the Greek…and get stuck.

    We have all the right letters, and we even realize that it’s possible the Gaia star contributes to the answer. However, we assume that if it does, its extraction works in the same way as the other stars. (Solving with this assumption gives that the 5th letter of the Gaia answer must equal the “alpha” letter, or 1st letter, of the Gaia answer. It’s weird logic, but can be uniquely defined.) Our lack of astronomy knowledge comes back to bite us, and no one on teammate thinks to look more closely at α Columbae until TTBNL gives us a hint at Monday 5:30 AM that the Gaia star contributes 5 letters to the meta answer.

    The puzzle was all fair, and I had a lot of fun up to the end, but I do wish the ending was a little more direct on using Gaia. We were stuck at “BE NEON?” for about 6 hours.

    Sedona, AZ

    A Discord message saying "Sedona round unlocked", with various "NotLikeThis" reactions

    I didn’t work in this round, it passed me by while I was working on the Gaia meta. I did look at the meta, which we seem close on, but this is one of the 2 metas we didn’t solve by end of Hunt.

    Still, one story that doesn’t fit anywhere else. Around Monday 12:20 AM, we realized it would be the last time we could get HQ interactions for the night, so we sent this:

    Hi Benevolent Gods and HQ,

    Us mortals of teammate would like to express our willingness to assist the gods with any tasks that they may need, perhaps in exchange for another “free answer”?

    TTBNL obliged, giving us the Hera interaction early.

    “So, how long have you been working on the Hera meta? It’s a tricky one.”

    “We just unlocked it.”

    “Oh. …The gods have decided to give you two free answers!”

    “Can you do three?”

    “Sure we can do three.”

    And that’s how we finessed three free answers right before getting kicked off campus.

    Heart made of shadow puppets

    Part of the Hera interaction, where we played charades via shadow puppets.


    There’s another round? Yep, there’s another round. This time we were pretty confident it was the last one, since we knew how many puzzles it would have and we were no longer unlocking things as we redeemed free answers elsewhere.

    Since we unlocked it so late, most of the work here was done out of hotel rooms. Across teammate, we designed some hotel rooms as “sleeping rooms”, and others as “working rooms”, shuffling people around depending on desire and ability to stay awake. I didn’t come to Mystery Hunt planning to pull an all-nighter, but with us being in reaching distance of an ending, I decided to stay up.

    Halloween TV Guide - I keep telling people that My Little Pony is a smaller part of my life now, but the first clue I solved was the MLP reference. I am never beating the brony allegations. I left this puzzle after the “octal-dec” break-in since I didn’t want to grind TV show identification, but I did successfully use Claude to identify some TV shows from short descriptions.

    Appease the Minotaur (meta) - It was nice to see a metapuzzle unlock from the start of the round. I broke in on beef grades, helped transcribe the maze borders into Sheets, then dozed off in the way you do when your body wants to sleep but you’re trying not to. We knew what we were doing the whole time, so we decided to drop it and work on other metas, waiting for free answers to get full data before trying extract. I bet that if we had really tried, we could have finished at 5/8 feeders, but solved at 8/8 instead.

    A Finale of Sorts

    Coming back to campus in the morning, solving had mostly slowed down to chipping away at metas and waiting for increasingly delayed hints. Once the “coin has been found” email came out, we got a call from TTBNL giving the runaround cutoff time. They said that based on our solve progress, we likely wouldn’t make it. This really killed the motivation to start work on the last part of Nashville we just unlocked, so many of us decided to start cleaning up HQ instead.

    Stats aren’t out yet, but I believe with our final push we got to around 7th. A bit disappointed we didn’t finish, but we had a good showing. Looking forward to next year!

    An image of Tame Meat, flying

    (Tame Meat enjoying a brief period of flight.)