Convening The Flag

I just got back from three months at the Recurse Center in NYC. One of the things I did there was co-organise / lead a group (with the excellent Dom) that worked through a series of Capture the Flag challenges. We used the first Stripe CTF, which has a definite unix/C flavour and is reasonably easy for people to get set up on their own machines. I’m going to talk here about what we did, what worked well, and what wasn’t so good.

What We Did

We started by putting a kickoff event on the calendar. I figured out in advance how to do the first level of the CTF, and at the kickoff event I plugged my laptop into the projector, gave a little spiel about how I imagined this would all work, opened up the first level, and asked the room for their thoughts on what to do next. In my introductory remarks I described a process of looking for something that’s a little off, and learning about it or poking at it until you either find a way to make it do what you want, or the value of this investigation goes down enough that finding something else to investigate is more valuable. I also tried to assess the interest in doing these Unix/C challenges versus the interest in doing a series based more on web security.

My goals when working through the first level as a group were to try to make these challenges feel approachable, to hear from as many voices in the room as possible, and to surreptitiously prevent the group from going too far down any unhelpful paths. I don’t think I can take too much credit for getting to hear lots of voices, because I suspect RC people are generally mindful about not dominating conversations. I was pleased with how the conversation went, though, because when somebody answered a question I could ask the rest of the room a follow-up question and get the next answer from somebody else. I think there was only one point where I deliberately declined an answer from somebody and sought a response from elsewhere in the room.

After capturing the first flag together, there was a brief intermission while I figured out ports and firewalls and IP addresses to access the second level. We then did the second level together, but it was pretty scattered. A lot of folks had wandered off, and while I’d been tinkering lots of side conversations had started up and I think many of those were more compelling than the second level. It also didn’t help that I hadn’t prepped the second level, so I was genuinely exploring possibilities along with everyone else in the room. That meant I didn’t have as much attention to devote to looking after the conversation.

We then went our separate ways. I tried to make myself available to answer questions and offer hints during the following two weeks, and then we reconvened to discuss levels 3 and 4. I think it’s safe to say that level 4 was the one that took most time for us. It was also the one with the least code. This is not a coincidence! The obvious thing to do in response to the bugs in the other 5 levels involved very little extra code, whereas most of us responded to level 4 by trying to inject some shellcode. That made it an awful lot trickier to get right. I was very impressed by the variety of solutions for level 4. One person had a very elegant ROP-gadget + shellcode attack; another found a way to disable Address Space Layout Randomisation, and managed to arrange the details for a jump to system and pass it exactly the right command to capture that flag. Another participant had a preposterously dunderheaded approach based on running things enough times that the randomization didn’t matter, and putting a binary on the path with a name taken from some gibberish that happened to be being passed tosystem consistently.

Two weeks later, we met again to discuss levels 5 and 6. In theory I was still available to offer hints, but I hadn’t solved either of them before, so I could really only be a sounding board. One issue we had was that some of our participants took a while after our second meeting to figure out the details for their level 4 attack and didn’t have enough time to complete level 6. That meant we had fewer people with solutions to discuss, but the different tenor of the conversation was pretty fun. For instance, I initially attempted a very subtle attack, which completely failed, but I was able to reuse the infrastructure I’d built up to do a better attack very efficiently.

For the latter two meetings, I had a bunch of questions in mind beyond just “how did you do it?” to keep the conversation going. They were questions like:

what did you learn?
what was the most annoying thing about these levels?
what struck you as particularly cool?
are there any tools you wish you’d had to solve these?
who did you learn from? who did you teach?

I also had some questions about the thing as a whole for the end of the last session:

what would you change about this CTF?
what was your favourite level?
which attack are you proudest of?

What Worked Well

We had about 20 people for the first meeting, then roughly 8 for the second, and around 6 for the third. One thing I enjoyed was people who hadn’t completed the challenges I picked for the group, or who had picked out other tasks entirely for themselves, came along and could tell us a bit about what they’d been up to. They were able to use their lack of knowledge of the Stripe CTF to force those of us who were doing those challenges to explain things properly.

For the most part, people worked in groups. People would often work together for a time, then peel off and do something else. I worked alone for all of these challenges, but I had some conversations with folks about what sort of things they’d been trying that helped me avoid getting too stuck. This was especially helpful with level 4, where it hadn’t occurred to me that I might not need shellcode.

I think I did an okay job of encouraging people to explain their attacks more fully. For instance, some people used ROP gadgets and some people had no idea what that means. It’s easy to forget that not everyone knows about all the things!

We liked this set of challenges, for the most part. When I asked everyone what they’d change about this CTF, the answer was basically just “more!”. Many of the levels had multiple solutions, and they covered a wide variety of security problems that have occurred in the wild.

What Wasn’t So Good

The Recurse Center is full of people in different parts of their programming journey. Many people there have only ever used python or javascript, and many aren’t yet familiar with web programming. People might use linux but only have beginner-level knowledge of how shells work. I would have liked to have a set of challenges that worked for everyone who turned up interested, but with such a broad set of people I doubt that it would be possible.

I asked a question in that first session wondering if we should abandon the Unix/C challenges in favour of some web security challenges. I did not handle this well, though. Only two people raised their hands, but there’s a good chance that 4 or 5 people wanted to raise their hands but didn’t. After all, that is how raising hands works. And because only two people raised their hands, I continued with my initial plan. To recap: I drew out two people to express an unpopular opinion, proceeded to disregard the opinion that they had bravely expressed, and made a few people feel awkward wondering if they should raise their hands or not. I could have asked the question in a way that stacked the decks in favour of the web security challenges (perhaps by asking if people didn’t feel like they knew enough for the Unix/C challenges, rather than asking if they felt they’d do better with web security), or just skipped the question entirely and explained why I thought this set of challenges would be good.

I wish I’d figured out more levels in advance so that I could give good hints for them. This burnt me particularly badly for level 3, where I worked out the hint I wanted to give out after I was done giving hints for that level.

I’d like to learn more strategies and tactics for keeping a group conversation ping-ponging like it did in the first meeting. It went really well, but in a room like that, it’d only take a few people playing by different rules to make the session take a very different path.

I also wish I’d had a better inclusiveness toolkit for people who had got stuck on level 3 or 4 but kept coming along. I wanted to get them into the conversation, so I asked if they’d finished level 3, and they said no, and I didn’t have anything to ask next. Since this was fairly late in the conversation that day, I could have asked if there were any techniques in the discussion that they’d used before, or that they’d like to use, or if these conversations were making it easier or harder to believe that they were writing secure code.

Would I do it again?

Yes!

Huge thanks to the folks at Stripe for putting the CTF challenges together and distributing them after their competition ended, to my co-conspirator Dom, and to everyone who came along to any of the sessions.

And finally, a huge congratulations to my fellow flag-capturers. It was a pleasure to hack a few of the things with you.

Thanks to Annie for commenting on a draft of this post