What we expect PhD students to know and learn

I have now been advising (computer science) PhD students for about a year and a half. This is just long enough, combined with my own recent experience of being a PhD student and watching those around me, to have encountered some repeated patterns of stumbling blocks and growth and to start thinking about whether there are ways to support students in a more scalable way than through one-on-one advising.

One-on-one, face-to-face advising is widely considered (and in my opinion) of the most valuable parts of doing a PhD; in few other professions does one have an opportunity to receive this kind of personally-tailored mentorship through topics of intellectual curiosity. However, it has its pitfalls. One of the most common stumbling blocks for students is that the advice, guidance, structure, and content of education provided to students by one advisor differs dramatically from that of another advisor, even in the same field. Meanwhile, the PhD program run by the student's department, not to mention the outside world that will hire them when they graduate (or leave without the degree), seems to have an entirely different, disjoint set of expectations for what they should be doing during their PhD and the skills and knowledge they are expected to acquire.

For this reason, we may hear questions from students like:

What all should I know by the time that I graduate?

How much should I be reading to extend my knowledge of the field vs. working on my own results?

Is it ok that I don't know anything about [tangentially-related subfield]?

What skills do I need to have to be an effective researcher in your lab?

This sort of question is extremely difficult for academics to answer. Most of us live in a culture of "learn more, do more" whenever possible. However, this is irresponsible advice to give; we know that knowledge is fractal and students can spend endless time down rabbit holes that don't get them closer to graduating. On the other hand, it is just as irresponsible to allow a myopic trajectory in which a student only stays focused on their thesis topic and does not develop a broader understanding of their field, especially when it comes to developing skills that will make them attractive to employers.

I think it's in the magnanimous spirit of trying to answer questions like these that academic and techy bloggers author posts in the format of The Checklist. I think of Matt Might as the most prolific academic author of posts like this, with the trio "What every computer science major should know", "Reading for graduate students" (previously titled "Books and papers every graduate student should read") and "12 Resolutions for Programmers" ; John Regehr also recently posted a list of tools programmers should know. When I voiced skepticism of this kind of list, a number of people responded with a lot of support and gratitude for them, and indeed Regehr's post was primarily in response to a student need, a lack of clarity around expectations.

(In some sense, a syllabus for a class is "just a checklist" like the above, and certainly the structured nature of the list, even if not every topic is interesting and relevant to every student, helps students develop a sense of what "constitutes" a subject area.)

So I got to wondering what it would look like to make checklists for a PhD in each of the disciplines my research covers (programming languages/formal methods and computational media/games). Of course, a research field is much broader in scope than a single class, and one person's thesis topic will only touch on a tiny subset of that field, so it is much harder to find a real "core" of what people are "expected" to know. For that reason, I think it's important to approach lists like these (and like the ones I linked above) not as checklists in the sense of it being important to check off every item, but rather as a map of a territory upon which you are welcome to plot your own itinerary.

I made a survey (which you can still complete if you like) asking for contributions, then went over the results and took stock of the commonalities and discrepancies in answers. Below I report what some of the domain-specific knowledge and skills are that people reported. (I am actually merging together the "before PhD" and "after PhD" sections here, which I comment on further below.)

Programming languages/formal methods/logic

How to do proofs
Logic
Grammars
Algorithms and data structures
Proofs as programs (AKA Curry-Howard)
Type systems/type theory
Lambda calculus
Combinator calculus
Turing machines
Language implementation techniques (e.g. compilers, interpreters, type checkers, runtimes, garbage collectors)
Programming paradigms (e.g. experience with functional programming, logic programming, object-oriented programming, etc.)
Domain theory
Hoare logic
Abstract interpretation
Parametricity
Model checking/temporal logic
Process calculi
"A way of thinking about structure and aesthetics" - one respondent identified this as the only common ground that PL researchers might have with one another.

Computational media/games

Programming; data structures and algorithms
Generative algorithms ("a few"; "nothing specific")
Computational creativity philosophy
Expressive range (Smith & Whitehead)
Micro-rhetorics (Treanor)
A game creation tool
Taxonomy & survey of PCG (Togelius)
Design patterns
Quantitative/qualitative research methods

In addition, people listed several items I would catalogue as general research skills and knowledge that is valuable to students in any field:

Familiarity with a body of work: major results and recent research
Community (presumably: who's in it now; who did what impactful work)
Sub-areas (presumably: what is the terminology people use for what they are working on; how do different areas relate)
Ability to manage people
Ability to write papers
Ability to procure funding
Know your own goals
Applying the right tools in different contexts

Respondents had surprisingly varied opinions about what was expected of a student before vs. after the PhD, which is why I just decided to merge the two lists. Some said that what students came in knowing wasn't really relevant, as long as they had a little bit of background and/or interest and/or "strong technical skills", but then felt that by the end of the degree, students should know an extensive list of things. Others felt that it was very important that students come in with a very specific skillset, but then when they finish the PhD, they may have diverged substantially from other researchers in their area, so it's hard to characterize what they would know. The divergence of these opinions strikes me as interesting, because clearly the faculty who require a smaller skillset have a different approach to training their students to be proficient in their field.

Is a common body of knowledge for a field important?

Folks identified two benefits to lists like the above:

1. They enable efficient/effective communication among experts. One respondent noted "it is required to have in-depth discussions"; Jon Purdy noted that "it enables us to communicate efficiently; we can rely on shared information that doesn’t need to be repeated when introducing new ideas."

2. Establishing clear expectations for students: Antwane Mason notes, "research is fundamentally an uncertain process so having a common body of knowledge can help scope research and make problems concrete." He also makes the practical observation that "academia can lead to an overemphasis on results at the expense of skill development in the graduate program. Having a common body of knowledge can help ground the graduate process in developing practical research knowledge and skill."

On the other hand, folks identified some cons, which I group as follows:

1. Fields are too broad in scope: one respondent observed that "not everyone in a field must have the same degree of common knowledge."

2. It can limit open-mindedness to new ideas: "The larger the field of knowledge that is brought to my field, the better chance of more interesting solutions to open problems. Diversity reigns supreme."

3. It can create or trigger impostor syndrome: this was the meat of Derek Dreyer's response to my Facebook post about the survey, and also the one I relate to the most. As Derek puts it, "There is *so* much about PL and formal methods that I know hardly anything about, and I constantly feel like an impostor because I don't know it." ... "there are certain things that people have heard of and believe to be well-known and so they generally won't say publicly that they don't understand [...] Do most people know these things, and do most people need to know all these things? No and no."

A certain professor I know used to periodically deliver a well-known rant to the students about how we could not rely on our advisors for our education, and we were apparently expected to know a wide range of subjects that we would inevitably need to learn most of on our own. While this was good for impressing on students the value of independence, it also had a way of making some of us at our more vulnerable times feel that we would never know enough.

For students to be able to trust their own independence enough to take leaps of faith into productive self-directed learning, we need to facilitate their confidence by giving them maps (and compasses, lanterns, spelunking gear, etc. ... twisty little research passages). We also need to help them understand that covering the entire territory is not the goal --- the goal is to find a route so interesting that they want to go beyond the edges of the map.

Everything in Context