Starter Guide to Biostatistics Academic Consulting

Author

Kaitlyn Lee

Starter Guide to Biostatistics Academic Consulting

This guide is broken down into two parts. The first part consists of things I wish I knew before my first consulting session. The second part is a list of things that my consulting partner (Sylvia Cheng) and I (Kaitlyn Lee) do that I think helps make our consulting sessions successful. I still have a lot to learn, so this is by no means supposed to be definitive or the right way to think through things - rather just a collection of thoughts I’ve curated after reflecting on the semester. This was originally written after Fall 2024, the first time I took this course. I, in general, still stand by this document. I have added a few amendments in italics.

Things I Wish I Knew

  1. You know more than you think! I was super nervous going into my first consulting session because I felt like I knew nothing. However, the expertise you bring to the consult is not necessarily a super deep understanding of every possible method someone may be considering, but rather the mathematical and statistical sophistication to be able to help make suggestions for a particular question. When I first started, I felt silly turning to Google during every consult to look things up – I thought, surely anyone can just use Google. However, I quickly learned that clients see our ability to find relevant papers/coding packages and make actionable suggestions on the spot to be almost like a super power. Have confidence in yourself! You got this.

  2. At the same time, your client is the subject matter expert. Ask lots of questions about their data - it will help give you a better understanding of the problem at hand.

  3. As Sylvia Cheng put it, 90% of consults result in suggesting GEE (generalized estimating equations) (F25 amendment: definitely fewer GEE consults this semester. However, the following is still good advice! See number 6 for an update.) . GEE is a regression method that handles dependent data. It can handle repeated measures data (ex: patients with multiple measurements over time) and clustered data (ex: analyzing individual test scores where you think there may be a dependence between people in the same classroom). Once they run a GEE, they can construct some type of plug-in estimator depending on what their estimand of interest is (clients generally find the concept of a plug-in estimator pretty intuitive).

    1. Another common approach when considering such data structures is GMM (general mixed models), or random effect models. Alan Hubbard et al. wrote a lovely paper explaining why they believe that GEE is a better method (Hubbard AE, Ahern J, Fleischer NL, Van der Laan M, Lippman SA, Jewell N, Bruckner T, Satariano WA. To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health. Epidemiology. 2010 Jul;21(4):467-74. doi: 10.1097/EDE.0b013e3181caeb90. PMID: 20220526.) I would recommend at least being familiar with this paper, 

    2. Sometimes students may ask if there are any classes you recommend they take if they want to learn more about analyzing this type of data. PH C242C: Longitudinal Data Analysis is a great course historically taught by Alan. The intended audience for this class is non-Biostatistics PhD students and Biostat/Epi MPH students - however, Biostatistics MA/PhD students take this course and find it fulfilling as well!

  4. If your client is a graduate student, there is a good chance that they already have a method that their advisor suggested they use in mind. Sometimes, these methods are great, and the focus of the session is to help the client understand why the method is a good fit. Sometimes, these methods are less great. Ultimately, it is not your job to tell other graduate students to go against their advisor’s advice. Instead, I try to focus on 1. helping the student understand what the suggested method is and what the underlying assumptions are and 2. perhaps provide some other methods/food for thought for the client to keep in mind as they build their statistical toolbox. As Alejandro (roughly) put it, “The real product [of the consult] is not the project but the person coming out of the session.” As long as the client leaves the session feeling like they have a better understanding of the method that they intend to use and what some of the limitations are, that’s an A+ consulting session in my book. 

  5. Setting boundaries is very important. This may look like boundaries around your time/investment and around your emotions. I have had more than one client ask me to meet outside of my set consulting hours. I decided to create a general policy that I would not meet with clients outside of my set hours, particularly during times in the semester when things were very busy. In general, people would respect this wish and book during my set times. It can be hard to say no, but I think it helped me be a better consultant - I never found myself too overwhelmed or unprepared for a consult. Also, as Andy Kim put it, consulting can be like therapy for some of your clients. Emotions can run high, particularly when people feel in over their heads, embarrassed because they don’t get things quickly enough (in their own minds), and generally overwhelmed. Ultimately, you don’t know what people are going through, and clients may feel nervous or frustrated coming into sessions, so in general I find it best to assume best intentions. Also, if the client leaves in a bad mood, it is not necessarily a reflection of you or your consulting abilities. Don’t let it get you down - onwards and upwards!

  6. In Fall 2024, I would say that 90% of my consults resulted in GEE. However, many fewer consults resulted in GEE in Fall 2025. Instead, I would say that most clients came in with code generated by prompts to ChatGPT, and the client would ask if the code looked right. Usually, the code would run without errors; however, very few clients could actually explain what they were trying to do or, perhaps more importantly, why (sometimes, they weren’t even sure what their research question was). There were typically two outcomes from these types of consults. The first is that the client is willing to take a step back from the code and use the consulting time to discuss what their research question is, what their data looks like, and what methods are actually suited given the answers to those two questions. From there, we can then find packages or code that implements the method of interest. The second is that the client really just wants you to give them a stamp of approval on their code or really want you to help them use the ChatGPT produced code in another work stream that is better suited for their data/question. The former tend to be more productive consults because you can actually get at what they are trying to do, rather than what ChatGPT thinks that they are trying to do. Some of my best consult this semester looked a lot like the former scenario. The latter tends to be more difficult, particularly if the client really can’t be convinced that using the code that they don’t understand isn’t a great idea. However, ultimately, similar to the advice in number 4 and 5, it’s not necessarily a reflection on you if the client leaves still determined to use whatever code they came with, even if it doesn’t really make sense. If you can help guide the client in the direction of getting them to think about what they are actually trying to do and what the limits are of their code, even a little bit, that’s still a success.

Helpful Things I Think We Do

Sylvia and I have had quite a few repeat customers this semester. Sometimes I think it’s just because it’s the most convenient time for clients, but I also like to think that they enjoyed our session enough to want to come back. Below are a few things we do that I think clients like and keeps em coming back for more.

  1. As we have discussed above, consulting is like therapy for some clients. So much of the first 5-10 minutes (or more!) of the session, in addition to introducing the consultant to the problem, is helping the client get comfortable. You are the expert in biostatistics, but the client is the expert in their particular subject area. Remind them of this by asking lots of questions (this also helps you get a clearer picture of the particular context/DGP). You can even admit that you find their field complex/confusing - this can help put them at ease because they likely feel that way about statistics! In general, I find that asking lots of questions and showing interest can help clients feel more confident, which ultimately leads to better conversations. It can also help make the rest of the session way more productive by getting everyone on the same page early on.

  2. We like to frequently check in with our clients. An hour is a long time, and if the client feels lost 15 minutes in, it’s likely that the rest of the session won’t be as productive. Ask if things are making sense, or if they have any questions along the way. Remind them that no question is too basic. I often found that clients wouldn’t express confusion until asked if they were confused.

  3. If you have another consultant in your session, don’t be afraid to lean on them! I really appreciated having Sylvia in the room with me to ask her a quick “does this idea make sense to you?” or “is there anything else you think would be helpful for this problem?” Some of the problems can be really complex and won’t necessarily have a nice, straightforward answer, so asking another expert in the room if what you are saying makes sense can help you feel more confident and help your client feel more confident in your recommendation!

Right after the session, we spend 1-2 minutes writing a quick email with any links to papers we discussed or packages we mentioned. It doesn’t have to be super detailed, but letting your clients know that you will send them along after the session helps them focus on what you are talking about (rather than the client spending time during your session scrambling to write down the papers you find). Below is an example of such an email.