Pair Programming Interviews for Data Science Roles

October 19, 2023

Pair programming interviews are widely used by software engineers to assess a candidate’s technical and communication skills. However, they are not very common in data science interview processes, with take-home assignments being the norm. I’m not a fan of take-home assignments as they are very time consuming for candidates, putting busy people (like parents or those with current full time employment) at a huge disadvantage. I mean, “a couple of hours” is never enough time to do a half decent job of any analysis! Furthermore, as an interviewer I gain far more insight into a candidate’s capabilities by observing their thought process in real-time, and it provides an opportunity to observe how they communicate and collaborate with others.

I believe pair programming interviews are uncommon in data science due, at least in part, to the technical challenge of running them. Software engineering pair programming interviews have plenty of options, such as IDEs with collaboration features like VSCode or PyCharm, cloud environments like AWS Cloud9, and online platforms like CoderPad. However, these options are not well suited to data science interviews, for which Jupyter notebooks are the optimal environment. IDEs and cloud environments are designed for software engineering - whilst some support notebooks I’ve always found them very buggy - and, with the exception of Google Colab, online platforms are expensive and aimed at more than just coding interviews, not to mention the lack of privacy.

Jupyter have recently released a collaboration feature, which is free and doesn’t require a server. It’s not perfect, but it’s good enough for a pair programming interview. The main challenge is the candidate must be able to access the Jupyter server running on the interviewer’s machine. However, this can be overcome by using Ngrok, which creates a secure tunnel to your notebook. Ngrok is free for personal use, but you’ll need to sign up for an account to use it. The other issue with Jupyter’s collaboration feature is the candidate will be able to execute code on the interviewer’s machine, which is a security risk. However, this can be mitigated by running the Jupyter server in a Docker container.

I have created a repository with all the code and instructions you need to get started, which you can find here, so I’m not going to go through this in detail. Once you have cloned the repository, the high level setup is as follows:

Customise the data creation script and build it with make data, or just add your data to the data folder.
Build and run your Docker container with make build run.
Start Ngrok with ngrok http 8888.
Share the link and Jupyter token with the candidate.
Enjoy your interview!

I hope this helps you run pair programming interviews for data science roles.