Does “Pair Programming” Work in Data Science?

Sean McClure
3 min readSep 30, 2017

Pair programming comes to us from Agile (specifically XP) and the debate about its overall benefit is often a charged topic. The main advantages include less overall error in the codebase and more efficient learning.

Less error in the codebase is a byproduct of having more eyes on the code. More efficient learning is of course dependent on the quality of your “pair.” When done well, pairing allows less-experienced developers (or those less-exposed to the codebase) to learn about the code as well as the thought-processes that go into designing the needed application.

The downside/challenge is that everyone works differently and this can make it difficult for people to work concurrently. For example, some people are more introverted and like to “stare off” while thinking of a good approach. Others jump into code immediately and pivot as they write. If you’re paired with the “wrong” person it can be hard to move at a pace that works for both people.

So, where does Data Science fit into all this? I would argue the benefits and downsides are the same as in mainstream development, but in the context of analysis. I’ve pair-programmed with Data Scientists who literally whip out paper and start doing “back-of-the-envelope” matrix algebra to think their way through a problem. I on the other hand would much rather code up many approaches rapidly, pivoting based on observed outputs. 2 passionate data scientists with 2 very different approaches. In these instances it is literally slower to work with someone than on your own.

I have also paired with those that work closer to my style. This allows us to establish a cadence and move through problems more rapidly. It also means we get to learn from each other as both individuals have unique experiences.

The ideal pair is someone who is similar enough in style to maintain a steady pace, but different enough in experience to allow for mutual learning.

If you can strike this balance then I think pair programming in Data Science is worth it. But be picky with your pair choices. It can really slow things down when the balance isn’t there.

There are also instances where you may have to pair with other practitioners, like data engineers or software developers. This can be as challenging as it is rewarding. While professionals of the same vocation will always be at least somewhat similar in their thinking, engineers and scientists can be very different.

Engineers are concerned with building scalable systems that won’t fail (outputs should never change), while data scientists are all about creative/messy exploration and adaptive/modifiable approaches (outputs are always different). This can cause a lot of friction between the 2 distinct paradigms of making software.

But applications need to grow holistically, and if you can find the right balance between the engineering and scientific mindsets it’s well worth the challenges. The engineer will learn to be more flexible in their approach to producing software behavior, passing some control over to the scientist’s models. The scientist will learn coding best practices and gain an appreciation for building reliable software.

It’s all about balance.

--

--

Sean McClure

Founder Kedion, Ph.D. Computational Chem, builds AI software, studies complexity, host of NonTrivial podcast.