I’m Not Smart Enough to be in Data Science. Should I Change Careers?
I feel as though I am not smart enough to be in the field of data science. Should I change careers?
This is a question on Quora, and something I imagine many newcomers to the field ask themselves. As Daniel Addyson highlights in his repost of my article, “understanding the underlying mathematical/statistical concepts is important, but folks frequently seem to confuse data science with being an academic mathematician. Sean McClure defines the boundaries between understanding, using, and developing mathematically based products.”
I repost here to encourage the discussion and help educate others on what it means to be “smart” in today’s information economy.
— — — — —
Data science is not about mathematical rigor or deep, reductionist approaches to analysis. In fact, those who focus on such things are often poor data scientists. This is because Data Science is about building real-world products that leverage the power of machine learning to produce the predictive and explanatory outputs needed to drive innovation. The machine learning that works in the “real world” is not the algorithms that squeeze an extra 1% of predictive accuracy out of data, or beat humans at playing games and recognize when dogs are laughing. Real-world machine learning is about folding software development and domain experience into the machine learning workflow to convert imperfect enterprise data into a product that adapts to its environment. It’s about knowing how to gain your sophistication through iteration instead of upfront design and mathematical elegance.
Writing equations on the whiteboard or dipping endlessly into academic journals is almost always a symptom of poor “package awareness” meaning you don’t have a solid grasp of the open source tooling needed to create machine learning products. Without this education in tooling your white-boarding efforts are almost guaranteed to be reinventing the wheel. There is over half a century of machine learning research behind us, and a wealth of algorithms and validation techniques available through high-level tools. Data science is not hurting for lack of algorithms. Great data scientists are not solving equations, they are using a conceptual understanding of math and the assumptions of their algorithms to solve highly complex, non-ideal (i.e. non-academic) problems; the ones that happen outside the idealized ivory towers.
You have to understand that society’s idea of “smart” comes from the industrial revolution, where hard-coded rules, mathematical thinking, and an adherence to logic is what helped the war effort. All machines until now relied on strict rules-based logic to produce their behavior. This is not the world you are in anymore. The machines we are now building have their behavior emerge from models that understand their environment through data. In other words, humans are not the ones writing computer programs, the machines are. If you focus on rules and strict logic you will be a poor substitute for what the machine can do on its own. This is the information age, and the skills we need look very different than those that dominated in the past.
As a data scientist you are expected to know how to train computers, not program them or endow them with so-called sophisticated math. Data scientists with an abundance of “hard skills” but who are short on soft skills fail to perform on the job. I’ve seen it many times. The sophistication afforded our products is achieved by hitting problems from many angles and rapidly iterating on the machine learning workflow in concert with the rest of the product. This is 100% impossible if you are diving deep into math and ignoring the high-level tooling made available from an amazing community of machine learning practitioners. Using machine learning libraries is how you move away from the naive academic approach (upfront design) and towards the more realistic discovery that happens when multiple algorithms are attempted rapidly; where the data leads the problem solving, not someone’s “smart” guess at what technique should be used.
Data Science has suffered from the media frenzy around the attention-grabbing accomplishments of larger companies and research departments. While these feats are exciting from an academic standpoint they have little to do with real-world machine learning where products are being built. The industry isn’t in need of an academic array of hard skills and outdated engineering best practices. We are in need of those who can take the softer, less naive approach to understanding human requirements. Those who can rise above the technical weeds and learn to work with machines, balancing the computer’s ability to write programs with our ability to understand strategy and value.
The world is about to become very different and the word “smart” is about to be redefined. Softer skills and a capacity to train machines as a behaviorist is the hot skill of today. If you love this field then stay with it and know that your reliance on open source, prepackaged libraries is the smart way to go.