STEM PhDs as Data Scientists
Updated: May 14, 2019
Nanotech NYC is beginning a new series of blog posts focused on exploring alternative career paths for those with backgrounds in STEM, including those with research or industry experience in nanotechnology. This post, kicks off the new blog series. The post was written by Frederick Pearsall with help with Shruti Sharma and Jacob Trevino.
If you are anything like me, you are about to graduate from a STEM PhD program, you have heard a lot of hype about data science, you are extremely interested and excited to learn, and you also have little (or no) idea about what a data scientist does (or what one is). A simple Google search (http://bfy.tw/3ahU) leaves much to the imagination and is reminiscent of the first time a professor tried to define nanotechnology to me in class. What got me the most interested in data science is the analytical side, and of course, the programming. That is to say, I understand I may not be able to code like a computer science major, but I can at least think like a PhD and analyze accordingly, using some coding skills in my toolset. In a romantic way, I think of data science as the analysis of all forms of data, not only in the traditional research sense. As an experimentalist, I view data science as a liberating way to learn about the world around us, not so constrained by those pesky natural laws. I also know the available positions for data scientists are growing, as well as their salary (the figure below does not even take into account the position of Senior Data Scientist with an overall average salary of $136,663/yr):
Now, I could delve into some more statistics and metrics but just what does a data scientist do? Where do they work? The inside of my head started to look like the 3D projection in the image below. I was confused, and I wanted some clarity. We reached out to two incredible PhDs in STEM currently working as data scientists, in hopes of clearing things up. This post is meant to allow you the reader to gain insight on data science as a career option for those with a STEM background. Both interviewees have positions in industry and gave some great advice on how to get some hands-on experience and break into the field.
Nikita Butakov received his PhD in Electrical Engineering from UC Santa Barbara in 2018. He is a Data Scientist with Ericsson’s Global AI Accelerator in Silicon Valley. Ericsson is a multinational networking and telecommunications company, responsible for managing over a third of the mobile phone infrastructure market. Their AI accelerator identifies data-science and artificial-intelligence use cases, within Ericsson, and helps accelerate the deployment of production-level solutions.
Bernard Hishamunda received his PhD in Physics from Brandeis University in 2017. He is a Senior Strategy Consultant/Data Scientist at IBM. IBM works with technologies like AI, cloud, blockchain and Internet of Things (IoT) to help their clients transform their industries.
What does a normal day look like for you?
Bernard: It is split evenly. 50% consulting work and 50% writing code (data analytics).
Nikita: A typical day starts off with a brief meeting within our team to discuss our daily goals. The rest of my work day can involve any number of activities, including programming, reading textbooks or papers, interviewing candidates (we’re heavily recruiting right now!) or engineering machine learning models.
What was your career path from your PhD to where you are today, noting any key transition points?
Bernard: I was already familiar with coding and advanced analytics through my academic training. I took big data courses to understand what the field was about. I took online tutorials and worked on data science related projects. I joined business consulting groups for insights into solving business problems while also participating in consulting competitions.
Nikita: Throughout my PhD I worked on side-projects in software engineering and data science, but I reached a key transition point towards the end, when I made the decision to pursue a data science career, rather than continue in academia.
Was this a goal of yours while doing your PhD?
Bernard: My initial goal was to work in big pharma as a liaison between scientists and business leaders, or work for consulting companies (consulting because work changes frequently unlike in academia) on solving business challenges.
Nikita: My PhD goals included publishing papers in high-impact journals, presenting at high-impact conferences, and, perhaps most importantly, figure out what kind of career I want to have post-PhD.
What was the biggest challenge in making the switch to your current role?
Bernard: Relying on other people to get your work done. Lack of academic rigor (when it comes to results) in industry (i.e. deliver quicker/faster solutions but not as thoroughly as required in academia).
Nikita: Breaking into the Bay Area’s highly competitive data science job market was a challenge, that took a lot of work to overcome.
What skill(s) do you rely on most on a daily basis?
Bernard: Data analytics, coding, communication, problem solving.
Nikita: The independent research skills and lessons I learned during my PhD.
What could a PhD student who is not in a data science program do during their PhD to better prepare them to have data science be a career option?
Bernard: Take intro courses on data science topics, be familiar with the basics (supervised, unsupervised machine learning). Learn the basic ML approaches (regression, decision trees, clustering) and have some general knowledge on advanced methods such as neural networks, deep learning etc.
Another great way to better prepare is to work on several data science projects. Kaggle is a good place to start, OpenAI has a good repo of datasets you can work with, and there are more out there as well. Follow analyticsVidhya and kdnuggets for tutorials and articles on ML and AI in general.
Nikita: Take coursework in the Statistics and Computer Science departments. Join a Data Science club at your university. Work on Kaggle competitions and other side-projects. If you don’t have much free time during your PhD, consider joining a PhD-oriented boot-camp program like Insight.
There are lots of options advertised out there to learn the data science fundamentals like boot camps, online programs both paid and free, part-time degreed programs, etc. Which of these are worth pursuing and are they valued by a hiring manager?
Bernard: Hiring managers value experience, not so much courses. Work on 3 or more data science projects. Makes sure the projects are diverse to show an array of skills in the domain. Online programs are a good asset, but use them to get you the foundations, then practice on any of the datasets found online or come up with a project you think interests you where you can apply data science techniques.
Nikita: Many of my colleagues have joined the Insight Data Science Fellowship after earning their PhDs and have great things to say. Insight is one of the most established data science boot camps and has a strong internal network. Since Insight makes money by acting as a hiring agency, it doesn’t charge its fellows any fees. Although I have heard good stories about non-PhDs experiences in paid boot-camps, I would be wary of a joining such a program if you already have a PhD. At this point, you should be able to learn everything you need to know on your own.
If you have any other general comments that you think would be good advice or useful information for someone thinking about moving into a data science career, you can write them here.
Bernard: Affiliation with DS groups, possessing real life experience some of which can be acquired through meetup projects, DataKind projects, academic projects or internships will be more valued than having taken courses.
Nikita: Network! Send me a connection request on LinkedIn with a short note introducing yourself. Follow me on Quora, where I write about data science, nanotechnology, and graduate school.
In summary, it seems like the best way to become a data scientist is to learn the foundations of machine learning and its approaches, then to put them into practice. To me, this is equivalent to research experience and depending on just how much ML and data science modeling you already have done in your PhD you may not need as many completed projects. However, if you plan to go into industry it pays to have projects with a keen eye on product development, especially if you know which companies you’d like to work for. Also, check out the free data bootcamps the both Bernard and Nikita mentioned. Two are listed at the end of this post.
My romanticized view of “all possibilities with data” is clearly not business oriented, a point which is emphasized by Bernard. This is true no matter what industry you are in. I think to be more fulfilled in a data scientist role, one must enjoy performing quantitative analyses, explaining them, and forming actionable recommendations on just about any challenge involving big data. Oh, and you must also enjoy working with lots and lots and lots of data.
Serious Data Analytics with Palantir
About the author:
Dr. Frederick Pearsall is a recent graduate from the City University of New York , holding a PhD in Chemistry and Materials Science. Currently he is a Solutions Architect at Nanotronics Imaging in Brooklyn, NY