“Data Science”; it’s a term you’ve probably heard a lot, but what does it actually mean? Demand in this field is at an all-time high, so what about Data Science has so many employers interested, and how can you be a part of it? These are the types of questions that’ll be answered today as we explore one of the most popular and sought-after fields in
Chris Lynch, the founder of data companies such as ArrowPoint communications, hack/reduce, and the former CEO of Vertica Systems, has been quoted saying “Big data is at the foundation of all the megatrends that are happening.”
So, why is that? Why are so many companies scrambling to hire data scientists?
Well, let’s start by diving into what it is and why it’s one of the most exciting and innovative industries within the tech world today.
WHAT IS DATA SCIENCE?
Data Science itself is a somewhat controversial term in that its exact definition has changed over time, with some industry professionals calling it simply a ‘buzzword’ with no real meaning.
Author and technology evangelist Robin Bloor is one such professional, stating that ‘Data Science is a poorly chosen term. Data is not a field of science since all sciences involve data. Also, as this is a Johnny-come-lately word, what qualifications are required? Statistics and computer science of course.’
So, what is it then? Dr. Vasant Dhar provides a fairly extensive definition in his article Data Science and Prediction, “The term ‘science’ implies knowledge gained through systematic study. In one definition, it is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions. Data science might therefore imply a focus involving data and, by extension, statistics, or the systematic study of the organization, properties, and analysis of data and its role in inference, including our confidence in the inference.”
Hence, Data Science can be defined as a field that applies scientific methods and processes to extract meaningful insights from data, both structured and unstructured. But that definition seems almost deliberately broad and non-specific, which leads us to the next question: how does one break into this field? Do you need university-level experience in Math, Computer Science, AND Statistics in order to work as a Data Scientist?
HOW TO GET YOUR FOOT IN THE DOOR
First, let’s look at some statistics regarding the field itself. While having a PhD in mathematics or statistics is not necessarily a requirement, half of those working in data science currently have a PhD, so the field can look daunting and intimidating. However, an interesting attribute of Data Science is that it is fairly universal and can be applied to a variety of areas, so while it may be incredibly difficult for someone without postgraduate education to get into a research-focussed position, it is much more feasible that they could get a job at a growing tech company.
But what about someone with no tertiary education? Is it possible for someone to start a career in Data Science without a degree in computer science or statistics? DJ Patil, who was once the Chief Data Scientist of the United States Office of Science and Technology Policy, tweeted in 2016 that “Data science doesn’t care about what you majored in or if you even got a degree. It’s what you do with data that matters.” So, yes, is the short answer. It is possible, but it’s not easy.
1. START WITH MATH
There are a variety of skills that go into Data Science, from theoretical mathematics to Computer Science and Statistics. So, it’s important to not overwhelm yourself or focus on skills that you won’t really need.
So, what do you need?
Well, a foundational understanding of advanced math is a good place to start. There are a variety of books to help you in this regard, such as Gilbert Strang’s Introduction to Linear Algebra, which can even be supplemented by his online course on MIT’s open courseware. Other notable examples include Machine Learning and Security by Clarence Chio and David Freeman, and The Nature of Statistical Learning Theory by Vladimir Vapnik.
If you’re already a computer scientist or even have some experience in statistics, this first step may be more of a refresher, but it is crucial that you have this fundamental understanding to allow you to apply a mathematical perspective to your work.
2. LEARN COMPUTER SCIENCE
Once you have the theoretical and mathematical foundation, the next step is to become proficient in computer science. Arguably, the computer science of Data Science is more important to learn in-depth than math, so make sure you understand it thoroughly. There are many resources to learn from, such as Udemy, Coursera, Udacity, and CodeCademy, just to name a few. These are all sites that will help you learn the fundamentals of Computer Science and a variety of different languages including Python and R, the two most popular languages for Data Science. It would also be wise to supplement this learning with reading; notable examples include the Python Data Science Handbook: Essential Tools for Working with Data by Jake VanderPlas, and R for Data Science by Hadley Wickham. These resources are essential to give you the knowledge and skills that will allow you to compete with other Applicants.
Hiring managers also prioritize candidates that can manipulate and control data effectively, so a proficiency in query languages such as SQL is a must-have; it’s very likely that your technical interview will include query-specific questions, so you need to be prepared. Having experience in a variety of libraries for Data Science is also very handy, such as Tensorflow, Keras and Pandas.
A knowledge of version control and how to physically deploy code is also highly sought- after. Many job postings now require experience with platforms like Docker and Kubernetes to demonstrate the ability to deploy and run applications, like machine learning models.
3. STUDY REAL-WORLD EXAMPLES
Another important area you should be aware of are real-time case studies. These are practical, real-world examples of how Data Science has been used in businesses, so it is imperative that you understand how the theory behind the science can be used practically in business. Sites like DataFlair provide easy to use tools that allow you to view and understand these case studies, which can come in handy when it’s time to show your potential employer that you understand how to incorporate foundations of Data Science in a production environment. Following Data Science journals and blogs can also allow you to keep up to date on any new breakthroughs in the field; blogs like Data Science Central, SmartData Collective, and No Free Hunch can give you consistent insights into new developments.
4. BROADEN YOUR SKILLSET
Simply having a surface-level understanding is not enough. You will be competing with many other applicants who would have done the same courses and read the same books, so how do you stand out? Well, if you’re already experienced in some aspects of computer science, it can be very helpful to ensure that you can master key aspects of Data Science such as Machine Learning.
Machine Learning is an important part of Data Science that deals with algorithms that self- improve over time, and it is crucial that you can understand and utilize these various algorithms to your advantage. Having a fundamental understanding of decision trees (a form of supervised machine learning), Linear and Logistic Regression, and Deep Learning through Neural Networks can seriously improve your Data Science ability. In addition to technical skills, it’s also important to showcase your soft skills. Being able to communicate your ideas to a team, problem solve, and utilize your coding skills to not only complete a task but also prevent errors and add real value that can increase sales and profits is incredibly attractive to hiring managers. Providing examples of times when you were able to add real monetary value and use your soft skills to improve a situation or project outcome will quickly make you a top candidate.
5. GAIN PRACTICAL EXPERIENCE
To further differentiate yourself from the competition, work on live projects that can demonstrate your ability to apply your knowledge to real-world applications. Sites like GitHub and Kaggle can allow you to find open-source projects to work on, as well as generate an impressive portfolio that you can show to recruiters to grab their attention. Additionally, adding these projects to your resume or even creating your own personal website can allow you to demonstrate your practical experience in the field to potential Employers.
Writing blogs about your projects can also be a great way to demonstrate that you have a fundamental, theoretical understanding of how your projects work and the fundamental theoretics of Data Science.
All hiring managers do a social media check before hiring any employee – if they research you and find an intelligent and thoughtfully written blog showcasing your abilities it will speak on your behalf.
If you have prior experience working in tech (as a software engineer, ETL developer, Business Analyst, etc.), this is a great opportunity to highlight any projects you worked on within a team and in a production environment – demonstrating your ability to work well with others as well as your experience delivering projects.
6. SHOW OFF YOUR SKILLS THE RIGHT WAY
Once you have these skills and experiences under your belt, it simply becomes a matter of presenting them to the recruiter or hiring manager. Make sure to minimize conversation about your education (especially if you don’t have any) and focus on the practical applications of your knowledge.
It is imperative that the interviewer understands that you can work in a production environment — highlighting that you watched a few lectures on linear algebra won’t do that, but showing that you created and deployed a machine learning model using knowledge you gained from that lecture will absolutely get your foot in the door. But don’t limit this to just Data Science experience – provide detailed examples, for each skill on your resume, that demonstrate how you applied that skill in a production environment. Then tie the practical application to a tangible business impact.
Data Science may seem like a daunting field at first, but with enough drive and determination, it is possible to get your foot in the door.
In summary, make sure to not skimp on the fundamentals, even if that means reopening some math textbooks. Ensure that you have up-to-date knowledge on both the current Data Science libraries that are in use, and the trends in how Data Science is being used in businesses. Additionally, broadening your skillset to include useful algorithms from machine learning and soft skills will differentiate you from your competition, giving you an advantage in the interview. Finally, gain some practical experience in the field by contributing to open- source projects and tinkering around with some Data Science libraries, and make sure to highlight these experiences in the interview, as they are going to be what sets you apart from your competition and secures you a job