All About Data Science
All about Data Science, Statisticians and Machine learning Engineer
Data science domain isn’t a newly coined term it has evolved in last 50 years and continuously evolving in 21st century. There has been a continuous debate between the skills, roles, and responsibilities of a data scientist and a Statistician or a Machine Learning engineer working on a similar domain. A Data scientist is not just having knowledge of a scripting language such as Python or R, rather I would consider Data science a broader characterization of skills.
Skills | Description |
---|---|
Programming Skills |
Knowledge of having a scripting language is mandatory for a Data scientist. It is good to have programming language like Python and R and data based query language like SQL |
Statistics |
Having a good understanding of Statistics is a huge advantage. With the ability to understand and carry multiple test before making any decision based on the output of a model will be very helpful and can act as a checker before moving any model to the production server. |
Data Visualization |
I consider data visualization as the utmost skill for a data scientist. While working as an intern at Bosch, I realize that it is equally important to convey your outcomes and finding to your higher manager. Currently, many companies use PowerBi, Tableau and Alteryx for data visualization and reporting. |
Machine Learning |
Understanding of Supervised and Unsupervised learning and tools such as K-means, SVM, Dimensionality Reduction, CNN, and RNN will be helpful in considering a model for the project. |
Being a Data Scientist
The data science domain is known for its set of processes that one need to follow such as data transformation, exploratory data analysis, model selection, model evaluation and finally visualization. A data scientist role is critical to the business whenever we need to deliver a reasonable data driven answer to the management. A data scientist is kind of a helper link to every machine learning and statistician. A machine learning engineer can deploy a recommendation model to suggest a movie to a user. The data scientist will always be part of that recommendation model as he will be responsible for exploratory data analysis and providing raw data to the machine learning engineer.
A potential data scientist must be well familiar with the standard scripting languages like Python and R. Currently, industry prefers the people with some level of work experience with SQL and any of the scripting language or potentially a PhD student
[Source]
When we compare a data scientist with statisticians, we can believe on the fact that a data scientist knows more about a computer programming. However, comparing with a software engineer, a data scientist knows more about statistics than the coding.
Compare and Contrasting Data Scientist and Statisticians
A data scientist and statistician collect data for a similar reason but the procedure of data collection is different. On the scale of the size, the data collected by a data scientist is massive, and they spent a significant of time in data transformation and data wrangling. In contrast a statistician depends more on a smaller size data collection.
A data scientist formulate problems based on a the accuracy of a model and if accuracy of model is on the lower side then they tune hyperparameters of the model. Data scientists do this by calculating R1 score, F1 accuracy and confusion matrix. However, a statisticians take a different approach to build and test their model. A statistician generally starts with a simple model and then utilize several tools to further tune the basic model.
Generally, a data scientist focuses on a model with good accuracy and later tune the hyperparameter to further improve the model accuracy, statistician rather work on a simple model to best fit.
Personally, I prefer to be called as a data scientist as it give me an opportunity to conflate several concepts of programming language as well as statistics.