Data Science
What is data science?
Data science is a discipline that looks for patterns in complex datasets to build models that predict what may happen in the future and/or explain systems. Data science combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from many structured and unstructured data, frequently for business purposes. To perform data science well, you need a combination of skills and expertise.
Skills and expertise involved in data science
Without math and statistics knowledge, data science models can be misused, and results can be misinterpreted.
Data science lifecycle
There are frequently four phases in a data science lifecycle:
- Manage: A business problem needs to be improved with insights from data or models. These problems are prioritized and scoped so the data science team can begin work. Models needing refresh or retraining are part of the prioritization process. The team reviews prior work and potential data sources that can be leveraged in the project
- Develop: Model/ development includes identifying and accessing data, preparing it for use, and the creation of models/analysis to solve the business problem. Data scientists collaborate to create the best model to solve the problem. It may take 100’s of iterations using different tools to find the best solution.
- Deploy: Validation and testing are necessary prior to deployment/use to ensure the model/analysis performs as expected. Then it is placed into a system or process for use.
- Monitor: Continuous monitoring of models ensures they perform within expected parameters. If performance decays, they should be refreshed, retrained, or replaced quickly.
Additional Resources