Introduction
There is not a formal definition for Data Science. However, it can be seen as the intersection between statistics, mathematics, computer science and any knowledge domain that demands analysis of big volumes of data such as Banking, Finance, Economics, Marketing, Demographics, Genetics, Astronomy, etc. If the domain is a business discipline, then we can call it Business Analytics, or just Data Science for Business.
Data Science includes not only the data management - data collection, data merging and data preparation- but also data analysis tailored to understand a problem in a specific domain, and then be able to do predictions, and finally improve decision making in organizations. For data analysis we can use techniques based on a) statistical modeling, and also b) machine learning. Actually, machine learning is a combination of statistical, mathematical and artificial intelligence methods.
Focusing on the business domain, we can classify business analytics in the following:
- Descriptive Analytics
- Diagnostic Analytics
- Predictive Analytics
- Prescriptive Analytics
Descriptive analytics refers to analysis of historical business/economic data to better understand a current business situation. Diagnostic analytics refers to the ability of finding business insights - mainly business opportunities and threats- after analyzing present and past business data. Predictive analytics refers to the ability of creating possible future scenarios based on historical data and assumptions related to organizational variables. Prescriptive analytics refers to the ability of identifying/proposing a specific strategic business plan according to the previous descriptive, diagnostic and predictive analysis.
For descriptive analytics the main techniques are data management, statistical analysis and data visualization. For predictive analytics the main techniques are statistical modeling and machine learning.
Machine learning is a very dynamic and changing field. Machine learning refers to automatic algorithms that can learn from facts (data), identify patterns, and then provide a solution to a problem or come up with forecast of variables.
Machine learning is a combination of artificial intelligence techniques along with statistical and mathematical methods. The main purpose of machine learning is to process data and come up with models to a) understand patterns, and b) predict variables. Then, machine learning is a set of algorithms that receive data, select a model, train the model with the data in order to calibrate the model, and then execute the model and come up with insights and/or predictions.
The machine learning techniques we will focus in this course will be mostly statistic techniques applied to Marketing and Finance.
Successful data-driven companies apply Data Science to keep competitive in the market. Those organizations combine a variety of internal and external data sources in an analytics engine, translate the data into quantifiable and actionable insights to make effective decisions for the organization.
In my view, the building blocks of Machine Learning are the following:
Mathematics:
Probability theory
Statistical Inference
Linear (Matrix) Algebra
Vectors and Matrices
A Matrix as a linear transformation of space
Solving a system of linear equations using matrix algebra
Calculus
Differential calculus
Partial derivatives
Integral Calculus for probability distributions
Optimization theory -methods to estimate model coefficients
Statistics
Modeling:
Linear Regression models
Time-series Regression models
Classification
Generalized linear models
Regularization methods
Computer Science
Data structures
Algorithms
Complexity theory
Software engineering
Parallel and distributed computing