Statistical Learning Theory is a framework for Machine Learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the problem of finding a predictive function based on data. It has been successfully applied in fields such as computer vision, bioinformatics, speech recognition, self-driven cars etc.
For anyone who is looking to get into the field of machine learning frameworks such as tensor flow, apache spark etc., it is a must-have that he/she develop a good understanding of statistical theories behind them. Statistical learning is the fundamental ingredient for anyone who wants to get into the field of data science.
Examples of Statistical Learning –
1) Predicting whether someone will have a heart attack on the basis of demographic, diet & clinical measurements
2) Spam detection system
3) Identifying handwritten numbers
In a series of articles, we will slowly explore various models and techniques that aid data scientists. Read on the first part below.
1 – LINEAR REGRESSION
Linear Regression is the most basic and widely used type of regression and commonly used for predictive analysis. It allows us to study the relationship between two variables; the dependent variable is continuous and independent variable or variables can be continuous or discrete. In this, dependent and independent variables are linearly related to each other.
Any linear regression line has an equation, Y = a + bX
Here Y is the dependent variable and X is the independent variable.
b is the slope of the line or regression coefficient and a is the constant or intercept that is the value of Y when X is zero.
The overall idea of the regression is to determine two things –
1) Is a set of predictor or independent variables doing a good job in predicting an outcome variable?
2) Which variables in particular are significant predictors of the dependent variables?
Linear regression has wide business usage across different sectors and domains –
1) It can be used to understand consumer behaviour, understanding businesses and factors that influence profitability, and impact of independent variables like price on dependent variables like sales.
a) The Walt disney company & SAS institute used linear regression to increase their sales. In Disney’s initial findings they learned that a decrease in price increased sales more than increase in television advertisement. In addition, television advertisements increased sales more than online advertising.
2) These days companies no longer communicate to consumers through one communication channel instead run parallel campaigns on TV, newspaper and social media. While employing these campaigns, marketers are often challenged on effectively dividing budget among different channels. With the business analysis tools available today, companies can create models that will show the effect each communication medium will have on sales. This modeling can be referred to as Marketing Mix Modeling (MMM). In its simplest form, MMM is a linear regression.
3) In another example, eHarmoney uses historical data to create a linear regression to match compatible users based on personality traits. eHarmony has approximately ninety users married a day, equating to over 33,000 eHarmony marriages a year, as a result of utilizing linear regression.
Though linear regression is used widely but it does have a major limitation in that it can work only when the dependent variable is of continuous nature.
Stay tuned as we explore other models and techniques and take you deeper into the fascinating world of statistics.