10 Chapter 10 - Hypothesis tests and regression coefficients

Frequently, beta coefficients of a regression model are called constants. However, in reality, beta coefficients are all but constant. They vary over time with different samples. For example, when we update the sample with one or a few observations, the estimation of the beta coefficients will vary.

According to the Central Limit Theorem, the probability distribution of a linear combination of random variables will tend to be normal with a specific standard deviation, even when the random variables are not normal. Since the expected value of b_{0} and b_{1} can be expressed as linear combination of the variables X’s and Y, then we can assume that the probability distribution of these coefficients will be close to a normal distribution.

Then, we can do hypothesis testing to examine how much each coefficient will differ from zero - the Null Hypothesis. In this case, the variable of study becomes the specific beta coefficient such as b_0 or b_1. Then, we need to estimate the standard deviation of each beta coefficient - their standard error- to estimate the corresponding t-Statistic for each hypothesis test.

Next I will explain how we can estimate the variability - the variance and standard deviation - of the beta coefficients of a simple regression model.

10.1 Variance and standard error of the regression coefficients

The regression coefficients b_{0} and b_{1} for the simple regression model are also called betas or constants of the regression. These coefficients are estimations of supposed population coefficients \beta_{0} and \beta_{1}. In the context of time-series variable, the population is supposed to be all possible values of the variable from all past periods to all future periods. We can only get samples from this population when we collect historic values of a time-series variable. When time goes on, we have a different realization of the random variable, or a different sample of the variable with more observations.

Even though regression coefficients are also called the “constants” in a regression model, they usually vary depending on the number of observations or time periods. For example, in the case of the market model, we can estimate the regression coefficients using a specific sample with a limited number of time-periods. Once time passes, we can re-run the same regression model, but now with more updated periods, and we can get new values of regression coefficients.

As we can see, regression coefficients usually change over time. We can estimate the expected variability of these regression coefficients using probability theory. The expected variability can be measured with the expected variance of the coefficient and the expected standard deviation of the coefficient. The standard deviation of a regression coefficient is usually named standard error.

The expected variance of the regression coefficient b_{1} for the case of simple regression model is:

Var(b_{1})=E[b_{1}-\beta_{1}]^{2}

Var(b_{1})=\sigma_{\varepsilon}^{2}\left[\frac{1}{\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}}\right]=\frac{MSE}{\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}}

The standard error of b_{1} is its expected standard deviation:

SE(b_{1})=\sqrt{\sigma_{\varepsilon}^{2}\left[\frac{1}{\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}}\right]}

SE(b_{1})=\frac{\sigma_{\varepsilon}}{\sqrt{\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}}}

The expected variance of the regression coefficient b_{0} for the case of simple regression model is:

Var(b_{0})=E[b-\beta_{0}]^{2}

Var(b_{0})=\frac{\sigma_{\varepsilon}^{2}\sum_{i=1}^{N}X_{i}^{2}}{N*\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}}

The standard error of b_{0} is its expected standard deviation:

SE(b_{0})=\sigma_{\varepsilon}\sqrt{\frac{\sum_{i=1}^{N}X_{i}^{2}}{N*\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}}}

We can see that the standard error of the regression coefficients is proportional to the standard deviation of the regression error (\sigma_{\varepsilon}). If the regression errors are big, then the standard errors of the coefficients will also be big. The bigger the standard error of a regression coefficient, the more the variety the coefficient for different samples. In other words, the bigger the standard error of the coefficients, the less predictable will be the expected value of the regression coefficient.

But why these formulas for the variance and standard deviation of beta coefficients work? If you are very curious and like mathematics, I show the mathematical derivation of these formulas in the next section.

10.1.1 Derivation of the variance and standard error of beta coefficients

10.1.1.1 Variance and standard error of b_1

Let’s start working with the variance of b_{1} coefficient for the simple regression model.

Last chapter we derive the formula for the optimal value of b_1 according to the OLS method for a simple regression model:

b_{1}=\frac{Cov(X,Y)}{Var(X)}=\frac{{\displaystyle {\textstyle {\displaystyle \sum_{i=1}^{N}\left(X_{i}-\bar{X}\right)*\left(Y_{i}-\bar{Y}\right)}}}}{{\displaystyle \sum_{i=1}^{N}\left(X_{i}-\bar{X}\right)^{2}}}

Playing with the numerator:

\sum_{i=1}^{N}\left(X_{i}-\bar{X}\right)*\left(Y_{i}-\bar{Y}\right)=\sum_{i=1}^{N}(X_{i}Y_{i}-X_{i}\bar{Y}-\bar{X}Y_{i}+\bar{X}\bar{Y}) Factorizing terms:

\sum_{i=1}^{N}\left(X_{i}-\bar{X}\right)*\left(Y_{i}-\bar{Y}\right)=\sum_{i=1}^{N}[(X_{i}-\bar{X})Y_{i}-(X_{i}-\bar{X})\bar{Y}]

Applying the sum operator to both terms:

\sum_{i=1}^{N}\left(X_{i}-\bar{X}\right)*\left(Y_{i}-\bar{Y}\right)=\sum_{i=1}^{N}(X_{i}-\bar{X})Y_{i}-\bar{Y}\sum_{i=1}^{N}(X_{i}-\bar{X})

We can see that \sum_{i=1}^{N}(X_{i}-\bar{X})=0:

\sum_{i=1}^{N}(X_{i}-\bar{X})=\sum_{i=1}^{N}X_{i}-\sum_{i=1}^{N}\bar{X}

Both sums are equal to N\bar{X}, so \sum_{i=1}^{N}(X_{i}-\bar{X})=0

Then:

\sum_{i=1}^{N}\left(X_{i}-\bar{X}\right)*\left(Y_{i}-\bar{Y}\right)=\sum_{i=1}^{N}(X_{i}-\bar{X})Y_{i} I can simplify b_1 as:

b_{1}=\frac{{\displaystyle \sum_{i=1}^{N}}{\displaystyle (X_{i}-\bar{X})Y_{i}}}{{\displaystyle \sum_{i=1}^{N}\left(X_{i}-\bar{X}\right)^{2}}}

Now let’s do a trick. Let’s define a constant k_{i} as the percentage of the difference (X_{i}-\bar{X}) with respect to the sum of squared of all differences (X_{i}-\bar{X}):

k_{i}=\frac{(X_{i}-\bar{X})}{\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}} I can express b_1 as a linear combination of Y_i values weighted with the respective k_i:

b_{1}={\displaystyle \sum_{i=1}^{N}k_{i}Y_{i}} Then b_{1} can be defined as a sum of products. I can estimate the expected value and variance of b_{1}. It is expected that E[b_{1}]=\beta_{1}, so we can say that b_{1} is unbiased. Let’s see if this is true:

E[b_{1}]=E[\sum_{i=1}^{N}k_{i}Y_{i}]=\sum_{i=1}^{N}k_{i}E[Y_{i}]

Since E[Y_{i}]=\beta_{0}+\beta_{1}X_{i}, then:

E[b_{1}]=\sum_{i=1}^{N}k_{i}(\beta_{0}+\beta_{1}X_{i})=\beta_{0}\sum_{i=1}^{N}k_{i}+\sum_{i=1}^{N}k_{i}\beta_{1}X_{i}

We can see that \sum_{i=1}^{N}k_{i}=0:

\sum_{i=1}^{N}k_{i}=\sum_{i=1}^{N}\frac{(X_{i}-\bar{X})}{\sum_{i=1}^{N}(X_{i}-\bar{X})^2}

\sum_{i=1}^{N}k_{i}=\frac{1}{\sum_{i=1}^{N}(X_{i}-\bar{X})^2}[\sum X_{i}-\sum\bar{X}] Since both \sum X_{i} and \sum\bar{X} are equal to N\bar{X}, then the numerator is equal to zero:

\sum_{i=1}^{N}k_{i}=\frac{1}{\sum_{i=1}^{N}(X_{i}-\bar{X})^2}[0]=0

I can simplify E[b_1] as:

E[b_{1}]=\sum_{i=1}^{N}k_{i}\beta_{1}X_{i}=\beta_{1}\sum_{i=1}^{N}k_{i}X_{i}

Now we can see that \sum_{i=1}^{N}k_{i}X_{i}=1:

\sum_{i=1}^{N}k_{i}X_{i}={ \frac{\sum_{i=1}^{N}(X_{i}-\bar{X})X_{i}}{\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}}=\frac{\sum_{i=1}^{N}(X_{i}^{2}-\bar{X}X_{i})}{\sum_{i=1}^{N}(X_{i}^{2}-2X_{i}\bar{X}+\bar{X}^{2})}=\frac{\sum_{i=1}^{N}X_{i}^{2}-\bar{X}\sum_{i=1}^{N}X_{i}}{\sum_{i=1}^{N}X_{i}^{2}-2\bar{X}\sum_{i=1}^{N}X_{i}+N\bar{X}^{2}

\sum_{i=1}^{N}k_{i}X_{i}=\frac{\sum_{i=1}^{N}X_{i}^{2}-\bar{X}N\bar{X}}{\sum_{i=1}^{N}X_{i}^{2}-2\bar{X}N\bar{X}+N\bar{X}^{2}}=\frac{\sum_{i=1}^{N}X_{i}^{2}-N\bar{X}^{2}}{\sum_{i=1}^{N}X_{i}^{2}-N\bar{X}^{2}}=1

Then:

E[b_{1}]=\beta_{1} Then we conclude the measure of b_1 is an unbiased measure of the population \beta_1

Now, working with the variance of b_{1}:

Var(b_{1})=Var(\sum_{i=1}^{N}k_{i}Y_{i})

Since k_i is a constant:

Var(b_{1})=\sum_{i=1}^{N}k_{i}^{2}Var(Y_{i})=Var(Y_{i})\sum_{i=1}^{N}k_{i}^{2}

The Var(Y_{i})=Var(\varepsilon_{i}) since the variance of Y is actually the Mean Squared of Errors (MSE). Also we can see that \sum_{i=1}^{N}k_{i}^{2}=\frac{1}{\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}}:

\sum_{i=1}^{N}k_{i}^{2}=\sum_{i=1}^{N}\left[\frac{(X_{i}-\bar{X})}{\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}}\right]^{2}=\sum_{i=1}^{N}\left[\frac{(X_{i}-\bar{X})^{2}}{\left[\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}\right]^{2}}\right]

We can consider the denominator as a constant with respect to the first sum:

\sum_{i=1}^{N}k_{i}^{2}=\frac{1}{\left[\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}\right]^{2}}\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}=\frac{1}{\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}}

Then, the variance of b_{1} is:

Var(b_{1})=\sigma_{\varepsilon}^{2}\left[\frac{1}{\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}}\right]=\frac{MSE}{\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}} We can calculate the standard error of b_{1} just by taking the square root of Var(b_{1}):

SE(b_{1})=\frac{\sigma_{\varepsilon}}{\sqrt{\sum_{i=1}^{N}(X_{i}-\bar{X})^{2}}}

With the standard error of b_{1} we can then estimate the 95% confidence interval for b_{1} in order to test whether b_{1} is statistically different than zero. In other words, we can test whether the independent variable X is significantly related with Y.

10.1.1.2 Variance and standard error of b_0

Pending…

10.2 Confidence interval for the regression coefficients

As we can see, the expected value of the regression coefficients b_{0} and b_{1} can change for different random samples. An interesting question would be which is the probability distribution of these regression coefficients?

According to the Central Limit Theorem, the probability distribution of a linear combination of a random variable will tend to be normal with a specific standard deviation. Since the expected value of b_{0} and b_{1} can be expressed as linear combination of the variables X and Y, then we can assume that the probability distribution of these coefficients will be close to normal distribution.

In reality, most of the time, we cannot get access to the population variance of a random variable. Statisticians have found that it is possible to use a sample variance as the population variance, but it is better to model the close-to-normal distribution as a t-Student probability distribution. When the sample size is less than 30, the probability distribution will be close-to-normal but with “fat” tails. Fat tails means that it is more likely to have a higher frequency for extreme values of the random variable of study. The t-Student distribution do a better job modelling this “fat” tail effect for small samples.

Actually, as the sample size increases above 30, the t-Student probability distribution approximates to the normal distribution. Then, we can model close-to-normal probability distributions using the t-Student distribution. This is the reason why all computerized statistical software applications use the t-Student distribution to report the variance and standard errors of any regression coefficient.

In the previous section we estimated the expected standard error (standard deviation) of both b_{0} and b_{1}. Since we know that the coefficients b_{0} and b_{1} follow a t-Student probability distribution, then we can estimate a 95% confidence interval for these coefficients.

When the sample size is much bigger than 30, then we can say that 95% of the time (for different samples), the coefficient b_{1} will move between b_{1}-1.96*SE(b_{1}) and b_{1}+1.96*SE(b_{1}). This is true since b_{1} follows a close-to-normal distribution. If X is normally distributed, then about 95% of the X values will lie between \overline{X}-1.96*\sigma_X and \overline{X}+1.96*\sigma_X. In other words, if we substract about 2 standard deviations from the mean of b_{1} and add about 2 standard deviations from the mean of b_{1}, we estimate the 95% confidence interval of b_{1}. This distance from the mean to the minimum and maximum values of the 95% confidence interval is also called the critical value for the 95% confidence level. We can do the same process to estimate the 95% confidence interval for the b_{0} coefficient using its expected value and its standard error.

Being more accurate, we can model the probability distributions of the regression coefficients as t-Student distributions. Then, for the calculation of the 95% confidence interval, we need to replace the critial value 1.96 for a specific value according to the degrees of freedom, which is equal to (N-2) for the case of simple regression model. Then, instead of substracting and adding 1.96 standard errors of the coefficient, we need to add and substract a critial value (which is close to 2) that depends on the t-Student distribution function and the sample size (degrees of freedom).

We can call this critical value with N-2 degrees of freedom as t_{(\alpha/2,n-2)} where \alpha=1-0.95=0.05, and N is the number of observations. We can say that \alpha is the probability of the regression coefficient having a value out of the 95% confidence interval.

Unfortunately, we cannot easily calculate this critial t–value since there is no analytical formula. However, we can use the t-Student table (which is usually published in any Statistics book) or we can use any statistical software to calculate the critical t-value for the 95% confidence interval.

For the case of b_{1} we can say that the 95% probability of the possible values for the coefficient can be experessed as:

Prob\left\{ b_{1}-t_{(\alpha/2,n-2)}*SE(b_{1})<=b_{1}<=b_{1}+t_{(\alpha/2,n-2)}*SE(b_{1})\right\} =0.95 Then, the 95% confidece interval is:

\left[b_{0}-t_{(\alpha/2,n-2)}*SE(b_{0})...b_{0}+t_{(\alpha/2,n-2)}*SE(b_{0})\right]

95% of the time, b_{0} will have a value between b_{0}-t_{(\alpha/2,n-2)}*SE(b_{0}) and b_{0}+t_{(\alpha/2,n-2)}*SE(b_{0}).

Once we have an estimation of the 95% confidence interval of the regression coefficients we can design several hypothesis tests to examine whether the regression coefficients are equal or different that zero or than any other number. In the next section we review how to do hypothesis tests with the regression coefficients and what is the purpose of these tests.

10.3 Hypothesis testing with regression coefficients

We have reviewed how to estimate the regression coefficients b_{0}, b_{1}, and their respective standard errors and 95% confidence intervals. To further understand regression models we need to interpret not only the regression coefficients, but also their standard errors and 95% confidence intervals.

As we mentioned before, the regression coefficient b_{0} is the expected value of the dependent variable when the independent variable is zero. In other words, b_{0} is the point where the regression line touches the Y axis. The regression coefficient b_{1} represents the slope of the regression line, which is the sensitivity of movements of the dependent variable Y with respect to one unit of change in the independent variable X. Although we can interpret these coefficients accordingly, it is very important to examine whether these coefficients are significantly different to zero or another number. We can use hypothesis testing with the regression coefficients to test how reliable is that the coefficient is equal or different to a specific value.

For the regression coefficient b_{1}, the most common test is to examine whether b_{1} is equal or different than zero. Why is this test important? If the b_{1} coefficient is not significantly different than zero means that the dependent variable Y is not linearly related to the independent variable X since the slope is basically zero, a horizontal regression line. If we can provide evidence that 95% of the time b_{1} is greater than zero, then we can say that the dependent variable Y is significantly related to X in a positive and linear relationship. In the case of the market model, another important test for b_{1} is to examine whether b_{1} is equal or different than one. Why is this test important? If b_{1} is significantly greater than one this means that the stock return is significantly riskier than the market since b_{1} measures market risk of the stock.

For the regression coefficient b_{0}, the most common test is to examine whether b_{0} is equal or different than zero. Why is this test important? If b_{0} is not significantly different than zero this means that the expected value of the dependent variable Y when X=0 is also zero, so the expected value of Y for a specific value of X will depend only on the effect of the regression coefficient b_{1} in case b_{1} is significantly different than zero. In the case of the market model, if b_{0} is significantly greater than zero then this means that the stock return is systematically offering extra returns over the market; when the market return is zero, the expected return of the stock will be equal to b_{0}.

The next question is how can we perform a hypothesis test for these regression coefficients. To run a hypothesis test we follow the basic rules of hypothesis testing, but in this case, the variable of interest will be the regression coefficient b_{0} or b_{1}, not the mean of a random variable.

When a regression model is run in most software and programs, one hypothesis test is performed for each regression coefficient. As a default for most software, the Null Hypothesis is always that the regression coefficient is equal to zero. What is the meaning that a regression coefficient is equal to zero?

If the the intercept or b_0 is equal to zero this means that the regression line crosses in the origin (0,0). In other words, this means that the expected value of Y is zero when the values of X is zero.

For the regression coefficients for each independent variable X, if b_i=0 this means that there is no linear relationship between X_i and Y. The Null Hypothesis for these beta coefficients mean that there is NO linear relationship between the corresponding X_i with the dependent variable Y.

For b_0, the hypothesis test is the following:

H0: b_0 = 0

Ha: b_0\neq0

In this hypothesis, the variable of analysis is beta0 (b_0).

Following the hypothesis test method, we calculate the corresponding t-value of this hypothesis as follows:

t=\frac{(b_{0}-0)}{SE(b_{0})}

SE(b_0) is the standard error of b_0, which is its estimated standard deviation.

Remember that the t-Statistic is the standardized distance from b_0 (the value estimated from the regression) and zero. In other words, the t-Statistic tells us how many standard deviations of the b_0 the actual value of b_0 is away from zero, which is the hypothetical true value.

Remember that the null hypothesis (H0) is the hypothesis of the skeptical person who believes that b_0 is equal to zero. Then, we start assuming that H0 is true. If we show that there is very little probability (its p-value) that the b_0=0, then we will have statistical evidence to reject the H0 and support our Ha.

Then, if \mid t\mid>2, then we will have statistical evidence at least at the 95% confidence level to reject the null hypothesis. The critical value of 2 for t is an approximation; it depends on the number of observations of the regression that this critical value can move from around 1.8 and 2.1.

From a t-Statistic and the number of observations, we can estimate the exact p-value. This value cannot be calculated using a formula since there is no close solution for the t cumulative density function. However, remember that the t-Student probability distribution becomes very similar to the normal probability distribution when the number of observations is equal or greater than 30. Then, if the t-Statistic is about 2, then if we remember the characteristic of the probability density function of a normal distribution, then the area under the curve (which is the probability) beyond t=2 and less than t=-2 will be around 0.05 (5%). This is the 2-sided p-value of the test.

Remember that the p-value is the probability of making a mistake if we reject the null hypothesis. Then, the less the p-value, the better. The rule of thumb is that if the p-value<0.05, then we have statistical evidence at least at the 95% confidence level to reject the null.

Fortunately, the standard error, t-Statistic and the 2-sided p-value for this hypothesis test is automatically calculated and shown when we run a regression model.

Then, in conclusion, if the p-value estimated for b_0<0.05 and b_0>0, then we can say that there is statistical evidence at the 95% confidence level to say that b_0 is greater than zero.

Let’s review the hypothesis test for b_{1} for the case of simple linear regression (only 1 X):

Null Hypothesis (H0): b_{1}=0 ; There is NO linear relationship between X and Y

Alternative Hypothesis (Ha): b_{1}<>0 ; There is a linear relationship between X and Y

The logic of a hypothesis test is the following. We usually state our hypothesis as the alternative hypothesis, while the null hypothesis is the one we want to “disprove” or find evidence against it. In this case, we want to show that Y is linearly related to X. In other words, we want to show that the slope of the regression line is different than zero. Then, we start assuming that the Null Hypothesis is the true hypothesis and the purpose of the test is to provide enough statistical evidence to say that this hypothetical “true” value of the coefficient is actually not true.

The first step we have to do is to calculate the b_{1} coefficient using one random sample, and follow the analytical formula we shown in a previous section. We then calculate its corresponding standard error (\sigma_{b_1}) and calculate a t-Statistic, which is the standardized distance from the hypothetical value of b_{1} (zero) to the actual estimated value of b_{1}:

tStatistic = \frac{b_1-0}{SE(b_1)} SE(b_1) is the standard error of b_1, which is its estimated standard deviation.

Then t-Statistic will tell us how many standard deviations of b_1 the value of b_1 is away from zero (the hypothetical value).

The next step is to examine whether this standardized distance is far enough say that we have statistical evidence to reject the null hypothesis and provide strong evidence for our hypothesis, the alternative one.

We do this by comparing this t-Statistis with the corresponding t-critical value according to the confidence level we need. It is very common to use the 95% confidence level, so we can calculate the corresponding t-critical value according to the number of observations of the sample (as explained above).

Considering (1-\alpha) as the confidence level, then the t-critical value with N-2 degrees of freedom is t_{(\alpha/2,n-2)}, which can be calculated using any statistical software or the t-Student distribution table.

The easy rule of thumb for the final decision is:

If |tStatistic|<=t_{(\alpha/2,n-2)}, then we have statistical evidence at the (1-\alpha)% to reject the Null Hypothesis. In other words, we have statistical evidence to consider that there is a linear relationship between the X and the Y represented in the sign and magnitude of the beta coefficient estimated b_1.

The default value of \alpha is 5% (0.05) so the default confidence level is 95%. But we can adjust these values according to the context.

If |tStatistic|>t_{(\alpha/2,n-2)}, then we do not have statistical evidence at the (1-\alpha)% to reject the Null Hypothesis.

I get the absolute value of the tStatistic since it can be negative if the beta coefficient is less than zero.

In reality, we have to apply our judgment to finally decide whether we have statistical evidence to support the linear relationship. We need to have knowledge and experience in the field of analysis, not only follow this simple rule of thumb!

10.4 Interpretation of beta coefficients of the simple linear regression

In a simple regression model we have the independent variable (X or IV), and the dependent variable (Y or DV). We assume that we are interested on learning about the DV, and how it can change with changes in other, the IV.

Reminding the regression equation as the expected value of Y_i:

E[Y_i] = \beta_0+\beta_1X_i

The beta coefficients \beta_0 and \beta_1 are the population coefficients. When we estimate the beta coefficients with OLS method with a sample from the population, we change the notation of beta with lower case betas:

E[Y_i]=b_0+b_1X_i We can also express the values of Y_i by adding the error term:

Y_i=b_0+b_1X_i+\varepsilon_i Where \varepsilon is a random error that behaves close to a normal distributed variable with mean = 0 and a specific standard deviation.

In the simple regression model, we can provide a general interpretation of the beta coefficients as follows:

b_1 is a measure of linear relationship between Y and X; if b_1>0, then, on average the linear relationship will be positive; if b_1<0, on average the linear relationship will be negative.
b_1 is also a measure of sensitivity of Y with changes in +1 unit of X. Then, b_1 is how much (on average), Y moves if X moves in +1 unit. This is the reason why b_1 represents the slope of the regression line.
b_0 is the expected value of the Y when X=0. If b_0=0, then the regression line will pass by the origin (X=0, Y=0). b_0 is the intercept since it is the point in the Y axis where the regression line crosses. b_0 defines how high or low the regression line will be in the XY space.

Besides this general interpretation of the beta coefficients, it is important to add a more specific interpretation according to the specific model. For example, if the Y variable is the stock return of a company, and the X variable is the market return, b_0 and b_1 have very important interpretations related to excess return of the stock over the market, and market risk of the stock.

Next chapter I illustrate the general interpretation and specific interpretations of coefficients with an example.