Posts

Showing posts from October, 2025

The Least Squares Assumptions for Multiple Regression

Image
Assumptions of Multiple Regression These are the conditions under which the OLS estimator is valid and has the nice statistical properties we rely on (like unbiasedness and consistency). Assumption 1: Zero Conditional Mean E ( u | X 1 = x 1 ,…, X k = x k ) = 0 Meaning: On average, the omitted factors uuu are unrelated to the included regressors XXX. Put differently: Once you control for the regressors, there’s no leftover systematic relationship between uuu and XXX. Why it matters: If this fails, your regression suffers from omitted variable bias . Example: If PctEL (percent English learners) belongs in the model but you leave it out, and it’s correlated with STR, then the STR coefficient gets biased. Solution: Include the omitted variable (if you can measure it). Assumption 2: i.i.d. Sampling   ( X 1 i ,…, X ki ,Y i ), i =1,…, n , are i.i.d. Meaning: Each observation comes from the same population and ...

What is error term (u) and Omitted Variable Bias in Regression?

Image
Omitted Variable Bias? What is "error u"? Imagine you want to predict students’ grades (Y) using hours studied (X) . But grades are not decided only by study hours. Some students are naturally smarter. Some had better teachers. Some were sick on exam day. All of these extra things (not included in your formula) are captured in the error term (u) . So u = "all the other factors we didn’t include in the equation." Could you please explain why there are always omitted variables?  In real life, it's impossible to include every single factor that affects Y. Example: For grades, you can’t measure “motivation”, “sleep quality”, or “stress” perfectly. So, some variables will always be omitted. Could you please let me know when this might become a concern If the omitted factors are unrelated to X (study hours), it’s not a big deal. OLS (our method) still works fine. BUT if the omitted factors are related ...