Regression: Using Functions to Model Data

Given a collection of numerical data \(\bigl\{(x_i, y_i)\bigr\}\) we can use least-squares regression to determine a function of any template we choose that “best fits” the data. We call such a function \(f\) a model for the data, and write \(y_i \sim f(x_i).\) The numbers that appear in the formula for the model are called parameters. The “goodness of fit” of the model to the data is often quantified as a value \(R^2\) between \(0\) and \(1,\) a higher value meaning a better fit. Beyond the \(R^2\) value, whether or not a specific type of function serves as a good model for some data is rather subjective. But certain functions have features that morally must be considered for their use a models.

A linear model \(y_i \sim ax_i + b,\) having a constant slope, should be used for data where \(y_i\) is suspected to be changing at a constant rate with respect to \(x_i.\) This rate is the value of the parameter \(a.\)

A quadratic model \(y_i \sim ax_i^2 + bx_i + c,\) which increases/decreases at a constant rate, should be used for data where \(y_i\) is suspected to be accelerating/decelerating at a constant rate with respect to \(x_i.\) This acceleration is the value \(2a,\) and \(b\) is the initial rate at \(x=0.\)

An exponential model \(y_i \sim \mathrm{e}^{k x_i},\) should be used for data where \(y_i\) is suspected to be changing at a rate proportional to the value of \(x_i.\) The parameter \(k\) is the constant by which they’re proportional, \(k \gt 1\) corresponding to a positive/increasing relationship (growth) and \(k \lt 1\) corresponding to a negative/decreasing relationship (decay).