Skip to content

Regression analysis functions

Mats Alm edited this page Jul 27, 2024 · 7 revisions

On this page we have documented the EPPlus implementations of some of the more complex statistical functions. Note that the below functions in some edge cases might return different results depending on which spreadsheet application (Microsoft Excel, LibreOffice, etc) you are using.

LINEST

Added in version 7.2.2

Returns statistics for a straight line that best fits the inputted data. The function can also calculate the statistics for multiple linear regression when there are several independent variables in the input data (more than one column/row in knownXs). The inputs for LINEST are knownYs (required), knownXs (optional), const (optional) and stats (optional). knownYs is the y-coordinates and knownXs is the x-coordinates. If knownXs is omitted it is assumed to be 1, 2, ..., {length of knownYs}. Const is a boolean flag, and if false forces the intercept to go through the origin. Stats determines if descriptive statistics is included (if stats = true).

The statistics calculated with LINEST are the following:

  • Coefficient(s) for all independent variables. Variables deemed collinear are represented by a zero.
  • R-squared (coefficient of determination). This value tells us how well the independent variables explain the dependent variable. This is calculated as the regression sum of squares divided by the total sum of squares.
  • Standard error(s) for the y-estimate.
  • F-statistic. Do keep in mind that LINEST calculates the observed F-value and not an F-test.
  • Degrees of Freedom (of ssresid, see below).
  • The regression sum of squares (ssreg).
  • The residual sum of squares (ssresid).

Multicollinearity

In some cases, an independent variable can be fully explained by another independent variable. This is called collinearity (or multicollinearity) and poses an issue since the parameter estimation can have infinitely many solutions and the moment matrix X'X can not be inverted due to it being singular. In this situation, LINEST removes collinear columns by transforming the moment matrix to echelon format with complete pivoting. The process of removing collinear columns affects the degrees of freedom of the regression model and some of the statistics.

EPPLUS uses a gaussian elimination process to identify collinear columns and which ones to remove. The idea is to reduce the matrix (X'X) to echelon format, which is done with complete pivoting. After the row and column swaps, if a value on the diagonal contains zero (EPPLUS LINEST version uses a threshold of 2e-13), that corresponding column is deemed redundant. In some rare cases the result might differ from the LINEST implementation in Microsoft Excel.

High multicollinearity between variables is fairly uncommon, and there is no standard practice for removing collinear variables. When pivoting equal columns, EPPLUS always chooses to remove the second column.

LOGEST

Added in version 7.2.2

LOGEST is based on the same calculations as LINEST, including how it handles collinearity for multiple regression analysis. However, the function returns statistics to a curve that best fits the data, rather than a line. The calculations for the LOGEST coefficients are calculated as EXP(LINEST(LN(knownYs), knownXs)). It contains the same argument and outputs as LINEST.

TREND

Added in version 7.2.2

This function takes four arguments, knownYs, knownXs, newXs and const. TREND calculates the y-values on the regression line that is fitted to knownYs and knownXs. If newXs is given, TREND returns the y-values corresponding to those x-values. TREND uses LINEST in order to construct the regression line and find all necessary coefficients.

The dimensions of newXs have to correspond to knownXs in terms of variables, but can be shorter in terms of amount of observations. For example, if knownXs is 4 columns wide and 20 rows long, newXs must be 4 columns wide and have any number of rows.

If const is set to FALSE, the regression line is forced to the origin. If knownXs or newXs is omitted, they are assumed to be an array 1, 2, ..., {length of knownYs}. For more information on how the regression model works, see below.

See also

EPPlus wiki

Versions

Worksheet & Ranges

Styling

Import/Export data

Formulas and filters

Charts & Drawing objects

Tables & Pivot Tables

VBA & Protection

Clone this wiki locally