Ordinary least squares (OLS) regression is a way to estimate the relationship between two or more variables in a linear equation. To understand OLS, it is helpful to recall the equation for a line: y = mx + b, where y is the value you calculate (the "dependent" variable), x is the value you know (the "independent" variable), m is the slope of the line (or the amount that a change in the independent variable changes the dependent variable), and b is the value for y when x = 0 (the "intercept"). OLS helps estimate a linear relationship between variables, such as the number of electronic health record (EHR) applications used and FTE employees per adjusted occupied bed (FTE/AOB). To illustrate, let's consider the following data sample:
To visualize how this technique can be used, let's next consider a plot of some observations of these two variables for a group of hospitals:
The OLS technique computes an estimate of the line shown in red in the above exhibit. The estimate minimizes the sum of the distances between the created line and all data points in the analysis (illustrated by the arrows labeled "a" and "b"). When more than two variables or more than a few data points are used, the estimate can be difficult to complete manually with a calculator; however, several PC applications are available that can simplify the calculations.
The estimates shown in the article "Do EHR Investments Lead to Lower Staffing Levels?" in the February 2012 edition of hfm were calculated using a specialized statistical program. However, similar analyses could be completed using the data analysis functions of Microsoft Excel. Excel has a limitation on how many variables can be analyzed in a regression analysis (a maximum of 16 independent variables), while most statistical programs can handle far more variables.
To complete such an analysis using a statistical program like the one used in the February article study, the first step is to compile all data in a table using a spreadsheet tool, such as Microsoft Excel. Using the study data as a basis for discussion, the resulting table might appear as follows:
Each column in the table represents a variable in the analysis, and the first row indicates the name of the variable. This example shows data used to estimate the impact of the total number of EHR applications in use on FTE/AOB staffing. The other variables depicted allow the analysis to take into account other impacts on staffing, such as ownership, membership in a multihospital system, teaching status, location in the State of California (with minimum staffing regulations for nursing), beds set up in operation, case mix index, and the weighted average patient safety index score for each hospital (to factor quality into the analysis).
Once the spreadsheet is prepared, it can be imported to the statistical program for use for the OLS regression analysis, after which the next step is simply to select the dependent variable and independent variables and complete the analysis.
Three OLS regressions were performed for the article: one to analyze the relationship between the number of EHR applications in use and FTE/AOB staffing levels, one to analyze the relationship between each application's use and FTE/AOB staffing levels, and one to analyze the impact of length of time (up to three years) with an application on FTE/AOB staffing levels.
The first of these regressions-taking into account the aforementioned variables-used the statistical program to yield the following information:
The two columns of interest here are Coefficient and P>t. The Coefficient column indicates the amount of change in FTE/AOB attributable to each item in the analysis. For example, a change in one more application would be expected to result in a negligible reduction of 0.001 FTE/AOB in the range of analysis from one to 13 applications. Similarly, the amounts of change in FTE/AOB attributable to the other variables in the analysis are indicated by the values in the Coefficient column of the table. Each of these values represent a part of the "m" term in our typical linear equation mentioned earlier: y = mx + b. The bottom row is the constant amount in the analysis-where staffing would be without any of the other factors in the analysis. That constant is the same as the "b" term in the linear equation.
The P>t (or p-value) column indicates the significance of the variable in explaining changes in the dependent variable-i.e., FTE/AOB. In this measurement, the lower the score, the more significant the observation. Values below 0.05 are generally considered significant. Looking at the results in above table, it is apparent that the impact of adding more EHR applications in this analysis is negligible and not statistically significant, since the p-value is much more than the 0.05 benchmark for significance. The general interpretation of these results suggests that an increase in use of one more EHR application-all other things being equal-does not impact FTE/AOB levels.
The second regression, evaluating the impact on FTE/AOB staffing from use of individual applications, took into account the same set of variables as were used in the first regression plus the various types of EHR systems and produced the following results:
This model shows that for the most part, individual applications are not significantly associated with changes in FTE/AOB staffing-with the exception of clinical decision support (CDS) systems, computerized provider order entry (CPOE), quality management (QM) systems, and physician documentation systems. Those systems with significant impacts on FTE/AOB staffing-CDS and QM systems-yielded reductions of 0.237 and 0.152 FTE/AOB, respectively. Conversely, CPOE and physician documentation systems were associated with FTE/AOB increases of 0.140 and 0.115 respectively.
Finally, results of the analysis of longevity with systems-accounting for the same set of variables-is shown in the following exhibit:
This analysis indicates that additional time with an application generally does not have a significant impact on FTE/AOB staffing, with only CDS systems, CPOE, QM systems, physician documentation systems, and radiology information systems showing significant impacts. Length of time with of CDS, QM, and radiology information systems showed improvements in FTE/AOB levels, while time with CPOE and physician documentation systems resulted in increased FTE/AOB staffing levels.
OLS regression is useful for finding associations between variables, but it has limitations. Although it can show whether strong associations among variable exist and can predict them with some accuracy, it does not explain why the relationships exist. An OLS tool also assumes that the relationship between variables is linear and can be expressed in a linear equation in the form similar to y = mx + b. Not all variables analyzed in the healthcare organization may vary in a linear fashion.
In short, the value of OLS regression for healthcare managers is that-although it cannot explain what is really happening in operation data or identify the causes of issues in the data-it can tell them with a high degree of certainty where they should focus their attention for further analysis.
For more information, see Jeffery Helton, Jim Langabeer, Jami DelliFraine, and Chiehwen Hsu's "Do EHR Investments Lead to Lower Staffing Levels?," hfm, February 2012.
Publication Date: Wednesday, February 01, 2012