By: Rees Morrison

This article explains how you can predict the amount a company’s law department spends on outside counsel based on the revenue of the company. Such a prediction can be useful for law firms, for example, when estimating how much of a given company’s expenditures the firm receives (the so-called share of wallet) or when proposing additional work (share of future wallet). For law departments, they might want to bootstrap a comparative analysis of their own spending against competitors who will not divulge proprietary information.

To explain how we make these predictions, we draw on data from the benchmark surveys conducted during the past five years by General Counsel Metrics, LLC. Specifically, the data set covers the outside legal expenditures and revenue for 145 submissions by 105 different law departments in the energy or utilities industries (the power industry, when grouped together). Collectively, the companies reported 2,916 lawyers and \$670 billion in revenue; 89 of them are U.S. companies, 33 are Canadian and seven are British.

To predict spending on law firms (and the small portion on other service providers), we use a powerful statistical tool called linear regression. Regression predicts the value of a response variable, such as external legal spend, based on one or more predictor variables, here only the one variable of revenue. The adjective “linear” tells us that we are assuming that external spend changes in some constant proportion, called a coefficient by statisticians, to changes in the amount of revenue.

Using the data set described above, software generates the linear regression relationship as the boxedequation. That equation enables someone to predict what a particular power department would spend on law firms if we fill in the revenue for that department’s company. The error term, by the way, represents what are called unsystematic errors, or noise. These errors average to zero by operation of the regression calculations, meaning they range on both sides of the optimal fit line (we explain this term later), and overall they cancel each other out.

External legal spend = \$219,900 + (.001963 times revenue) + error term

So, for a hypothetical billion-dollar company, start with the constant (\$219,900), then add to that the product of its revenue multiplied by the coefficient (0.001963): The legal department is predicted to spend \$4,162,000 on outside counsel and vendors (typically, law firms make up 90 percent of the external spend). Put another way, adding \$1 billion of revenue would be associated with increasing external spend by \$1,963,000 (on average).

Enough math. Take a look at the graphic plot with the straight blue linear regression line. In the plot, each law department is a point based on its value for company revenue on the horizontal axis and its external spend on the vertical axis. The plot sorts the revenue figures from the lowest on the left to the highest on the right.

The line is the best-fit line. The software creates it based on residuals. A residual is the difference between the actual values from the survey (external spend) and the predicted values from the linear regression model that uses revenue to make the predictions. The line minimizes the set of residuals, commonly using a method called ordinary least squares (OLS). What you see reflects in the aggregate what results in the smallest possible residuals overall – the best fit line for this data.

Confidence intervals reflect the uncertainty of the regression line, and they show as the lighter gray portion above and below the line. You can be 95 percent confident that the vertical range contains the true external spend for a company with that revenue. If the predictor data is indeed associated with the outcome variable, the more data a plot has in an area, the narrower the confidence interval. Prediction bands are analogous to confidence intervals, but they show the uncertainty about future observations. Prediction bands are always wider than confidence intervals because the latter are based on known numbers.

The second plot has a curvy line. This plot uses a variation of linear regression called loess (locally weighted scatterplot smoothing). Each smoothed portion of the line comes from a least squares regression over a small range of values of the y-axis variables (but instead of ordinary least squares, it employs the impressively named weighted quadratic least squares). In plainer words, the software figures out the regression fit line for local regions of the revenue data to produce a more nuanced form than the straight-line regression imposes.

When software calculates the best fit line, it also provides some additional insights. For example, adjusted R-squared tells us what percentage of changes in external spending are predicted by revenue in the model. In our data, it is 41 percent, which means other factors influence external legal spending besides revenue. One of the factors would be the number of lawyers in the department. If we included that figure in our analysis, it would become a multiple linear regression and would improve the adjusted R-squared value.

To rely on linear regression, the data must conform sufficiently to four requirements or we need to take other actions. (1) The distribution of the residuals must be normal. By which we mean that when the residuals are plotted on a graph by count, the shape is reasonably close to the often-seen bell curve (relatively few numbers far to either side on the tails and most of them clustered toward the middle of the distribution). (2) The variance of the distribution of the response variable for a given combination of values of the predictor variables (or, equivalently, of the residuals) should be constant for all values of the predictor variables. This means that at higher or lower ranges of revenue, the magnitude of the residuals doesn’t change too much. The remaining two requirements are beyond the scope of this article. Finally, when using regression, it is prudent to pay particular attention to data points that exhibit abnormal characteristics, which are often called outliers.

One final note: How is correlation different from regression? Correlation estimates the strength of the linear association between paired variables. Correlation is not predictive. That is, how closely is revenue associated with the number of lawyers? In our data set, the correlation is relatively strong, at 0.66, and it is positive: More revenue typically means more lawyers. By contrast, regression tells us the coefficients that best predict the number of lawyers from revenue.

Now, that wasn’t too bad, was it? You have learned about regression, a tool that enables an analyst to take a set of data, model it and extract predictions based on new data. Moreover, you can visualize the regression results, interpret the reliability of the range of predictions they produce and quantify their explanatory power. That’s quite a gift to bestow on legal managers!