Continue Reading…]]>

*Correlation* is the statistical tool to quantify such possible connections – the degree to which two sets of numbers (called *variables*) are associated with each other – and it can also clarify the relationship between them.

Insightful correlations are plentiful for legal managers if they collect the data. Consider law firms working with a law department: a general counsel might suspect a connection between their overall effective billing rate and their number of lawyers. Or perhaps a lawyer’s gut feeling is that the more years practicing law an in-house counsel has, in general, the less closely she or he manages their outside counsel spend. A billing partner might suspect that the longer an invoice goes unpaid, the lower the realization rate. After collecting some data, someone can run a correlation and test those suppositions.

To explain correlation with real data, let’s hypothesize that the more equity partners a law firm has, the more non-equity partners it is likely to have. We can find out by using data from the NLJ 500 list (ALM). To have a more homogenous set, we focus on the 96 law firms that reported between 90 and 125 lawyers, where it turns out that only a third of them (28 firms) reported non-equity partners.

Before figuring out the correlation that will answer our hypothesis, sound statistical practice says to plot the data. Our eyes easily spot oddities in the data and even patterns. We particularly want to avoid wasting time on correlations where the variables are not linearly related to each (such as if they include squares or cubes). Correlation is an appropriate numerical measure only for linear relationships, and is sensitive to *outliers* (values at the extremes).

In the plot, the horizontal, x axis tells how many equity partners each of the 28 firms reported, while the y axis tells the firm’s number of non-equity partners. For example, the highest number of non-equity partners was 49 at a firm with 18 equity partners. (We could have added a best-fit *regression* line, but that’s for another column.)

At a glance you can see that this scatter plot suggests no outliers and displays a *linear relationship* between the two variables. So we can proceed and calculate the *correlation coefficient*, which measures the strength and direction of a linear relationship between two quantitative variables. For the 28 law firms, the correlation between the number of equity and non-equity partners is -0.41, a moderately strong NEGATIVE correlation. Our hypothesis was wrong: actually, as law firms of this size have more equity partners, they tend to have fewer non-equity partners.

Correlation coefficients always range between -1 and 1. Values near -1 indicate a strong negative linear relationship (as is our negative 0.41 value), values near 0 indicate a weak linear relationship, and values near 1 indicate a strong, positive linear relationship. A negative correlation means that one variable is less likely to be below the average of that variable when the other variable is below its average.

Let’s extend our understanding of correlation with some additional observations.

- Correlation disregards the
__units of measurement__of the variables. Inches can correlate to dollars, or speed with age. The correlation coefficient can be computed and interpreted for any two numeric variables. - Because all correlations range between -1 and 1, you can compare correlations between
__different pairs of variables__. So, we might find and say that there is a stronger correlation between the number of equity partners and the number of associates. - We can calculate correlations between
__ratios__, such as between R&D spending as a percentage of revenue and patent lawyers as a percentage of all the lawyers in a law department. - If certain assumptions are met, we can interpret the
*confidence interval*of the correlation coefficient (what range the coefficient would likely fall in if we redid the calculation on lots of samples from the data) and test the*null hypothesis,*that there really is no correlation between the two variables (and any correlation you observed is a consequence of random sampling). - If both variables are normally distributed (think bell curve), the proper method is the
*Pearson coefficient*[the covariance (see below) of the two variables divided by the product of their standard deviations]. Another method,*Spearman’s rank correlation,*does not assume that the underlying data is distributed in a statistically normal way. - Usually there is an intuitive explanation for a strong positive or negative correlation. With our scenario, firms that believe in full ownership and equity participation of their partners are probably less likely to create a tier of non-equity partners.
- Correlation differs from regression. Correlation is not predictive. That is, correlation can only tell us how closely one set of numbers is associated with another set. By contrast, regression enables us to predict one variable based on another.

Simply to know the direction and strength of the relationship between two metrics gives legal managers useful insights. But even if no significant correlation exists between two metrics, that finding too can help managers. For example, what if it turns out that there is no statistically meaningful correlation between the average number of lawyers in a law department per office location and total legal spending? That realization might suggest that it doesn’t much matter about the number of in-house lawyers at various locations. We should point out, however, that a lack of correlation does not necessarily imply a lack of relationship; the variables could have a non-linear (e.g., curved, step or bimodal) relationship. Using our data set, the correlation turns out not to be statistically significant, meaning that it could be a function of chance.

Always remember that correlation does not prove causation. Just because the degree of regulation of an industry correlates with higher legal spend per billion of revenue does not mean that regulation is the **CAUSE **of those legal expenses (nor do bulging legal bills cause regulations!). Other factors may be at work such as political influence, case-law development, maturity of the industry, or specific legislation. Sometimes a third factor accounts for the close relationship between one metric and another, such as concentration in an industry. Statisticians refer to those behind-the-scenes factors as *lurking variables*.

One way to remove the effect of a possible lurking variable from the correlation of two other variables is to calculate a partial correlation. A *partial correlation* is the correlation that remains between two variables after removing the correlation that is due to their mutual association with one or more other variables. By looking at a graphic of the partial correlation coefficient matrix among the variables (like a heat map), it may be visually clear that the partial correlation between some of them is quite high.

Let’s add one more math concept related to correlation. *Covariance* tells how much two sets of variables vary together. It’s similar to *variance*, but where variance tells about a single variable’s distribution, covariance tells you how two variables vary together. The larger the values of the variables, the larger the covariance. For our data set of 28 law firms, the covariance of equity partners and non-equity partners is -91.8. If we divide that covariance by the product of both variables’ standard deviations we arrive at the correlation coefficient given above.

We emerge from this brief dive into mathematical methods. The key takeaway from this column is that correlation tools stand ready to help managers of lawyers to more crisply understand data variables that might vary together. Managers need not rely only on their own experiential sense of two sets of numbers changing in relation to each other. Correlation can put precision to intuitive insights and can clarify the direction of the variation and its credibility.

]]>Continue Reading…]]>

Day after day, managers confront operational problems, think about them and make choices about what to do or not to do. In other words, they decide something. But they don’t always take into account the data available to them when they make those decisions.

For law departments and law firms, analyses and visualizations of numbers can help managers make better decisions. If they collect metrics and weave them into their pondering, the outcome is often sounder and easier to explain. Let me demonstrate what I mean by starting with a plausible management decision and then outlining five situations where the contribution of data could improve the quality of that decision.

Assume that two or three managers in a legal group have met to figure out whether to hire another paralegal. Further, assume that our managers disagree on the appropriateness of adding paralegals. Their decision will improve if the managers have relevant data, analyze it effectively and apply it to their problem. How can numbers help them objectively think through the problem and find potential solutions?

**Sidestep Cognitive Fallacies**

Data that has been accurately and comprehensively collected can counteract many of the cognitive biases that afflict managers. Nobel laureate Daniel Kahneman has studied and illuminated many cognitive fallacies. In his book “Thinking, Fast and Slow,” Kahneman explains many of the quirks in our reasoning that deviate significantly from classical notions of rationality. Unfortunately, we are usually unaware of the gremlins in our minds that attack what we believe to be our clear-headed, balanced evaluations. Consider four cognitive fallacies and how data might help correct them.

*Framing:*The managers’ thinking might be skewed by “framing.” For example, one of them may start arguing for more paralegals by asserting that “we should have six paralegals, at least one for every two lawyers.” That preliminary high demand will likely frame how other managers consider the decision. It warps the mind frames of the other two because it creates a baseline, a benchmark, a seemingly pivotal number to negotiate around. Yet it might be baseless. An antidote to framing could be benchmark data on paralegals per lawyer in firms or law departments.*Salience:*The bias of “salience” also warps rationality because it elevates a notable example, which may not be representative or appropriate at all, but which undeservedly shifts impressions of what is typical. A slew of counterexamples might be forgotten when a manager proclaims, “I just read that GE has one paralegal for every lawyer!” We shouldn’t necessarily focus on the latest example or most famous one just because it comes to mind easily. To blunt its potential impact, someone could gather articles that report average lawyer/paralegal ratios based on surveys of many companies.*Confirmation bias:*Data also can ameliorate “confirmation bias.” An example of this phenomenon is a manager who has consistently had good paralegals and ignores evidence of incompetent ones. It blinds people to examples that are contrary to their opinions or expectations. Perhaps the mixed evaluations that lawyers in the managers’ group make of paralegals would be data that challenges the one-sided, rosy view.*Risk aversion:*A fourth mental frailty, “risk aversion,” tilts our reasoning to fear losses more than favor gains. Some studies suggest that pluses must be twice minuses in our internal calculations for people to press forward. The downsides of making a call one way or the other (“We might hire a dud and never be able to get rid of them!”) loom larger than the countervailing upsides. On the other hand, a manager who argues for more paralegals might be risk averse because the group never wants to invite a crisis due to lack of resources.

Good data can combat each of these mental distortions as well as others that psychologists have catalogued. Facts and figures foil figments and fictions.

**Uncover and Query Empirical Assumptions**

When people make decisions, they often neglect to articulate the factual assumptions on which they base their conclusions. Worse, they may not even realize that they have been motivated by unstated (and usually untested) beliefs about how common something is or how much there is of something measurable. For example, one manager accepts on faith that lawyers will delegate work to paralegals, while another trusts without verifying that it will be easy to find, hire and retain capable paralegals.

If underlying assumptions such as these are not surfaced, and if there is no data to undermine or support their correctness, managers are apt to make weaker decisions. Apposite data can be used to bore in and counterbalance statements and assumptions that might otherwise be accepted unquestioningly. In our scenario, that might include data collected through interviewing lawyers about how often they delegate tasks to paralegals – and how much more they might delegate. It could also include data from other legal groups that hire contract lawyers, temps or other staff instead of hiring full-time paralegals.

**Disrupt Entrenched Values**

As data becomes available for decision-makers, they should incorporate it and change how they view the probability of being correct. Around 250 years ago, Rev. Thomas Bayes first described what is now called Bayesian probability analysis. With a Bayesian view, the arrival of new data changes what are called “priors” and helps make predicted outcomes more accurate. Perhaps one of the managers believes at her core that lawyers are superior to paralegals, which is her prior.

New findings alter how Bayesians view the world; they should cause thoughtful people to reconsider their animating beliefs. Those values might be modified, for example, by the arrival of client-satisfaction survey praise for paralegals relative to lawyers.

**Delay Premature Conclusions**

When making a decision, it is crucial not to seize upon the first plausible solution to the problem. Rather, we should keep exploring alternative possibilities. For example, it might be possible to modulate the flow of work coming in from clients, which would change the equation for adding new paralegals. Or you can gather new data by drilling to discriminate more finely among the data you already have (are certified paralegals more desirable than uncertified paralegals?).

Data helps generate new possibilities to address a problem or to encourage managers to think about the problem longer. Regarding paralegals, it could be useful to find out how the current paralegals allocate their time among various tasks. Or exit interviews may have created a database of germane metrics.

**Resist Peer Pressure**

In the context of a group decision, solid data can serve as a talisman against the blandishments of a rhetorician, the rigidity of a zealot or the power of a senior executive. In the face of credible and on-point figures, it is harder to smooth-talk a contrary view, stick to unreasonable guns or ram a decision down a committee’s throat. “Yes, you’ve repeatedly said we don’t have cube space, but we’ve hotelled a dozen people for two years!” A manager who disagrees with the other two managers might be more inclined to buck them if some metrics back up the point.

This is not to claim that good data always means good decisions. It is to claim that operational data can help steer managers to reach a good decision more frequently than may now be appreciated. To be sure, the toughest decisions tend not to have decisive metrics. But even the gnarliest decisions – those that entangle personalities, traditions and long-range projections, or involve visceral disagreements over fundamental values – can benefit from whatever dollops of data are available.

**Rees Morrison** *is a partner at Altman Weil, Inc. He focuses on law department consulting and data analytics for lawyers. He recently published a book on LeanPub, “Data Graphs for Legal Management.” He can be reached at rwmorrison@altmanweil.com.*