The 20-60-20 Rule

In this paper we discuss an empirical phenomena known as the 20-60-20 rule. It says that if we split the population into three groups, according to some arbitrary benchmark criterion, then this particular ratio implies some sort of balance. From practical point of view, this feature often leads to efficient management or control. We provide a mathematical illustration, justifying the occurrence of this rule in many real world situations. We show that for any population, which could be described using multivariate normal vector, this fixed ratio leads to a global equilibrium state, when dispersion and linear dependance measurement is considered.


Introduction
The Pareto principle, also known as the 80-20 rule is one of the most commonly recognized empirical fuzzy principles (cf. [11] and references therein for a historical background). On intuitive level, this empirical rule states that about 80% of the effects come from about 20% of the causes [5]. It is a special type of power law, when (usually) the Pareto distribution is considered, with parameter α ≈ 1.161. The 80-20 rule phenomena can be seen in many real world situations. Let us mention just few of many situations, when this particular ratio could be observed: 80% profits come from 20% of customers; 80% of land is owned by 20% of people; 80% of complaints came from 20% of customers; 20% of all books in the library account for 80% of the library circulation; 80% of decisions are made in 20% of time. See e.g. [5] for more examples.
As the Pareto principle is strongly related to the power law distribution it is not directly connected to the Gaussian distribution, often used in real world models (e.g. in financial mathematics). In this paper we present another principle, which origins could be sought in the Normal distribution.
The 20-60-20 principle is used commonly in the management theory. Many of it's applications (e.g. to leadership or time management) are connected to people behaviour or choices. It is usually present, when the underlying population tend to follow multivariate normal distribution (e.g. when people's measures are considered). Intuitively saying, in many problems, we can split the whole population into three groups: 1. Positive group. This is the group which positively contribute to the considered problem. This might relate to efficient workers, productive members, effective days or people which are fastest, tallest, etc. In our framework, this relate to top 20% members of the whole population.
2. Neutral group. The impact of this group is uncertain. It's performance is average. The average 60% of the population is here.
3. Negative group. This group has negative impact. This might relate to inefficiency or unproductivity. The bottom 20% of the population is considered here.
This relates to a simple fuzzy logic system, when three states are considered or putting it another way we cluster the population basing on a fuzzy notion of effectiveness. Of course one might find countless interpretations, where those groups appear naturally. Let us present selected examples.
• In the financial markets, one can consider the overall condition of the market, when the measurement is done daily, according to market's growth. The upper quantile is often connected to market boom (bull market), the lower quantile is associated with crisis period (bear market), while the middle part corresponds to normal times. The ratio of 20, 20 and 60 is often used to split times into those three regimes.
• When one want to measure the performance of employees in the sale department, usually three groups emerge. The first one consist of top performers, which could operate without any manager, making big profits. The middle group requires supervision and management but they do contribute to the company, making average profits. The last group are people who are heading toward resignation or termination, producing no good income, even when supervised.
• Approximately 20% of employees of any corporation commit internal fraud or theft. Another 20% do not commit fraud of theft no matter the regulations and punishment. The middle 60% are hesitating, those are the ones, which a manager needs to control.
• In any sport club usually one could divide the people into three groups. One of the group consist of strong performers (e.g. best athletes), which can produce good results and are crucial for the club, winning contests, scoring goals etc. The middle group is related to average members (e.g. important when team work is needed), while the lower group consist of members who perform poorly.
• Approximately, if you make any statement in the group, then on average 20% of people will immediately be on board with whatever you are saying, 20% of the people will be opposed to whatever you are saying and 60% of the people will be hesitating, i.e. they could be influenced one way or the other depending on future interactions.
• Many celebrities, musicians or politicians want to know, how the structure of their fans or potential electorate look like. It appear that very often, among people who get to know you, there are three groups: 20% of the people will like you from the start and become your fans immediately. On the other hand 20% of the people will not like your appearance, work or agenda. The key middle 60% group will be hesitating.
• Within each corporation we can consider so called change capability. On average 20% of the people are ready, willing and able to change, while 20% of of people wouldn't change, whatever the cost.
The middle 60% will wait to see how the situation turns out.
The 20/60/20 ratio seems to create a fixed pattern, valid in many real world situations. While very popular among practitioners (cf. [12,4,6,9,1] and references therein), due to authors knowledge, no strict mathematical interpretation of this phenomena has been provided. This will be the main topic of this paper. We will discuss the occurrences of the 20-60-20 rule in probability theory and statistics. In more details, we will show that if a (multivariate) random vector is distributed normally and we do conditioning based on the (quantile function of) first coordinate, then the ratio close to 20/60/20 imply a global equilibrium state. In particular, we prove that this particular partition implies the equality of covariance matrices, for all conditional vectors. This implies that the dependence structure within positive group, neutral group and negative group is similar, implying some sort of a global balance.
The material is organized as follows. The introduction is followed by a short preliminaries, where we establish basic notations used throughout this paper. Next, in Section 3 we introduce a mathematical model for the 20-60-20 rule and give a definition of the equilibrium state, using conditional covariance matrices. We also present here the main result of the paper, Theorem 1. Section 4 is devoted to the study of different equilibrium states, obtained using correlation matrices, Kendall τ matrices and Spearman ρ matrices. In particular we present here some theoretical results, when Spearman ρ matrices are considered and a numerical example, illustrating the 20-60-20 rule for sample data. In Section 5 we discuss shortly what happens if we loose the assumption about normality. The general elliptic case is considered here. Concluding remarks are summarized in Section 6.

Preliminaries
Let (Ω, Σ, P) be a probability space and let n ∈ N. Let us fix an n-dimensional continuous random vector X = (X 1 , . . . , X n ). We will use to denote the corresponding joint distribution function and to denote the marginal distribution functions. Given a Borel set B inR n such that P[{ω ∈ Ω : (X 1 (ω), . . . , X n (ω)) ∈ B}] > 0 we can define the conditional distribution H B for all (x 1 , . . . , x n ) ∈ B by Putting it in another words we truncate the random vector X to the Borel set B. If necessary, we assume the existence of regular conditional probabilities. In this paper we will assume that B is a non-degenerate rectangle, i.e. B ∈ R, where where a n , b n ∈R and a n < b n }.
As we will be mainly interested in quantile-based conditioning on the first coordinate, for p, q ∈ [0, 1] such that p < q, we shall use notation where the conditioning set is given by We shall also refer to H [p,q] as the truncated distribution, while B(p, q) will be called truncation interval (see [8]). Moreover, we will denote by µ = (µ 1 , . . . , µ n ) and Σ = {σ 2 ij } i,j=1,...,n , the mean vector and covariance matrix of X. Similarly as in formula (1), given B, we will use µ B and Σ B to denote the conditional mean vector and the conditional covariance matrix, respectively. Notation µ [p,q] and Σ [p,q] , introduced in (2), will be also used.
Finally, we shall write X ∼ N (µ, Σ), if X is normally distributed with mean vector µ and covariance matrix Σ. We will also use Φ and φ to denote the distribution and density function of a standard univariate normal distribution, respectively.

20-60-20 Rule
We want to split our population into three separate groups -the best outcomes, the average ones and the worst ones. The partition will be based on the quantile function of selected benchmark (decision) criterion, described by a random variable. Without loss of generality, we will assume that the separation is made according to the first coordinate of X. Please note that for X ∈ N (µ, Σ) it might be a linear combination of all other coordinates. In the Gaussian framework, due to symmetry of multivariate normal distribution, the natural choice of conditioning sets is for some q ∈ (0, 0.5). For a given X we want to find the value of q, such that the corresponding conditional random variables, with conditional distributions H [0,q] , H [q,1−q] and H [1−q,1] , will admit some sort of balance. Of course the value of q might depend on the distribution of X.
We have decided to define the equilibrium, as the state for which the conditional covariance matrices Σ [0,q] , Σ [q,1−q] and Σ [1−q,1] coincide. To measure the distance between matrices, we will use Frobenius matrix norm defined as Definition 1. Let us assume that X is symmetric. 1 We will say thatq implies quasi-equilibrium state of X, if Similarly, we will say thatq implies equilibrium state of X, if The intuition, which goes behind Definition 1 is rather straightforward. For a given X, the existence of equilibrium state implies the existence of the partition (based on a given benchmark), such that: 1. The conditional variances of X 1 coincide. Thus, looking only at the benchmark, the average distance from the conditional mean observation (in all three subgroups) will be the same. (Note that conditional variances of X i also coincide, for i = 2, 3, . . . , n.) 2. The conditional correlation matrices will coincide. Thus, the linear dependence within each subgroup will be the same.
Similarly, the quasi-equilibrium state relates to the situation, when the covariance matrices are close to each other. We think that the linear measure of dependence is a good choice of equilibrium measure, as it describes the types of dependencies, which are easy to see. 2 Of course the conditional distribution of X 1 as well as the dependence structure (often described by the copula function [7]) in each group will not be the same in general, even for multivariate normal. See Figures 1 and 2 for an illustrative example. We are now ready to present the main result of this paper.
. Then there exists a unique q ∈ (0, 0.5), which imply equilibrium state of X. Moreover the value of q is independent of µ and Σ.
The proof of Theorem 1 is a direct consequence of Lemma 1 and Lemma 2, which we will now present and prove.
Proof. Being in Gaussian world we can describe each random variable X i as a combination of the random variable X 1 and a random variable Y i independent of X 1 . Indeed, we put for i = 1, . . . , n Obviously β 1 = 1 and Y 1 = 0. Since for i = 2, . . . , n, the newly defined variable Y i is uncorrelated with X 1 , they are independent. Next, we calculate the conditional covariance matrix. Using (5), we get for i, j = 1, . . . , n Since Y i and Y j do not dependent on X 1 , we get Therefore, we obtain Since β i β j is the i, j-th entry of the n × n matrix ββ T , we finish the proof of the lemma. Corollary 1. Let X ∼ N (µ, Σ). Let q ∈ (0, 0.5) be such that Then . Then there exist a unique q ∈ (0, 0.5) such that Moreover, q = Φ(x), where x < 0 is the unique negative solution of the following equation The approximate value of q is 0,198089616....
Proof. Without any loss of generality we may assume that X 1 has the standard normal distribution N (0, 1). Indeed, for X st 1 = X1−µ1 σ11 , and a, b ∈ [0, 1], such that a < b, we get To proceed, we need to compute the first two moments of the truncated normal distribution of X 1 . For transparency, we will show full proofs (compare [8, Section 13.10.1]).
Let us calculate the conditional expectations E[ To get the corresponding second moments we integrate by parts. Therefore, .
Since the conditional expected value behaves like a weighted arithmetic mean, we get that are strictly decreasing with respect to x. Consequently, the central conditional variance D 2 [X 1 | x < X 1 < −x] is strictly decreasing. Next, we will show that the tail conditional variance D 2 [X 1 | X 1 < x] is strictly increasing. Indeed, The last inequality follows from the fact that since φ(x) Next, note that (compare [7,Lemma 8 Hence there exists a unique x < 0 such that Compare Figure 3 for visualization. Moreover, which shows that x is a (negative) solution of equation (6). Using basic numerical tools we checked that (6) is satisfied for x ≈ −0, 8484646848, for which q = Φ(x) ≈ 0, 198089615.  N (0, 1).

Remark 1.
The equilibrium level q calculated in Lemma 2 depends neither on µ nor Σ. Therefore, if we consider correlation matrices instead of covariance matrices in (3) and (4), then the optimal value of q from Theorem 1 will also imply the corresponding equilibrium state (4), for correlation matrices. 3 Theorem 1 suggests, that if we split our population into three groups: the worst 20%, the average 60% and the best 20%, the equilibrium will be achieved. Hence the name 20-60-20 rule is (mathematically) justified.
Remark 2. The value Σ [0,q] − Σ [q,1−q] F , for q ≈ 0.198, might be used to test how far X is from a multivariate normal distribution. This test is particularly important, as it shows the impact of the tails on the central part of the distribution, as usually (for empirical data) the dependence (correlation) structure in the tails significantly increases, revealing non-normality. See [2,7], when similar approach is applied to measure contagion effect among different financial markets.
Remark 3. We can also consider more than three states in fuzzy logic system (e.g. having 5 states we might relate to them as critical, bad, normal, good and outstanding performance, based on selected benchmark). The ratios, which imply equilibrium state (similar to the one from Definition 1) for 5 and 7 different states are close to

Taking different equilibrium norm
Instead of measuring the distance between conditional covariance matrices, and thus the linear dependence structure between random variables, we can measure the dependence using different measures. Among most popular ones are so called measures of concordance, where Kendall τ and Spearman ρ are usually picked representatives for two dimensional case (see [10, Section 5] for more details). Instead of measuring the linear dependence, they focus on the monotone dependence, being invariant to any strictly monotone transform of a random variable. Thus, instead of covariance matrices Σ [0,q] and Σ [q,1−q] in (3) and (4) we can consider the corresponding matrices of conditional Kendall τ 's and conditional Spearman ρ's, , respectively. For comparison, we also consider conditional correlation matrices, for which we use notation Σ r [0,q] and Σ r [q,1−q] . Next, we consider numberŝ q τ = arg min which define the corresponding modifications of quasi-equilibrium state defined in (3) and equilibrium state defined in (4) 4 . For X ∼ N (µ, Σ), the valuesq τ andq ρ also seem to be very close to 0.2, for almost any value of µ and Σ. To illustrate this property, we have picked 1000 random covariance matrices {Σ i } 1000 i=1 for n = 4 5 and computed the values of functions To do so, for each i ∈ {1, 2, . . . , 1000} we have taken 1.000.000 Monte Carlo sample from X ∼ N (0, Σ i ) and computed values of (10), (11) and (12) using MC estimates of the corresponding conditional matrices.
The graphs of f i r , f i τ and f i ρ for i = 1, 2, . . . , 50 are presented in Figure 4. In Figure 5, we also present the smoothed histogram function of points , for which the minimum is attained in (10), (11) and (12) for i = 1, 2, . . . , 1000.
Unfortunately, in general the valuesq τ andq ρ defined in (8) and (9) are not constant and independent of Σ. In particular, if the dependance inside X is very strong, e.g. the vector (X 1 , X 2 , . . . , X n ) is almost comonotone, then the values ofq τ andq ρ might increase substantially.
. For each i = 1, 2, . . . , 1000 a 1.000.000 sample from N (0, Σ i ) was simulated and the corresponding estimates of conditional matrices were used for computations.
To illustrate this property, let us present some theoretical results, involving conditional Spearman ρ and Kendall τ . For simplicity, till the end of this subsection, we will assume that n = 2.
Lemma 3. For all 0 ≤ p < q ≤ 1 and r ∈ (−1, 1), Proof. Before we begin the proof, let us recall some basic facts from the copula theory (cf. [10] and references therein). We will use C r to denote the Gaussian copula, with parameter r ∈ (−1, 1), which coincides with the correlation coefficient. Noting, that the copula could be seen as a distribution function (with uniform margins) let us assume that (U, V ) is a random vector with distribution C r . We will denote by C r [p,q] the copula of the conditional distribution (U, V ) under the condition U ∈ [p, q], where 0 ≤ p < q ≤ 1. Due to Sklar's Theorem we get the following description of C r [p,q] : Next, it is easy to notice, that the distribution function of (U, 1 − V ) is equal to C −r . Hence the Gaussian copulas commute with flipping, i.e.
On the other hand the flipping transforms the conditional distribution (U, . Thus basing on [10, Theorem 5.1.9], we conclude We recall that the Spearman ρ and Kendall τ of the conditional copula C r [p,q] are given by formulas: To describe their behaviour for small r we will need their Taylor expansions with respect to r. Proposition 1. For a fixed p, q ∈ (0, 1) (p < q) and r ∈ (−1, 1), such that r is close to 0, we get where x 1 = Φ −1 (p) and x 2 = Φ −1 (q).
Proof. We will use notation similar to the one introduced in Lemma 3. The proof will be based on two facts. First, for r = 0 both C and C [p,q] are equal to product copula Π(u, v) := uv, i.e.
Second, the derivative of the distribution function of a bivariate Gaussian distribution having standardised margins with respect to the parameter r is equal to its density, which implies We calculate the Taylor expansion of ρ [p,q] (r) at r = 0.
The derivative of C r [p,q] will be calculated in two steps. First we differentiate formula (13). We get Next, setting r = 0, we obtain Finally, we get The proof of the Kendall τ case follows from the symmetry We have Setting r = 0 we get ∂ρ q,p (0) ∂r .
Theorem 2. For r close to 0, we get where q * ≈ 0, 2132413 is a solution of the following equation Proof. If r = 0, then for any q ∈ (0, 0.5), we get that (18) is equal to 0, so for clarity we might set A κ (0) = q * . Using Lemma 3, without loss of generality, we might assume that r > 0. Due to Proposition 1, for small r, we get and a similar formula for τ .
Remark 4. When we consider the equilibrium state for conditional Spearman ρ matrices (or Kendall τ ), we only need to know the dependance structure of X, given by it's copula. Thus, we can set any marginal distributions of X 1 , . . . , X n , without changing the equilibrium. This allow us to consider much more general class of multivariate distributions, for which the 20-60-20 rule will hold.

Abandoning Gaussian world
When we loose the assumption that X ∼ N (µ, Σ), the existence of equilibrium is no longer guaranteed. A natural question is if for any elliptical distribution the equivalent of 20/60/20 rule holds. In this section we will discuss this matter shortly. We say that X has the elliptic distribution if it can be defined in terms of a characteristic function where µ is a vector (which coincides with mean vector, if it exists), Σ is a scale matrix (which is proportional to covariance matrix, if it exists) and Ψ is so called characteristic generator of the elliptical distribution (cf. [3] and references therein for a general survey about elliptic distributions). For simplicity, we will use so called stochastic representation of an elliptic distribution. It is well known (see [3]) that if X has the density, then it is elliptic if and only if it can be presented as where √ Σ is any square matrix such that √ Σ t √ Σ = Σ (e.g. obtained using Cholesky decomposition), U is an n-dimensional random vector, uniformly distributed on the unit n-sphere, and R is a nonnegative random vector, corresponding to the radial density, independent of U . Moreover, we will assume that the first two moments of R exists, which ensures the existence of mean vector and covariance matrix of X. Now we can ask, if for given U and R the equilibrium state of X always exists and if it is invariant wrt. µ and Σ. Unfortunately, it is easy to show, that the equilibrium state (with covariance matrices) is not always achieved and the quasi-equilibrium state might strongly depend on Σ, even when we consider only the class of multivariate t-student distributions (i.e. we can consider appropriate radial distributions and covariance matrices in Algorithm 1).
On the other hand, if we substitute covariance matrices with correlation matrices in (3) and (4), then we are able to prove the results similar to Theorem 1 for a much more general class of elliptic distribution.
To illustrate this property, we have conducted simple computational experiment, using multivariate t-student distribution, as it is commonly used by practitioners. Assuming n = 4, for any ν ∈ {2, 3, . . . , 20} we have picked 100 random matrices Σ i ν and for each i = 1, 2, . . . , 100 we simulated 1.000.000 Monte Carlo sample, assuming X ∼ t ν (0, Σ i ν ). Next, we have calculated the values of q i ν ∈ (0, 0.5), for which (quasi-)equilibrium state is attained (i.e. for estimates of conditional correlation matrices; see Algorithm 1). In Figure 8 we present the graph of 0.1, 0.5 and 0.9 quantiles of the sample {q i ν } 100 i=1 for ν = 2, 3, . . . , 20. The value of q for which (quasi-)equilibrium state is achieved clearly depends on the degrees of freedom increasing to value 0.198, which coincides with equilibrium state for multivariate normal distribution (i.e. note that t-student distribution converge to normal distribution, when ν → ∞).

Concluding remarks
We have shown that if we consider a multivariate normal distribution, then the quantile-based conditioning on the first coordinate always imply equilibrium state, for q ≈ 0.1980 quantile. This provides a mathematical and statistical background to a fuzzy phenomena, commonly known as 20-60-20 rule. We have also shown, that instead of covariance matrices, one can consider different equilibrium norms (e.g. using Spearman ρ matrices), which imply similar results for multivariate normal vectors.