Evolutionary, Mean-Field and Pressure-Resistance Game Modelling of Networks Security

The recently developed mean-field game models of corruption and bot-net defence in cyber-security, the evolutionary game approach to inspection and corruption, and the pressure-resistance game element, can be combined under an extended model of interaction of large number of indistinguishable small players against a major player, with focus on the study of security and crime prevention. In this paper we introduce such a general framework for complex interaction in network structures of many players, that incorporates individual decision making inside the environment (the mean-field game component), binary interaction (the evolutionary game component), and the interference of a principal player (the pressure-resistance game component). To perform concrete calculations with this overall complicated model we work in three basic asymptotic regimes; fast execution of personal decisions, small rates of binary interactions, and small payoff discounting in time. By this approach we construct a class of solutions having the so-called turnpike property.


Introduction
The issue of social security and crime prevention dominantly concerns the modern societies. In the traditional terrain of counter-terrorism, corruption and tax evasion, the corresponding authorities in charge struggle to deal with large populations of increasingly informed violating individuals (this term will be used interchangeably with the terms agents or small players). Reversely, in the recently emerging field of cyber-security, large groups of individuals aim to defend their private computers against a lurking cybercriminal (bot-net defence). Similar reasoning can be asserted for the citizens of a large city defending against a biological weapon (bio-terrorism). The rapid advance in means of interaction, communication and exchange of information has established the individuals' social network as a decisive parameter of their strategies in the above and similar instances. Here we consider agents who are organized in specific social or phenotypic (or even geographical), and behavioural network structures. The central focus of this paper is to investigate the evolution of the complex process where a (very) large number of interacting individuals, susceptible to engage in or be affected by criminal behaviour, decide their strategies subject to a benevolent, or respectively to a malicious, major player's (this term will be used interchangeably with the term principal) pressure, to their individual optimization criterion, and to their (social) environment's influence.
In the real life scenaria we aim to capture with our approach, it is natural to distinguish two main dimensions of structure. The first dimension refers to the individuals' objective distribution among different levels of social, bureaucratic, or phenotypic hierarchy, or to their geographical distribution, in general to any finite partition according to their independent characteristics. One can think for example of tax payers of different bands, employees of different grades, or infected computers/individuals of different degrees. The second dimension refers to the agents' distribution among different types of strategy or behaviour, subject (mainly) to the agents' individual control; say for example the level of tax evasion in the field of inspection games, the extent of bribery acceptance in the field of corruption games, or the level of defence against terrorist activity or a malware in the fields of counter terrorism and cyber-security respectively.
Note that our game theoretic approach is developed under the basic idea of a very large number of non-cooperative, interacting agents playing against (i.e. under the pressure of) a single major player. In principle, our model belongs to the wide class of non-linear Markov games, see e.g., Kolokoltsov (2010), combining under an extended scheme the pressure-resistance, the evolutionary, and the mean-field game approach.
The pressure-resistance terminology was introduced in Kolokoltsov (2014), where ideas captured from evolutionary game theory were extended, including the pressure of a principal player on a large group of interacting agents. Here, the pressure-resistance game component refers to the principal's interference generatin transitions solely on the first dimension of structure (e.g. a benevolent director being able to promote or downgrade interacting bureaucrats, computers and individuals getting infected or recovering subject to a cyber-criminal's and a bio-terrorist's activity respectively). This approach of major and minor players has also been considered for the analysis of mean-field type models, see, e.g., Huang (2010); Bensoussan, Chau and Yam (2016); Carmona and Zhu (2016).
The evolutionary game component refers to the agents' pairwise interactions, with a particular focus on the established social norms effect, potentially generating transitions on both dimensions of structure. For a general survey on the literature of population dynamics applications on game theory, that is, on evolutionary game theory, see, e.g., Smith (1988); Weibull (1997); Gintis (2000); Samuelson (2002); Hofbauer and Sigmund (2003); Taylor et al. (2004); Szabó and Fath (2007), and references therein. See also Friedman (1998Friedman ( , 1991 for specific application in economics. The mean-field game component refers to the agents' individual optimization controlled by their strategic position on the second dimension of structure, taking into account the entire population's behaviour. This element of 'globally' rational optimization introduces an additional level of complexity compared to Katsikas, Kolokoltsov and Yang (2016); Kolokoltsov (2014), where optimization strictly upon imitation of successful strategies on the basis of binary comparison of payoffs was considered (purely evolutionary approach). Mean-field games (MFG) were introduced by Larsy and Lions (2007), by analogy with the mean-field theory in statistical mechanics, and were also developed independently by Huang, Malhamé and Caines (2006) as large population stochastic dynamic games. In principle, they represent a natural extension of earlier work in the economics literature under the assumption of infinite number of players, see, e.g., Aumann (1964); Dubey, Mas-Colell and Shubik (1980) for static games, Jovanovic and Rosenthal (1988); Bergin and Bernhardt (1992) for dynamic games. The literature on mean-field games is growing really fast, see, e.g., Tembine et al. (2009);Cardaliaguet (2010); Caines (2013); Bensoussan, Frehse and Yam (2013); Carmona and Delarue (2013); Gomes and Saude (2014); Bauso, Tembine and Basar (2016), and references therein for a general survey.
Here we shall work in three asymptotic regimes; fast execution of the agents' personal decisions, weak binary interactions, and small discounting in time. The need to intro-duce this ternary asymptotic approach is revealed from the analysis of a similar setting conducted by Kolokoltsov and Bensoussan (2016), where the distribution of infection in a computers network with a malicious software controlled by a cyber-criminal was described by a stationary MFG model with four states. Whilst the three states model describing the distribution of corruption in a population of bureaucrats under the pressure of a benevolent principal that was studied by Kolokoltsov and Malafeyev (2015), is solved explicitly without any asymptotic simplifications, the introduction of a fourth state in Kolokoltsov and Bensoussan (2016) already increases the complexity significantly, such that the need to consider (though not as strongly as we do here) the assumption of large λ (fast decisions execution) is critical to obtain descent solutions.
Similarly, for the more complicated n × m states model we introduce here, the need to consider the three asymptotic regimes mentioned above becomes obvious. In principle, even without working in these regimes one can sometimes obtain explicit but extremely lengthy formulas, not revealing any clearer insights. But also form a practical point of view our asymptotic approach has clear interpretation. Indicatively, an infinitely large transition rate λ implies the natural process of immediate execution of personal decisions as long as they have already been taken, while a vanishingly small discounting coefficient δ dis implies a short planning horizon. Both of the models studied in Kolokoltsov and Bensoussan (2016); Kolokoltsov and Malafeyev (2015), as well as the extended approach presented in this paper, belong to the category of finite state space mean-field games that were initially considered by Souza (2010, 2013). See also Gomes, Velho and Wolfram (2014) specifically for socio-economic applications.
In addition to the above applications on corruption, and cyber-security, here we introduce the bio-terrorism interpretation, that is, the defence of a population against a biological weapon. The implementation of game theoretic methods to the analysis of terrorism has been vastly developed ever since the 1980s, with game theory allowing the investigation of different strategic interactions (e.g. terrorists vs government, terrorists vs terrorists, terrorists for sponsors, terrorists for supporters), see, e.g., Sandler (2005); Sandler and Arce (2007); Sandler and Siqueira (2009) and references therein. The additional pairing captured here is civilians vs a bio-terrorist.
We organize the paper as follows. In Section 2 we introduce our model, specifying explicitly the time-dependent and stationary MFG consistency problems. In Section 3 we solve the stationary problem in our proposed asymptotic regimes, and we show that the identified solution is a stable fixed point of the corresponding evolutionary dynamics. In Section 4 we obtain our main result; we construct the class of time-dependent solutions that stay in a neighbourhood of the identified stationary solution. In the terminology of mathematical economics, this stationary solution represents a turnpike (see, e.g., Kolokoltsov and Yang (2012); Zaslavski (2006)) for the class of time-dependent solutions.
In Section 5 we summarize our approach and our results. We shall stick here to the first instance.
We distinguish the following three structures.
Firstly, the decision structure (B, E D , λ), that is a non-oriented graph with the set of vertices B and the set of edges E D , where an edge e joins the vertices i and j whenever an agent is able to switch between states (h, i) and (h, j). Every such transition in B requires certain random λ-exponential time. For simplicity, a single parameter λ is chosen for all possible transitions. As mentioned, we shall mostly look at the asymptotic regime with λ → ∞. We take the agents to be homogeneous and indistinguishable, in the sense that their strategies and payoffs may depend only on their states, and not on any other individual characteristics. Hence, a decision of an agent in a state (h, b) at any time is given by the decision matrix u = (u hb→hb ), expressing his intention to switch from strategy b tob, for allb ∈ B such thatb = b. In our model, we consider agents without mixed strategies, that is, for any state (h, b) the decision vector (u hb→hb ) is either identically zero, when the agent does not wish to change strategy, or there exists one strategy b 1 = b such that u hb→hb 1 = 1, and all the other coordinates of (u hb→hb ) being zero, when the agent wishes to change from strategy b to b 1 .
Secondly, the pressure structure (H, E P , q jb→ib ), that is an oriented graph, where an edge e joins the vertices j and i whenever a major player has the power to upgrade or downgrade the small players from the hierarchy level j to i. In this case, coefficients q jb→ib represent the rates of such transitions in H, that is, every such transition requires certain q jb→ib -exponential waiting time. In general, these rates may depend on some principal's control (one can think of some parameter describing his/her efforts, for example his/her budget). We shall not exploit this version here.
Finally, we consider the evolution structure that characterizes the change in the distribution of states due to the agents' pairwise interaction (e.g. through exchange of opinions, fight with competitors, effect of established social norms etc.). This can be described by the set of rates q s s 1 →s 2 /N , by which an agent in state s can stimulate the transition of another agent from state s 1 to state s 2 . For instance, an honest agent (or even a corrupted one) may help the principal to discover, and therefore punish, the illegal behaviour of a corrupted agent. Note that transitions due to binary interaction can be naturally Here we shall ignore the behavioural element of the evolution structure. That is, we shall assume that transition rates q s s 1 →s 2 /N may not vanish only for two states s 1 , s 2 that differ strictly in their h-component. Moreover, since we shall work in the asymptotic regime of small binary interactions, it would be helpful to introduce directly a small parameter δ int discounting the power of these interactions. Then, we shall denote thereafter the corresponding transition rates by δ int · q s h 1 b→h 2 b /N .

Remark.
The evolutionary transitions in B represent an alternative to the individual transitions described by the decision structure (B, E D , λ), and can be considered negligible in the limit λ → ∞ that we shall look at here. Taking into account a behavioural evolution structure is more appropriate in the absence of a decision structure, which was the case developed by Kolokoltsov and Malafeyev (2015).
To introduce a more detailed description of our game theoretic framework, note that the states of the corresponding N players game are the N -tuples of the form, where each pair (h i , b i ) describes each of the N players position on the hierarchy and the behaviour axis respectively. Assuming that each player adopts a decision matrix u, then the system evolves according to the continuous time Markov chain introduced above, with the corresponding transitions rates as were specified. If we further specify the rewards for staying in each state per unit of time, the transition fees (or costs) for transiting form one state to another, as well as the terminal payoffs corresponding to each state for some finite terminal time, then we shall be working in the setting of a stochastic dynamic game of finite number of players.
As usual in a mean-field game approach, we are interested in estimating the approximate symmetric Nash equilibria. Assuming indistinguishable agents, the system's state space can be reduced to the set Z nm + of vectors n = (n ij ), i ∈ H, j ∈ B, where n ij denotes the number of agents in state (i, j), and N = ij n ij denotes the (constant) total number of agents.
Therefore, the initially introduced Markov chain reduces to the Markov chain on Z nm + , described by the time-dependent generator: where the unchanged values in the arguments of function F on the right-hand side are omitted. Equivalently, in the normalized version the system's state space can be reduced to the subset of the probability simplex Σ N n×m ⊆ R n×m , with vectors of the form x = (x ij ) = n/N , i ∈ H, j ∈ B, where each coordinate represents the occupation density (alternatively the occupation probability) of each state (i, j).
For the Markov chain on Σ N n×m , generator (1) can be rewritten in the equivalent form: where {e ij } is the standard orthonormal basis in R n×m . Assuming, additionally, that f is a continuously differentiable function on Σ N n×m , and taking its Taylor expansion, in the limit of infinitely many small players N → ∞, we find that the above generator eventually converges to: or equivalently to the form: This is a first order partial differential operator, that generates a deterministic Markov process, whose dynamics are governed by the characteristic equations of L t : These calculations make the following result plausible: is the decision matrix that may depend on time, the evolution of x is given by system (5).

Remark. For a rigorous explanation (not just the formal description that we provide
here) of the Markov chain's convergence to the deterministic process given by (5), see, e.g., Kolokoltsov (2012).
The above general structure is rather complicated. To deal effectively with this complexity, one can distinguish two natural simplifying frameworks: (i) the set of edges is ordered and only the transitions between neighbours are allowed, (ii) the corresponding graph is complete, so that all transitions are allowed and have comparable rates. We shall choose the second alternative for B, and the first alternative for H thinking of it as an hierarchy of agents. Moreover, we shall assume that the binary interaction occurs only within a common level in H, ignoring the binary interaction between the agents in different levels of the hierarchy structure. Therefore, for the transition rates q ij→i+1,j of the pressure structure increasing in i ∈ H, we introduce the shorter notation q + ij , and for the transition rates q ij→i−1,j decreasing in i ∈ H, we introduce the notation q − ij . Accordingly, for the transition rates q ik ij→i+1,j of the hierarchical evolution structure increasing in i ∈ H, we introduce the shorter notation q +k ij , and for the transition rates q ik i→i−1,j decreasing in i ∈ H, we shall use the notation q −k ij . Figure 1: The simplified version of our network: only the transitions between neighbours are allowed in H, all transitions are allowed in B, binary interaction occurs only within a common level in H.
Applying the above simplifications, the kinetic equations (5) reduce to the following system: Note that equations (6) hold only for the internal states (i, j), i ∈ H, j ∈ B, such that i = 1, |H|. On the contrary, for the boundary states (i, j) the terms involving downgrading to i − 1 and upgrading to i + 1 respectively are omitted. In particular, we consider: Additionally, to simplify further the final explicit calculations, for all i ∈ H, j ∈ B, we shall consider the constraint: which can be interpreted as a detailed balance condition; it actually asserts that the number of downgrades is compensated in average by the number of upgrades.
Remark. An alternative simple (and analogously manageable) model allows the principal either to move an agent one-step upward in the hierarchy with rates q + ij and q +k ij respectively, or send an agent directly down to the lowest state with rates q d ij and q dk ij respectively.
In this case the system describing the evolution of occupation densities becomes, for i = 1: with an obvious modification for i = 1.
To identify the agents' optimal decision vector, we need first to define certain game characteristics such as the state rewards and the transition costs. In particular, we assign the reward w ij per unit of time to an agent for staying in state (i, j), the fee/cost f B kj for an agent's elective transition from state (h, k) to state (h, j) (which we assume independent of h for brevity), and the fine/cost f H j for an agent's enforced transition from state (j, b) to state (j − 1, b) (which we assume independent of b for brevity). Let, additionally, g ij = g ij (t) be the payoff corresponding to the state (i, j) in the process starting at time t and terminating at time T . Then, for an infinitesimally small time step τ , and assuming that g(t) is differentiable in time, an agent at state (i, j) decides his/her strategy targeting to optimize the expression: Taking the Taylor expansion specifically of the term g ij (t + τ ), and omitting terms of order O(τ 2 ), the above optimization equation turns into the form: In the limit of infinitesimally small time step τ → 0, equation (11) implies the evolutionary Hamilton-Jacoby-Bellman (HJB) equation, that is satisfied by the individual optimal payoffs g ij . A rigorous derivation of the HJB equation can be found in every standard textbook on dynamic programming and optimal control, see, e.g., Kamien and Schwartz (1991). In particular for stochastic dynamic programming, see, e.g., Ross (2014).
The above yields the following result: Proposition 2. Given the Markovian interaction we introduced above consisting of the decision, the pressure-resistance and the evolution structure, if g ij = g ij (t) denotes the payoff to an agent at state (ij) in the process starting at time t and terminating at time T , and subject to a given evolution of the occupation density vector x given by (6), these individual optimal payoffs g ij (t) will satisfy the following evolutionary HJB equation: As above, note that equations (12) hold only for the internal states (i, j), i ∈ H, j ∈ B, such that i = 1, |H|. For the boundary states (i, j) the terms involving transitions to i − 1 or from i + 1 respectively are omitted. Indicatively, for i = 1 it is: We consider here the optimization problem of estimating the discounted optimal payoff (alternatively one can look for the average payoff in a long time horizon). Hence, assuming the discounting coefficient δ dis for future payoffs, the evolutionary HJB equation for the discounted optimal payoff e −δ dis ·t · g ij (t) of an agent at state (i, j), with any finite planning horizon T , can be written as: The basic mean-field game consistency problem states that, for some interval [0, T ], every agent will benefit from applying the same common control, that is, from adopting the same decision vector. In other words, the MFG consistency condition states that one needs to consider the kinetic equations (6) (i.e. the forward system), where the collective control is taken into account, and the evolutionary HJB equations (13) (i.e. the backward system), where individual controls are used, as a coupled forward-backward system of equations on a given time horizon [0, T ], complemented by some initial condition x 0 for the occupation density vector x, and some terminal condition g T for the optimal payoff g, such that x, g and the common u solve the aforesaid system. Our aim here is first to identify the solution of the stationary consistency problem, and then to investigate the general time-dependent problem, extending (if possible) our findings for the stationary problem. As mentioned, we shall work in three asymptotic regimes; fast execution of the agents' personal decisions, weak binary interactions, and small payoff discounting in time.

Stationary Problem
The stationary MFG consistency problem consists of the stationary HJB equation for the discounted optimal payoff e −δ dis ·t · g ij of an agent at state (i, j), with a finite time horizon: where the evolution given by (6) is replaced with the corresponding fixed point condition: By analogy with the time-dependent problem, for the stationary MFG consistency problem one needs to consider (14),(15) as a coupled stationary system. In the asymptotic limit of fast execution of individual decisions, λ → ∞, the terms in (14), (15) containing the transition rates λ should obviously vanish (otherwise they would 'explode' to infinity).
For a practical interpretation of this observation, one can think that if the execution of personal decisions is significantly fast, then in a stationary state no agent should be interested in switching strategy. In this case (14),(15) turn respectively into the form: and, supplemented by the consistency condition: for all i ∈ H, j, k ∈ B, such that x ij = 0. In fact, the consistency condition (18) ensures that all terms in (14) and (15) including elements of the decision matrix indeed vanish in (16) and (17) respectively, for all the occupied states. (16) and (17) are written respectively in the form:

Introducing further the auxiliary notationw
and, where the matrices A j , with the transpose matrix A T j , and E j (x), with the transpose matrix E T j (x), are given respectively by: and, We shall look further for the asymptotic regime with small rates δ int · q ±k ij . Therefore, starting with (20) we are looking for stationary solutions of the form: Substituting (23) into (20), and equating terms of the same order in δ 0 int , δ 1 int , we obtain respectively the following equations: where the notation E 0 j corresponds to the matrix E j containing only elements of order O(δ 0 int ) (we use respectively the notation E 0T j for the transpose matrix). (8) hold with all q + ij (or q − ij ) being strictly positive. We shall use the shorter notation, for i ∈ H : i = n, j ∈ B:

Assumption 1. Let the detailed balance condition
In the linear approximation of vanishing δ int , we end up with an uncoupled system.
Since different elements of B are also uncoupled, then (24),(25) can be solved separately for any j ∈ B. Looking at the zero order of small evolution transition rates, by (24), we have the following result: Proposition 3. Let Assumption 1 hold. Then, the rank of A j is exactly n − 1, while the kernel of A j is generated by the following vector: where we have introduced the notation x 0 j = i x 0 ij . Specifically, under the detailed balance condition A j is symmetric, and its kernel generated by (27) is proportional to the uniform Proof. Notice that system (24) is degenerate, as expected, since we are looking for nonnegative solutions satisfying j (x 0 1j + · · · + x 0 nj ) = 1. Thus, one of the n equations of (24) can be discarded, say for example the last one. Rewriting the system of the remaining (n − 1) equations by using the first equation, and then adding sequentially to each of the next (n − 2) equations their previous one, one eventually obtains the following system: This has an obvious solution, that is unique up to a multiplier, and is given by (27).
Alternatively, starting the exclusion from the last equation of (28), the solution to (24) is: Given now the detailed balance condition (8), and the non-degeneracy established by Assumption 1, one observes from (27), or (29), that for every strategy j ∈ B we have: We have shown that in the main order of small evolution rates δ int · q ±κ ij , x 0 * ij = x 0 * j /n is a fixed point of the evolution (6), along with the common control u com = (u ij→iκ = 0), ∀i ∈ H, ∀j, κ ∈ B, that is consistent with condition (18), and expresses the instantaneous execution of personal decisions. This will also be a stable solution of the stationary system, if x 0 * ij = x 0 * j /n is a stable fixed point of (6), for u com = (u ij→iκ = 0), ∀i ∈ H, j, κ ∈ B.

Assumption 2. For technical (computational) purposes only, let the hierarchy and the
strategy set be of the same size, i.e. |H| = |B| ⇒ n = m.
Using the above variables, we transform system (6) into the non-degenerate linear homogeneous system:ẏ where Λ is the block matrix: Each matrix ∆ has the same non zero entries −q − nn on its bottom row, while the rest of its elements are equal to zero. Note, as well, that the A j matrices are of dimension n × n, and each zero matrix to the right of an A j matrix is of dimension n × n · (n − j) − 1. The (n − 1) × (n − 1) matrix D is given by: Applying sequentially (starting with C 1 ≡ A 1 , setting in the next step C 1 ≡ A 2 etc.) the following block matrix formula: where C 1 , C 2 , and C 3 are n × n, m × n, and m × m matrices respectively, the determinant of Λ is given by: We further apply sequentially n − 1 times the elementary row operation of row addition on every n × n matrix A j , starting with row n and adding in each step row i to row i − 1.
Eventually, we transform A j into a lower triangular matrix of the form: with a single zero eigenvalue, and n − 1 negative eigenvalues −q − ij , for i = 2, . . . , n. Note that since A j are symmetric matrices (due to the detailed balance condition), the algebraic multiplicity of each of their eigenvalues is equal to the geometric multiplicity.
Regarding the (n − 1) × (n − 1) matrix D, and bearing in mind the detailed balance condition, we apply once the elementary row operation of adding row n − 1 to row n − 2, and then, we apply sequentially n − 2 times the elementary column operation of adding column i to column i + 1, starting with column 1, to eventually transform D into the following lower triangular form: with n − 1 negative eigenvalues −q + in , for i = 1, . . . , n − 1. In total, we find that matrix Λ has one zero eigenvalue of algebraic multiplicity n − 1, and n · (n − 1) negative eigenvalues. Now it is trivial to transform Λ into a block diagonal matrix, subtracting sequentially from each column i, ∀i = {1, . . . , n · n − n}, each column j, ∀j = {n · n − n + 2, . . . , n · n − 1}.
For a block diagonal matrix, both the algebraic and the geometric multiplicity of an eigenvalue is given by adding the multiplicities from each block. Then, for the block matrix Λ the algebraic multiplicity of the zero eigenvalue is equal to its geometric multiplicity.
We, thus, have the following result: Lemma 1. Let the Assumptions 1, 2 hold. Consider the linear systemẏ = Λ · y as defined above. The solution to this system, that is the vector x 0 * ij = x 0 * j /n given by Proposition 3, is stable (but not asymptotically stable) since Λ has n · (n − 1) negative eigenvalues, and a single zero eigenvalue whose algebraic multiplicity equals to its geometric multiplicity.
The third asymptotic regime we shall look at is that of small discounting δ dis . Obviously, no payoff discounting terms appear in the stationary kinetic equations (17). Moving to the stationary HJB equation (16), or (19), initially we are looking for solutions of the form: Substituting (37) into (19), and equating terms of zero order in δ int and δ dis , we get the equation: In general, equation (38) has no (non-degenerate) solution, since (by Proposition 3) the kernel of the symmetric matrix A T j = A j is one dimensional, implying that the image of the transpose matrix A T j is (n − 1) dimensional (by the rank-nullity theorem). More precisely, equation (38) has no solution if: Thus, to remain in the non-degenerate regime, we need to introduce additionally the following assumption; Assumption 3. For every strategy j ∈ B the following is true; As a result, we are looking next for solutions of (19) in the form of the expansion: Recall that we are looking at the asymptotic regime with small δ int (weak binary interaction), and small δ dis (small discounting). One needs to distinguish clear assumptions on the relation between the small parameters δ int and δ dis , for a full perturbation analysis.
In principle, three basic regimes can be naturally identified: ID 1 : Interaction is relatively very small, i.e. δ dis = δ and δ int = δ 2 .
We initially concentrate on the ID 1 regime. Substituting (40) into (19), and equating terms of order δ −1 , δ 0 , δ 1 , we find respectively the following equations: The first equation in (41) tells us that g 0 ij belongs to the kernel of A j (since A j = A T j ), that is, for arbitrary constants a j ∈ R, we get: The second equation in (41) tells us thatw ij − g 0 ij belongs to the image of A j , which coincides with the orthogonal compliment to Ker(A j ), given the identity: Besides, from Proposition 3 we find that the orthogonal compliment to Ker(A j ) is: In this case, the fact thatw ij − g 0 ij ∈ Im(A j ) further implies that: Looking at the third equation in (41), and noting that E 0T g 0 ij = 0 for a uniform g 0 ·j , we conclude that g 1 ij ∈ Im(A j ) as well, that is, g 1 ij ∈ Ker ⊥ (A j ). Thus, to identify g 1 ij we need to invert A j on the reduced (n − 1) dimension of Ker ⊥ (A j ).
Lemma 2. Let Assumption 1 hold, and let y ∈ Ker ⊥ (A j ). Then all solutions z to the matrix equation A j · z = y are given by the formula: ∀i = 1, with arbitrary z 1j . There exists a unique solution z ·j ∈ Ker ⊥ (A j ) specified by: Notice that formulae (45) and (46) yield z ij = g 1 ij when y ij =w ij − g 0 ij . In particular, for g 1 ij we find the explicit expression: where 1(·) is the indicator function.
Regarding the consistency condition (18), in the main order in small δ it can be written in the equivalent form: for all i ∈ H, k, j ∈ B. Given thatw ·,· does not depend on δ, this leads to the interesting result that in the equilibrium of the asymptotic regime of small δ, only those strategic levels j ∈ B are occupied (that is, x 0 j = 0), where the sum iw ij obtains its maximum. For simplicity, let us further consider the following assumption; Assumption 4. There exists a unique behavioural level b ∈ B, such that: for all j ∈ B, such that j = b.
Note that Assumption 4 implies that in any equilibrium x * , with δ sufficiently small, all terms with j = b become irrelevant for the analysis.
We, thus, have the following result: Proposition 4. Let Assumptions 1, 2, 3 and 4 hold. Consider the ID 1 regime. Then, the solution to the stationary problem described by (16), (17) and (18), in the main order in small δ, is given by: where x 0 * ij is a stable fixed point of (6).
Remark. If we continue in the next order of our perturbation analysis (subsequently in the second next order, and so forth) we can obtain explicit approximate solutions with arbitrary precision.
Next we consider the ID 2 regime. In this case, we look at the solutions to (17) in the next order with respect to small δ. In view of (50), we write (25) in the form: where i x 1 ib = 0, and the usual convention for the boundary terms, i = 1, n, apply. Note that the right-hand side of Equation (51) belongs to Ker ⊥ (A j ), implying that: Moreover, given that x 1 ib ∈ Ker ⊥ (A j ), we can identify x 1 ib applying Lemma 2. Formulae (45), (46) Regarding the solution to (19) in ID 2 , substituting (40) into (19), and equating terms of order δ −1 , δ 0 , δ 1 , we get respectively the following equations: where the notation E 1 j corresponds to the matrix E j containing only elements of order O(δ int ) (we use respectively the notation E 1T j for the transpose matrix).
The first two equations in (53) are identical with the corresponding equations in (41) (recall that E 0T j g 0 ij = 0 for a uniform g 0 ij ), and provide the same results expressed through (42), (44). Looking at the third equation in (53), and noting that E 1T , implying that g 1 ij can be uniquely identified through formula (45) of Lemma 2, with z ij = g 1 ij and y ij =w ij − g 0 ij , under the condition: Last we consider the ID 3 regime. Substituting (40) into (19), but equating now terms of order δ −2 , δ −1 , δ 0 , we get the equations (in analogy to (41), (53)): Again, the first and the third equations in (55) lead to the same results with the first and the second equations in (41), namely to (42) and (44) respectively, while the second equation in (55) always holds for a uniform g 0 ij .
We, thus, have the following result: Proposition 5. The solution to the stationary consistency problem in the main order in small δ in ID 2 and ID 3 , is the same with the one identified in Proposition 4 for ID 1 .

Time-dependent Problem
The solution to a non-linear Markov game of mean-field type like the one we consider here (on a finite time horizon), defines an epsilon-Nash equilibrium of the corresponding game with a finite number of players, see, e.g., Basna, Hilbert and Kolokoltsov (2014).
Having identified the solution to the stationary MFG consistency problem, we need next to look at the time-dependent consistency problem in order to validate our results for initial/terminal conditions other than those given by the solution of the stationary problem.
We further need to investigate the stability of the fixed point x 0 * ij (see Lemma 1) without necessarily assuming that from the very beginning all players apply the same stationary control u com = (u ij→iκ = 0).
For the full time-dependent problem, the HJB equation for the discounted optimal payoff e −δ dis ·t · g ij (t) of an individual at state (i, j) with any planning horizon T is given by (13), where the occupation density vector x = (x ij ) is also time varying. For definiteness, we shall focus on the ID 1 regime (the same method applies for ID 2 , ID 3 regimes). Our aim is to show that by fixing the control u iα→iβ = 0 in (13), ∀i ∈ H, α, β ∈ B, the solution to the resulting system: will be consistent, that is, the control u iα→iβ = 0 will indeed give a maximum in (13) in all times. Fixing the control u iα→iβ = 0, ∀i ∈ H, α, β ∈ B, is actually equivalent to assuming that: Our aim here is to show that starting with a terminal condition belonging to the cone defined by (57), we shall stay inside the cone for all t ≤ T . Therefore, it is sufficient to show that on the boundary of this cone the inverted tangent vector of (56) is never directed outside the cone. The necessary condition that needs to be satisfied for this to be true for any boundary point g jβ − f B αβ = g jα is the following: where, We substitute into (59) g ij and x ij , taken from (40) and (23) respectively. Assuming, then, that f H j is independent of δ, and equating terms of similar order, in the main order (58) is equivalent to (recall that we are in the ID 1 regime): . (60) Note that in the main order O(δ −1 ) in small δ (assuming that f B αβ is also independent of δ) for the specified boundary point of the cone, we get: while for all the other i ∈ H, such that i = j, will be: Combining (61) and (62) we obviously get: and rewriting (60) in the equivalent form: we check that condition (64) is satisfied when q iα ≤ q iβ , ∀i ∈ H (the first term is smaller or equal than the third term, the second term is smaller or equal than the fourth term).
But also for the case when q iβ < q iα , ∀i ∈ H, rewriting (63) in the equivalent form: we check that (60) is satisfied (again the first term is smaller or equal than the third term, the second term is smaller or equal than the fourth term).
Thus we obtain the following main result: Theorem 1. Let Assumptions 1, 2, 3 and 4 hold. Assume additionally, ∀α, β ∈ B, that: Then, for sufficiently small discounting δ dis = δ, and relatively smaller binary interaction coefficient δ int = δ 2 , in the main order in small δ, for any T > t, and for any initial occupation probability distribution x(t), and any terminal payoffs such that: there exists a unique solution to the time-dependent discounted MFG consistency problem such that the control u is stationary, and is given by u iα→iβ = 0, ∀i ∈ H, ∀α, β ∈ B, x(s) stays near the fixed point of Proposition 4 as s → T , and g ij (s) stays near the stationary solution of Proposition 4 (almost for all time), for large T − t.

Discussion
In this paper we formulate the interaction of a large number of small players under the pressure of a major player (principal), on n-dimensional arrays, having in mind the paradigm of individuals defending against a bio-terrorist; alternatively, the similar context of corrupted tax inspectors against a benevolent authority. The n-dimensional arrays dual structure naturally describes on the one hand the distribution of individuals among m levels of 'behaviour' (e.g. levels of defence) and on the other, their distribution according to a phenotypic characteristic among n levels of 'hierarchy' (e.g. levels of infection).
Transitions on the first network structure are mainly subject to the individuals' control, while transitions on the second are mainly subject to the principal's pressure. Transitions on both structures may as well be an outcome of the individuals' binary interactions. Our model is a performance of a finite state non-linear Markov game combining mean-field, evolutionary, and pressure-resistance types of interaction. For our analysis, we consider the discounted mean-field game consistency problem. According to the general framework of mean-field games we analyse the forward-backward system of coupled equations, the kinetic equations governing the evolution of the individuals' distribution among the n × m states (forward equation), and the Hamilton-Jacobi-Bellman equation giving the individuals' optimal payoff (backward equation). We solve the stationary problem and we provide a link of the stationary solution to the time-dependent problem. For simplicity, we work in the asymptotic regimes of fast execution of personal decisions, weak binary interactions, and small payoff discounting in time. Considering a stationary control that is consistent with the assumption of fast execution of personal decisions, in the main order of small payoff discounting in time (or in the main order of weak binary interactions), we find that individuals will be uniformly distributed among the 'behaviours' of the unique 'hierarchy' level where the sum of rewards is maximised, and we obtain the optimal payoff as a function of these rewards. We show that there is a unique solution to the time-dependent problem, that is very close to the stationary solution. Our simplifications, while necessary for concrete calculations, represent the first step towards a more comprehensive treatment of the game that we have introduced and explicitly formulated.