ON MEAN FIELD SYSTEMS WITH MULTI-CLASSES

. This work focuses on stochastic systems of weakly interacting particles containing diﬀerent populations represented by multi-classes. The dy- namics of each particle depends not only on the empirical measure of the whole population but also on those of diﬀerent populations. The limits of such systems as the number of particles tends to inﬁnity are investigated. We establish the existence, uniqueness, and basic properties of solutions to the limiting McKean-Vlasov equations of these systems and then obtain the rate of convergence of the sequences of empirical measures associated with the systems to their limits in terms of the p th Monge-Wasserstein distance.

1. Introduction. Originated from statistical physics, mean-field models are concerned with stochastic systems containing a large number of particles having weak interactions. To overcome the complexity of interactions due to the large scale of system, all interactions with each particle are replaced by a single average interaction normally represented by empirical measure associated to system.
Studying the limits of mean-field models as the sizes of the systems tend to infinity has been a long-standing problem and presents many technical difficulties. There have been a vast amount of works dealing with the limit of the empirical measures of mean field systems such as propagation of chaos, law of large numbers, fluctuations, phase transitions, and large deviations (see [6,11,13,23,27,28,29] among others and references therein). Recently, the past decade has witnessed renewed interests in mean-field models in game theory since the seminal works [19,22]. The mean field interaction has been used to model the weak interaction between players in large population games and the limiting results are used to construct computable decentralized strategies.
Along with this renewed interest in the classical models, the studies for some other type of mean-field models were also carried out; see for example, models with a major particle which has an important impact to all other particles [17,24,26], models with space noises [12,21], models with a common noise [8,20], models with jumps [1], the regime-switching models [25,31], and models with two-time scales [15]. Another type of mean-field models being investigated recently is the class of models with multi-classes. In these models, the particles come from finitely many different populations, types, or classes, which appear in social sciences [10], statistical mechanics [9], neurosciences [2], as well as finance [5]. In particular, in [10], the phase transition is studied for a multi-class mean field statistical mechanics model. Two different classes of particles are introduced to depict two interacting groups of spins. The model is then interpreted as a prototype of resident-immigrant cultural interaction. In [9], a two-population generalization of the classical mean field Ising model is considered. In [2], multi-class mean field model is used to describe the weak interaction of a large network of neurons of P different populations. In [5], the authors use multi-class mean field model to reformulate a portfolio optimization problem with a very large total number of stocks as a sector-wise allocation problem in which each sector can be interpreted as a mean field class. In mean-field games, systems with multi-classes are frequently used to to describe the heterogeneity of the population of agents (see [3,17,18,19,24]).
To the best of our knowledge, despite of many works on these models, each of them focuses on a specific system which linearly depends on either mean field terms from populations or mean field term from whole population. Models in a general setting, therefore, have not been considered. In this paper, we study a mean field model with multi-class in which the dynamic equation of each particle depends on mean field terms from both subclasses and the whole population. Unlike classical cases, our proposed problem is dealing with several empirical measures from different populations. As a result, we obtain the limiting equations as a system of McKean-Vlasov equations instead of a single one. We establish the existence, uniqueness, and some basic properties of the solutions to limiting McKean-Vlasov equations of the system and then obtain the rate of convergence of the sequence of empirical measures to their limits in terms of p th Monge-Wasserstein distance. The paper is organized as follows. In Section 2, we begin with an introduction to the model of multi-class weakly interacting diffusions, the McKean-Vlasov equations and the system of associated limiting equations. The main results are also presented in this section. For the sake of the exposition, their proofs are aggregated in Section 3 together with some auxiliary results. Finally, proofs of these auxiliary results are placed in the Appendix.
2. Formulation and main results.
2.1. N -particle multi-class weakly interacting system. Let d, K be positive integers and K = {1, . . . , K}. For each vector x ∈ R d let δ x denote the Dirac measure centered at x, i.e. δ x (A) = 1 if x ∈ A, and δ x (A) = 0 if x / ∈ A for any Borel subset A of R d . We consider the following mean field system with K different classes x where B 1 (·), B 2 (·), . . . are independent d-dimensional standard Brownian motions defined on a complete probability space (Ω, F, P), the parameter θ i ∈ K, i = 1, . . . , N , which indicates that the particle x (N ) i belongs to the jth population if θ i = j, and is the empirical distribution of particles in the whole population while, for each j, µ (N ) j (t) is the empirical distribution of particles in population j. It is easy to verify that Throughout the paper, we assume that for each j ∈ K the initial conditions x 0 i,j , i = 1, 2, . . . are independently identically distributed random vectors defined on (Ω, F, P), the collection of random vectors x 0 i,j , B i (·) : i = 1, 2, . . . ; j ∈ K is independent, and the parameters θ i , i ≥ 1, satisfy the equations Now we will introduce a distance between two probability measures. For each metric space (E, d E ) let P(E) denote the set of all probability measures defined on the Borel σ−field B(E) of E. For an E−valued random variable X, we use the notation L (X) to denote its probability distribution. That is, L (X) ∈ P(E). For each function f : In what follows, for convenience, we denote P = P(R d ). For x ∈ R d and A ∈ R d×m such that A = (a ij ), let |x| = √ x x denote the usual Euclidian norm of x, and |A| = max i,j |a ij |. For p ≥ 1 and measures µ and η in P, the Monge-Wasserstein distance W p is defined by where Γ(µ, η) denotes the set of all probability measures on R d × R d with marginals µ and η. In addition, denote the p th moment with respect to a measure µ by According to the Kantorovich-Rubinstein theorem, for p = 1, the W 1 distance possesses the following dual representation where µ, f = x∈R d f (x)µ(dx) and To proceed, we make the following assumptions on the functions b j (·, ·, ·, ·) : Assumption A. For some p ≥ 1 there exists a constant C > 0 such that for j ∈ K, t > 0, x ∈ R d , y ∈ R d ; µ ∈ P, η ∈ P, µ = (µ 1 , . . . , µ K ) ∈ P K , and η = (η 1 , . . . , η K ) ∈ P K , the following inequalities hold true.
We have the following proposition. For convenience, its proof is given in Appendix A.1.
ij | p < ∞, then the following assertions hold.
(i) The equation (1) has a unique solution.
(ii) There exists a constant C that depends only on p such that  (t) is the empirical distribution of particles in population K j , in view of (2), we would expect that, as N → ∞, the limit (µ(t), µ(t)) of (µ (N ) (t), µ (N ) (t)) satisfies the following limiting equations where B 1 (t), B 2 (t), . . . , B K (t) are d-dimensional standard Brownian motions, the initial value y 0 j has the same distribution as that of x 0 1,j for each j ∈ K, and the set y 0 j , B j (·) : j ∈ K is independent and defined on (Ω, F, P).
with the usual supremum norm and P t = P(C t ) be the set of all probability measures on C t . Let us also introduce the p th Wasserstein distance on P t as follows The McKean-Vlasov equation (7) can be rewritten in the following equivalent form where Y 0 = y 0 1 , . . . , y 0 K , Y (t) = (y 1 (t), . . . , y K (t)) , and

DUNG TIEN NGUYEN, SON LUU NGUYEN AND NGUYEN HUU DU
where Define Ψ and Φ the mappings which associate to each M ∈ P T the unique solution of the following equation and its law respectively i.e., Ψ(M ) = Y is a solution of equation (9) and also satisfy linear growth and Lipschitz conditions for any M (t) ∈ P((R d ) K ). Therefore, equation (9) has a unique solution and Ψ and Φ are well defined. Moreover, by a typical argument, we can Observe that if Y is a solution of equation (8) then its law is a fixed point of Φ, and conversely if µ is such a fixed point of Φ, equation (9) defines a solution of equation (8). We have the following lemma which asserts that Φ is a contraction mapping on P T . Its proof is aggregated in Appendix A.2.
where = max {p, 2} and C is a constant depending only on T .
Fix a measure M 0 ∈ P T . We consider the following recursive formula for all k ≥ 0. It follows from inequality (12) that {M k } k≥0 is a Cauchy sequence in complete metric space (P T , W p,T ). Therefore, this sequence converges to some is also a fixed point of Φ. This property of consistency gives a fixed point µ on C [0, ∞], (R d ) K , confirming the existence of solution of equation (7). The uniqueness of this solution also follows from Theorem 2.2. Thus, we have just established the following theorem.
Theorem 2.3. Assume that Assumption (A) holds. Then there exists a unique solution to the limiting equation (7).
The next proposition gives upper bounds for moments of solutions (y 1 , y 2 , . . . , y K ) of equation (7). Its proof is placed in Appendix A.3 to keep the presentation more transparent. Furthermore, where T = t + s and the constant C depends only on p.
The following theorem asserts the continuity of µ(t) in t with respect to the Monge-Wasserstein distance and provides a bound for W p (µ j (0), µ j (t)) for j ∈ K and t > 0. In order to achieve this, one needs an estimation for empirical measures of independent and identically distributed random vectors which requires the boundedness of higher-order moments of the initial values. More precisely, we shall assume that max j∈K E |y j (0)| q < ∞ for some q > p (see Theorem 3.1). The proof of the theorem below is postponed in Section 3.2.
Theorem 2.4. Let p ≥ 1. Assume that max j∈K E |y j (0)| q < ∞ for some q > p and that Assumption (A) holds. Then there exists a constant C such that for all s > 0 and t > 0, and for each j ∈ K, where T = t + s and the constant C depends only on p.

Limiting system and approximation in Monge-Wasserstein distance.
Let y j (t), 1 ≤ j ≤ K, be the solution of equation (7), . In this section we will consider the system of infinite particles associated to (1) described by following equations for all i = 1, 2, . . ., where the initial values x 0 i,θi and the Brownian motions B i (t) are as in equation (1). It can be shown that under Assumption (A), for each i ≥ 1, equation (15) has a unique solution. Note that since (B i (t), B j (t) : i = 1, 2, . . . ; j = 1, 2, . . . , K) are independent and identically distributed, (x 0 i,j , y 0 j : i = 1, 2, . . . ; j = 1, 2, . . . , K) are independent, and (x 0 i,j , y 0 j : i = 1, 2, . . .) are identically distributed for each j = 1, 2, . . . , K, we conclude that for all i ∈ K j , x i (t) and y j (t) have the same distribution µ j (t), i.e. L (x i (t)) = L (y j (t)) = µ j (t). Therefore, as a direct Next, we will estimate the rates of convergence of x (t) to x i (t) and µ j (t), respectively, in L p and W p senses. To do so, similar to the simpler case of approximating a distribution by empirical measures of independent and identically distributed random vectors (see Theorem 3.1), we need stronger conditions on the moments of the initial values. More precisely, we assume that max i,j E |x 0 ij | q < ∞ for some q > max {p, 2}. Define where and It is useful to notice that g 1 (p, q, N ) ≤ Cg 2 (p, q, N ) for 0 ≤ p < 2, and thus g 1 (p, q, N ) ≤ g(p, q, N ) for all p, q, N . The function h(p, q, N ) comes from an approximation of a distribution by empirical measures (see [14,Theorem 1] or Theorem 3.1). For simplicity, h(p, q, N ) is only defined for q = 2p if p ≥ d/2 and q = d/(d − p) if p < d/2 since some complicated terms involving logarithm functions will appear the two remaining cases as pointed out in [14]. Now we are in a position to state the following estimate the error in L p sense of the approximation of finite systems by the limiting one. Its proof is provided in Section 3.3.
Here g(p, q, N ) is defined as in equation (17).
The approximation in Monge-Wasserstein distance of empirical measures in finite systems by the limiting measure is given in the following theorem for which the proof is aggregated in Section 3.4. Theorem 2.6. Assume that Assumption (A) holds and max i,j E |x 0 ij | q < ∞ for some q > max {p, 2}. Then Cg(p, q, N ).
3.1. Auxiliary results. We provide the following lemma without its proof. One may refer to [14, Theorem 1] for more details. For q > 0 and η be a probability measure in P, we define M q (η) = |x|η(dx).
Lemma 3.1. Let η be a probability measure in P such that M q (η) < ∞ for some q > p > 0. Let η N be the corresponding empirical measure of a sequence of N independent, η-distributed, and R d −valued random vectors X 1 , X 2 , . . . , X N , i.e., Then there exists a constant C independent of N such that where h(p, q, N ) is defined as in (18) Besides, it follows from (3) and (18) where the constant C is independent of N .
It is useful to mention the following property of Monge-Wasserstein distance. Its proof is accumulated in Appendix A.4.

DUNG TIEN NGUYEN, SON LUU NGUYEN AND NGUYEN HUU DU
Lemma 3.2. For each 1 ≤ j ≤ K, let η j and j be probability measures and λ j and π j be nonnegative numbers such that K j=1 λ j = 1 and K j=1 π j = 1. Put η = K j=1 λ j η j and = K j=1 π j j . Then for all p ≥ 1, where C is a constant depending only on p and M p ( j ) is defined as in (4).

Remark 2.
It can be derived from Theorem 3.2 and its proof that (i) If λ j = π j for j ∈ K, that is, = K j=1 λ j j , then for all p ≥ 1, In order to prove Proposition 1 and Proposition 2, we need to use the following lemma which is a generalization of Gronwall's inequality. See Appendix A.5 for its proof. Lemma 3.3. Let I be an index set. For each i ∈ I, let (Z i (t)) t≥0 be a nonnegative stochastic process and V i (t) = sup 0≤s≤t Z i (s). Assume that for some constants p > 0, for all t ∈ [0, T ]. Then there exists a constant C that depends on K 1 , K 2 , p and q such that In particular, if q = 2 and p ≥ 1 then C < 8, and therefore,

3.3.
Proof of Theorem 2.5. Applying Cauchy's inequality, Hölder's inequality, and Burkholder-Davis-Gundy inequality on the following equation We will estimate two terms on the right hand side of inequality (30). First, by Assumption (A2) we have Put (r) be defined as in (21). In order to estimate W p µ (N ) (r), µ(r) , we will estimate W p µ (N ) (r), µ (N ) (r) and W p µ (N ) (r), µ(r) . It is easy to see that for each j ∈ K, {x i (r) : i ∈ K j } is an independent sequence of µ j (r)-distributed random vectors. By Remark 1, for each j ∈ K, In addition, equation (19) implies ν (N ) j N − ν j ≤ h ν (N ) for all j = 1, . . . , K. Thus, according to equation (3) and the fact max j∈K M p (µ j (r)) < ∞, we can derive from Theorem 3.2 that Besides, Theorem 3.2 (see (60)) also implies

DUNG TIEN NGUYEN, SON LUU NGUYEN AND NGUYEN HUU DU
Therefore, a combination of inequalities (32) and (33) gives A similar argument also yields It follows from inequalities (31), (34) and (35) that Next, in view of Assumption (A2) we get To proceed, we will estimate the last two terms in the right most hand side. We start with the last term. The second term then can be done in a similar way. If p ≥ 2 then Hölder's inequality and (35) imply If 0 ≤ p < 2 then Hölder's inequality, Jensen's inequality, and (35) yield Note that in the last inequality we have used the fact that for all i ∈ K j , the remains the same for each fixed j ∈ K. By using (34) instead of (35), we can obtain a similar estimate for the second term in the right most hand side of (37). Therefore, for all p ≥ 1, As a result, inequalities (30), (36) and (38) leads to 3.4. Proof of Theorem 2.6. In view of Theorem 2.5 and the fact that g 1 (p, q, N ) ≤ g(p, q, N ), inequality (34) yields This proves the first part of the theorem. To complete the proof, recall that and µ for each j ∈ K. By the triangle inequality, We can estimate A 3 by using Theorem 3.2 and Theorem 2.5 as follows Next, according to Theorem 3.1, we have the following estimate for A 4 Consequently, we can derive from inequalities (40) to (42) that which implies the second claim of the theorem.
Also, for X = (x 1 , x 2 , . . . , x N ) and Y = (y 1 , y 2 , . . . , For N ≥ 1, denote Then the system (1) can be rewritten in the form In virtue of Remark 2(ii), there exists a constant C = C(N ) such that, Therefore, Assumption (A) implies that b (N ) (·, ·) and σ (N ) (·, ·) satisfy the linear growth and Lipschitz continuity conditions. As a consequence, the stochastic differential equation (43) has a unique solution. This yields the existence and uniqueness of solution of (1). (ii) Now we will verify the second assertion. For all t > 0 and each i = 1, . . . , N , put By Holder's inequality, where C depends only on p. Recall that ϕ(x) = |x|. The definitions of µ (N ) (t) and µ Hence, we can write On the other hand, equation (7) gives An application of Hölder's inequality and Burkholder-Davis-Gundy inequality, in view of (45) and inequality (46), implies that for all i = 1, . . . , N , which, together with (44), leads to Since the above inequality holds true for any i = 1, 2, . . . , N , by taking the maximum with that index we arrive at Hence, This completes the proof.
A.5. Proof of Theorem 3.3. Let Z(t), t ≥ 0, be a nonnegative stochastic process and ρ > 0. Denote V (t) = sup 0≤s≤t Z(s). We claim that: For any C 1 > 0, there exists a constant C 2 > 0 such that for all t ∈ [0, T ]. In fact, we can choose Indeed, if ρ ≥ 1 then Hölder's inequality implies that with C 2 = T p/q−1 when p ≥ q and C 2 = p q 2K 2 (1 − p q ) q−p p when p < q. As a result, inequalities (22) and (63) show that Here, K 2 C 2 = K 2 T p/q−1 when p ≥ q and K 2 C 2 = p q 2(1 − p q ) q−p p K p/q 2 when p < q. Thus, for any 0 ≤ t ≤ T , sup i∈I E[V i (t)] p ≤ 2(K 1 + K 2 C 2 ) where constant C depends only on p and q. In particular, if q = 2 and p ≥ 1 then C < 8, and therefore, sup i∈I E[V i (t)] p ≤ 2δ exp 2K 1 T + 8K 2 T max{p/2,1} .
This completes the proof.