Does assortative mating lead to a polymorphic population? A toy model justification

We consider a model of phenotypic evolution in populations with assortative mating of individuals. The model is given by a nonlinear operator acting on the space of probability measures and describes the relation between parental and offspring trait distributions. We study long-time behavior of trait distribution and show that it converges to a combination of Dirac measures. This result means that assortative mating can lead to a polymorphic population and sympatric speciation.


1.
Introduction. Quantitative investigation of the inheritance of phenotypic traits has a long tradition, going back to the seminal paper [9]. Extensive information and references about classical models of inheritance and trait evolution can be found in [3]. Speciation is one of the most important topics of evolutionary biology. It is clear that one of the reasons of speciation is geographical isolation (allopatric, peripatric and parapatric speciation) but evolutionary biologists consider also more controversial sympatric speciation -the process through which new species evolve from a single ancestral species while inhabiting the same geographic region [13]. Such speciation is a product of reproductive isolation and the reasons of it can be disruptive selection (a natural selection which favors both extremes of the population) or assortative mating. Assortative mating appears when individuals with similar traits mate more often than they would if they were choosing a partner randomly. In this paper we present a simple model of the evolution of phenotypic traits with assortative mating and show mathematical results which can be used as an argument for sympatric speciation. The traits are assumed to remain unchanged during lifetime, and their examples are skin color, shape of leafs and shell pattern. The trait of an individual is described by a real number. We consider a hermaphroditic population, i.e., a population where every individual has both male and female reproductive system. An individual chooses a partner according to a preference function [8,10,14,18,22,23]. We allow also for self-fertilization or vegetative reproduction in the case when an individual has not found a proper partner.
Self-fertilization or vegetative reproduction occurs in most flowering plants, numerous protozoans, and many invertebrates. As a result of mating or self-fertilization, a new individual is born. In the main model we assume that the offspring trait is a random variable with values between the parental traits. This assumption seems to be artificial but it allows us to give a strict mathematical proof of the convergence of trait distribution to a combination of Dirac measures. In a real situation the range of the offspring trait can be wider which can lead to a multimodal limit distribution.
We consider here a discrete time population model, i.e., a model with a fixed interval between generations. Such models are appropriate when species have no overlap between successive generations. The model is described by an operator defined on the space of probability measures and this operator describes the relation between the distributions of traits in consecutive generations. Continuous time models of phenotypic evolution were considered in [4,8,21,24].
Although the biologists agree that assortative mating constitutes a key component of the speciation process (see [15], [19], references therein), it is rather difficult to give a precise proof of this conjecture. Some numerical results presented in the paper [8] suggest that in the case of assortative mating phenotypic profiles converges to multimodal limit distributions, which means that the population becomes polymorphic. The main goal of our paper is to show that the presented model has a similar property, and the speciation is achieved without mutation or external selective presure. Precisely, we prove that the distribution of traits converges to a combination of Dirac measures. A similar behavior was observed only in a few biological models, e.g., in a Rubinow maturity-structured type equation [16], in an asexual phenotype-structured models [12,17], and in a nonlinear age-structured model of semelparous species [20].
The scheme of the paper is the following. In Section 2 we introduce our model. Section 3 contains a theorem on the convergence to a combination of Dirac measures. The next two sections are devoted to the answer to the following question. Does there exist an initial measure µ supported on some interval such that the sequence of (P n µ) converges to a combination of at least two Dirac measures? In Section 4 we present an example which precisely answer this question. The numerical simulations suggesting that such behavior is rather typical are presented in Section 5. Also in this section we present some simulations when the offspring trait is distributed around parental traits and we show that the limit distribution can be multimodal. In the last section we comment our results and discuss problems for future investigation.

2.
Model. We consider a hermaphroditic population with sexual and asexual reproduction. We start with an individual-based model. We assume that every individual is described by a phenotypic trait x ∈ F , where F is a closed and bounded interval with non-empty interior. Consider a population which consists of n individuals with traits x 1 , . . . , x n . The process of sexual reproduction depends on a preference function ψ : F × F → [0, 1], which is any measurable function. We suppose that if individuals with traits x and y encounter, then they mate with probability ψ(x, y) assuming that the first one plays the role of male and the second female. We also assume that different individuals encounter randomly, thus, individuals of traits x i , x j , i = j, mate with rate and is a self-fertilization or vegetative reproduction rate. After mating/self-fertilization an offspring is born with probability 1, it survives till the next reproduction. We also assume that there is no overlap between successive generations. The above assumptions seem to be strong because in each generation, each adult individual is replaced by an offspring and the size of the population remains constant. We neglect here questions concerning control of the population size and intraspecific competition to make the model as simple as possible to be mathematically tractable.
We discuss more advanced models including these questions in the last section. Consider the case of sexual reproduction and let x and y be parental traits. We assume that the distribution of trait of the offspring is the random variable where ξ x,y is a random variable with values in the interval [−1, 1] such that where c is a constant independent of x and y. It means that the trait of the offspring is distributed between traits of parents, the expected offspring's trait is the parental mean trait and the offspring can have a different trait than parents. By K(x, y, dz) we denote the distribution of the random variable ζ x,y . In the case of self-fertilization the offspring has the same trait as the parent, which means that there is no mutation. By increasing the number of individuals to infinity, we obtain a nonlinear operator which describes the evolution of phenotypic distribution. Denote by M the set of all probability Borel measures on F . Let the measure µ be the distribution of traits in one generation. We denote by P µ the distribution of traits in the next generation. The operator P : M → M is given by the formula P µ(dz) = F F ψ(x, y)K(x, y, dz) µ(dx) µ(dy) In (5) the term F F ψ(x, y)K(x, y, dz) µ(dx) µ(dy) denotes the distribution of newborn individuals in the process of sexual reproduction while the last term is related to vegetative reproduction. The operator P can be also written in the following way 3. Convergence to a combination of Dirac measures. Denote by M δ,d the set of probability Borel measures on F which are convex combinations of Dirac measures δ x supported on sets of isolated points with distance at least d, i.e.

RYSZARD RUDNICKI AND RADOS LAW WIECZOREK
The proof is based on the method of moments (see e.g. Section 30 of [2]). The idea of the proof is similar to that used in [11,Theorem 4.2].
Proof. Without lost of generality we can assume that F ⊆ [0, ∞). Let µ ∈ M and h : F → R be a continuous function. Then where In particular, if h(x) = x, then h s (x, y) = 0 and, consequently, the measures µ and P µ have the same first moment.
and from (6) it follows that If h is a convex function, then Indeed, since ζ x,y has values between x and y we have and therefore because Eζ x,y = (x + y)/2. From (7) and (10) it follows that h s (x, y) ≤ 0 for all x, y ∈ F , and consequently, (6) gives For a positive integer k, we denote by m k (µ) the k-th moment of the measure We now take a µ ∈ M and consider the sequence µ n = P n µ, n ≥ 0. Since F is a compact set, this sequence is tight and, therefore, for every subsequence (µ n k ) there exists a further subsequence (µ n k(j) ) convergent weakly to some measure ν ∈ M. We check that ν ∈ M δ,d . Indeed, from (9) it follows that where a n = F F (x − y) 2 ψ(x, y) µ n (dx) µ n (dy).
From the above it follows that the series ∞ n=0 a n is convergent and, therefore, lim n→∞ a n = 0. Since the measure ν is the limit of some subsequence of (µ n ) we deduce that We claim that (12) and the assumption ψ(x, y) > 0 if |x − y| < d imply that ν ∈ M δ,d . Suppose, contrary to our claim, that ν / ∈ M δ,d . Then there exist ε > 0 and points (12) is positive, a contradiction. It remains to check that ν is unique. Since the sequence m k (µ n ), n ≥ 0, is decreasing and bounded from below by zero, each moment is convergent to some nonnegative number. Now, the uniqueness of the measure ν can be checked by using Theorem 30.1 of [2] which gives conditions under which a distribution is uniquely determined by its moments, but to make the paper self-contained we use another simpler argument. If two measures ν 1 and ν 2 from the set M δ,d have the same moments, then all moments the signed measureν = ν 1 − ν 2 are zero. Sincē ν = n i=1 c i δ xi , where c i ∈ R and x 1 , ..., x n are different real numbers, then the equations for moments j = 0, 1, . . . , n − 1 are following The matrix X = [x j i ] is a Vandermonde matrix and it has a nonzero determinant. Thus the system (13) has the unique solution c 1 = · · · = c n = 0. It means that ν 1 = ν 2 and ν is uniquely determined by its moments. Remark 1. An essential role in the proof of Theorem 3.1 is played by the assumption that the offspring trait has values between the parental traits -see assumption (3). Without this assumption we expect that the sequence (P n µ) converges weakly to some measure ν but ν does not belong to M δ,d . Some numerical simulations suggest that the limit distributions is multimodal. Now we study a special case of operator P which is related to assortative mating.
is a non-increasing function such that ϕ(r) > 0 for r ∈ [0, d) and ϕ(r) = 0 for r ≥ d. From Theorem 3.1 it follows immediately that if the length of the interval F is less than d, then the sequence of (P n µ) converges weakly to δ a , where a = F x µ(dx). Assume that the interval F is longer than d and the initial measure µ is of the form where supp µ is the topological support of µ and dist(A, B) denotes the distance between sets A and B. Then the sequence of (P n µ) converges weakly to ν = c 1 δ a1 + · · · + c k δ a k , where c i = µ i (F ) and a i = F x µ i (dx).

Example.
Having in mind results of the previous section one can ask the following question: Does there exist an initial measure µ supported on some interval such that the sequence of (P n µ) converges to a combination of at least two Dirac measures? Though numerical simulations suggest that such behavior is rather typical (see Section 5) it is not easy to give a precise mathematical proof.
In order to construct such an example we consider the operator P with ψ(x, y) = ϕ(|x − y|) and K(x, y, dz) = δ x+y 2 (dz), i.e., the offspring's trait is the parental mean trait's. We restrict the operator P to absolutely continuous measures with respect to the Lebesgue measure. If µ(dx) = f (x) dx then the measure P µ has also a density given by where we formally assume that f (x) = 0 for x / ∈ F . We consider further the case when F = [−1, 1] and ϕ(r) = 1 [0,1) (r). Then from (14) it follows that Let α 0 , η, andx be positive constants such that η ≥ 10, α 0 e η ≤ 1 16 , andx < 1 be such that 1 −x ≤ 1 4 e −η . Define a functional I : Let a density f 0 be an even function such that f 0 (x) ≤ α 0 e η|x| for |x| ≤x and I(f 0 ) ≤ 1 32 .
Using inductive arguments we show that P n f 0 (x) ≤ α n e η|x| for |x| ≤x, where the sequence (α n ) converges to zero as n → ∞ and, in consequence, the sequence (µ n ) of measures with densities P n f 0 converges weakly to a combination of two Dirac measures. Since P n f 0 are also even functions, it is sufficient to check (16) for x ∈ [0,x]. From (15) it follows that for any even density f . Now let f be an even density such that f (x) ≤ αe η|x| for |x| ≤x , I(f ) ≤ 1 16 , and α > 0 satisfies the inequality αe η ≤ 1 16 . We estimate the function P f from above in the interval [0,x]. We have where Sf (x) = 0 for x ∈ [0,x − 1/2] and Sf (x) = 2 1/2 x Since Now we show that We have Thus Since f (s) ≤ αe ηs for s ∈ [0,x] we have From inequality 1−x ≤ 1 4 e −η it follows that the expression in the square brackets is less than 1, which implies that (20) holds. Now let α n = 3 4 n α 0 . Since I(f 0 ) ≤ 1 32 and ∞ n=0 (1 + α n ) < 2 from (19) and (20) it follows that P n f 0 (x) ≤ α n e η|x| for (dz) and the preference function ϕ(r) = 1, for r < 1 and 0 otherwise. The initial function is similar to f 0 in Example.
x ∈ [−x,x] and for all n. Therefore, the sequence (µ n ) of measures with densities P n f 0 converges weakly to a combination of two Dirac measures.
Remark 2. The proof of convergence to a combination of two Dirac measures is quite technical. The assumptions on the initial function f 0 are rather restrictive in order to have precise and reasonably simple proof, but it is clear that other similar functions, which have two peaks more distant then 1 and less distant then two, will have the same behavior. To illustrate the shape of f 0 , in Figure 4, we show a simulation of evolution with the initial density given by f 0 (x) = Ce 10|x| , where C is a normalization constant. 5. Numerical simulations. In this section we present some interesting results of numerical simulations. The aim of these simulations is mainly to illustrate the fact, that for many initial distributions the sequence of iterations converges to a multimodal measure.

5.1.
Model with the offspring trait between the parental traits. We assume, as in Section 4, that ψ(x, y) = ϕ(|x − y|) and K(x, y, dz) = δ x+y 2 (dz) and we start from a measure that is absolutely continuous with respect to the Lebesgue measure with some density f 0 . Then the operator on densities has the form (14). In Figure 2, we consider three different preference functions: ϕ 1 (r) = 1, for r < 1 and 0 otherwise, which means that the individuals mate always if they are within the range of 1, and self-fertilization or vegetative reproduction occurs only for those which are more distant then 1; the second example is given by the so called beta distribution function ϕ 2 (r) = (r −1) 2 (r +1) 2 for r < 1, which resembles a truncated Gaussian function; the last one decreases faster: ϕ 3 (r) = (1 − r) 3 for r < 1. In all three cases the initial population's trait is uniformly distributed on the interval [−1.5, 1.5]. In the first case the limit measure is a single Dirac measure in the middle, whereas for the next two functions the limit consists of two separated Dirac measures.
In Figure 3, we can see four sequences of graphs showing evolution with the same preference function ϕ 2 (r) = (r − 1) 2 (r + 1) 2 for initial trait uniformly distributed on different intervals.
The question arises: how the limit measure depends on the spread of trait in the initial population. As a partial answer, in Figure 4 we present a kind of "bifurcation graph" in which we show the dependence of positions of Dirac measures in limit distributions on the width of the support of initial measure. (dz) and different preference functions ψ(x, y) = ϕ i (|x − y|) with ϕ 1 (r) = 1, ϕ 2 (r) = (r − 1) 2 (r + 1) 2 , and ϕ 3 (r) = (1 − r) 3 , respectively, for r < 1 and 0 otherwise. In all three cases the initial population's trait is uniformly distributed on the interval [1.5, 1.5].
In all the above simulations, the offspring trait is exactly the average of parental traits, which case is just the simplest one to simulate. Apparently, we could use any other distribution of offspring trait that satisfies the assumptions of Theorem 3.1. Actually, we have run some simulations with the offspring trait distributed uniformly between parental traits and the results were very similar, although the convergence to singular measures is a little slower and one need a bit wider initial distributions to obtain the same number of separate subpopulations.

5.2.
Model with a wider distribution of the offspring trait. Although in Section 2 we assumed that the trait of offspring is between the parental traits, the simulations show the interesting asymptotic behavior without this assumption. We consider the case when the trait of the offspring after sexual mating is distributed around the average of parents' traits with some fixed symmetric distribution, that do not depend on the parental traits. Let this distribution have a density κ, so K(x, y, dz) = κ z − x+y 2 dz. In this case, as before, if µ has a density f then the density of P µ is (dz) and ϕ(r) = (r − 1) 2 (r + 1) 2 , for r < 1 and 0 otherwise. The initial trait is uniformly distributed on the intervals of length 2.5, 3, 4.3, and 6 in subsequent rows.  The evolution of the trait distribution for this operator is shown in Figure 5. In all three cases, the preference function is the same, ϕ(r) = (r − 1) 2 (r + 1) 2 for r < 1, and the initial trait is uniformly distributed on an interval of length 4. The variation of offspring's trait from the parent's average has the beta distribution κ(r) = C a (r − a) 2 (r + a) 2 , for |r| ≤ a, where C a is a normalizing constant. We can see that if the spread of offspring trait is much narrower than the maximal mating distance, then the limit distribution is multimodal.
Finally, let us assume that the trait of all offspring, even those originating form self-fertilization, is distributed with some probability density. This may by caused by mutations. Such an assumption implies the following form of the operator on densities: Figure 6 suggests that in this case the width of offspring's trait distribution should be very small to obtain a multimodal limit distribution.
6. Discussion. In the paper we considered a very simple phenotype structured population model with assortative mating. The model is given by a nonlinear operator acting on the space of probability measures and describes the relation between parental and offspring trait distributions. We prove that if the offspring trait has values between the parental traits than the limit distribution of trait converges to a combination of Dirac measures and such a combination can be non-trivial (i.e. it is a combination of at least two Dirac measures). This result means that assortative mating can lead to a polymorphic population and sympatric speciation. We also present some numerical simulations which show how a uniformly distributed population evolves to become distinct species. We also provide some simulations in the case when the range of the offspring trait is wider the interval between parental traits. Then the limit distribution can be multimodal. Our model is rather simple and does not contain many important components. It will be not difficult to make it more realistic by adding birth and death rates which depend on the traits of mating individuals and the total size of the population m n = µ n (F ) or by considering a model with a competition kernel (see e.g. [5,6,8,12]). In the case when birth and death rates depend only on the total size, the properties of the model remain the same if we replace µ n by the trait profile µ n /m n . In the case when birth and death rates depend also on the trait of an individual, the trait distribution will concentrate around the local maximum of the reproduction rate. Assortative mating will sustain this phenomenon. It means that if we consider a model with both disruptive or competition selection and with assortative mating it will be difficult to decide which factor is responsible for sympatric speciation. If we consider a model only with assortative mating, then all traits are equally favorable and one can expect that the limit distribution will be uniform or will concentrate at the Dirac measures located in the mean of the initial distribution, but we show the picture is more complex. Moreover, one of the main biological arguments that assortative mating can have influence on the speciation process is that rare phenotypes experience less competition than common phenotypes [7]. On the other hand such mating can reduce the number of individuals with rare traits because they can have a problem finding partners, and therefore it seems sensible to consider a simplified model neglecting important ecological aspects as competition for resources and partners.
The proof of the convergence to a combination of Dirac measures is essentially based on the assumption that the offspring are distributed between the phenotypic values of the parents. It is clear that this assumption is an oversimplification and it should be replaced by, for example, that the offspring distribution is close to normal with a mean coinciding with the parents' mean. This problem is discussed in details in [1]. Simulation presented in Section 5.2 confirm that we expect in such a case multimodality of phenotype distributions. But in this case a serious mathematical problem appears: how to define the class of multimodal distributions, which additionally could explain sympatric speciation. It is why we only make a first step in this direction considering a model with such a strong assumption concerning offspring traits, but we hope to find a mathematical method of studying a model with different assumptions. It would be also interesting to investigate a continuous time model described in the last section of [21]. This model is more realistic but also more difficult to study. Till now we have only proved a version of Theorem 3.1 for this model but we are still unable to construct an example similar to that considered in Section 4.