DOMAIN CONTROL OF NONLINEAR NETWORKED SYSTEMS AND APPLICATIONS TO COMPLEX DISEASE NETWORKS

. The control of complex nonlinear dynamical networks is an ongoing challenge in diverse contexts ranging from biology to social sciences. To explore a practical framework for controlling nonlinear dynamical networks based on meaningful physical and experimental considerations, we propose a new concept of the domain control for nonlinear dynamical networks, i.e., the control of a nonlinear network in transition from the domain of attraction of an undesired state (attractor) to the domain of attraction of a desired state. We theoretically prove the existence of a domain control. In particular, we oﬀer an approach for identifying the driver nodes that need to be controlled and design a general form of control functions for realizing domain controllability. In addition, we demonstrate the eﬀectiveness of our theory and approaches in three realistic disease-related networks: the epithelial-mesenchymal transition (EMT) core network, the T helper (Th) diﬀerentiation cellular network and the cancer network. Moreover, we reveal certain genes that are critical to phenotype transitions of these systems. Therefore, the approach described here not only oﬀers a practical control scheme for nonlinear dynamical networks but also helps the development of new strategies for the prevention and treatment of complex diseases.


(Communicated by Qing Nie)
Abstract. The control of complex nonlinear dynamical networks is an ongoing challenge in diverse contexts ranging from biology to social sciences. To explore a practical framework for controlling nonlinear dynamical networks based on meaningful physical and experimental considerations, we propose a new concept of the domain control for nonlinear dynamical networks, i.e., the control of a nonlinear network in transition from the domain of attraction of an undesired state (attractor) to the domain of attraction of a desired state. We theoretically prove the existence of a domain control. In particular, we offer an approach for identifying the driver nodes that need to be controlled and design a general form of control functions for realizing domain controllability. In addition, we demonstrate the effectiveness of our theory and approaches in three realistic disease-related networks: the epithelial-mesenchymal transition (EMT) core network, the T helper (Th) differentiation cellular network and the cancer network. Moreover, we reveal certain genes that are critical to phenotype transitions of these systems. Therefore, the approach described here not only offers a practical control scheme for nonlinear dynamical networks but also helps the development of new strategies for the prevention and treatment of complex diseases.
1. Introduction. The control of complex networks has attracted enormous attention in recent years because of its wide applications in a variety of natural, social, economic, and man-made systems [37,43,12,47]. Despite significant advances in exploring the controllability of complex networks [37,66,20,63,46,55,65,36,11,56], certain limitations remain. For example, most of the methods that have been developed are for the control of networks hypothetically governed by linear dynamics.
A previous study showed that according to the control theory for linear systems, it is necessary to directly control approximately 80% of the nodes to achieve complete controllability in gene regulatory networks. However, such controls are usually infeasible and unnecessary because biological and medical experimental lines of evidence show that one or few driver nodes should be enough, particularly when designing drug targets in the biological and medical fields [37,42,63]. Therefore, the minimum number of driver nodes in a linear dynamical system is often overestimated.
In the real world, dynamical processes on complex networks are generally nonlinear. Therefore, the control of complex networked systems using nonlinear dynamics must be considered [66,8,36,11,29,58,67]. Although the controllability of nonlinear problems can be formulated using Lie brackets, such a control strategy is difficult to implement if the system is large and complex [48]. Although some progresses have been made in the development of specific control methods, such as chaos control [10], pinning control [50] and optimal control [53], these methods are limited to low-dimensional systems or certain particular situations. A breakthrough in the newly developed control approach is a realistic control framework, which can be used to direct a network to a desirable state by identifying compensatory perturbations [12]. A numerical and group representational framework has also been presented to show the effect of symmetries on the controllability of nonlinear networks [62]. More recently, a highly scalable algorithm has been presented to rationally identify parameter interventions that can control the response to noise in complex nonlinear networks by modulating the heights of the barriers separating different stable steady states [61]. In addition, the concept of the attractor network, in which the nodes are the distinct attractors of the nonlinear system, has been recently introduced to investigate the transitions between different attractors by applying parameter perturbations [57]. However, a general theoretical and computational framework for the actual control of complex networks using nonlinear dynamics has not been developed yet. A general mathematical framework for controlling complex nonlinear networks is difficult to define due to the complexity of the network structure as well as the extremely diverse nonlinear dynamical behaviors of systems [31,52]. Therefore, the problem of controlling complex nonlinear dynamical networks remains an outstanding challenge.
Indeed, many realistic networks, particularly networks representing pathogenesis of complex diseases, may not require the detailed control of all state variables during the transition from any initial state to any desired final state, which prompts us to explore the development of a practical framework for controlling nonlinear dynamical networks based on meaningful physical and experimental considerations. A common feature of nonlinear dynamical systems is the emergence of a number of distinct attractors [30,52,25,26]. The performance and functions of the system are often determined by a particular attractor. Because multiple types of stable steady states (attractors) can occur, as has been observed in complex disease networks, such as cancer, tumor, and diabetes II, there are certain states that may correspond to healthy (desirable) states under normal conditions and a few other states that correspond to disease states or undesirable states [1,5,22,13,35,34]. Potential curative interventions are those that can steer the system from a disease or pre-disease state (i.e., domain of attraction of an undesirable state) to the domain of attraction of the normal state [12,1,3].
Inspired by these biological problems, we propose a new concept for domain control of a nonlinear dynamical network, i.e., a method that steers a nonlinear network from one domain of attraction to another domain of attraction. Our goal is to develop a new mathematical theory and computational approach to realize domain controllability. We then use several realistic networks to test our theory and approaches. Notably, we reveal newly meaningful strategies for controlling the fate of living cells. These results are an important step toward the realistic control of complex networked systems with nonlinear dynamics.
2. Problem formulation. The dynamics of complex networks can often be represented by a set of nonlinear ordinary differential equations (ODEs). Without loss of generality, we consider an n-node network whose n-dimensional dynamical state y(t) is described asẏ (t) = F (y(t)), (1) where F (y(t)) ∈ R n is a nonlinear function, which is continuously differentiable with respect to y. The corresponding nonlinear control system can be specified aṡ where x(t) ∈ R n is the state vector, u(t) ∈ Ω ⊆ R m is the control signal, which is continuous on the interval [t 0 , t f ], Ω is a compact convex set and f (x(t), u(t)) ∈ R n is continuously differentiable with respect to x and t. It should be noted that system (2) is exactly system (1) when u = 0. Throughout this paper we make the following assumptions.
Assumption 1. The nonlinear system (1) exhibits at least two distinct stable steady states (attractors) In addition, D 1 and D 2 are the domains of attraction for attractors x 1 * and x 2 * , respectively. (2) is Lipschitz continuous with respect to x and u; thus, there exist two constants L 1 , L 2 > 0 such that (3) Assumption 3. Nonlinear system (1) is dissipative, i.e., there exist large positive constant C and time T such that its solution x(t) satisfies x(t) ≤ C for any initial state x 0 ∈ R n and t ≥ T. In other words, any solution of nonlinear system (1) eventually is bounded, which is actually true for biological systems in practice.
A schematic illustration of domain control is presented in Fig. 1. Our goal is to develop a new theoretical and computational framework to quantify the domain controllability of nonlinear dynamical networks and demonstrate the application of domain controllability analysis to identify potential drug targets during disease occurrence and progression. To address these problems, we consider the following control system ẋ(t) = F (x(t)) + Bu(t), where x(t) ∈ R n is the state vector, x 0 is in the attraction domain D 1 and B ∈ R n×m (m ≤ n) is the control input matrix in which each column is a n-by-1 identity matrix. If the j-th column of matrix B is the i-th column of n-by-n identity matrix, it means that the j-th input control signal is directly acted on variable i.

3.
Theoretical results. Without loss of generality, we assume that the first m variables are selected and controlled, i.e., the sub-matrix consisting of the first m rows of B is an identity matrix. For convenience to prove the following Theorem 3.1, we use the following notations. where Theorem 3.1. Suppose that Assumptions 1, 2 and 3 hold. Then the nonlinear system (5) is domain controllable; that is, there exists a control function u(t) ∈ Ω ⊆ R m (m ≤ n) such that the corresponding solution x(t) of system (5) satisfies lim t→+∞ Proof. The proof includes two main steps. The first step is to determine the index set of control variables (nodes) that include m variables and another is to design the control u(t) and prove the system (5) can be controlled. First, we describe how to determine the index set of control variables (nodes) that include m variables.
Let K be an index set (possibly empty set) of variables. The purpose of K is to screen out the variables whose partial derivatives are not constantly negative. Specially, if there exists a time t and a state x(t) ∈ R n such that the following inequality ∂F r (x(t)) ∂x r ≥ 0 holds, then r ∈ K. Let Q be an index set of variables containing q elements. We relabel the indexes of variables of the system (5) such that Q = {1, · · · , q}, where q is the smallest positive integer satisfying where C = Q K = {1, · · · , m} is the index set of control variables (nodes), N C = {m + 1, · · · , n} is the index set of variables that do not need to be controlled and N (p) is the index set of variables which are in the p-th equations of the system (5) but exclude the index p.
It is obvious that m ≤ n. Second, we can design the control signal as follows where λ < 0. Assumption 3 implies uniform boundedness of x(t) for all t ≥ t 0 . This implies uniform boundedness of u(t). Thus u(t) ∈ Ω ⊆ R m .
Recall that the first m rows of B constitutes an identity matrix. Substituting (8) into the nonlinear control system (5), we havė LetF and Recall that Assumption 3 holds and that F (x(t)) is continuous. Combining (9), (11) and the definition of K, we have Combining (13) and (14) gives Similar to Fiedler et al. [18], we use the mathematical induction method to prove First, the statement given in (16) obviously holds when i = 1. Second, we assume that (16) holds for all i < k. At last, we will show that (16) also holds when i = k. Let T ∈ R n . According to the generalized mean value theorem, we havė (17) can be rewritten aṡ where Assumption 3 implies uniform boundedness of x(t), and x(t)−θϕ(t) for all t ≥ T. SinceF (x) is continuously differentiable with respect to x, then ∂F k ∂xj (x(t) − θϕ(t)) is uniform bounded for a given j. This implies uniform boundedness of β j (t) based on the Property 6 of Integrals in [51]. Recall that (15) holds, according to the Property 5 of Integrals in [51], then α k (t) < 0 for all t ≥ t 0 . Thus, for all t ≥ T , there exist constant α 0 and β 0 such that Using the variation of constants method, the solution of (18) can be written as

DOMAIN CONTROL OF NONLINEAR NETWORKED SYSTEMS 2175
Combining (19) and (20) yields Obviously, lim t→+∞ |ϕ k (t 0 )| e α0(t−t0) = 0. The integral terms on the right hand of (21) can be rewritten as According to the induction hypothesis, we have lim Thus, e α0s |ϕ j (t − s)| is bounded for a given s. Using the Lebesgue's dominated convergence theorem [7], we have Thus lim Remark 1. There are two fundamental questions in the control of nonlinear dynamical networks, that is, how to identify driver nodes and design the control signals to achieve the control goal for a dynamical system. In the proof of Theorem 3.1, we established a strategy for identifying the driver nodes that are theoretically required to ensure the system domain controllable. Algorithm 1 presents detailed steps for identifying the theoretically required nodes (section 4.1.1). In addition, we also designed a general form of control functions (8) that depend on the dynamics and the desired state of the system for realizing domain controllability.
Remark 2. According to our theory and approaches, it is easy to find that the number of driver nodes is much less than the number of nodes for biochemical networks, i.e., m n. On the one hand, biochemical processes take place usually involving the degradation of molecules (e.g., genes and proteins). Thus, degradation terms (e.g., −d i · x i ) generally appear in the mathematical models of biochemical networks, suggesting that the partial derivative of F i with respect to x i is usually negative; that is, K is an empty set based on (6). On the other hand, signaling networks are characterized by their upstream and downstream signaling pathways and cascade to shape a uniquely defined response to the initial stimulus [27,44]. Thus, based on the method of determining the set Q, we can control the dynamics of the system as long as we control these extracellular signaling molecules in signaling networks. Altogether, the number m of the control set C = Q K may be much smaller than the number n of nodes in the node set of biochemical networks.
Remark 3. In biological models, the set K has obvious biological meanings. If there are not any self-feedbacks in all variables, then the set K is empty. Otherwise, the variables that have self-feedbacks will be selected into the set K when (6) holds. For a real biological system, very few variables have self-feedbacks. In Appendix A, we give two simple but famous biological examples to explain how to obtain the set K and show the biological significance of the selected set K.
Remark 4. The set Q can be determined according to (7). The main idea of determining the set Q is that a set of variables are selected such that the dynamical equations of all of the remaining variables can be expressed as explicit functions of these selected variables by successive evaluation of their dynamical equations. According to (7), when p = m + 1, N (m + 1) ⊆ C means the (m + 1)-th equation only contains the variables in the control set C. In other words, the (m + 1)-th equation can be expressed as an explicit function of the controlled variables. When p = m+2, N (m+2) ⊆ C {m+1} means the (m+2)-th equation only contains the variables in the control set C and the variable x m+1 , suggesting that the (m + 2)th equation can be expressed as an explicit function of the controlled variables and the variable x m+1 . Therefore, the (m + 2)-th equation can also be expressed as an explicit function of the controlled variables. By this analogy, when p = n, N (n) ⊆ C {m + 1, · · · , n − 1} means that the (n)-th equation only contains the variables in the control set C and the variables {x m+1 , x m+2 , · · · , x n−1 }. Therefore, the dynamical equations of all of the remaining variables can be expressed as explicit functions of these controlled variables. To clearly show the relationships between the controlled variables and the remaining variables, we give a schematic diagram in Fig. A3. In addition, to further illustrate the definition of the set Q, two simple examples are given in Appendix A.
Remark 5. Because Theorem 3.1 provides a sufficient condition for the domain controllability of a system, the number of the calculated driver nodes based on this Theorem may be overestimated. Therefore, we numerically investigate whether a less number of driver nodes can also achieve domain controllability (section 4.1.2). To distinguish the differences of these two approaches, we denote them as theoretically and practically required driver nodes, respectively. 4. Algorithms for solving the domain control problem.

4.1.
Identifying driver nodes for realizing domain controllability.

4.1.1.
Theoretically required driver nodes. In the proof of the Theorem 3.1, we established a strategy for identifying the driver nodes that theoretically need to be controlled. The set of driver nodes equals the union of the sets K and Q, which can be determined using (6) and (7), respectively. Here, we gave an intuitive explanation and a simple algorithm of how to select the driver nodes. Firstly, we ignored the variables that are not influenced by any other variables. The dynamics of these variables converges to the stable steady states and does not contribute to the diversity of attractors of the system. We inductively repeated this procedure until there was no such variables. Secondly, the main idea of determining the set Q is that a set of variables was selected such that the dynamical equations of all of the remaining variables can be expressed as explicit functions of these selected variables by successive evaluation of their dynamical equations. Therefore, if we appropriately control the dynamics of these selected variables, we can drive the system to the domain of attraction of the desired attractor. Thirdly, the minimal set of such selected variables is exactly the set Q in our proof. In this study, we employed an exhaustive algorithm to search the set Q. More specifically, we presented the following algorithm to identify the theoretically required driver nodes.

Algorithm 1 Identifying the theoretically required driver nodes
Step 1. Determine the set K according to (6).
Step 2. Ignore the variables that do not be influenced by any other variables as described above.
Step 3. Select a subset of variables, denoted by Q.
Step 4. If the dynamical equations of all of the remaining variables can be only expressed as explicit functions of the variables in Q, then stop the iteration and return C = Q K as the set of driver nodes. Otherwise, go to the next step.
Step 5. Repeat Steps 3 and 4 using the exhaustive search of the candidate set by looking first at smaller sets.

4.1.2.
Practically required driver nodes. The Algorithm 1 for identifying the theoretically required driver nodes is based on the proposed method for finding the set C = Q K of driver nodes in the proof of Theorem 3.1. The purpose is to demonstrate the correctness of our theory. However, the purpose for finding practically required driver nodes is to investigate whether a less number of driver nodes can also achieve domain controllability because the targets are very few in biological and medical networks. Therefore, the computational method for practically required driver nodes is very simple. That is, we select one node from the set of theoretically required driver nodes and use the designed control function (8), then perform the numerical simulation to determine whether the trajectory of the control system (5) can dynamically evolve to the desirable attractor using the designed control error (24). If the system cannot be controlled by controlling one node, then we select two nodes from the set of theoretically required driver nodes and repeat the simulations, and so on. In addition, for low-dimensional problems, we also test each variable (node) in the dynamical system to observe its domain controllability.

4.2.
Determining the attractors of a nonlinear dynamical system. In this study, we combine the Runge-Kutta method and trust-region method to determine the attractors (steady states) of an n-dimensional nonlinear dynamical system such as that given in (1). It is impossible to enumerate all of the possible attractors of a high-dimensional nonlinear dynamical system, but we can confirm the minimum number of attractors by performing many rounds of sufficient sampling. The algorithm for determining the steady states of the nonlinear dynamical system (1) was presented in Algorithm 2.

4.3.
Finding the domains of attraction. In this study, we estimate the domains of attraction using the Latin hypercube sampling (LHS) method. As biological systems are often bounded, to measure the domains of attraction more accurately, we find the domains of attraction in the positively invariant set of a dynamical system. A positively invariant set ∆ is a subset of the state space of a dynamical system with the property that, if any solution to the dynamical system with initial conditions x(0) ∈ ∆ satisfies x(t) ∈ ∆ when t ≥ 0. The procedure is presented in Algorithm 3.

Numerical illustrations and biological implications.
In this section, we numerically illustrate the effectiveness of our proposed theoretical and computational frameworks as well as their biological implications. Network controllability Algorithm 2 Determining the attractors of the nonlinear dynamical system Step 1. Let P and A be the phase space and the set of attractors of the dynamical model (1), respectively. Given the total number of iterations N , termination tolerance ε, set iteration i = 1 and A = Ø. Step 2. Randomly generate an initial state x i 0 ∈ P. Integrate system in (1) over a long time using the classical fourth-order Runge-Kutta method and let Step 3. Start at the point x i (t f ) and again try to solve the nonlinear equations F (x) = 0 using the trust-region method, which return the solution of the nonlinear equation, again denoted by x i .
Step 5. If i < N , update i = i + 1 and return to Step 2.

Algorithm 3 Finding the domains of attraction
Step 1. Let D be the domain of attraction of an attractor (denoted by x * ) of the dynamical model (1). Given the total number of iterations N , termination tolerance ε, set iteration i = 1 and D = Ø. Step 2. Randomly generate a sample point x i 0 ∈ ∆. Integrate system in (1) over a long time by taking x i 0 as the initial state and let x i (t f ) be the solution x i evaluated at the terminal time point t f .
Step 3. Start at the point x i (t f ) and again try to solve the nonlinear equations F (x) = 0 using the trust-region method, which return the solution of the nonlinear equation, again denoted by x i .
Step 5. If i < N , update i = i + 1 and return to Step 2.
analysis has recently emerged as powerful tools for the identification of potential drug targets of complex diseases [64,54]. The identification of critical genes during disease progression is providing new insights into the transitions of cellular phenotypes and possible therapeutic interventions. To demonstrate effectiveness of using domain controllability analysis as a general tool to discover novel disease genes and potential drug targets, we apply our proposed approach to three realistic diseaserelated networks: the epithelial-mesenchymal transition (EMT) core network [41], the T helper (Th) differentiation cellular network [39] and the cancer network [33]. We use the root-mean-square error (RMSE) to measure the quality of the domain control process. The control error between the control trajectory and the desired attractor at time t is defined as where x i (t) and x * 2i is the i-th component of the state vector of the control system (5) and the desired attractor, respectively. n is the dimension of the system (1).

5.1.
Domain control of the EMT core network. In this paper, we first used a simple EMT core network, as shown in Fig. 2A, to illustrate the effectiveness of our theory and the power of domain controllability analysis in identifying key genes that control the initiation and progression of EMT.
Based on the law of mass action and Michaelis-Menten kinetics, we built an ODE model of this network. The associated dynamics followṡ where x i (i = 1, 2, 3, 4, 5, 6) represents the activity level of CDH1, SNAI1, ZEB1, ZEB2, miR-200 and miR-203, respectively. K ij denotes the regulatory strength from node j to i and d i is the degradation rate of gene i. The parameters used to perform numerical simulations are in the following: As a preparation, we first employed Algorithm 2 to identify the attractors of this model. As shown in Table 1, we confirmed the existence of two attractors in this network, which correspond to the epithelial and mesenchymal states. This result is consistent with previous findings [41]. Second, to find the domains of attraction of this model, we give the positively invariant set of this model, as shown in Table  2. Applying Algorithm 3 to this model, we uniformly took 10,000 samples (N = 10, 000) from the positively invariant set and obtained the domains of attraction. Because it is impossible to show domains of attraction in high dimensional space, we projected the domains onto two-dimensional planes. The domains of attraction of this model are depicted in Fig. C1(see Appendix C).  Table 2. The positively invariant sets of the mathematical models of the three disease-related networks.

Networks
Positively invariant sets 3,4,9,11,12,13,15,16, j = 8, 10, 23 We take this network as an example to describe how to identify the driver nodes using our established strategy (section 4.1.1). First, based on the dynamical equations of this network (25), it is clear that there does not exist an index of a variable satisfying (6), i.e., K is an empty set. Second, we rewrote the dynamical equations (25) as the following form: where G i (x j ) and D i (x i ) indicate the terms related to other variables (j = i) and itself (x i ) in the i-th equation, respectively. By successive evaluation of the dynamical equations, we obtained the following equivalent system: As shown in (27), the dynamical equations of the remaining variables (i.e., x 1 , x 2 , x 3 , x 4 ) are explicit functions of selected variables (i.e., x 5 , x 6 ). Therefore, the dynamics of the remaining variables is determined uniquely by that of the selected variables, suggesting that the control of x 5 and x 6 can indeed steer the system (25) to evolve to the desired attractor. In addition, according to (26), it can easily be shown that Finally, we relabel the variables in accordance with the order {x 5 , x 6 , x 2 , x 3 , x 4 , x 1 }, then (7) holds.
The above analysis indicates that we can drive epithelial cells to transition into mesenchymal cells and back by controlling two genes, i.e., miR-200 (x 5 ) and miR-203 (x 6 ). Based on the designed control function (8), we can obtain the control function for the theoretically required driver nodes (i.e., miR-200 and miR-203), which is presented in Table 3. Based on this control function, we simulated the dynamics while controlling these two genes. The node state trajectories x(t), the control function u(t) and the control error e(t) are depicted in Fig. C2 (see Appendix C), which indicates that the control errors remained very small (e(t) < 10 −3 ) with the time evolution, suggesting that our approach accurately controls the network dynamics. Furthermore, we randomly selected 10,000 points from the domain of attraction of the undesirable state and found that 100% control eventually caused the network to evolve to the desired target state using our method.
To show how the parameter λ in our designed control function influences the control process, we perform numerical simulations by changing the value of λ. Fig.  C3 (see Appendix C) shows that a higher absolute value of λ indicates a less time the system takes to evolve to the desirable attractor. However, there is no significant influence when the absolute value of λ is larger than a certain value. When we perform numerical simulations for the three disease-related networks, the parameter λ is set to be −0.2.
To identify the practically required driver nodes (section 4.1.2), we tested each of the six nodes and found that controlling any one of the four genes (SNAI1, ZEB1, ZEB2 and miR-203) can drive the system to switch from the epithelial state to the mesenchymal state, and controlling any one of the two genes (SNAI1 and miR-203) can drive the system to transition back. Fig. 2B summarizes the practically required driver nodes (genes) for inducing transitions between the epithelial and mesenchymal states. Given an initial point in the domain of attraction of the epithelial state, we simulated the dynamics while controlling a single node. Based on the designed control function (8), we can also obtain the control function for   Table 4. An example of the practically required driver nodes and designed control functions for realizing domain control of the three disease-related networks. E and M represent the epithelial state and mesenchymal state, respectively. C, A and N represent the cancer, apoptosis and normal states, respectively. x * 2,i is the i-th component of the state vector of the desired attractor.

Transitions
Drivers Control functions (u(t)) AKT − a(s 11 x 11 n +s 12 x 12 n +s 13 x 13 n ) 3S n +s 11 x 11 n +s 12 x 12 n +s 13 x 13 n +  the practically required driver nodes (e.g., miR-203), which is presented in Table  4. Based on this designed control function, Fig. 3 shows the results obtained while controlling node miR-203 and the corresponding time evolution of the genes and the control function. The control error results show that we were able to accurately steer the system from the epithelial state to the mesenchymal state. We observed that we could drive the network to the desired state by controlling any one of the three genes SNAI1, ZEB1 and miR-203, which is in line with the previous findings that these genes are important regulators in the EMT [41,32], suggesting a potential application of our approach to uncover key genes in the EMT program.

5.2.
Domain control of the Th differentiation cellular network. Next, we applied our proposed theoretical and computational frameworks to a core regulatory network controlling the differentiation of Th cells, which consists of 23 nodes (Fig. 4). Starting with the discrete model presented in a previous study [17], we constructed an ODE model (see (B1) in Appendix B). Applying Algorithm 2 to this model, numerical simulations show that there are three attractors, which correspond to the Th0, Th1 and Th2 phenotypes ( Table 5). The patterns of activity of these attractors are qualitatively consistent with those identified in a previous study [17]. In addition, we obtained the corresponding domains of attraction by using Algorithm 3, which are shown in Fig. C4 (see Appendix C). Applying our theory to the Th differentiation cellular network leads to the findings that controlling two genes (GATA-3 and STAT1) can force the system to perform transitions among the three phenotypes in the Th differentiation process. The designed control function for these two genes was shown in Table 3. Furthermore, a single driver node (T-bet, IFN-γ, IFN-γR, JAK1 or STAT1) was observed when attempting to identify the practically required driver nodes of this network for the transition from Th0 to Th1, and another driver node (GATA-3, IL-4, IL-4R or STAT6) was identified for the transitions from Th0 to Th2 and Th1 to Th2. Fig.  4B summarizes the practically required driver nodes needed to achieve domain control of the transitions from Th0 state to the Th1 and Th2 states. Using the designed control function (Table 4), Fig. 5 shows that the control of one of the two driver Table 5. The attractors of the Th differentiation cellular network. The three observed attractors (A, B and C) correspond to the Th0, Th1 and Th2 phenotypes [17], respectively.  Th0 to Th1 and Th0 to Th2, respectively. The driver node is T-bet when Th0 is steered to the Th1 phenotype (Th0→Th1) and GATA-3 when Th0 is driven to the Th2 phenotype (Th0→ Th2).
nodes, T-bet and GATA-3, can indeed steer a cell's differentiation from Th0 to either Th1 or Th2. The simulation results for the transition from Th1 to Th2 when controlling GATA-3 are shown in Fig. C5 (see Appendix C). These findings are consistent with the experimental results that stimulating the system with either Tbet or IFN-γ results in differentiations from Th0 to Th1, and with either GATA-3 Figure 6. Diagram of the cancer network. This network is redrawn from the reference [33], including 32 nodes (genes) and 111 edges (66 activation interactions and 45 repression interactions). The network mainly includes four types of marker genes: apoptosis marker genes (green rectangles), cancer marker genes (red rectangles), tumor repressor genes (light blue rectangles) and other genes (blue rectangles).The red arrows represent activation and the green short bars represent repression.
or IL-4 leads to differentiations from Th0 to Th2 [39,23,24]. These results suggest that the driver genes revealed by domain controllability analysis may play a vital role in the transitions among these three phenotypes during the Th differentiation process.

5.3.
Domain control of the cancer network. Finally, we applied our proposed theoretical and computational frameworks to a cancer gene regulatory network, as shown in Fig. 6. The corresponding ODEs describing the dynamics of the underlying system were constructed in a previous study [33]. This system describes the expression level of 32 genes (nodes) and 111 regulations (edges) ((B2) in Appendix B). Applying algorithm 2 to the constructed mathematical model, we observed three attractors characterizing three important biological states: normal, cancer and apoptosis (Table 6). In addition, we obtained the corresponding domains of attraction by using Algorithm 3, which are depicted in Fig. C6 (see Appendix C).  By applying the strategy for identifying the theoretically required driver nodes (Algorithm 1) to the cancer network, we revealed that control of the nine genes (e.g., P53, RB, AKT, EGFR, HIF1, CDK2, CDK1, BCL2 and NFκB) is sufficient to induce transitions among the cancer, normal and apoptosis states. The designed control function for these genes was shown in Table 3.
Furthermore, we tried to identify the practically required driver nodes for inducing transition from one domain of attraction to another in this network (section 4.1.2). Strikingly, a single driver node, such as RB (a tumor suppressor gene) and CDK2 (an oncogene), was detected by the application of our designed control function ((8) and Table 4) to this network for induction of the transition from the cancer to normal states, and another driver node, such as AKT (an oncogenes), PTEN (a tumor suppressor gene) and NFκB, was observed to cause the transitions from the cancer to apoptosis states. Given an initial point in the domain of attraction of the cancer state, we simulated the dynamics when one of the two genes AKT and RB was controlled. Fig. 7 shows the time evolution of the key genes and the control functions. The control errors show that we can accurately steer the system to switch from the cancer state to the apoptosis state and the normal state by controlling one of these genes, respectively.
Moreover, our study also revealed that only a single node, namely MDM2, RB, P53, CDK2 or CDK4, is needed to drive the normal state to the cancer state and that another driver node, namely AKT, PTEN or NFκB, is sufficient to switch the system from the normal state to the apoptosis state (see Fig. C7 in Appendix C). We took the driver nodes RB and NFκB as examples to show the designed control functions for achieving the transitions from the normal state to the cancer and apoptosis states, respectively (4). In addition, controlling any one of the six genes, including NFκB, AKT, PTEN, VEGF, HGF and HIF1, can cause the system to switch from the apoptosis state to the normal state and controlling AKT can induce an apoptosis-to-cancer transition (see Fig. C8 in Appendix C). Taken together, our results revealed that AKT, RB, PTEN, NFκB, CDK2 and VEGF play critical roles in the transitions among the cancer, normal and apoptosis states and may be possible targets for preventing cancer occurrence or transforming the early cancer state back to the normal state, which is consistent with findings that the repression The black dashed lines with star markers indicate the cancer state, i.e., the undesired state with no control. The blue solid lines with squares and red solid lines with filled circles represent transitions from cancer (C) to apoptosis (A) (denoted by C→A) and cancer (C) to normal (N) (denoted by C→N), respectively. The driver node is AKT when cancer is steered to the apoptosis and RB when cancer is driven to the normal state.
of key oncogene AKT will inhibit the formation of cancer state [21,59] and that PTEN and NFκB play an integral role in preventing the onset and progression of numerous cancers [49,9], implying that control of these genes can attenuate cancer by inducing a cancer-to-normal transition or cell apoptosis. Fig. 8 summarizes the practically required driver nodes needed to achieve domain control of the transitions among the cancer, normal and apoptosis states found using our approach.

Biological implications.
5.4.1. New systems biology approaches to reveal the underlying mechanisms. Network based systems biology approaches have recently emerged as powerful tools for the study of complex diseases [6] because these promising approaches allow us to take a more systems-level view of a biological process and thereby identify multiple factors (such as genes, proteins and their regulations) responsible for maintaining different cellular phenotypes and driving different cell fate decisions. In this study, using a novel network control framework, we identified the key proteins and their Figure 8. Domain control strategies for phenotype transitions of the cancer network. The practically required driver nodes (genes) for realizing domain controllability of this network were presented. For example, the normal and apoptosis states can be induced from the cancer state by any one of the two genes (RB and CDK2) and any one of the three genes (AKT, PTEN and NFκB), respectively.
interactions in biological networks that determine the transitions between normal and disease states. For example, in the EMT network that is established during developmental and disease processes, we identified that any one of the four genes (SNAI1, ZEB1, ZEB2 and miR-203) can induce the system to switch between the epithelial and mesenchymal states (Figs. 2B and 3). Interestingly, results from biological experiments have demonstrated that SNAI1 and ZEB1 can promotes tumorigenicity and miR-203 functions as a tumor suppressor in basal cell carcinoma [60,15,28], suggesting that these genes are crucial for the EMT program and dysregulation of them may result in cancerization. In the Th differentiation cellular network, we found that T-bet is needed for the transition from Th0 to Th1 and that GATA-3 is needed for the transition from Th0 to Th2 (Figs. 4B and 5), which agrees with previous reports [23,24]. Importantly, we determined that one of three genes, namely IFN-γR, JAK1 and STAT1, is required for the transition from Th0 to Th1 and that one of the two genes IL-4R and STAT6 is required for the transitions from Th0 to Th2 and Th1 to Th2, respectively, which are findings that have not previously been reported. In the cancer network, our results uncovered that activation of RB(a tumor repressor gene) activity and AKT(an oncogenes) activity can drive the system from the normal state to the cancer state and the apoptosis state (Fig. C7), respectively, providing possible mechanisms for cancer formation. Taken together, these results predict that disease-associated genes and proteins. Therefore, our control framework provides a novel approach for uncovering the underlying mechanisms through which proteins and their interactions lead to diseases.

5.4.2.
Network-based drug targets for complex diseases. Complex diseases are often regulated by the underlying biological networks, which can exhibit complex nonlinear dynamical behaviors. With the increase of network structure and dynamics being uncovered, the design of network-based drugs may be feasible and thereby provide an attractive strategy to potentially treat complex diseases such as cancer [2,14,45]. In our analyses, we looked at diseases as an undesirable network state and reframed the goal of drug therapy toward shifting the network back to a normal state. In the cancer network, we demonstrated that NFκB is sufficient to switch the system from the apoptosis state to the normal state and that AKT is sufficient to induce the transition of the system from the apoptosis state to the cancer state (Fig. C8). More importantly, our results provide certain predictions regarding which genes in the cancer network are critical to the transitions from the cancer to the normal and apoptosis states. We revealed that a single driver node, such as RB or CDK2 (an oncogene), can induce the transition from the cancer state to the normal state and that another driver node, such as AKT, PTEN (a tumor suppressor gene) or NFκB (a critical link between inflammation and cancer [9]) can induce cancer cell death (Figs. 7 and 8). These results may provide two rational drug designs of anti-cancer strategies, namely returning the early cancer state back to the normal state or inducing the death of cancer cells, for the prevention of cancer initiation and progression by perturbation of the activities of these genes through experimental techniques. Taken together, the key nodes and their dynamics identified in this study can be used to guide the design of anti-cancer strategies that involve targeting key genes or their regulations.
In summary, the approach described here will not only provide a new systems biology approach for revealing novel pathogenic mechanisms of complex diseases through which genes or proteins lead to disease but also aid the development of innovative intervention strategies for the prevention and treatment of complex diseases that involve the manipulation of the biochemical state of genes or the dynamics of signal transduction. 6. Discussions and conclusions. In this study, we present a new theoretical approach to describe the controllability of networks using a novel concept of domain control and demonstrate how to drive a complex networked system transition from one domain of attraction to another. Such problems are of practical relevance and importance, particularly in the identification of potential disease genes and the development of therapeutic strategies for complex diseases. Based on our theory, we have proposed an algorithm of identifying the driver nodes for realizing domain controllability. More importantly, we have designed a general form of control functions, which can be used to uncover the practically required driver nodes for inducing transition from one domain of attraction to another. The successful application of domain controllability analysis to three real disease-related networks demonstrates the effectiveness of our approach, which provides a powerful tool for finding the critical nodes (genes) that contribute to phenotype transitions of systems. Recent life sciences are aiming at the control of biological systems for medical purposes. Such problems may be solved if we can identify the key driver genes and control the activity of these genes. Overall, these results indicate that our proposed domain controllability framework not only offers a practical control scheme but also identifies potential disease genes and drug targets, providing innovative therapeutic approaches, as well as prognostic or diagnostic markers for complex diseases.
To demonstrate the robustness of the control effectiveness, we performed the perturbation analysis for the EMT core network presented in section 5.1. The parameter values in (25) are perturbed within 10% variations for 100 times. Figure  C9 shows the evolution of the control function u(t) and the control error e(t) of each perturbation across 100 disturbed parameter sets. As shown in Fig. C9, there are almost no changes of u(t) and e(t) with the perturbations of the parameters. In other words, the control effectiveness is robust to perturbations of the parameter values.
Despite these exciting results regarding the control of the nonlinear dynamics of complex networks, domain control frameworks can still meet some difficulties. For example, estimating the domains of attraction is an important and challenging problem in higher-dimensional nonlinear systems [69]. Although many techniques have been proposed to address this problem, these are mainly based on the elements of Lyapunov stability theory [4,38]. Unfortunately, Lyapunov functions are difficult to determine using computational and analytical methods [30]. In this study, we employed a sampling method to estimate the domains of attraction, which was proven to efficiently address the domain controllability problem, as demonstrated through its applications to several complex systems.
To identify the driver nodes for achieving domain controllability, we employed the method presented in the proof of Theorem 3.1 to obtain theoretically required driver nodes, including two index sets K and Q. The set Q is similar to a minimal feedback vertex set of a directed graph in graph theory, which has been studied in previous literatures [18,40]. However, one of the remarkable aspects is that we designed a general form of control functions for realizing domain controllability. It is of great importance because it tells us the proper control input that we should impose on the driver nodes in practice. Therefore, we also performed simulations to obtain practically required driver nodes. Undoubtedly, these methods may take too much time to determine the set Q for the high-dimension problems. Because this study focused on the theory of domain controllability and its validation using the numerical simulations, designing effective algorithms for high-dimension problems would be our further work. In addition, nonlinear systems often display different types of attractors, such as stable steady states (fixed points), periodic attractors and strange attractors. In this paper, we mainly studied the transition between different stable steady states which have been observed in complex disease systems. However, the proposed approaches should also be extended to include other types of attractors which need for further examination.
In summary, we developed a practical and efficient framework for the domain controllability of nonlinear dynamical networks based on theoretical analysis and successfully applied to three disease-related complex networks. The principles outlined here can be used to guide pharmacological design and optimize drug combination strategies targeting complex diseases. Furthermore, the phenomenon of multistability has observed in many important biological processes such as cell cycle and cell fate decision. Therefore, our approaches are of wide applications in the biological and medical fields, as explained in section 5.4. In addition, the approaches presented here can also be extended to the control of related problems in other fields such as engineering and economy only if the dynamical system can exhibit multistable phenomenon. Our methodological framework may lead to new insights into how the behavior of complex systems can be altered in a desired manner.
Appendix A. Three biological examples for illustrating the definitions of the sets K and Q. In this section, we first give two simple but famous biological examples to explain how to obtain the set K and show the biological significance of the selected set K.
The first example is a system without any self-feedbacks. The well-known bistable gene-regulatory network, i.e., the genetic toggle switch in Escherichia coli, has been constructed and verified experimentally [19]. The regulatory network is shown in Fig. A1. Figure A1. The gene-regulatory network of the genetic toggle switch.
The dynamical model of this system is presented as follows.
where x 1 , x 2 represent the concentrations of repressors 1 and 2, respectively. α 1 and α 2 indicate the effective rate of synthesis of repressors 1 and 2, respectively. β and γ are the cooperativity of repression of promoters 2 and 1, respectively.
From the system (A1), it is easy to obtain that ∂F1 ∂x1 = −1 < 0, ∂F2 ∂x2 = −1 < 0. According to the definition of the set K, the set K for this system is empty.
The second example is a gene regulatory circuit with a self-feedback, which includes two genes. The regulatory network is shown in Fig. A2. Figure A2. A two-gene regulatory circuit with a self-feedback.
The dynamical model of this system is presented as follows (Please see the details of the model in the previous study [68]).
where x 1 , x 2 represent the expression levels of genes 1 and 2, respectively. α 1 and α 2 are the maximum expression rate. β 1 and β 2 indicate the inverse of the fold-change of enhanced transcription rates when the promoter is occupied by the activator. K 1 and K 2 are Michaelis constants and n is the Hill coefficient. λ 1 and λ 2 are the degradation rates. According to the system (A2), we have ∂F2 Because λ i are degradation rates, thus λ i > 0(i = 1, 2). Based on the biological significance of the parameters, we have β i = ρ i K n i (ρ i ∈ [10 −3 , 10 −2 ], i = 1, 2) [68]. Thus, it is easy to see that ∂F1 ∂x1 is not constantly negative. Thus K = {1}.
The main idea of determining the set Q is that a set of variables are selected such that the dynamical equations of all of the remaining variables can be expressed as Accordingly, C = Q K = {1}. Therefore, (7) holds. Actually, the set Q is not unique. Since N (1) = {2}, i.e., the first equation is also an explicit function of x 2 , Similar to the above method, the set Q can also be set as {2}.
Example 2. The famous Goodwin model [16]. The end-product of a metabolic pathway inhibits the expression of a gene coding for an enzyme that catalyzes a reaction step in the pathway, which is shown in Fig. A4. Its dynamics is governed by three ordinary differential equations.
In the above equations, S represents the threshold (inflection point) of the explicitly sigmoid functions, i.e. the strength of the regulatory interaction, and n is the Hill coefficient which determines the steepness of the sigmoid function. In addition, k is self-degradation constant, b is repression constant and a is activation constant.   . Domain control of the EMT core network transition from the epithelial state to the mesenchymal state by the theoretically required driver nodes. The node state trajectories x(t) (A-F ), control function u(t) (G) and control error e(t) (H ) are depicted when the theoretically required driver nodes (i.e., miR-200 and miR-203) are controlled. The symbols x 1 , x 2 , x 3 , x 4 , x 5 and x 6 indicate the activities of CDH1, SNAI1, ZEB1, ZEB2, miR-200 and miR-203, respectively. u 5 and u 6 indicate the control functions for miR-200 and miR-203, respectively. The black dashed and blue solid lines represent the initial epithelial state (i.e., the undesired state with no control) and desired mesenchymal state, respectively. Figure C3. Effect of the parameter λ in the designed control function on the domain control of the EMT core network. The node state trajectories x(t) (A-F ), control function u(t) (G) and control error e(t) (H ) are depicted when the practically required driver node, namely miR-203, is controlled. The symbols x 1 , x 2 , x 3 , x 4 , x 5 and x 6 indicate the activities of CDH1, SNAI1, ZEB1, ZEB2, miR-200 and miR-203, respectively. The black dashed and colored solid lines represent the initial epithelial state (i.e., the undesired state with no control) and desired mesenchymal state, respectively. Here, we take the EMT core network as an example to show how the parameter λ in our designed control function influences the control process. As shown in (H), to a certain extent, a higher absolute value of indicates a less time the system takes to evolve to the desirable attractor. However, there is no significant influence when the absolute value of λ is larger than 1. Figure C4. Distributions of the domains of attraction of the Th differentiation cellular network model. We project the domains into a two-dimensional plane. The light red and light yellow regions represent the domains of attraction of the Th1 attractor and Th2 attractor, respectively. Figure C5. Domain control of the T helper differentiation cellular network transition from Th1 to Th2. The representative node state trajectories x(t) (A-K ), control function u(t) (L) and control error e(t) (M ) are depicted. The symbols x 1 , x 4 , x 5 , x 6 , x 7 , x 12 , x 13 , x 17 , x 19 , x 21 and x 22 indicate the activities of GATA-3, IFN-γ, IFN-γR, IL-10, IL-10R, IL-4, IL-4R, SOCS1, STAT3, STAT6 and T-bet, respectively. The dashed black line indicates the Th1 phenotype, i.e. the initial state with no control. The solid blue line represents the Th2 phenotype. The driver node is GATA-3 when Th1 is driven to the Th2 phenotype. Figure C6. Distributions of the domains of attraction of the cancer network. We project the domains into a two-dimensional plane. The light red and light yellow regions represent the domains of attraction of the cancer attractor and normal attractor, respectively. Figure C7. Domain control of the cancer network transitions from the normal state to the apoptosis state and cancer state. The representative node state trajectories x(t) (A-L), control function u(t) (M ) and control error e(t) (N ) are depicted when AKT (u N →A ) and RB (u N →C ) are controlled, respectively. The symbols x 4 , x 5 , x 6 , x 10 , x 12 , x 13 , x 14 , x 16 , x 18 , x 21 , x 22 and x 25 indicate the activities of PTEN, CDH1, RB, AKT, VEGF, HGF, HIF1, MDM2, CDK4, Caspase, BAX and NFκB, respectively. The black dashed lines with star markers indicate the normal state, i.e., the initial state with no control. The red solid lines with filled circles and blue solid lines with squares represent transitions from normal (N) to apoptosis (A) (denoted by N→A) and normal (A) to cancer (C) (denoted by N→C), respectively. The driver node is AKT when normal is steered to the apoptosis and RB when normal is driven to the cancer state. Figure C8. Domain control of the cancer network transitions from the apoptosis state to the normal state and cancer state. The black dashed lines with star markers indicate the apoptosis state, i.e., the undesired state with no control. The red solid lines with filled circles and blue solid lines with squares represent transitions from the apoptosis state (A) to normal state (N) (denoted by A→N) and apoptosis state (A) to cancer (C) (denoted by A→C), respectively. The driver node is NFκB when the apoptosis state is steered to the normal state and AKT when apoptosis state is driven to the cancer state. Figure C9. Robustness analysis of the control effectiveness against parameter perturbation. The evolution of the control function u(t) and control error e(t) are depicted when miR-203 is controlled. Each curve represents one simulation for one perturbation of the parameter values. In our numerical simulations, we randomly generate 100 parameter sets in which every parameter of the system is perturbed within a range of ±10%. The initial state is the epithelial state, which is set to be (1, 0, 0, 0, 1, 1) T in all the numerical experiments. We zoom into the curves in a small window.