Word combinatorics for stochastic differential equations: splitting integrators

We present an analysis based on word combinatorics of splitting integrators for Ito or Stratonovich systems of stochastic differential equations. In particular we present a technique to write down systematically the expansion of the local error; this makes it possible to easily formulate the conditions that guarantee that a given integrator achieves a prescribed strong or weak order. This approach bypasses the need to use the Baker-Campbell-Hausdorff (BCH) formula and shows the existence of an order barrier of two for the attainable weak order. The paper also provides a succinct introduction to the combinatorics of words.


Introduction
This paper shows how word combinatorics is a useful tool in the analysis of splitting integrators for Ito or Stratonovich systems of stochastic differential equations. In particular we present a technique to write down systematically the expansion of the local error; this makes it possible to easily formulate the conditions that guarantee that a given integrator achieves a prescribed strong or weak order. This approach bypasses the need to use the Baker-Campbell-Hausdorff (BCH) formula and shows the existence of an order barrier of two for the attainable weak order. In the case of Stratonovich systems the technique has already appeared in [1]; the corresponding Ito results appear here for the first time. In addition, while the succinct presentation in [1] focuses on the "recipe" to write down the order conditions, the present paper includes background on the combinatorics of words. In this way we also provide what we hope is a reader-friendly introduction to that area, which has applications outside numerical mathematics in many mathematical tasks, including averaging of periodically or quasiperiodically forced systems of differential equations, reduction of continuous or discrete dynamical systems to normal form, rough path theory, etc. (references are given below).
The importance of splitting integrators [6,28] has increased continuously in the recent past due to their flexibility to adapt to the structure of the problem being solved, be it in the context of multiphysics systems or in the domain of geometric integration (i.e. integration performed under the requirement that the numerical solution has some of the geometric properties possessed by the true solution) [42,21,4]. As it is the case with any other one-step integrator, the analysis of a splitting algorithm starts with the study of the local error [9,22], i.e. the error under the assumption that the computation at time level t n+1 starts from information at time t n that is free of errors. Unfortunately, even in the case where the system being integrated consists of (deterministic) ordinary differential equations, the investigation of the local error may be a daunting task if undertaken in a naive way. Formal series and combinatorial algebra have been very useful tools as we discuss presently; see [44] for a recent survey.
For Runge-Kutta methods, whose history goes back to 1895, the structure of the local error was only understood after Butcher's work in the 1960's [8]; this work made it possible to construct formulas that improve enormously on those known until then. In Butcher's theory, the true and numerical solutions are expanded in series; each term of the series is the product of a power of the step size, a numerical coefficient (elementary weight) and a vector-valued function (elementary differential). There is term in the series associated with each rooted tree. The elementary differentials change with the system being integrated but are common to all Runge-Kutta formulas and to the true solution. The weights change with the integrator but are independent of the system being integrated. B-series [23], formal series indexed by rooted trees, were introduced by Hairer and Wanner as a means to systematize Butcher's approach and to extend it to more general classes of algorithms. B-series are indexed by rooted trees and are combinations of elementary differentials. A key result in the theory of B-series is the rule to compose two B-series to obtain a third. B-series possess many applications in numerical analysis, especially in relation to geometric integration (starting with [11]) and modified equations [10]. (Loosely speaking the modified equation of a numerical integration is the differential equation exactly satisfied by the numerical solution.) Recently B-series have also been used outside numerical mathematics, e.g. to perform high-order averaging of periodic or quasiperiodic systems [12,13].
For splitting integrations of deterministic systems, the best-known method to investigate the local error [42] uses the BCH formula [43,21]. This may be considered in indirect approach, in that it does not compare the numerical and true solutions but rather the modified system of the integrator and the true system being solved. The large combinatorial complexity of the BCH formula is certainly a limitation of this technique. An alternative methodology, patterned after Butcher's treatment of the Runge-Kutta case was introduced in [32] (a summary may be seen in [21,section III.3]). A third possibility is the use of word series expansions [31,14,15,33,34,35,36,37]. Word series are patterned after B-series; rather than combining elementary differentials they combine word basis functions. They are indexed by words on an alphabet rather than by rooted trees. Their scope is narrower than that of B-series; all problems that may be treated by word series are amenable to analysis via B-series, but the converse is not true. On the other hand, word series, when applicable, are more compact and simpler to use than B-series; in particular the composition rule for word series is much simpler than the corresponding rule for B-series. Word series may be used outside numerical mathematics in tasks such as high-order averaging [14,15,34,36,37], reduction of dynamical systems to normal form [33], etc. They are very well suited to investigate the local error of splitting algorithms [35] (see also the closely related technique in [5,Section 2.4]).
Turning now our attention to splitting algorithms for stochastic differential equations, the most popular technique is again based in the BCH formula, see e.g. [26,27]. In [1] we suggested a word-series approach in the case where the equations are interpreted in the sense of Stratonovich. This approach bypasses the use of the BCH formula and it is not difficult to implement in practice. Here we extend the material in [1] in several directions that we now discuss briefly.
This paper contains nine sections. Section 2 recalls the Taylor expansion of the solution of Stratonovich and Ito equations and introduces much of the notation to be used throughout the paper. In Section 3 we present splitting integrators and their local errors. We also discuss briefly the pullback operator associated with a mapping; this is a key notion in what follows, as the local error is investigated here by expanding pullback operators rather than mappings. Section 4 describes the main tool: formal series indexed by words. We employ two kinds of such series: series of differential operators and series of mappings. The central results, i.e. the structure of the strong and weak local error and the strong and weak order conditions, are given in Section 5. In the Stratonovich case the order conditions have been already presented in [1]; the Ito case is new, as is the detailed discussion of the necessity of the order conditions (Lemma 8). Section 6 deals with the shuffle and quasishuffle products; these play a key role in the combinatorics of words. In our context they are necessary to identify sets of independent order conditions, a point not discussed in [1], and to prove the composition rule for word series (Proposition 20). The discussion of the order conditions finishes in Section 7 with the help of the infinitesimal generator. There we show an order barrier of 2 for the weak order attainable by splitting integrators in both the Stratonovich and Ito cases. Sections 8 and 9 present come complements; they respectively discuss how the relation between the Ito and Stratonovich interpretations may be understood in terms of word combinatorics and the links between the material in this paper and the theory of Hopf algebras.
We close the introduction with some important points.
• The word "formal" is often used in some disciplines, such as theoretical physics, as somehow synonymous to imprecise or lacking in rigour. In this paper formal series are well defined objects that, after truncation, yield meaningful approximations; they are manipulated rigorously because all the necessary computations involve finite sums.
• Our interest is in the combinatorial aspects of the theory. Therefore we shall not concern ourselves with the derivation of error bounds or other analytic considerations. The interested reader is referred to the appendix of [1] (see also [14]).
• In order not to clatter the exposition, all functions that appear are assumed to be smooth in the whole of the Euclidean space. At some places only a finite number of the terms in some series make sense if the given vector fields have limited smoothness. In those circumstances one has to replace the series by a finite sum.

Stochastic Taylor expansions
We are concerned with Stratonovich, or Ito, systems of differential equations (see e.g. [30]), where f , g i , i = 1, . . . , n, are smooth vector fields in R d and B i , i = 1, . . . , n, are independent scalar Wiener processes. When applying splitting integrators, f is often written a sum m j=1 f j ; it is then convenient to work hereafter with the formats The finite set of indices A det is called the deterministic alphabet; its elements are called deterministic letters. The finite set A sto is the stochastic alphabet and its elements are the stochastic letters. The set A = A det ∪ A sto is called the alphabet and is assumed to be nonempty. On the other hand, we include the cases where A det or A sto are empty; if A sto = ∅ then (3)-(4) is a system of ordinary differential equations. We use lower case a, b, . . . for deterministic letters and upper case A, B, . . . for stochastic letters. The symbols k, ℓ, m, . . . are used to refer to elements of A, i.e. to letters, when it is not necessary to specify if they are deterministic or stochastic.
In this section we recall the expressions of the Taylor expansions of the solutions of (3) or (4) presented in e.g. [25,Chapter 5]. Our treatment is somewhat different, because we deal with the format (3)-(4) rather than with the standard (1)- (2). Specifically, as distinct from [25], we work here with deterministic alphabets A det that may have several letters and, in the Ito case, introduce introduce a letterĀ for each A ∈ A sto . In the presentation of the Taylor expansion we shall encounter words, and their differential operators and iterated integrals; these are essential later in the paper.

The Stratonovich-Taylor expansion
With each letter ℓ ∈ A we associate a first-order differential operator D ℓ . By definition, D ℓ is the Lie operator that maps each smooth function χ : R d → R into the function D ℓ χ that at the point x ∈ R d takes the value (superscritps denote components of vectors). In (5), the symbol χ ′ denotes the first (Fréchet) derivative of χ; its value at x ∈ R d is a linear map defined on R d and is the image by this linear map of the vector f ℓ (x) ∈ R d . Smooth functions χ : R d → R will often be referred to as observables. Since the Stratonovich calculus follows the rules of ordinary calculus, if x(t) is a solution of (3) and t 0 ≥ 0, h ≥ 0, where for deterministic ℓ 1 the notation •dB ℓ1 (s 1 ) means ds 1 . In (6), as h ↓ 0, the term χ(x(t 0 )) provides the Taylor approximation of order 0 to χ(x(t 0 + h)) and the integral gives the corresponding remainder. To obtain additional terms of the Taylor expansion of χ(x(t 0 + h)), we first write formula (6) with D ℓ1 χ(x(s 1 )) in lieu of χ(x(t 0 + h)), and then substitute in (6) to get By iterating this procedure, we find the series where J ℓn...ℓ1 (t 0 + h; t 0 ) denotes the iterated stochastic integral Iterated integrals obey the following recursion, n ≥ 2, Remark 1 In the right-hand side of (7) the iterated integrals are constructed from the Brownian processes B A , A ∈ A sto , in (3) and do not change if the fields f ℓ , ℓ ∈ A, (or even their dimension d) change. On the other hand the operators D ℓ are constructed from the vector fields and do not change with the Brownian processes.
In the deterministic case, iterated integrals were introduced and investigated extensively by Kuo Tsai Chen [16] in the context of his work on topology.
The notation may be simplified by introducing the set W consisting of all words ℓ n ℓ n−1 . . . ℓ 1 constructed with the letters of the alphabet A; W includes an empty word ∅ with n = 0 letters. Elements ℓ ∈ A are seen as words with a single letter and accordingly A becomes a subset of W. With each word w = ℓ n . . . ℓ 1 with n ≥ 1 letters, we associate the n-th order (linear) differential operator D w = D ℓn · · · D ℓ1 . For the empty word, we define D ∅ to be the identity operator Id with Idχ = χ for each observable and set J ∅ = 1. (Then (9) also holds for n = 1). With this notation the series in (7) simply reads We note that for a deterministic letter, while in the stochastic case is a Gaussian random variable with standard deviation h 1/2 . For this reason, we attach to each deterministic letter a ∈ A det the weight a = 1 and each stochastic letter A ∈ A sto the weight A = 1/2. We then define the weight w of each word by adding the weights of its letters. The weight of the empty word is 0. The following proposition, whose proof may be seen in [1], lists some properties of the iterated integrals. It shows in particular that, as h ↓ 0, J w (t 0 + h; t 0 ) may be conceived as having size O(h w ).

Proposition 2
The iterated Stratonovich integrals J w (t 0 + h; t 0 ) have the following properties: • The joint distribution of any finite subfamily of the family of random variables {h − w J w (t 0 + h; t 0 )} w∈W is independent of t 0 ≥ 0 and h > 0.
• For each w ∈ W and any finite p ≥ 1, the (t 0 -independent) L p norm of the • E J w (t 0 + h; t 0 ) = 0 whenever w is not an integer.
In view of the proposition we rewrite (10) as: where N/2 = {0, 1/2, 1, 3/2, . . . }. (For each ν, the inner sum only contains a finite number of terms.) In this way, by discarding the terms with ν > ν 0 in (11), one obtains the Taylor approximation of order ν 0 for χ(x(t)). Of course the series in (11) in general does not converge to χ(x(t 0 + h)); it is a formal series, whose truncations provide the required Taylor approximations. So far it has been assumed that χ is scalar-valued. For a vector-valued χ, the Taylor expansion is also given by (11), with the differential operators D w defined to act componentwise. The particular choice where χ : R d → R d is taken to be the identity function x → x, yields the expansion of the solution x(t 0 + h) itself given by where, f w (x(t 0 )) denotes the result of applying D w to the identity function and then evaluating at x(t 0 ). Note that the functions f w may be constructed from the fields f ℓ in (3) with the help of the recursion where f ′ ℓn−1...ℓ1 (x) stands for the value at x of the Jacobian matrix of f ℓn−1...ℓ1 .

The Ito-Taylor expansion
The Taylor expansion of the solution of Ito stochastic differential system was first derived by Platen and Wagner [39]. For (4), formula (6) has to be replaced by DĀχ(x(s 1 )) ds 1 ; the last term in the right-hand side is the Ito correction, where, for each stochastic letter A, DĀ represents the second-order, linear differential operator The symbol χ ′′ represents the second (Fréchet) derivative of χ; its value χ ′′ (x) at a point x ∈ R s is a bilinear map defined on R d × R d and In order to write (14) more compactly, we introduce the extended alphabetĀ = A det ∪Ā sto . The extended setĀ sto of stochastic letters coincides with the old A sto , i.e. with the set of indices in the second sum in (4); the extended setĀ det comprises the indices a in the first sum in (4) and, in addition, a letterĀ for each A ∈Ā sto = A sto . With these notations, (14) becomes (dB ℓ1 (s 1 ) = ds 1 for ℓ 1 ∈Ā det ); this is the Ito counterpart of the right-most expression in (6). By iterating as in the Stratonovich case, we obtain the series where I ℓn...ℓ1 (t 0 + h; t 0 ) denotes the Ito iterated stochastic integral These iterated integrals satisfy the obvious analogue of the recursion (9). Again the iterated integrals do not change if the vector fields are changed and the operators D ℓ do not change if the Brownian processes are changed. We now consider the set of words W constructed with the letters of the extended alphabetĀ, and write D w = D ℓn · · · D ℓ1 for w = ℓ n . . . ℓ 1 ∈ W, n > 0, (recall that D ℓ is a second order operator if ℓ is of the formĀ, A ∈ A sto ), D ∅ = Id, I ∅ = 1. Then (16) has the compact expression w∈W I w (t 0 + h; t 0 )D w χ(x(t 0 )).
If letters inĀ det are again declared to have weight 1 and letters inĀ sto to have weight 1/2, we have the following result, whose proof is similar to that of Proposition 2:

the properties of the Stratonovich iterated integrals listed in Proposition 2
The series (17) is rewritten as and for the solution itself we have the Taylor series ν∈N/2,ν w∈W, w =ν where f w (x(t 0 )), w = ℓ n . . . ℓ 1 denotes the result of successively applying D ℓ1 , . . . , D ℓn to the identity function and then evaluating at x(t 0 ). The f w satisfy (13) if ℓ n ∈ A det ∪ A sto and for ℓ n =Ā with A ∈ A sto . Since the second derivatives of the identity function vanish we have the following result.
Proposition 4 If the last (i.e. right-most) letter of w ∈ W is of the formĀ with A ∈ A sto , then f w vanish identically.
Therefore, after suppressing the f w that vanish identically, the Taylor expansion may be written: where W 0 is the subset of W consisting of words whose last letter is not one of theĀ, 3 Analyzing splitting integrators: preliminaries

Splitting integrators
In order to avoid notational complications, let us momentarily consider only the simple instance of (3) given by Splitting integrators may be applied to integrate this system if one may solve in closed form the split systems and In the simplest splitting integrator, the Lie-Trotter algorithm, the numerical solution is advanced from its value x n at a time level t n to the value x n+1 at the next time level t n+1 by first integrating (22) in the interval [t n , t n+1 ] with initial condition x n to get a valuex n and then usingx n as initial condition to integrate (23) in the interval [t n , t n+1 ] to obtain x n+1 . In this way, the simultaneous contributions of f a and f A in (21) are replaced by successive contributions. The procedure is best described by introducing, for t,s of (22) and (23); by definition, ϕ t,s ) maps x ∈ R d into the value at time t of the solution of (22) (respectively (23)) with initial value x at time s. Note that, for the autonomous deterministic system (22), ϕ (a) t,s depends on t and s only through the combination (elapsed time) t − s, but that is not the case for ϕ t,s makes sense for t < s, but ϕ (A) t,s does not, because stochastic differential equations cannot be evolved backward in time. With this notation in place, one step of the Lie-Trotter algorithm described above is given by Of course one may also consider the alternative algorithms given by ϕ More involved splitting algorithms are obtained by composing four or more solution maps of the split systems.
Leaving now the particular instance (21), for a problem of the general form (3) the splitting-integrator mapping x n+1 = ψ tn+1,tn (x n ) is a composition of solution operators ϕ Here c i and d i are constants and the superindex i refers to a system of differential equations obtained by taking into account a subset S i , i = 1, . . . , m of the fields f ℓ in (3); it has to be supposed that these systems are solvable in closed form. For our purposes here, there is complete freedom when choosing the different S i ; it is possible to have S i = S j for i = j (as in Strang's method where S 1 = S 3 ) or to let given vector field f ℓ appear in S i and S j with S i = S j . It is important to note that it is necessary to assume throughout that c i < d i except in the case where S i is a deterministic system; stochastic differential equations cannot be evolved backward in time.
The Ito case can be dealt with in the same way; the only difference is that the solution operators of the systems S i have to be based on the Ito interpretation.

The local error
An essential part of the analysis of any one-step integrator x n+1 = ψ tn+1,tn (x n ) is the study of the corresponding local error (or truncation error). By definition, if ϕ t,s denotes the solution operator of the system (3) or (4) being integrated, the local error is the difference In what follows we just consider the case ψ t1,t0 (x 0 ) − ϕ t1,t0 (x 0 ); the case with general n differs from this only in notation. Furthermore we write t 1 = t 0 + h, where h > 0 is the step-length. Our aim is to understand the behaviour of (26) as h ↓ 0 and this is achieved by Taylor expansion. In the particular case of the Lie-Trotter integrator (24) for the simple system (21), we have therefore to Taylor expand and compare the result with the expansion of ϕ t1,t0 (x 0 ) found in the preceding section. Note that if we writex so that the expansions i c i F i (x 0 ) of (28) at x 0 and j d j G j (x 0 ) of (29) atx 0 are both known; they are particular instances of (12) corresponding to alphabets with the single letter a or A respectively. Then expansion for (27) may be obtained by substituting to get Taylor expanding each G j i c i F i (x 0 ) and gathering terms of equal weight. For more complicated splitting integrators there are m mappings being composed and implementing the naive substitution we have described may be a daunting task. We are thus led to the following: Problem P: Find efficiently the expansion of a composition of mappings ϕ (m) • · · ·• ϕ (1) , when ϕ (i) , i = 1, . . . , m, have known expansions of the form (12) (or (20) for the Ito case).
The solution to this problem presented in the next section is based on expanding pullback operators (see e.g. [37]) rather than mappings.

Pullbacks
Associated with any mapping ϕ : R d → R d , there is a pullback operator Φ. By definition, Φ maps each observable χ into the observable Φχ whose value at x ∈ R d is (Φχ)(x) = χ(y) with y = ϕ(x) (ϕ pushes the point x forward to y, while Φ takes values of the observable "back" from y to x). The pullback operator corresponding to a composition ϕ (2) A map and its pullback operator contain the same information: when the operator Φ is known, one may retrieve the underlying map ϕ by applying Φ to the identity x → x in R s . Recovering Φ from ϕ is similar to what was done for formal series rather than for maps to obtain (12) from (11) (or (20) from (18) in the Ito case). Taking this point further, from (11) we may consider that the series provides the Taylor expansion of the pullback operator of the solution operator ϕ t0+h,t0 of (3). For the Ito case (30) is replaced by In this way the problem posed above may be reformulated as: The idea of using pullback (differential) operators to analyze local errors is old. Merson [29] used it in 1957 to study Runge-Kutta formulas; however the subsequent treatment in Butcher [8] did away with differential operators and worked only with elementary differentials (mappings). In the stochastic case it is convenient to work both with differential operators and mappings, as it will become clear below.
In addition, we need to consider the larger noncommutative algebra R A of formal series. These are formal expressions w∈W c w w where it is not any longer assumed that only finitely many coefficients c w are = 0. If S ∈ R A , we denote the corresponding coefficients by S w , i.e. S = w∈W S w w. Formal series are combined linearly in an obvious way and are multiplied as in (32), where we note that the right-hand side is well defined, even if infinitely many c v and d w do not vanish, because the number of ways in which a given u ∈ W may be written as a concatenation u = vw is finite. More precisely, if we denote by R W the set of all sequences of coefficients {c w } w∈W indexed by words, then the product in (32) is the series u∈W e u u ∈ R A with coefficients {e u } u∈W such that e ∅ = c ∅ d ∅ and , for each nonempty word u = ℓ 1 . . . ℓ n , The right-hand side of this formula contains all the ways of writing u = ℓ 1 . . . ℓ n as a concatenation of two (possibly empty) words. Thus (33) defines a (noncommutative, associative) product in the set R W of sequences of coefficients, the so-called convolution product, in such a way that the product of series S ∈ R A corresponds to the convolution product of the sequences of coefficients {S w } w∈W ∈ R W . A general well-known reference to the combinatorics of words is [41].

Series of differential operators
Given the vector fields f ℓ in (3), the concatenation product of words obviously corresponds to the composition of the associated differential operators: With the series S ∈ R A we associate the formal series of differential operators D S = w∈W S w D w . It follows that S, S ′ ∈ R A , the product (composition) D S D S ′ is the series D SS ′ associated with SS ′ , whose coefficients, as we know, are given by the convolution product of the coefficients of S and S ′ . Series of differential operators are a common tool in control theory, see e.g. [19].
With the terminology we have introduced, for each fixed t, t 0 and for each event in the underlying probability space, the expansion in (30) coincides with D S when the coefficients are S w = J w (t 0 + h; t 0 ), w ∈ W. Since we have just described how to multiply series D S , we have solved the problem P' posed in the previous section.
We illustrate the technique by means of a simple example. We integrate the system with the help of the split systems We use the Lie-Trotter formula ϕ (2) • ϕ (1) . According to (30) (when the alphabet is chosen to be {a, A}), the expansion of Φ (1) is where O(2) denotes the terms in the series with weight ≥ 2, and J A , J a , . . . stand for Multiplying out, we obtain the expansion for the product Φ (1) Φ (2) : For the solution of the system being integrated, (30) (when the alphabet is {a, b, A}) yields and subtracting we find that the pullback operator associated with the local error has the expansion:

Word series
Given the vector fields f ℓ in (3), with each series S ∈ R A we associate the corresponding word series W S (x 0 ); this is obtained by applying D S to the identity map The functions f w : R d → R d , w ∈ W, we already encountered in (12), are called word basis functions. Recall that they may be found recursively via (13) from the f ℓ that appear in the system (3). Word series, introduced and studied in [31,13,14,1,35,36,37], may be seen as equivalent to series of differential operators; the theory of words series is patterned after the theory of B-series [23] familiar to many numerical analysts. With the terminology above, for fixed t 0 and h and each event in the underlying probability space, the expansion (12) is simply the word series with coefficients given by the iterated integrals J w . In what follows we shall denote by J the series J = w J w (t 0 + h; t 0 )w ∈ R A , so that D J and W J are the corresponding series of operators and word series. Formal series of words whose coefficients are iterated integrals are often called Chen series; they play a role in several mathematical developments, including the theory of rough paths (see e.g. [2]).
In the example we are discussisng, from (34) we obtain that the local error has the expansion with

Series for Ito problems
The preceding material is easily adapted to the Ito system (4). The required changes are few. One considers formal series S ∈ R Ā (words are now based on the extended alphabet) and to each S = w∈W S w w associates a series of differential operators D S = w∈W S w D w . The expansion (31) of the pullback of the solution operator is D S when the coefficients of the series are chosen to be the Ito iterated integrals. We write this series as D I and set I = w∈W I w (t 0 + h; t 0 )w ∈ R Ā for the corresponding Chen series.
Here is an example. For the Ito system corresponding to the alphabet {a, b, A}, split as (1) {a, A}, (2) {b}, the expansion of of Φ (1) and, multiplying out, the expansion Φ (1) Φ (2) is found to be For the solution of the system being integrated we have and, for the pullback of the truncation error, while for the truncation error itself we have the word series expansion: with

The expansion of the local error. Error equations
In this section we present the Taylor expansion of the local error along with the conditions that have to be imposed to achieve a target strong or weak order of consistency.

Expanding the local error
By applying the technique in the previous section, the Taylor expansion of the mapping ψ t0+h;t0 that describes a splitting integrator for the Stratonovich system (3) is found as a word series Here J w (t 0 ; t 0 + h) is either zero or a product of Stratonovich iterated integrals corresponding to words whose concatenation is w (see (35) for an example). Thus, in each product, the iterated integrals being multiplied correspond to words whose weights add up to w . In particular J ∅ (t 0 ; t 0 + h) = 1. For the corresponding pullback we have the expansion (34)). Similarly, in the Ito case, ψ t0+h;t0 has a word series expansion (see (37)) and for the associated pullback the expansion is (36)). The proof of the following technical result may be found in [1] for the Stratonovich case; the Ito case is proved similarly.

Proposition 5
The coefficients J w (t 0 + h; t 0 ), w ∈ W, possess the properties of the exact coefficients J w (t 0 + h; t 0 ) listed in Proposition 2. The coefficients I w (t 0 + h; t 0 ), w ∈ W, possess the properties of the exact coefficients I w (t 0 + h; t 0 ) listed in Proposition 3.
By subtracting the expansions of the integrator and the true solution, we immediately obtain the next result. The bound for δ w (t 0 ; h) p follows from the third item in Proposition 2 and the corresponding result for J w (t 0 + h; t 0 ) in Proposition 5. Note that the halfinteger values of ν drop from (39) in view of the last item in Proposition 2 and the corresponding result for J w (t 0 + h; t 0 ). Theorem 6 For a splitting integrator for the Stratonovich system (3), the local error ψ t0+h;t0 (x 0 ) has a word series expansion with coefficients For each nonempty w ∈ W and any L p norm 1 ≤ p < ∞, uniformly in t 0 , In addition, for each observable χ, conditional on x 0 , the error in expectation Of course to obtain good strong approximations, integrators with small error coefficients δ w (t 0 ; h) are to be preferred, all other things being equal, to integrators with large error coefficients. A similar comment applies to weak approximations. The reference [1] presents a comparison between two splitting integrators of the Langevin dynamics introduced by Leimkuhler and Matthews [26,26]. While both schemes are closely related, it is found in [26,26] that, in practice, one clearly outperforms the other; this is explained in [1] by analysing the corresponding error coefficients.
In the Ito case we have the following result: Theorem 7 For a splitting integrator for the Ito system of differential equations (4), the coefficients η w (t 0 ; h) = I w (t 0 + h; t 0 ) − I w (t 0 + h; t 0 ), satisfy, for each nonempty w ∈ W and any L p norm 1 ≤ p < ∞, uniformly in t 0 , The local error ψ t0+h;t0 (x 0 ) has a word series expansion In addition, for each observable χ, conditional on x 0 , the error in expectation

Stratonovich order conditions
If µ ∈ N/2, µ > 0, we shall say that the integrator has strong order ≥ µ if the series (38) only comprises terms of weight ≥ µ + 1/2, i.e. of size O(h µ+1/2 ). From Theorem 6 it is clear that for µ ∈ N/2, µ > 0, the strong order conditions, are sufficient to guarantee strong order ≥ µ. Under suitable assumptions on (3), it may be proved that when the order conditions hold the local error actually possesses a O(h µ+1/2 ) bound in the L p norms, p < ∞. Here our interest lies in the combinatorial aspects of the theory and will not be concerned with the derivation of such bounds; the interested reader is referred to [1]. Are the strong order conditions (42) necessary as well as sufficient to achieve strong order ≥ µ? This question may be discussed in two different scenarios: • Specific system. In this case we are only interested in (3) for a fixed, specific choice of dimension d and vector fields f ℓ in R d .
• General system. Here A and the coefficients J w are fixed and one demands that the series (42) only comprises terms of weight ≥ µ + 1/2 for each choice of d and each choice of vector fields f ℓ , ℓ ∈ A, in (3).
While the general system scenario is not without mathematical interest, in practice it is the specific system case that matters. This point, that would be true for any numerical integrator, is especially so for splitting algorithms: one of the main advantages of the splitting idea is its versatility to be tailored to the specific problem at hand. In the specific system scenario is possible that for some words w the word basis functions f w vanish at each x 0 . If that is the case, it is not necessary to impose the order conditions δ w = 0 associated with such words. This is illustrated in [1] in the case of the Langevin dynamics, whose structure implies that many f w vanish.
In the general system scenario the conditions (42) are necessary for strong order ≥ µ, in view of the second item of the lemma below that show that the word basis functions are independent.
Lemma 8 Fix the alphabet A and choose w ∈ W, w = ∅. There exist a value of the dimension d, vector fields f ℓ , ℓ ∈ A, in R d , and a scalar observable χ, which depend on A and w, such that, • D w χ(0) = 1 and D u χ(0) = 0 for each nonempty u ∈ W, u = w.
• The first component f 1 u (0) of the vector f u (0) ∈ R d vanishes for each nonempty u ∈ W, u = w, while f 1 w (0) = 1.
Proof: The first item follows from the second by choosing χ to be the first coordinate mapping x → x 1 . For the second item, the idea of the proof is best understood by means of an example. Suppose that w = ℓℓmℓm with ℓ = m. Then set d = 5, (recall that superscripts denote components) and f k (x) = 0 any remaining letters. Thus Because second and higher derivatives of the fields vanish, the recurrence (13) shows that for any word u = k n . . . k 1 .
The Jacobian matrix f ′ k1 must have a nonzero element in its first row and this implies that k 1 = m. Then, by definition of f m , the first row is [0, 1, 0, . . . , 0], so that the second component of must be nonzero. This implies that k 2 = ℓ. By repeating this argument, we conclude that u = w and f 1 w (0) = 1. For a general word w, things are as follows. The dimension d is taken equal to the number of letters in w. The field is split in such a way that its d − j + 1, j = 1, . . . , d component is assigned to the field f k if k is the letter that occupies the j-th position in w. (In this way there as many nonzero vector fields f k as distinct letters in w.) For σ ∈ N, σ > 0, the weak order conditions E J w (t 0 + h; t 0 ) = E J w (t 0 + h; t 0 ) , w ∈ W, w = 1, 2, . . . , σ, (43) are sufficient to ensure that the series in (39) only comprises terms of weight ≥ σ + 1, or, as we shall say, the integrator has weak order ≥ σ. In a general system scenario the weak order conditions are also necessary in view of the first item in the preceding lemma.

Ito order conditions
For the Ito case the strong and week order conditions are and respectively. They guarantee that the series in (40) (respectively (41)) only consists of terms of weight ≥ µ (respectively ≥ σ), or, as we shall say, the integrator has strong order ≥ µ (respectively weak order ≥ σ).
It is possible to show (but the very long proof will not be reproduced here) that, in the general system scenario and if µ = 1/2, 1, 3/2, the conditions (44) are necessary to achieve strong order ≥ µ. Similarly it may be proved that (45) are necessary to have weak order ≥ σ for general systems if σ = 1, 2, 3. These particular values of µ and σ are sufficient for establish the order barrier in Theorem 27 below. We believe the strong are weak order conditions are necessary for arbitrary µ or σ but a proof is not yet available.

Extensions
In (3) or (4) is assumed that all vector fields f a and f A are equally important. In several applications this may not be the case. For instance, consider the system or its Ito counterpart, where the split systems {a, A}, {b} may be solved in closed form.
Thus we are dealing with small perturbation of an integrable system and it makes sense, when expanding the local error, to track not only powers of h but also powers of ǫ, as in done in e.g. [5] in the deterministic scenario. That task is easily accomplished with the tools presented so far. Details will not be given.

The shuffle and quasishuffle products
The conditions in (42) are not independent; for instance the order condition corresponding to the two-letter word ℓℓ is fulfilled whenever the order condition for ℓ is fulfilled. (Note that the dependence between order conditions and the necessity of the order conditions discussed above are completely different issues.) Similarly there are dependencies within each of the set of conditions (43), (44) and (45). The study of this issue requires the help of the shuffle and quasishuffle products. More generally, these products play a key role when working in many developments involving elements of R A or R Ā [41]. In the deterministic case the shuffle relations between iterated integrated were first noted by Ree [40]. The stochastic scenario was addressed by Gaines [20]. On the other hand there is much literature relating the shuffle and quasishuffle products to stochastic integration, see e.g. [18]. We begin with the Stratonovich/shuffle case. The more complicated Ito/quasishuffle case is presented later.

The shuffle product
To motivate the introduction of the shuffle product, we begin by noting that if ϕ : R d → R d is any mapping and Φ the associated pullback operator, then, for any pair of scalar-valued observables χ 1 , χ 2 , where · denotes the standard (pointwise) product of observables, i.e. (χ 1 · χ 2 )(x) = χ 1 (x)χ 2 (x) for x ∈ R d . In other words Φ is multiplicative. The series of differential operators D J and D J that expand the pullback operators associated with ϕ t0+h;t0 and ψ t0+h;t0 are similarly multiplicative. Now, it is easily checked that if S ∈ R A , then, in general (for instance, if S = ℓ ∈ A, then D S (χ 1 ·χ 2 ) = (D S χ 1 )·χ 2 +χ 1 ·(D S χ 2 )). Therefore the coefficients J w and J w of the series D J and D J must have some special property that tells them apart form "general" coefficients; as we shall see, that property explains the dependence between the order conditions. In order to identify when a series D S is multiplicative, we first investigate the action of a differential operator D w , w ∈ W, on a product χ 1 · χ 2 . For instance, for k, ℓ, m ∈ A, a trivial computation leads to The right-hand side contains eight pairs of words (kℓm, ∅), (kℓ, m), . . . What do these pairs have in common? They are precisely the pairs such that when shuffled give rise to the word kℓm in the left-hand side. By definition, the shuffle product u ¡ v of two words with m and n letters is the sum of the (m + n)!/(m!n!) words that may be formed by interleaving the letters of u with those of v while keeping the letters in the same order as they appear in u and v. For instance kℓ ¡ m = kℓm + kmℓ + ℓkm, ℓ ¡ ℓ = ℓ + ℓ = 2ℓ, etc. More formally, the shuffle product of words may defined recursively by the relations [41, Section 1.4] The last equality corresponds to the fact that the words arising from shuffling uℓ and vm necessarily end with either the last letter of uℓ or the last letter of vm. Note that for words u, v ∈ W, the shuffle u ¡ v is in general not a word but an element of the space R A of linear combination of words. By linearity, the shuffle product may be trivially extended to a commutative, associative product in R A ; for instance At this stage we introduce some additional notation that will be used frequently below. If S ∈ R A is a series and p = w p w w ∈ R A , we set the sum is well defined because only a finite number of coefficients p w are = 0. In the case where p coincides with a word w, (S, w) is just the coefficient S w ; for general p, (S, w) is a linear combination of coefficients S w . Obviously (·, ·) is a real-valued bilinear map. With this notation, we may present the following result (that generalizes the formula for D kℓm (χ 1 · χ 2 ) above).

Proposition 9 For any S ∈ R A and any pair of observables
Proof: It is sufficient to prove the case where S coincides with a word. The proof is by induction on the length (number of letters) of the word (not to be confused with its weight). When S is the empty word the result is trivial because, necessarily, in the We assume that the result is true for the word w and prove it for the longer word wℓ. Since D ℓ is a first-order differential operator we may write so that, by the induction hypothesis, The proof concludes by observing that, when (wℓ, u ′ ¡ v ′ ) is = 0, i.e. when wℓ is one of the words arising when shuffling u ′ and v ′ , the last letter in wℓ must be either the last letter of u ′ or the last letter of v ′ , so that either u ′ = uℓ or v ′ = vℓ.
Since, clearly we may write (48) This leads trivially to next result: Proposition 10 Consider a series S ∈ R A , S = 0. The series of operators D S is multiplicative if S ∅ = 1 and for each pair of words u, v ∈ W, the so-called shuffle relation holds.
Thus the shuffle relations are equations that link the different coefficients S w , w ∈ W. For instance, from the shuffle ℓ ¡ ℓ = 2ℓℓ, ℓ ∈ A, we have the shuffle relation S 2 ℓ = 2S ℓℓ and, from the shuffle k ¡ ℓ = kℓ + ℓk, S k S ℓ = S kℓ + S ℓk .
Proposition 10 in tandem with the following result give a new proof of the multiplicativity of D J that we pointed out above.

Proposition 11
The Stratonovich iterated integrals J w (t 0 + h; t 0 ) satisfy the shuffle relations.
Proof: For the shuffling of two letters ℓ, m ∈ A, the integration by parts formula is a statement of the shuffle relation J ℓ J m = J mℓ + J ℓm (recall that if ℓ or m are not stochastic, then B ℓ (t) = t or B m (t) = t respectively). General shuffles are dealt with by induction based on the recursive definition of the shuffle product in (46) and the recursion (9) for the iterated integrals.
To present a similar result for the integrator we need a lemma: Proof: Recall that the coefficients of ST are given by the convolution product as in (33), which is based on deconcatenation. The result is a consequence of the following observation: the deconcatenation of the words in a shuffle u ¡ v may be found by shuffling the deconcatenations of u and v. An example of this observation follows.
By using the observation, (ST, u ¡ v) may be written as a sum of products or, since S and T satisy the shuffle relations, Proposition 13 For a splitting integrator for the Stratonovich system (3) the coefficients J w (t 0 + h; t 0 ) satisfy the shuffle relations.
Proof: The proof is a trivial consequence of the lemma, because D J is a composition of solution operators D Ji associated with the split systems and therefore, by the preceding proposition, a composition of operators that satisfy the shuffle conditions.
After the last two propositions, it is easy to see that the Stratonovich strong order conditions are not independent. For instance from the shuffle relations J ℓ (t 0 +h; t 0 ) 2 = 2J ℓℓ (t 0 +h; t 0 ) and J ℓ (t 0 +h; t 0 ) 2 = 2 J ℓℓ (t 0 +h; t 0 ), we conclude that the strong order condition J ℓℓ (t 0 + h; t 0 ) = J ℓℓ (t 0 + h; t 0 ) corresponding to the word ℓℓ is fulfilled if the strong order condition J ℓ (t 0 + h; t 0 ) = J ℓ (t 0 + h; t 0 ) holds. Analogously, if k = ℓ the order condition for kℓ is implied by those of ℓk, k and ℓ, etc. It is possible to obtain independent order conditions by keeping only the conditions corresponding to the so-called Lyndon words [20] that we describe next. We order the alphabet A and then order words lexicographically; a Lyndon word is a word that is strictly smaller than all the words obtained by rotating its letters. If the alphabet is A = {a, A} and a < A, then aA is a Lyndon word because it precedes the rotated Aa. Similarly aaA is a Lyndon word while aAa and Aaa are not. For this simple alphabet, the Lyndon words with three or fewer letters are a, A, aA, aaA, aAA; their order conditions are independent and imply, via the shuffle relations, the order conditions for aa, AA, aaa, aAa, Aaa, AaA, AAa and AAA.
For reasons of brevity, the independence of the Stratonovich weak order conditions will not be discussed in this paper.

The quasishuffle product
As we noted above, the shuffle property of the Stratonovich iterated integrals stems from the formula of integration by parts in (49). For the Ito calculus, formula (49) has to be replaced by where the last term represents the quadratic covariation (see e.g. [2,Chapter 5]). If ℓ = m ∈ A sto , then the quadratic covariation in (50) is h; for all other combinations of letters the quadratic covariation vanishes.
The quasishuffle product ⊲⊳ to be defined presently is such that for any two letters ℓ, m ∈Ā, the computation of ℓ ⊲⊳ m mimics the integration by parts relation (50). In combinatorial algebra, the definition of a quasishuffle product depends on the choice of a so-called bracket [·, ·]; different brackets lead to different quasishuffle products as defined by Hoffman [24]. Throughout this paper we only work with one fixed choice of bracket defined as follows. For letters ℓ, m ∈Ā, [ℓ, m] takes the valueĀ ∈ R Ā if ℓ = m = A ∈ A sto ; [ℓ, m] = 0 ∈ R Ā in all other cases. Then the quasishuffle product of words u ⊲⊳ v ∈ R Ā is defined recursively by and uℓ ⊲⊳ vm = (uℓ ⊲⊳ v)m + (u ⊲⊳ vm)ℓ + (u ⊲⊳ v)[x, y], u, v ∈ W, ℓ, m ∈Ā.
In the particular case u = v = ∅, the last relation yields ℓ ⊲⊳ m = ℓm + mℓ + [ℓ, m], a transcription of (50). The next four results are counterparts of Propositions 9-13. The bilinear form (·, ·) in (47), which we defined in R A × R A , is now extended to R Ā × R Ā .

Proposition 15 For any S ∈ R Ā and any pair of observables
Proof: One may use the same technique as in Proposition 9. Here the proof is lengthier because it has to contemplate the possibility ℓ =Ā, A ∈ A sto in which case D ℓ is a second order operator.

This yields immediately:
Proposition 16 Consider a series S ∈ R Ā , S = 0. Then the series of operators D S is multiplicative if S ∅ = 1 and for each pair of words u, v ∈ W, the quasishuffle relation holds.
The proofs of the following propositions are similar to those of Propositions 11 and 13 respectively.

Proposition 17
The the Ito iterated integrals I w (t 0 + h; t 0 ) satisfy the quasishuffle relations.
Proposition 18 For a splitting integrator for the Ito system (4), the coefficients I w (t 0 + h; t 0 ) satisfy the quasishuffle relations.
The last two propositions show immediately that the Ito strong order conditions are not independent. The dependence between the Ito weak order conditions will be discussed after Proposition 25.

Concatenating Chen series
The shuffle (quasishuffle) relations constrain the values of Stratonovich (Ito) iterated integrals corresponding to different words but based on a common interval (t 0 , t 0 + h). Iterated integrals corresponding to adjacent intervals are also interrelated, as we now discuss.
Solution operators of Stratonovich or Ito systems satisfy From here we get the following relations between series of operators D J(t1;t0) D J(t2;t1) = D J(t2;t0) D I(t1;t0) D I(t2;t1) = D I(t2;t0) , t 2 ≥ t 1 ≥ t 0 ; the corresponding relations between elements of R A or R Ā (Chen series) are J(t 1 ; t 0 )J(t 2 ; t 1 ) = J(t 2 ; t 0 ); I(t 1 ; t 0 )I(t 2 ; t 1 ) = I(t 2 ; t 0 ), (51) The equalities in (51) are, in view of (33), a family of relations between iterated integrals first noted by Chen [16] in the case where there are no stochastic letter. For instance, for words with two letters: etc. These relations may alternatively be proved by manipulating the integrals, without going through the series of differential operators as above.

Composing word series
We conclude our study of the shuffle and quasishuffle products by showing that, in some circumstances, the composition W T (W S (x)) of two word series is another word series.
Let us begin with the Stratonovich case. If χ is an observable and w ∈ W, then D w χ is a sum of terms each of which is a derivative χ (s) (x) acting on combinations of derivatives of the functions f k , k ∈ A. A simple example is: Here, the word ℓm may have weight 1, 3/2 or 2 depending of whether ℓ and m are stochastic of deterministic; the thing to observe is that in each term of the right-hand side of the last equality the f k k ∈ A, that appear have a combined weight that matches the weight of ℓm. If S ∈ R A and D S is the corresponding series of differential operators we may arrange D S χ by grouping the terms where the combined weight of the f k that appear is successively 0, 1/2, 1, 3/2, etc. On the other hand if W S (x) is the associated word series and S ∅ = 1 so that W S (x) − x = O(1/2), we may Taylor expand as follows Here the right-hand side may be arranged, as we did in the case of D S χ, by grouping the terms where the combined weight of the f k that appear is successively 0, 1/2, 1, 3/2, etc. This arrangement may be carried out because [W S (x) − x] r only contributes terms of combined weight ≥ r/2 and therefore for each weight there is only a finite number of terms to be grouped. It turns out that if S is multiplicative the expansions of D S χ(x) and χ(W S (x)) coincide.

Proposition 19
Suppose that S ∈ R A has S ∅ = 1 and satisfies the shuffle relations. Then for any observable χ, the expansion of χ(W S (x)) coincides with D S χ(x).
Proof: If χ is one of the coordinate mappings x → x i , then the result is true because, by definiton, the i-th component of the word-basis function f w is obtained by applying D w to the i-th coordinate mapping. If χ is a product of coordinate mappings, the result holds because D S acts multiplicatively. By linearity the result is true if χ is a polynomial. Then the result hold for smooth χ because it holds for the Taylor polynomials of any degree of χ around any base point x.
As a direct consequence we may state: Proposition 20 Suppose that S ∈ R A has S ∅ = 1 and satisfies the shuffle relations. Then for any T ∈ R A , W T (W S (x)) coincides with the words series W ST (x).
Proof: It is enough to note that, for each word basis function f w (x) = D w id(x), according to the preceding proposition, f w (W S (x)) has the expansion The Ito case is completely parallel; the only change is that S ∈ R Ā has to be demanded to satisfy the quasishuffle relations rather than the shuffle relations.
In fact the computations leading to (34) or (36) are instances of the composition just described.

Infinitesimal generators
It is well known that the infinitesimal generators of (3) or (4) play an important role in the study of these systems, see e.g. [38,Section 2.5]. In this section those generators are described in the language of words. The material has an important implications for the weak order conditions. We begin with Ito systems.

The Ito generator
For system (4), we consider the linear combination of deterministic letters and define the exponential exp(hG) ∈ R Ā , h ∈ R, as the series where the powers are based on concatenation, e.g. (note that the right-hand side is simply the sum all the words consisting of two deterministic letters fromĀ). The operator D G is the infinitesimal generator of (4), a linear combination of first and second order differential operators.

Proposition 21
The expectations of the Ito iterated integrals are given by EI w (t 0 + h; t 0 ) = 0 if w ∈ W has at least one stochastic letter and EI w (t 0 + h; t 0 ) = h n /n! if w ∈ W consists of n deterministic letters. The following relation holds: For any observable and h > 0, where x(t) solves (4) with x(t 0 ) = x 0 and the expectation is conditional on x 0 .
Proof: For the first claim we recall that the expectation of Ito integrals vanishes. In addition it is trivially computed that, when all the letters in a word are deterministic, I w (t 0 + h; t 0 ) = h n /n!, where n represents the number of letters. By expanding exp(hG as a series, one sees that the second claim is just a reformulation of the first. An alternative proof of this second claim is as follows. As noted before (Proposition 3), the distribution of the random variable I(t 0 + h; t 0 ) is independent of t 0 and therefore we may write EI(t 0 + h; t 0 ) = EI(h). The functions exp(hG) and EI(h) coincide at h = 0, where they take the common value ∅. By taking expectations in (51), we find the semigroup relation E(h 1 + h 2 ) = EI(h 1 ) EI(h 2 ) for h 1 , h 2 ≥ 0. Differentiating with respect to h 1 and then setting h 1 = 0, h 2 = h yields the linear, constant coefficient differential equation (d/dt)E(h) = [(d/dh)EI(0)] EI(h). 1 On the other hand, a straightforward computation leads to (d/dh) exp(hG) = G exp(hG), and the proof of the second statement concludes by noting that (d/dh)EI(0) = G since EI w (h) = o(h) as h ↓ 0 if w has length > 1 and all its letters are deterministic.
For the last claim, just take expectations in (17).

Remark 22
The preceding proposition and the quasishuffle relations among the I w (Proposition 17) make it possible to compute all the moments of the Ito iterated integrals, as first suggested by Gaines [20]. The easiest example is given by the relation A ⊲⊳ A = 2AA +Ā that leads to I 2 A = 2I AA + IĀ; according to the proposition the expectation of the right-hand side equals 0 + h and therefore EI 2

Weak order conditions in the Ito case
We now turn to the series of expectations associated with a splitting integrator specified by the pullback series In general, the equality will not hold because the I (i) (t 0 + d i h; t 0 + c i h) are not independent. However, as it will shortly become clear, (52) will typically be satisfied. We first present some examples that will help to understand the situation.
Assume that the alphabet A consists of two letters a and A. Choose a partition of the interval [0, 1] while the deterministic f a may act on any set of intervals. In this case (52) holds because the Brownian motion B A acts on nonoverlapping intervals. This example may be easily extended to the case where there are additional deterministic fields f b , f c , . . . ; in the split systems some of them could be grouped with f a and some of them grouped with f A .
As a second example, assume that A = {A, B} and use Strang's splitting with f A acting first. Here I (1) and I (3) are independent because their intervals do not overlap, while the pairs I (1) , I (2) and I (2) , I (3) are independent because they use independent Brownian motions. Again this example may be easily generalized by adding additional deterministic and/or stochastic letters.
We have the following general result: Lemma 24 Assume that A sto = ∅ so that (4) does not degenerate into a deterministic differential equations. If a splitting integrator for (4) has strong order > 0 (i.e. ≥ 1/2, then (52) holds.
Proof: As noted at the end of Section 5, the Ito strong order conditions with µ = 1/2 must be satisfied. Now for each A ∈ A sto the strong order condition corresponding to A, shows that j I Schemes that satisfy (52) have the special properties that we study next. To begin with, Lemma 12 clearly implies: Proposition 25 For splitting integrators for (4) that satisfy (52), the expectations coefficients E I w (t 0 + h; t 0 ) satisfy the shuffle conditions.
In turn this result and Proposition 23 show that the weak order conditions are not independent when (52) holds. For instance the weak order condition for ℓℓ is implied by the weak order condition for ℓ ∈ W , since, as noted repeatedly, ℓ ¡ ℓ = 2ℓℓ.
In the next proposition we need the deterministic system: obtained by replacing the differentials dB A in the Ito system (4) by dt. It is clear that each splitting algorithm for (4) defines a splitting algorithm for (53) and vice versa.
Proposition 26 For splitting integrators for (4) that satisfy (52) and in the general vector field scenario, the following properties are equivalent: • The weak order conditions (45) hold for a positive integer σ.
• When applied to the deterministic system (53), the integrator has local error O(h σ+1 ).
Proof: From (52) and Proposition 21 where the G (i) are the generators of the partial systems and therefore sums of deterministic letters. Condition (45), requires that, in the series in the last display, the terms corresponding to words with ≤ σ letters coincide with those of EI(t 0 + h; t 0 ) = exp(hG).
To study the order for (53) we may also use words seeing a deterministic system as the particular case of Ito system where there is no stochastic letter. If we denote byĀ the (deterministic) letter associated with the field f A , we then have and and order σ requires that the terms involving words with σ or fewer letters in the series in the last two displays coincide.
The following counterexample shows that, in the last two propositions, hypothesis (52) cannot be dispensed with. For the alphabet A = {a, A}, we consider the integrator While this is admittedly a contrived example, using the interval [t 0 , t 0 + h/2] to finish the step (rather than the more natural [t 0 + h/2, t 0 + h]) may have some appeal. On the one hand the distribution of the iterated integrals in [t 0 , t 0 + h/2] is the same as that in [t 0 + h/2, t 0 + h]) and, on the other hand, working twice with [t 0 , t 0 + h/2] may make it possible to reuse Brownian increments. For this integrator the hypothesis (52) does not hold. A simple computation, similar to that preceding (36), yields (the iterated integrals in the right-hand side are over [t 0 , t 0 + h/2]). We note in relation with Lemma 24 that here the order condition for A is obviously not satisfied. Taking expectations in the last display, Since 0 2 = 2×h/2, for the expectations, the shuffle relation corresponding to A¡A = 2AA does not hold. On the other hand, from Proposition 21, so that weak order conditions for σ = 1 are not satisfied. In the deterministic case the algorithm coincides with Strang's splitting with local errors O(h 3 ) (i.e. σ = 2). Thus the weak order does not coincide with the deterministic order. It turns out that, in the general system scenario, under (52), there is an order barrier: the weak order cannot be better than σ = 2.
Proof: By contradiction. As noted at the end of Section 5, the Ito weak conditions with σ = 3 holds. From Proposition 26 the algorithm is of order ≥ 3 for deterministic problems, which is known to be contradictory with the condition c ij < d ij [3].

Remark 28
In the deterministic case this order barrier may be overcome by using complex coefficients; a full discussion of the relevant literature may be seen in [4,Section 6.3.3]. To our best knowledge complex coefficients have not yet been tested in the stochastic scenario.

The Stratonovich generator
We briefly outline how the preceding material has to be modified in the Stratonovich case. The expression for the generator is and, in analogy with Proposition 21, we have a formula that may be proved by showing, as in the Ito case, that the left-and righthand sides satisfy the same initial value problem. As a consequence, one obtains the following formula for the expectation of observables: Taking the coefficient of the word w ∈ W in (54) gives the value of the expectations of the iterated integrals. Clearly EJ w (t 0 + h; t 0 ) = 0 if w is not a concatenation of deterministic letters a ∈ A det and pairs AA, A ∈ A sto (examples include AAA or ABAB if A = B). When w is such a concatenation, it is easily shown that where π(w) is the number of pairs that enter in the concatenation (for instance for AAaBBAA, π = 3 and for AAAA, π = 2). Once the expectations EJ w (t 0 +h; t 0 ) are known, the shuffle relations in Proposition 10 may be used to compute higher moments of the iterated integrals, similarly to what was explained in Remark 22. As distinct from the EI w , w ∈ W, studied in Proposition 23, the EJ w , w ∈ W, do not satisfy the shuffle relations (except of course in the degenerate case where A sto = ∅).
For integrators that satisfy the obvious analogue of (52), Proposition 26 also holds in the Stratonovich case and therefore the order barrier in Theorem 27 also applies to the Stratonovich interpretation.

Relating the Stratonovich and Ito interpretations
In this paper, the Stratonovich and Ito theories have been developed in parallel. It is well known that it is actually possible to map one into the other and we now present how to do so by means of words.

Relating the Stratonovich and Ito iterated integrals
Along with the extended alphabetĀ that we used to carry out the Ito-Taylor expansion, let us now consider a new alphabet A ⋆ that consists of all the deterministic letters a ∈ A det , all the stochastic letters A ∈ A sto and, in addition, a deterministic letter A ⋆ associated with each A ∈ A sto . After setting •dB ℓ (s) = ds for all deterministic letters, we may define, via (9), Stratonovich iterated integrals J w for each w ∈ W ⋆ , where W ⋆ denotes the set of words for the alphabet A ⋆ . Note that this set of iterated integrals is different from that used to write the Stratonovich-Taylor expansion in (10)- (12) because W ⋆ is a larger set than W. With the J w , w ∈ W ⋆ , we construct the Chen series The results in this section require the use of two mappings θ and ρ that we introduce now. We define θ : R A ⋆ → R Ā as follows. For letters, we set θ(a) = a for a ∈ A det , θ(A) = A for A ∈ A sto and θ(A ⋆ ) =Ā − (1/2)AA for A ∈ A sto . For words, we set θ(∅) = ∅ and θ(ℓ 1 . . . ℓ n ) = θℓ 1 · · · θℓ n . We note that, for each w ∈ W ⋆ , θ(w) is a linear combination of words of weight w . Finally, we set θ( w S w w) = w S w θw. Clearly θ is linear and in addition is an algebra morphism, i.e. maps the concatenation S 1 S 2 into the concatenation θ(S 1 )θ(S 2 ).
We next define a bilinear mapping R A ⋆ × R Ā → R as in (47) and define ρ : R Ā → R A ⋆ by demanding θ(S), p = S, ρ(p) for each S ∈ R A ⋆ and each p ∈ R Ā ; thus ρ is the linear map obtained from θ by transposition with respect to (·, ·). As an example of the computation of ρ, let us find ρ(AA). The maps θ and ρ have been defined so that they encapsulate the relation between Ito and Stratonovich integrals, as shown in the next result, where the first formula expresses each Ito iterated integral as a linear combination of Stratonovich iterated integrals (cf. formula (8) in [20]).
As a consequence, θ maps the (Stratonovich) Chen series J ⋆ into the (Ito) Chen series I = w∈W I w (t 0 + h; t 0 )w.
Proof: The equality in (55) clearly holds if w is empty or consists of a single stochastic letter. Suppose that it holds for all words with weight ≤ N , N ≥ 1/2, and consider a word of of weight N + 1/2, which we write in the form wkℓ. Assume first that k = ℓ = A for some A ∈ A sto . By the recurrence relation between iterated integrals, we have I wk (s) • dB ℓ (s).
As a simple instance of (55) we note that, from the relation ρ(AA) = AA − (1/2)A ⋆ found above, we get I AA = J AA − (1/2)J A ⋆ , i.e. the well-known relation

The equivalence Ito-Stratonovich
Proposition 29 links the Chen series J ⋆ and I. We investigate next the link between the corresponding series of differential operators. Recall that, associated with each The dual vector space of H sh (A) may be identified with the vector space of formal series R A via the bilinear form (47). In other words, the linear form on H sh (A) that as w ranges in W associates with w the real number S w is identified with S w w. With this identification, the concatenation product of series S ∈ R A , or equivalently the product (33) for the coefficents, coincides with the convolution product in the dual of the Hopf algebra. Series S with S ∅ = 1 that satisfy the shuffle relations are then the linear forms on H sh (A) that are multiplication morphisms (i.e. preserve multiplication). The set of those linear forms forms is well known to be a group for the convolution product; this group is called the shuffle group and denoted G sh (A). Therefore Lemma 12 is just the statement that the convolution product of two elements in G sh (A) lies in G sh (A).
The quasishuffle Hopf algebra H qsh (Ā) is constructed similarly. One endows the vector space R Ā with the quasishuffle product and the deconcatenation coproduct. The series S ∈ R Ā with S ∅ = 1 that satisfy the quasishuffle relations may then be viewed as forming the quasishuffle group G qsh of linear forms on H qsh (Ā) that are multiplication morphisms.
Theorem 2.5 in [24] shows that the mapping ρ is an isomorphism of H qsh (Ā) onto H sh (A ⋆ ). In particular it maps the quasishuffle product into the shuffle product: This observation and the material in Section 8 make clear that the quasishuffle/Ito results in Propositions 15-18 may be derived from the corresponding shuffle/Stratonovich results by transforming ¡ into ⊲⊳ with the help of the inverse isomorphism ρ −1 .