Unconditional uniqueness for the derivative nonlinear Schr\"odinger equation on the real line

We prove the unconditional uniqueness of solutions to the derivative nonlinear Schr\"odinger equation (DNLS) in an almost end-point regularity. To this purpose, we employ the normal form method and we transform (a gauge-equivalent) DNLS into a new equation (the so-called normal form equation) for which nonlinear estimates can be easily established in $H^s(\mathbb{R})$, $s>\frac12$, without appealing to an auxiliary function space. Also, we prove that low-regularity solutions of DNLS satisfy the normal form equation and this is done by means of estimates in the $H^{s-1}(\mathbb{R})$-norm.


Introduction
We consider the initial-value problem for the derivative nonlinear Schrödinger equation (DNLS) on the real line, i.e.
where u is a complex-valued unknown. This PDE arises as a model equation in plasma physics, see e.g. [35,29]. Moreover, since it is completely integrable [21] it has a rich structure (e.g. infinitely many conservation laws). From the analytical point of view, it poses interesting technical challenges due to the presence of the derivative in the nonlinear cubic term in the context of Schrödinger dispersion. The initial-value problem (1.1) has been intensely studied both for smooth, high-regularity (say, s ≥ 1) initial data [27,18,26,33] as well as for low-regularity initial data [36,6,7,28,15,34]. For the discussion of this section, it is relevant to recall the result of [36]: by using the Fourier restriction norm method (i.e. using X s,b spaces) and a gauge transformation (see e.g. [18]), Takaoka showed that DNLS is locally well-posed in H s (R), for s ≥ 1 2 . However, the uniqueness of solutions holds conditionally: for any u 0 ∈ H s (R), there exist T > 0 and a unique solution u ∈ C([−T, T ]; H s (R)) ∩ X T to (1.1), where X T is some auxiliary function space. In other words, for given initial data, the solution is guaranteed to be unique only in the subspace C([−T, T ]; H s (R)) ∩ X T .
1.1. Main result. In this paper, we study the uniqueness of low-regularity solutions to DNLS. In particular, we are preoccupied to establish the unconditional uniqueness of solutions to (1.1) in H s (R), for s < 1.
Generally speaking, provided that we can make sense of the nonlinearity (as a distribution) without assuming that the solution belongs to some auxiliary function space X T , we establish the unconditional well-posedness for a given PDE by removing the auxiliary function space from the uniqueness statement of its well-posedness theory.
Our main result is the following: Theorem 1.1. Let s > 1 2 . Then, DNLS is unconditionally (locally) well-posed in H s (R).
The unconditional well-posedness is a notion of well-posedness that does not depend on how the solutions were constructed. It was Kato [20] who first studied the issue of whether or not one can remove an auxiliary function space from the well-posedness statement for the nonlinear Schrödinger equation and thus strengthen its uniqueness property. Since then, the uniqueness of solutions for various other nonlinear dispersive PDEs was investigatedsee e.g. [5,12,14,22,25,42].
The proof of Theorem 1.1 is based on the normal form approach to unconditional wellposedness of Kwon, Oh, and Yoon [25], where it was developed an infinite iteration scheme of normal form reductions in an abstract form for nonlinear dispersive PDEs on the real line. This approach builds upon previous works [14,24] where the normal form method was applied to PDEs with periodic boundary conditions. In addition, we also rely on the abstract variation of the normal form method due to Kishimoto [22].
It is worthwhile mentioning here that the method of normal form reductions has other uses besides proving unconditional uniqueness. For example, it has been used by Oh and Wang [32] to exhibit energy estimates in negative Sobolev spaces for the periodic fourth order NLS with cubic nonlinearity. Also, by combining the normal form reductions idea with X s,b -analysis, Erdogan and Tzirakis [8] proved a nonlinear smoothing property for the periodic Korteweg-de Vries equation, and more recently for DNLS on the real line by Erdogan, Gurel, and Tzirakis [8].
In the following we describe the normal form approach for DNLS on the real line.
1.2. The normal form method for DNLS. As in the work of Takaoka [36], we have to use a gauge transformation 1 (i.e. a nonlinear change of variable u → w) in order to remove the nonlinearity 2i|u| 2 ∂ x u from (1.1). See also Remark 3.3. This transformation changes favorably the cubic nonlinearity but introduces a (pure-power) quintic term. Therefore, we begin with the following gauged DNLS (see Section 2): where I (with 0 ∈ I) is a time interval on which a solution u to (1.1) exists. By setting v(t) = e −it∂ 2 x w(t) (the interaction representation of w), one can rewrite the gauged DNLS 1. More recently, Pornnopparath [34] showed the local well-posedness of (1.1) in H s (R), s ≥ 1 2 , without using a gauge transformation. In fact, the same result is shown to hold for a more general nonlinearity than in (1.1), namely a generic polynomial in (u, u, ∂xu, ∂xu) where all monomials have degree ≥ 3 and at most one derivative.
x v(t), (1.3) where F denotes the Fourier transform in the spatial variable, and the modulation function Φ(ξ) is given by Thanks to the algebra property of H s (R), s > 1 2 , we may focus our attention to the cubic nonlinearity T (v). Indeed, the quintic term Q(v) can be estimated easily: Such an estimate clearly does not hold for T (v) due to the presence of the spatial derivative ("the derivative loss issue"). Hence, we proceed to iteratively substitute this nonlinearity with (infinitely many) terms which are easily controlled in the H s (R)-norm. Let us take the spatial Fourier transform of (the Duhamel formulation of) (1.3) and we formally integrate by parts in the temporal variable to obtain: v(t, ξ) = v(0, ξ) − (1.4) We first note that we aim to overcome the derivative loss issue of T (v) by exploiting the denominator Φ(ξ) after such an integration by parts step, at least in an integration region where the modulation function Φ(ξ) is large (i.e. "away from resonant" contribution to T (v)). On the other hand, when the modulation function Φ(ξ) is in a neighborhood of 0 (i.e. "almost resonant" contribution to T (v)), the denominator would actually work against us, being impossible to handle the terms appearing in (1.4) directly in the H s (R)-norm. In our analysis we distinguish two cases, namely (i) the almost resonant case: |Φ(ξ)| ≤ N and (ii) the away from resonant case: |Φ(ξ)| > N , for some suitably large threshold N = N ( v 0 H s ). In the case (i), thanks to the restriction on the modulation, we can directly estimate the contribution of T (v) from (1.3) in H s (R), s > 1 2 (see Corollary 3.5). In the integration region (ii), we proceed to perform the integration by parts as in (1.4).
In view of (1.3), the second integral in (1.4) can be written as the sum of quintic and septic terms. Indeed, by assuming that the temporal derivative falls on the first factor, the second integral in (1.4) can be essentially written as where Φ(ξ 1 ) := Φ(ξ 1 , ξ 11 , ξ 12 , ξ 13 ). Although we have an H s (R)-estimate for the last term in (1.5), the contribution due to T (v) (i.e. the quintic term in (1.5)) suffers from the same derivative loss issue as T (v) itself. The idea now is to repeat the previous two-steps iteration. First, we split the domain of the second integral in (1.5) again into (i) the almost resonant case: |Φ(ξ) + Φ(ξ 1 )| ≤ N 1 where we can establish an H s (R)-estimate and (ii) the away from resonant case: |Φ(ξ) + Φ(ξ 1 )| > N 1 . We then integrate by parts only in (ii) and exploit the gain of the denominator Φ(ξ) + Φ(ξ 1 ) (the price paid being additional nonlinearities of higher degrees). It turns out that it is helpful to chose the threshold N 1 ∼ |Φ(ξ)| and we point out that at this stage we have as well |Φ(ξ)| > N . Regarding the two left out terms, namely when the time derivative falls on the kth factor (k = 2, 3), we mention here that the factor e i(Φ(ξ)+Φ(ξ 1 ))t ξ 1 ξ 12 above changes to e i(Φ(ξ)+Φ(ξ k ))t ξ 1 ξ k2 and that we use the same strategy as described above.
After J iterations we derive the following equation and the nonlinearity T (v) is passed on to the next iteration. In comparing (1.3) with (1.6), notice that we have replaced the nonlinearity T (v) by several terms whose origin (at iteration j) we briefly explain here: the T (j) 0 (v) term denotes the boundary terms that appear when integrating by parts, T (j) Q (v) stands for the terms corresponding to replacing T ,1 stands for the terms corresponding to replacing ∂ t v by T (v) followed by restricting the appropriate modulation function to the almost resonant case, and finally T (J+1) T (v) is "the remainder term" which is passed to the (J + 1)th iteration. Since ∂ t may fall on any of the factors of v, it becomes apparent that one has to manage the bookkeeping of terms (whose number grows facorially in J). We accomplish this by using the notion of ordered trees as in the work of the second author together with Kwon and Oh [25]. See also the paper by Christ [4] in which a precursor notion was used.
The key point to be made at this stage is that we manage to show that, for fixed N , as J ր ∞. While we do not have control of the remainder term in the H s (R)-norm, the remainder term vanishes in the limit in a weaker topology than the strong H s (R)topology (see Subsection 4.1). This fact together with H s−1 (R)-estimates similar to (1.7) (see Section 5) allow us to prove that any solution v ∈ C(I; H s (R)) to (1.3) necessarily satisfies (in H s (R)) the normal form equation: for all t ∈ I. The analysis for the equation (1.8) is simple: we apply a fixed point argument directly in the C(I; H s (R))-norm, without relying on extra harmonic analytic tools. Indeed, we can write all the nonlinear terms in (1.8) as iterated applications of a single trilinear form (3.1). Once we have the H s (R)-estimate for this simple trilinear form (Lemma 3.1), we obtain control of all the terms in (1.8) (see Section 3). This is a very efficient method to deal with the infinite series of nonlinearities and it was applied before in [22,23,25]. Showing (1.7) also relies on this idea; however, for this purpose one needs two "building blocks", namely the H s−1 (R)-estimates of ∂ t v for v ∈ C(I; H s (R)) solution to (1.3) and of a second trilinear form (in an "away from resonant" integration region). See Corollary 4.2 and Lemma 4.4. For an exposition of this idea we refer the reader to the report paper by Kishimoto [22] (in particular, see the meta-theorem [22,Theorem 1]).
In summary, the method applied in this work is antipodal to that of the Fourier restriction norm method (as applied by Takaoka [36] for DNLS): we first derive a complicated Duhamel formula, that is the normal form equation (1.8), after which the analytical part is simple. In contrast, one needs a more involved analysis when using the X s,b -norms (i.e. the Fourier restriction norm method) given by on the simple Duhamel formula of (1.2). For a similarity, notice that the interaction representation of w(t) also plays a role in the Fourier restriction norm method. In the "denominator games" specific to the Fourier restriction norm method, one essentially overcomes the derivative loss issue with a denominator |Φ(ξ)| b with b ≈ 1 2 . In the method employed here, due to the integration by parts (see e.g. (1.4)), we benefit from a full power |Φ(ξ)|.
Finally, we emphasize that the proviso for the scheme of infinite iterations of normal form reductions to work is showing that the remainder term vanishes in the limit. In some sense, this represents the heavier analytical part of this method, namely identifying some weaker norm than the C(I; H s (R))-norm in which one can get (1.7).

Comments and remarks.
For DNLS on the real line, Yin Yin Su Win [38] established its unconditional well-posedness in the energy space, i.e., for s = 1. Indeed, by modifying the X s,b -multilinear estimates in [36], the author of [38] showed the uniqueness of solutions to DNLS in X and thus it must be unique. This strategy does not work for s < 1 because the key trilinear estimate is known to fail in X s,b with s < 1 2 , for any b ∈ R (see [36,Proposition 3.3]). For DNLS on the torus, Kishimoto [23] proved its unconditional well-posedness in H s (T), for s > 1 2 . In addition to [25], our implementation of the infinite iteration of normal form reductions to prove Theorem 1.1 follow ideas presented in [22,23], specifically in making use of the trilinear forms T Φ and T w |Φ|>M in Sections 3 and 4. In contrast, in [25] (handling the cubic NLS and mKdV equations on the real line in Sobolev spaces) and in [10] (handling the cubic NLS in almost critical spaces), the approach is to prove "strong and weak localized modulation estimates" (SLME and WLME) and then use more intricate thresholds to separate the almost resonant and away from resonant integration regions at each iteration. Although we can still prove a useful SLME for DNLS in order to establish the H s (R)-estimates for all nonlinearities in a normal form equation derived from DNLS, there seems to be no useful corresponding WLME.
Finally, we include here a corollary to Theorem 1.1 regarding the global well-posedness of DNLS. We recall that Colliander, Keel, Staffilani, Takaoka, and Tao [6,7] introduced the I-method and showed that it is in fact globally well-posed, provided that s > 1 2 and u 0 2 L 2 < 2π. Miao, Wu, and Xu [28] reached the end-point regularity s = 1 2 , under the same condition on the L 2 -norm of the initial data. The L 2 -norm threshold on initial data was improved 2 to u 0 2 L 2 < 4π by Guo and Wu [15] who showed global well-posedness of DNLS in H s (R), s ≥ 1 2 . Taking into account Theorem 1.1 and the result of [15], we obtain the following: Then, DNLS is unconditionally globally well-posed in H s (R).
Although we do not pursue the question of global well-posedness of DNLS in this paper, we would like to point out that above the mass threshold 4π, the question of whether all solutions to (1.1) extend globally in time is not settled for low-regularity initial data. We mention here two recent papers that are relevant to this question. First, for H 1 (R)-initial data, by using variational analysis of soliton solutions, Fukaya, Hayashi, and Inui [11] gave a sufficient condition for the global well-posedness of (1.1) covering the result of Wu [41]. Second, by using the inverse scattering method, Jenkins, Liu, Perry, and Sulem [26] (see also references therein) proved that all solutions started with initial data in the weighted exist for all times.

1.4.
Organization of the paper. In Section 2, we perform normal form reductions and transform the (gauged) DNLS equation into an equation which is more complicated algebraically, but simpler analytically. The proofs of the crucial estimates are given in Sections 3 and 4. In Section 5, we rigorously justify the various operations from Section 2 for rough solutions to DNLS. Finally, in Section 6 we put the pieces together and give the proof of Theorem 1.1.
1.5. Notation. We use A B to denote the estimate that A ≤ CB for some constant C which may vary from line to line and depend on various parameters. We use A ∼ B to denote the statement that A B A. We also use A ≪ B if A ≤ ǫB, where ǫ is a small absolute constant. For an integrable function f (x) with x ∈ R, we use the Fourier transform convention We denote S(t) = e it∂ 2 x the linear propagator for the linear Schrödinger equation ∂ t u = i∂ 2 x u. We include in Appendix A the notion of ordered trees and related terminology as introduced in [25] in order to make our paper self-contained.

The normal form equation
In this section, we formally derive a normal form equation for a so-called gauged DNLS equation. First, we use a gauge transformation to remove the nonlinear term 2i|u| 2 ∂ x u from the right-hand side of (1.1) at the expense of introducing a (pure power) quintic nonlinear term -see (2.2) below. Then, we apply an infinite iteration of normal form reductions to transform the gauged DNLS into a new equation involving infinite series of nonlinearities of arbitrarily high degrees. To this end, we employ the machinery developed in [25].
We use the following gauge transformation Notice that this is an autonomous transformation, i.e. it does not depend explicitly on the time variable. Thus, equation (1.1) is transformed into the gauged DNLS : This nonlinear transformation (2.1) goes back to the works of Hayashi [16] and Hayashi and Ozawa [17]. See also [27]. It is well known by now (see [36]) that the cubic nonlinearity with the derivative falling on the complex-conjugate factor can be handled using the Fourier restriction norm method, whereas the cubic term |u| 2 ∂ x u fails to have a useful estimate. It turns out that this is also the case when employing the normal form approach, namely we have to remove the bad nonlinearity before renormalizing the equation -see also Remark 3.3. We can transfer a well-posedness result on the gauged DNLS equation back to the original DNLS equation with the following: Next, we denote S(t) := e it∂ 2 x and we use the change of variable v(t) = S(−t)w(t) (the interaction representation variable). Then, the equation (2.2) becomes where we denoted the quintic and the cubic nonlinear terms respectively by: In what follows we exploit the oscillatory nature of the Fourier transform of T (v). With a slight abuse of notation 3 , let us introduce the trilinear operator T defined by where the phase is given by Notice that on the convolution hyperplane ξ = ξ 1 − ξ 2 + ξ 3 , we have Since it is determined by the linear part of the equation, the function Φ(ξ) is the same as the modulation function for the cubic NLS equation in [25], but the trilinear operator is different due to the presence of the derivative in the cubic nonlinearity.
Since for s > 1 2 , H s (R) is a Banach algebra, the quintic term can be estimated easily: Due to the derivative loss in the cubic term, T does not have a similar estimate in H s (R), even though s > 1 2 . Therefore we proceed to renormalize this nonlinearity by means of normal form reductions (NFR).
Remark 2.2. Throughout this paper, when the complex conjugate sign on v(ξ) does not play any significant role in the analysis, we drop the complex conjugate sign. Also, we often drop the complex number i and use 1 for ±1 and ±i.
2.1. The first step of NFR. The idea is to exploit the oscillatory factor of the convolution integral in (2.6), and so we apply integration by parts on a domain of integration where |Φ(ξ)| > N , for some threshold N > 1 to be chosen later. We first decompose where T 2 (v) is defined as T (v) (see (2.6) above), but the integration is further restricted to the domain Thanks to the modulation restriction, the term T 1 (v) enjoys a sufficiently good H s (R)estimate -see Lemma 3.1 below. For the remainder term T 2 (v), we apply differentiation 3. Note that when all the entries of the trilinear operator are the same, we write by parts 4 in order to renormalize it. To ease the writing, we drop the complex conjugate, the Fourier transform notation, and the complex constants of modulus one in front of the nonlinearities. We have: Let us start employing the ordered tree notation from Appendix A. At this stage, we can express everything in terms of T 1 , the sole ternary tree of the first generation. With can be written as follows: By using the product rule and supposing v is a smooth solution of (2.3), we get On the right side above, T Q (v) is the sum of three septic terms, corresponding to replacing T (v) is the sum of three quintic terms, Thus, if v is a smooth solution of (2.3), then it is also a solution of 4. Here, "differentiation by parts" means usual integration by parts (with respect to the time variable) in the Duhamel formulation of (2.3), without writing explicitly the time integration. In other words, where we set T (1) T ,1 (v) := T 1 (v) for the sake of consistency with subsequent NFR steps. It turns out that we can establish sufficiently good estimates for all of the nonlinear terms of (2.14), except for those in T (2) T (v). Therefore, we proceed to renormalize them.
2.2. The second step of NFR. For the sake of clarity, let us write T (2) T (v) defined in (2.13) first without appealing to the terminology of Appendix A, and then in the compact writing facilitated by the ordered trees notation: where µ 1 is the same as in the first step of NFR, i.e. µ 1 = Φ(ξ), and i.e. each term of the sum in (2.15) is split into two parts corresponding to further restricting the domain of integration to and its complement, respectively, where β 1 ≥ 2 is to be chosen later. By Lemma 3.9 below, we have H s (R)-estimates for the terms in T T ,2 (v), we apply differentiation by parts for all of its three terms. Thus by working with the ordered trees notation, we have 5 5. Given an ordered tree T2 with T1 denoting its first generation tree, for A1 ⊆ Ξ(T1), A2 ⊂ Ξ(T2), we define by a slight abuse of notation, A1 ∩ A2 := {ξ ∈ A2 : ξ|T 1 ∈ A1}. Inductively, this definition is generalized to higher generation ordered trees as follows: if TJ+1 is an ordered tree with chronicle {Tj } J +1 j=1 and Aj ⊆ Ξ(Tj ), j = 1, 2, . . . , J + 1, then By using the product rule and the assumption that v is a smooth solution of (2.3), we get The last term T T (v) is passed to the next step in the iterative procedure. As we believe the iterative procedure became clear, let us present the general step of normal form reductions.
2.3. The Jth step of NFR. We now write down the terms that appear in the Jth step of normal form reductions. We decompose T (J) and its complement, respectively, where β J−1 ≥ 2 is to be chosen later (See 3.8). After differentiation by parts and by using the equation (2.3), we are led to where the terms on the right-hand side are given by the following formulae: where we have sets F 1 := C 0 and F J : (2.20) We record the formula for the term T (v) appeared in the next step of NFR: where F J is defined above, and with β J ≥ 2 to be determined later.
These multilinear terms T (j) T ,1 , and T (j) 0 appear as a result of (j − 1)-many iterations of normal form reductions.

The estimates in the strong norm
We consider the trilinear operator T Φ defined by where Φ(ξ) is given by (2.7). We can prove the H s (R)-estimates for all higher order terms that appear in (2.23) once we establish the following lemma: Proof. By duality, the desired estimate follows once we prove that for any v 1 , . . . , v 4 ∈ L 2 (R) with v j ≥ 0 (1 ≤ j ≤ 4), where the multiplier is given by Without loss of generality, let us assume that |ξ 2 − ξ 1 | ≤ 1. Since ξ 1 ∼ ξ 2 and ξ 3 ∼ ξ , we have m(ξ) 1. Denote ζ := ξ 2 − ξ 1 = ξ 3 − ξ and thus by using Hölder's inequality, we get that For all of the remaining cases we assume that |ξ 2 −ξ 1 | > 1 and |ξ 2 −ξ 3 | > 1. Also, we note that the largest two frequencies necessarily have comparable sizes and that the multiplier m is symmetric in ξ 1 , ξ 3 .
We are using the following known fact: for any a, b ≥ 0 such that a + b > 1, with implicit constant independent of η ∈ R. Indeed, this follows immediately from Young's convolution inequality: a or b is zero, then (3.4) is trivially true). By Cauchy-Schwarz inequality (see, for example, [37, Lemma 3.7]), for (3.2), it is enough to show that for some mutually distinct 1 ≤ j, k, ℓ ≤ 4 (with the convention that ξ 4 = ξ). Indeed, by the Cauchy-Schwarz inequality with respect to dξ k dξ ℓ (with the index r such that {j, k, ℓ, r} = {1, 2, 3, 4}), where in the last step we used the Cauchy-Schwarz inequality with respect to dξ j and then (3.2) follows from (3.5) by possibly changing the order of integration on the right-hand side above (and taking into account the linear dependence ξ 4 = ξ 1 − ξ 2 + ξ 3 ). Next, we discuss several cases based on the frequency size of the derivative factor ∂ x v 2 .
In this case, Hence, we have , we note that whenever m(ξ) 1 (e.g. when min(|ξ 2 − ξ 1 |, |ξ 2 − ξ 3 |) ≤ 1 or when |ξ 1 | ∼ |ξ 2 | ∼ |ξ 3 |), our operator T Φ acts as the operator N 0 ≤M from [25] (with displacement parameter α = 0 and localization size M ∼ 1), and thus we can appeal to the arguments used therein. For the sake of completeness we have also included the argument for Case 1 in the proof of Lemma 3.1 above. Remark 3.3. Notice that in the above proof, the case when |ξ 2 | ∼ |ξ| ≫ |ξ 1 |, |ξ 3 | in Case 2 informs us why the derivative falling on the conjugate factor in the cubic nonlinearity v 2 ∂ x v can be handled: in the worst case scenario of the low×high×low → high frequency interaction, we can use the 1 2 -power of the modulation to cancel the factor ξ 2 in the numerator. This motivates the need to use the gauge transformation (2.1) to eliminate the nonlinearity 2|v| 2 ∂ x v from the right-hand side of (1.1).

Remark 3.4.
At the end-point regularity s = 1 2 , with minor changes in the proof, we can also obtain an estimate as in Lemma 3.1, but for T Φ defined by where ε > 0 can be taken arbitrarily small. However, in this case C = C(ε) ր ∞ as ε ց 0. This remark also applies to Corollaries 3.5 and 3.7, Lemmata 3.10, and 3.11, but not to Lemma 3.9.
In the proofs of the following lemmata, we freely use the Fourier lattice property of H s (R), i.e.
and thus we drop the modulus notation on factors such as v(ξ) (which henceforth we assume to be non-negative).
T ,1 = T 1 (v) given by (2.9), we have and therefore the estimate follows from Lemma 3.1.
For estimating the remaining nonlinear terms of (2.23), it is convenient to introduce the mapping S(T; · ) associated to an ordered tree T, say of generation J, which essentially applies the operator T Φ iteratively taking into account the structure of T. We define these mappings by the following bottom-up algorithmic procedure. Definition 3.6. Let J ≥ 1 and T ∈ T(J). We define the (2J + 1)-linear map S(T; · ) on space-time functions v j ∈ C(I; H s (R)) (1 ≤ j ≤ 2J + 1 = |T ∞ |) by the following rules.
(ii) For j = J, J − 1, . . . , 1, replace the jth root node r (j) by the trilinear operator T Φ whose arguments are given by the functions associated with its three children.
For such mappings, we have the following corollary which is a consequence of Lemma 3.1.
Corollary 3.7. Let s > 1 2 , J ≥ 1 and T ∈ T(J). Then where C is the constant given by Lemma 3.1.
Proof. It follows immediately by successively applying Lemma 3.1. Namely, we start with the root node r (1) of T and we move top-down on T. Since T is a tree of generation J, it has J many root nodes and thus we pick up the constant C J .
We are now ready to prove the estimates for all nonlinear terms of (2.23), which we treat in decreasing order of difficulty. given by (2.21) we have and thus Now fix T ∈ T(J + 1). We recall that the frequency support of T Hence, we have |µ 1 | > N , | µ j | > β j−1 | µ j−1 | for j = 2, . . . , J , and | µ J+1 | ≤ β J | µ J | .
Next, we consider the nonlinear terms coming as boundary terms when applying integration by parts with respect to the temporal variable in Section 2.  Now fix T ∈ T(J). We recall that the frequency support of T Hence, we have |µ 1 | > N , | µ j | > β j−1 | µ j−1 | for j = 2, . . . , J. As in the proof of Lemma 3.9, we have |µ j | ∼ | µ j | > b j−1 N for j = 2, . . . , J. Thus we have Therefore, by Corollary 3.7 and (3.14), we get For the difference estimate (3.12), an observation analogous to that in the proof of Lemma 3.9 applies and thus we obtain In the proofs of the following lemma, we skip the argument for the difference estimate altogether as the same ideas apply as for the difference estimate of Lemma 3.10.
Proof. The proof is similar to the proof of Lemma 3.10. We have where if b is the jth terminal node of T, we put Therefore, by Corollary 3.7, (2.8), and (3.17), For the difference estimate (3.16), an observation analogous to that in the proof of Lemma 3.9 (see also the proof of Lemma 3.10) applies and we take into account Remark 3.8.

The estimates in a weak norm
Here, we prove the estimates necessary to rigorously justify the normal form equation (2.23) for rough H s (R)-solutions of (2.3), which is done explicitly in Section 5. For this purpose, we have to be able to estimate ∂ t v, for v ∈ C(I; H s (R)) solution to (2.3).
It is clear that due to the derivative in the cubic nonlinearity, the estimate fails. However, if we weaken the norm in the left-hand side above, then we might be able to obtain an estimate satisfactory to our aims in Section 5. Hence, with the following lemma, we identify a family of Sobolev norms weaker than the H s (R)-norm which can serve as a weak topology used to justify the normal form equation (2.23).
Proof. By duality, the desired estimate follows once we show: We study the boundedness of this multiplier, distinguishing which two of the four frequencies are the largest. On the convolution hyperplane, it must be that the largest two frequencies are comparable. Also, by the symmetry of m 4 with respect to ξ 1 , ξ 3 , we may assume without loss of generality that |ξ 1 | ≥ |ξ 3 |.
As a consequence of Lemma 4.1 and (2.8), we have the following: Next, for M ≥ 1, we consider the trilinear operator T w |Φ|>M defined by where Φ(ξ) is given by (2.7).
Let us first prove the lemma for j = 1.
Since m 1 is not symmetric in ξ 1 , ξ 3 , we treat the following two subcases.
This finishes the proof for j = 1. Notice that the case j = 3 is symmetric to the case j = 1. It remains to discuss the case j = 2. In this case, by the symmetry of m 2 with respect to ξ 1 , ξ 3 , we may assume without loss of generality that |ξ 1 | ≥ |ξ 3 |. If ξ 2 ξ 1 , then it is easy to check that m 2 (ξ) m 1 (ξ) and thus (4.5) for j = 2 follows from (4.5) for j = 1. Now, let us assume that j = 2 and that ξ 2 ≫ ξ 1 . In fact, in this case, we have ξ ∼ ξ 2 ≫ ξ 1 ≥ ξ 3 which implies Φ(ξ) ∼ ξ 2 2 and for any M ≥ 1.
Proof. It is an immediate consequence of Lemma 3.1 taking into account that the multiplier of the operator T w |Φ|>M has an additional 1 2 -power of Φ(ξ) in the denominator as compared to the multiplier of T Φ and that in the domain of integration we have |Φ(ξ)| > M .
(ii) For j = J, J − 1, . . . , 1, replace the jth root node r (j) by the trilinear operator T w |Φ|>b j N/2 whose arguments are given by the functions associated with its three children.
We have the following immediate consequence of Lemmata 4.3 and 4.4.
We prove the desired estimate by moving top-down on T with a chronicle {T j } J j=1 . Starting with j = 1, if a j is a child of r (1) , then we just apply Lemma 4.3. Otherwise, T 1 has one child (and only one) that belongs to P (r (1) , r (k) ) which is r (k 2 ) ∈ π k 2 (T), 1 < k 2 ≤ k. So we use Lemma 4.3, placing the subtree with root node r (k 2 ) in the H s−1 (R)norm and the other two subtrees (possibly, it can be just one node) in the H s (R)-norm. In a similar manner, we continue to move down the path r (k 2 ) , . . . , r (k ℓ−1 ) , r (k) and each time we apply Lemma 4.3 analogously. For any subtree of T whose root node does not belong to {r (1) , r (k 2 ) , . . . , r (k ℓ−1 ) , r (k) }, we use Lemma 4.4 in chronological order. Notice that (modulo the constant C), the coefficient provided by the latter lemma is smaller than the one provided by the former. In the worst cases scenario (i.e. the tree is "linear" so that k = J, and P (r (1) , r (k) ) = r (1) , r (2) , . . . , r (J) ), we only apply Lemma 4.3 to pick up the coefficient with b j given by (3.7).

4.1.
Convergence to zero of the remainder term. Here, we argue that for fixed N > 1, the remainder term T  (v) given by Proof. The formula (2.19) for T . On the other hand, the same formula (2.19) can also obtained by replacing one v in T (J) 0 with T (v). More precisely, we can write where a j denotes the kth terminal node of T, and for simplicity, we put , v, . . . , v) .

Justification of the normal form reductions for rough solutions
In each step of the infinite iteration in Section 2 we performed normal form reductions (NFR) which relied on two formal operations which obviously hold if v is assumed to be a smooth solution to (2.3). Namely, (i) we applied the product rule when distributing the time derivative over products of several factors of v (see e.g. (5.3) below), and (ii) we switched the time derivative with integrals in spatial frequencies (see e.g. (5.4) below). In this section, we justify these operations for a rough solution v to (2.3).
Let s > 1 2 , θ = θ(s) = min{2s − 1, 1 2 }, and let I be an interval containing t = 0. Suppose that v ∈ C(I; H s (R)) is a solution to (2.3), namely it satisfies (in the sense of distributions) the Duhamel formula with Q, T as in (2.4), (2.5), respectively. By Lemma 4.2, we have v ∈ C 1 t I; H s−1 x (R) . With p, q ∈ (1, ∞) such that 2 p + 1 q = 1 and 1 2 − 1 q ≤ s − 1, by Hölder inequality and Sobolev embedding, we also have that . Note that the condition 1 2 − 1 p ≤ s is automatically satisfied. Therefore, we have . Note that all of the above estimates hold uniformly in t ∈ I. For the quintic term in (5.1), we immediatelly have H s x (R) . Moreover, by the Riemann-Lebesgue lemma, it follows that By taking the Fourier transform of (5.1), by Fubini's theorem, we get and by taking time derivative for fixed ξ ∈ R, we have for each (t, ξ) ∈ I × R. It follows that v ∈ C 1 t I; C ξ (R) . Here, we carefully justify that v is also a solution to (2.14), namely that the Duhamel formula is satisfied in the sense of distributions. Due to (5.2), it is immediate that the application of the product rule is justified for all t ∈ I and all ξ 1 , ξ 2 , ξ 3 ∈ R.
By Lemma 4.3, we deduce 6 that F ∈ C(I; H s−1 (R)) with Similarly, we have that G ∈ C(I; H s−1 (R)) since by Lemma 4.1 and Lemma 4.3, we have where in the last step we applied Lemma 4.2. Now fix t ∈ I and let ϕ ∈ S(R). By the Plancherel formula, we have By appealing to the Fourier lattice property of the Sobolev spaces H s−1 , H 1−s , to the Riemann-Lebesgue lemma and by using (5.6), we have and thus the dominated convergence theorem implies: 6. For the continuity in time of F , one uses the multilinear version of the estimate provided by Lemma 4.3.

Proof of Theorem 1.1
First, we summarilly go over the fixed point argument for (2.23) with prescribed initial data v(0) = v 0 ∈ H s (R), s > 1 2 . Integrating the limit equation (2.23) in time, we obtain the folllowing Duhamel formulation: Let us denote the right-hand side of (6.1) by Γ(v), and for simplicity we write C T H s instead of C([−T, T ]; H s (R)).
Having the estimates of Section 3, one can show that Γ is a contraction on the ball provided that T > 0 and N > 1 are appropriately chosen. Indeed, we set R := 2 v 0 H s , and thus by Lemmata 3.1, 3.10, 3.11, and 3.9, we get for some c = c(s) > 0, when N ≥ 4R 4 so that (1 − N − 1 2 R 2 ) −1 ≤ 2. First, we choose T 1 = T 1 (R) > 0 such that (1 + c)T 1 R 4 ≤ 1 6 , then we choose N = N (R) ≥ 1 + 4R 4 such that 2c(1 + 2T 1 R 2 )N − 1 2 R 2 ≤ 1 6 , and finally we choose T = min T 1 , 1 6 (cN 1 2 R 2 ) −1 . By possibly choosing smaller T and bigger N and by using the difference estimates of Lemmata 3.10, 3.11, 3.1, and 3.9, the contraction property of Γ follows analogously. Therefore, by the contraction mapping principle, for given v 0 ∈ H s (R), there exists a unique v ∈ C T H s satisfying (6.1). Moreover, v C T H s v 0 H s . Now let us consider two solutions u 1 , u 2 ∈ C T H s of DNLS. By Lemma 2.1, w 1 , w 2 ∈ C T H s and and thus any two solutions u 1 , u 2 ∈ C T H s started from the same initial data must coincide on the time interval [−T, T ]. By appealing to the time translation symmetry of DNLS, we conclude that any initial data u 0 ∈ H s (R) determines a unique solution to DNLS which is continuous in time with values in H s (R).
Appendix A. Notation: indexing by ordered trees We include here the notation and terminology used in [25, Section 3.1] regarding the cubic NLS equation on the real line.
Definition A.1. Given a partially ordered set T with partial order ≤, we say that b ∈ T with b ≤ a and b = a is a child of a ∈ T, if b ≤ c ≤ a implies either c = a or c = b. If the latter condition holds, we also say that a is the parent of b.
As in [4,31], the trees refer to a particular subclass of ternary trees.
Definition A.2. A ternary tree T is a finite partially ordered set satisfying the following properties: Let a 1 , a 2 , a 3 , a 4 ∈ T. If a 4 ≤ a 2 ≤ a 1 and a 4 ≤ a 3 ≤ a 1 , then we have a 2 ≤ a 3 or a 3 ≤ a 2 .
A node a ∈ T is called terminal, if it has no child. A non-terminal node a ∈ T is a node with exactly three children denoted by a 1 , a 2 and a 3 . 8 There exists a maximal element r ∈ T (called the root node) such that a ≤ r for all a ∈ T. We assume that the root node is non-terminal. T consists of the disjoint union of T 0 and T ∞ , where T 0 and T ∞ denote the collection of parental (non-terminal) nodes and terminal nodes, respectively.
Note that the number |T| of nodes in a tree T is 3j + 1 for some j ∈ N, where |T 0 | = j and |T ∞ | = 2j + 1. Next, we recall the notion of ordered trees introduced in [14]. Roughly speaking, an ordered tree "remembers how it grew". Definition A.3. We say that a sequence {T j } J j=1 is a chronicle of J generations, if T j has j parental nodes for each j = 1, . . . , J, T j+1 is obtained by changing one of the terminal nodes in T j , denoted by p (j) , into a non-terminal node (with three children), j = 1, . . . , J − 1. Given a chronicle {T j } J j=1 of J generations, we refer to T J as an ordered tree of the Jth generation. We use T(J) to denote the collection of the ordered trees of the Jth generation.
Note that the cardinality of T(J) is given by |T(J)| = 1 · 3 · 5 · · · · · (2J − 1) =: c J (A.1) Remark A.4. Given two ordered trees T J and T J of the Jth generation, it may happen that T J = T J as trees (namely as graphs) while T J = T J as ordered trees according to Definition A.3. Henceforth, when we refer to an ordered tree T J of the Jth generation, it is understood that there is an underlying chronicle {T j } J j=1 .
Definition A.5. (i) Given an ordered tree T J ∈ T(J) with a chronicle {T j } J j=1 , we define a "projection" π j , j = 1, . . . , J, from T J to subtrees in T J of one generation by setting 8. Note that the order of children plays an important role in our discussion. We refer to aj as the jth child of a non-terminal node a ∈ T. In terms of the planar graphical representation of a tree, we set the jth node from the left as the jth child aj of a ∈ T. π 1 (T J ) = T 1 , π j (T J ) to be the tree formed by the three terminal nodes in T j \ T j−1 and its parent, j = 2, . . . , J. Intuitively speaking, π j (T J ) is the tree added in transforming T j−1 into T j . We use r (j) to denote the root node of π j (T J ) and refer to it as the jth root node. By definition, we have r (j) = p (j−1) . (A.2) Note that p (j−1) is not necessarily a node in π j−1 (T J ).
(iii) We define the essential terminal nodes π ∞ j (T J ) of the jth generation by setting π ∞ j (T J ) := π j (T J ) ∞ ∩ T ∞ J = (T j \ T j−1 ) ∩ T ∞ J . By definition, π ∞ j (T J ) may be empty. Note that {π ∞ j (T J )} J j=1 forms a partition of T ∞ J .
We record the following simple observation.
Remark A.6. Let T ∈ T(J) be an ordered tree. Then, for each fixed j = 2, . . . , J, there exists a path 9 a 1 , a 2 , . . . , a K , starting at the root node r = r (1) and ending at the jth root node r (j) such that a k = r (l) for any k = 1, . . . , K and l ≥ j + 1. Namely, we can move from r (1) to r (j) without hitting a root node of a higher generation. More concretely, given r (j) , we know that it appears as a terminal node of π j 1 (T) for exactly one j 1 ∈ {1, 2 . . . , j − 1}. Similarly, r (j 1 ) appears as a terminal node of π j 2 (T) for exactly one j 2 ∈ {1, 2 . . . , j 1 − 1}. We can iterate this process, which must terminate in a finite number of steps with j k = 1. This generates the shortest path r (j k ) , r (j k−1 ) , . . . , r (j 1 ) , r (j) from r (1) to r (j) and we denote it by P (r (1) , r (j) ). Similarly, given a ∈ T \ {r (1) }, one can easily construct the shortest path from r (1) to a since a is a terminal node of π k (T) for some k. We denote this shortest path by P (r (1) , a).
Given an ordered tree, we need to consider all possible frequency assignments to nodes that are "consistent".
Definition A.7. Given an ordered tree T ∈ T(J), we define an index function ξ ξ ξ : T → R such that for a ∈ T 0 , where a 1 , a 2 , and a 3 denote the children of a. Here, we identified ξ ξ ξ : T → R with {ξ a } a∈T ∈ R T . We use Ξ(T) ⊂ R T to denote the collection of such index functions ξ ξ ξ. Also, the collection of index functions ξ ∈ Ξ(T) with fixed frequency ξ ∈ R at the root node of T is denoted by Ξ ξ (T) Remark A.8. If we associate functions v a = v a (ξ a ) to each node a ∈ T, then the relation (A.3) implies that v a = v a 1 * v a 2 * v a 3 .

9.
A path is a sequence of nodes a1, a2, . . . , aK such that a k and a k+1 are adjacent.