EXACT ASYMPTOTIC ORDERS OF VARIOUS RANDOMIZED WIDTHS ON BESOV CLASSES

. We study the eﬃciency of the approximation of the functions from the Besov space B Ω pθ ( T d ) in the norm of L q ( T d ) by various random methods. We determine the exact asymptotic orders of Kolmogorov widths, linear widths, and Gel’fand widths of the unit ball of B Ω pθ ( T d ) in L q ( T d ). Our results show that the convergence rates of the randomized linear and Gel’fand methods are faster than the deterministic counterparts in some cases. The maximal improvement can reach a factor n − 1 / 2 roughly.

1. Introduction. A generic problem in applied mathematics and data sciences is to seek a system of functions which is well adapted for approximating the functions from the class involved. It is natural to ask how to choose an appropriate form of approximants and how to determine an optimal method. This problem has been intensely studied in approximation theory, and resulted in the invention of the notions of various widths. In 1936, A. N. Kolmogorov gave the first definition of the width which is now known as Kolmogorov n-width. Subsequently, V. M. Tikhomirov's work made the research of the widths active, see [19,31,32]. The main task in the study of the widths is to construct the best approximations and determine the asymptotic orders of the corresponding errors for various classes of functions. Since the concepts of the widths characterize the efficiency of different algorithms from the viewpoint of optimal approximations, they have close relations to many optimal problems such as complexity of numerical problems, machine learning and signal processing, see [6,11,12,33] and the references therein. For the results related to the widths on some important classes of functions, such as the Sobolev classes, the Besov classes and the classes of analytic functions, one can see [1,8,12,13,16,17,18,19].
On the other hand, randomized (Monte Carlo) methods has been widely used in many areas of applied and computational mathematics, see [26,27], and [11].
With the development of the theory of widths, there has been an increasing interest in investigating the optimality of various randomized methods. This leads to the establishment of the theory of information-based complexity, cf. [33]. This theory has provided some notions and tools to understand the efficiency issues for both deterministic and randomized methods. In this way some comparison between the deterministic and the randomized settings becomes possible. Under this background, Mathé in [20,21,22] and Heinrich in [15] studied various widths for classical Sobolev classes in the randomized setting. They determined the exact orders of these widths. Comparing their results with the deterministic ones, one can see that for certain function approximation problems, the randomized methods have better error performances. Furthermore, in [10] and [5], the authors proved that the randomized methods can bring speed-up for some approximation problems on Sobolev classes with bounded mixed derivatives.
In this paper, we consider random approximation for Besov classes. Our motivations are based on the following two considerations. First of all, Besov classes play important roles in approximation theory, machine learning, and statistics. It is well known that in nonlinear approximation by wavelets or piecewise polynomials, some approximation classes of functions can be characterized as the classes of functions with Besov smoothness which are larger than the Sobolev classes, cf. [7,9]. So in the study of regression learning or density estimation by nonlinear methods, one may assume the regression function or probability density function has Besov smoothness to get the desired convergence rates, cf. [4,2,18]. Furthermore, studying classification based on adaptive partitioning, one can use Besov smoothness assumption to weaken the margin condition and obtain the same approximation rate, cf. [3]. Second, the results on the deterministic approximations for the Besov classes have been well established. In particular, the exact asymptotic orders of Kolmogorov widths, linear widths, and Gel'fand widths for these classes have been obtained, see [13,28,29,34] and the references therein.
Motivated by the above works, we study various widths for Besov classes in the randomized setting. We determine the exact asymptotic orders of the randomized Kolmogorov widths, linear widths, and Gel'fand widths for the Besov classes B Ω pθ (T d ) in the space L q (T d ). Comparing our results with those of deterministic counterparts, see [34], we see that the randomized linear and Gel'fand methods perform better than deterministic ones for some values of parameters p, q. When q = ∞, the randomized method has the maximal improvement.
We organize this paper as follows. In Section 2, we recall the notions of the widths in the framework of information-based complexity theory. In Section 3, we state the main results. In Section 4, we establish two discretization inequalities for the randomized widths of B Ω pθ (T d ) which estimate these widths from above or below by means of widths of the unit ball of finite-dimensional Euclidean spaces. In Section 5, we prove the main results.
2. Notions of the widths. We formulate the notions of the deterministic and randomized widths in the context of the information-based complexity theory [33]. Let N , Z and R denote the set of natural numbers, integer numbers, and real numbers, respectively. Let S be a continuous linear mapping from a close bounded subset X 0 of a Banach space X to a Banach space Y . We want to approximate S by mappings of the form The mapping N is called information operator and the mapping ϕ is called algorithm. We consider the following three classes of methods introduced by Mathé in [22].
For a fixed k ∈ N, a rule u : X 0 → Y of the form u = ϕ • N is said to be a Kolmogorov method, if the information operator N is an arbitrary mapping from X 0 to R k and ϕ extends to a linear mapping from R k to Y , a linear method, if the information operator N is the restriction of a continuous linear mapping from X to R k and ϕ extends to a linear mapping from R k to Y , a Gel'fand method, if the information operator N is the restriction of a continuous linear mapping from X to R k and ϕ is an arbitrary mapping from R k to Y .
To let the readers have a better understanding of the above methods, we present some simple examples. Let H be a separable Hilbert space and {e k } ∞ k=1 be an orthonormal basis of H. For any f ∈ H, we have the expansion Let S be the identical mapping from H to H. We will design three methods to approximate S. Let If we take the information operator then we get a Gel'fand method.
Denote the sets of all Kolmogorov methods, linear methods, and Gel'fand methods with cardinality k by D k (X 0 , Y ), A k (X 0 , Y ) and C k (X 0 , Y ), then we put which give rise to the respective classes of Kolmogorov, linear, and Gel'fand methods. Note that we will denote by M(X 0 , Y ) any of the classes of Kolmogorov, linear, and Gel'fand methods in this paper.
If we fix a solution operator S : X 0 → Y and a class of methods M(X 0 , Y ), the couple (S, M) is called a numerical problem.
The worst case error of any method u ∈ M(X 0 , Y ) for S is measured by For n ∈ N, the n-th minimal error for the problem (S, M) is defined by If we take the specific class of methods as M in the deterministic setting, then we obtain the following Kolmogorov, linear, and Gel'fand widths, respectively and c n (S, X, Y ) := e n (S, C, X, Y ). It is well known that Kolmogorov and Gel'fand widths are closely related to the worst case reconstruction error of compressed sensing methods over classes of vectors, cf. [3,4].
Next we pass to the randomized setting. We assume that both X 0 and Y are equipped with their respective Borel σ-algebras B(X 0 ) and B(Y ), i.e., the σ-algebras generated by the open sets. Now we recall the definition of a randomized method. (1) [W, F, P ] is a probability space, i.e., W is a nonempty set, F a σ-algebra on W and P is a probability measure on (W, F). ( The cardinality function k : W → N is measurable, for which The error of a randomized method P M := ([W, F, P ], u, k) is defined as The cardinality of a randomized method P M is defined as

EXACT ASYMPTOTIC ORDERS OF RANDOMIZED WIDTHS 3961
The n-th minimal randomized error is defined as If we take the specific class of methods as M in the randomized setting, then we obtain the following randomized Kolmogorov, linear, and Gel'fand widths, respectively Throughout the paper, we will use the following symbols for brevity. For two nonnegative sequences {a n } n∈N and {b n } n∈N , the relation a n b n (a n b n ) means that there is a positive number c independent of n such that a n ≤ cb n (a n ≥ cb n ) for all n. The weak asymptotic relation a n b n means that a n b n and a n b n .
3. Main results. We first recall the definition of Besov class determined by a given modulus of smoothness from [34]. Let T d : The k-th modulus of smoothness Ω k (f, t) p of f is defined by Definition 3.1. Let Ω denote a continuous non-negative function on R + = {t : t ≥ 0}. We say that Ω(t) ∈ Φ * k if it satisfies: (1) Ω(0) = 0, Ω(t) > 0 for any t > 0; (2) Ω(t) is almost increasing, i.e., for any two points t, τ such that 0 ≤ t ≤ τ , we have Ω(t) ≤ CΩ(τ ), where C ≥ 1 is a constant independent of t and τ ; (3) For any n ∈ N, Ω(nt) ≤ Cn k Ω(t), where k ≥ 1 is a fixed positive integer, C > 0 is a constant independent of n and t; (4) There exists α > 0 such that Ω(t)/t α is almost increasing; (5) There exists β, 0 < β < k, such that Ω(t)/t β is almost decreasing, i.e., there exists C > 0 such that for any two points t, τ with 0 < t ≤ τ we have Now we define the Besov classes B Ω pθ (T d ) determined by a given modulus of smoothness Ω.
Definition 3.2. Let k ∈ N, Ω(t) ∈ Φ * k , 1 ≤ θ ≤ ∞, and 1 ≤ p ≤ ∞. We say f ∈ B Ω pθ (T d ) if f satisfies the following conditions: It is well-known that the space When Ω(t) = t r , r > 0, the space B Ω pθ (T d ) is the usual Besov space B r pθ (T d ). Let I denote the identical imbedding operator from the unit ball of B Ω pθ (T d ) to L q (T d ). Let a + := max{0,a}. For convenience, we introduce a function Φ : N → R by where Ω(t) is the same as that appearing in the definition of B Ω pθ (T d ). So the results about the deterministic widths of B Ω pθ (T d ) in L q (T d ) can be stated as follows.
Our main results are the following theorems.
Theorem 3.6. Suppose that k ∈ N, Ω(t) ∈ Φ * k , 1 ≤ θ ≤ ∞, Ω(t)/t α is almost increasing and α > d. We have c mc n (I, B Ω pθ , L q ) Ω(n −1/d ), 1 < p, q < ∞, or 1 < q ≤ p = ∞; Comparing Theorems 3.5 and 3.6 with Theorem 3.3, one can see that the convergence rates of randomized linear and Gel'fand methods are better than the deterministic ones in some cases. Quantitatively, the maximal gain can reach a factor n −1/2 roughly. 4. Discretization theorems. In this section, we establish two discretization theorems which reduce the estimates of the various randomized widths of the class B Ω pθ to those of finite-dimensional Euclidean spaces. We first recall some notations. Let Let I m p,q denote the identical imbedding mapping from the unit ball of m p to m q . Let s mc n denote any of quantities d mc n , a mc n or c mc n . Theorem 4.1. Let 1 ≤ p < q ≤ ∞, Ω(t)/t α be almost increasing, α > d(1/p−1/q). Then where the n k are non-negative integers with ∞ k=0 n k ≤ n.  It is easy to check that the quantities d mc n , a mc n , and c mc n are pseudo-s-scales. To prove Theorem 4.1, we recall the representation theorem for B Ω pθ (T d ) which is essentially due to S. M. Nikolskii [23]. For m ∈ N, let V m (t) = 1 + 2 m k=1 cos kt + 2 2m k=m+1 ((2m − k)/m) cos kt be the one-dimensional de la Vallée Poussin kernel. Then we define the d-dimensional de la Vallée Poussin kernel by For f ∈ L p (T d ), define its de la Vallée Poussin sum by V m f = f * V m , where * denotes the convolution. The differences of successive de la Vallée Poussin sums are defined by . Now the representation theorem for B Ω pθ (T d ) can be stated as follows.
By using Nikolskii inequality, see [30] for instance, one can derive the following theorem from Theorem 4.4, see [34] .
Proof. By Theorem 4.4, a function f ∈ B Ω pθ can be represented as According to the property (ii) of pseudo-s-scale, we have It suffices to prove that for k ∈ N.
To prove this inequality, we need to factor the operator Φ k . We use a technique in [34]. We first define some operators. Let T n denote the space of trigonometric polynomials of degree n on each variable. Define the mapping H k from T 2 k+1 to . . , d}. LetΦ k denote the operator from B Ω pθ to L p satisfyingΦ k (f ) = Φ k (f ) for any f ∈ B Ω pθ . Now we factor the operator Φ k : B Ω pθ → T 2 k+1 ∩ L q as follows: From Marcinkiewicz theorem, we know that for every f ∈ T 2 k+1 By using the properties of de la Vallée Poussin kernel and Riesz-Thorin theorem, see [24], we have To prove Theorem 4.2, we need two more lemmas. We first recall some notations. For a vector s = (s 1 ,· · · ,s d ) with nonnegative integers coordinates we define the set where y denotes the the largest integer y such that y ≤ y. For a function f ∈ L 1 (T d ), let δ s (f, x) denote the "blocks" of the Fourier series for f (x), holds.
Lemma 4.7 ([30]). Let G be a finite set of vectors, and define the operator S G as Then Now we are in a position to prove Theorem 4.2.
Proof. We define the space of trigonometric polynomials F m by F m := span e i(k,·) : k ∈ ρ(m1) , (4.5) where 1 = (1, . . . , 1). Let L m denote the mapping from F m to 2 md p via the isomorphic mapping defined in Lemma 4.6. We decompose the identical imbedding mapping I 2 md p,q from 2 md p to 2 md q as follows: where P denotes the orthogonal projection from L q to F m ∩ L q . So we obtain the decomposition Using Lemma 4.6 again, it is easy to see that 5. Proofs of main results. We use discretization technique to prove the main results. We first collect some results about the randomized widths of the unit ball of m p in the space m q from [22] and [15]. Proposition 1 ( [22]). For 1 ≤ p ≤ 2 and 1 ≤ n ≤ m, we have the estimate a mc n (I m p,q , m p , m q ) Proposition 2 ( [22]). For any n ∈ N, there exist a constant c > 0 and a r ∈ N for which d mc n (I m p,q , r n p , r n q ) ≥ cφ(n, p, q), where φ(n, p, q) := Then we recall an auxiliary lemma. Proof. The upper bounds follow from the well known relation d mc n (I, B Ω pθ , L q ) ≤ d n (I, B Ω pθ , L q ) and the conclusions of Theorem 3.3. Now we turn to the lower bounds. First, we prove the results for 1 < p, q < ∞ and treat the two cases q ≤ p and q > p, respectively.
(i) For 1 < q ≤ p < ∞, it suffices to prove the lower bounds for 1 < q ≤ 2 ≤ p < ∞. According to Theorem 4.2, we have (ii) For 1 < p < q < ∞, we divide the lower estimates into three cases. For 1 < p < 2 ≤ q < ∞, we only need to consider the case 1 < p < 2 and q = 2. The inequality (5.1) and Proposition 2 again imply that