On a new smoothing technique for non-smooth, non-convex optimization

In many global optimization techniques, the local search methods are used for different issues such as to obtain a new initial point and to find the local solution rapidly. Most of these local search methods base on the smoothness of the problem. In this study, we propose a new smoothing approach in order to smooth out non-smooth and non-Lipschitz functions playing a very important role in global optimization problems. We illustrate our smoothing approach on well-known test problems in the literature. The numerical results show the efficiency of our method.

1. Introduction. We consider the following unconstrained optimization problem (P ) min x∈R n f (x) where the objective function f : R n → R is continuous and differentiable almost everywhere. The problem (P ) has drawn important attention in many branches of engineering, finance, and other sciences [31,33]. However, the problem is very hard to solve because of the non-smooth and non-convex structure of the objective function [18,47].
When the problem is non-smooth (perhaps non-Lipschitz), well developed important techniques for smooth optimization can be insufficient. In order to make these optimization techniques available the sub-differential and smoothing approaches are proposed. The sub-differential approach is related to generalization of the differentiability. In fact, with the help of the generalizations on differentiability, the conditions for differentiability are weakened and the classical techniques based on classical differentiability are upgraded [32,7,8,9]. These upgraded techniques are used effectively for Lipschitz continuous problems but most of these techniques are unavailable for non-Lipschitz problems.
The smoothing studies deal with making the objective function continuously differentiable before the optimization procedure. The smoothing focuses on finding the appropriate modification which makes the objective function smooth and on designing approximation to the objective function by smooth functions. One of the first studies on the smoothing approach was proposed in 1975 by Bertsekas [2], in 1980 by Zang [51] and in 1982 by Xavier [44,46]. The smoothing approach has been used for many non-smooth but Lipschitz continuous problems such as min-max [1,50], penalty expressions of constrained optimization problems [20], clustering problems [45], nonlinear complementarity problems [4]. Moreover, the smoothing approach is used for both Lipschitz and non-Lipschitz regularization problems in [5,6] and for other problems [25,26]. Therefore, the smoothing techniques are very important in terms of optimization. The smoothing function is described by the following definition: Definition 1.1. [5] Let f : R n → R be a continuous function. The functioñ f : R n × R + → R is called a smoothing function of f (x), iff (·, β) is continuously differentiable in R n for any fixed β, and for any x ∈ R n , If the function f in problem (P ) is non-convex then, it may have a lot of local minimizers. When the function f has multiple local minimizers, determining the global one and escaping from the current local minimizer in order to get the lower one are important challenges. Although there are many valuable methods and algorithms which are dedicated to solve the global optimization problems, there is no exact method which efficiently solves all these different types of problems. For example; the global optimization methods based on gradient usage are not useful for non-smooth problems and the methods based on Lipschitz continuity are not useful for non-Lipschitz problems.
We assume the following assumptions hold for the problem (P ): As a result of Assumption 1, there exists a closed and bounded box Ω containing all the minimizers of f . Assumption 2 The function f has a finite number of local minimizers. Throughout the paper, we use x * k to denote the k−th local minimizer of f whereas by x * we mean the global minimizer.
Most of the global optimization methods concentrate on finding lower minimizer than the current one. These methods can be classified into three groups; heuristic, stochastic and deterministic. Each of these approaches can outmaneuver each other from different aspects. Heuristic approaches generally base on the simulation of physical, chemical or biological processes. The prominent heuristic approaches are Simulated Annealing, Genetic Algorithms, Particle Swarm Optimization and Fuzzy Optimization [15,12,37,30]. Stochastic approaches are quite simple, very efficient in black box problems and robust with respect to the increment of dimension of the problem [53]. Although these methods have many important advantages, the convergence to solution is quite slow and they can give different results if they are run again [36,16]. Deterministic approaches always find the same solution if they run again under the same conditions. They also have some advantages over the others in terms of easy implementations [38,35]. There exist important methods such as Branch and Bound [54,27], Cutting Plane [39] and space filling curves based methods [17] and other important methods [13,28,29].
One of the important deterministic approach is the auxiliary function approach which includes the Tunneling Method (Algorithm) [19,3,48], Filled Function Method [10,11,52,49,42,22,21], Global Descent Method [43,23] and Cut-Peak Function Method [40]. These methods are established on finding a lower minimizer than the current one by making a suitable modification to the objective function. The modified function is generally called as auxiliary function (Filled Function, Tunneling Function and etc.). The auxiliary function contains parameters which control the modifications. In most auxiliary function methods, setting the parameters constitutes the important phase of their algorithms. Since the parameters may change from problem to problem, there is always a risk of undesirable situations in computations.
In this study, we propose a new smoothing approach for a wide class of nonsmooth functions. In this new technique, first the smoothing approach is applied to a non-smooth (or non-Lipschitz) functions so that well-known classical optimization methods can be applied.
In the following section, we present the theoretical infrastructure of the approach analytically. In Section 3, we give the convergence results of the smoothing approach. In Section 4, the application of smoothing approach is presented and the numerical results on test problems are reported. In the last section, we give some concluding remarks.

A New Smoothing
Technique. The non-smoothness of the problem (P ) mostly originates from the presence of "max p , min p , | · | p " in the formulation of f (x). If p = 1, then the problem (P ) becomes non-smooth but Lipschitz continuous. If 0 < p < 1, then the problem (P ) becomes non-Lipschitz. In order to obtain the smooth objective function in which the local minimizer finding algorithms can easily be applicable, we replace non-smooth functions including the above terms with their smoothed versions. For this purpose, if q p (t) = max{t, 0} p , 0 < p ≤ 1 the smoothing function is defined bỹ where µ p (t, β) is continuously differentiable with respect to t and it is expressed by the following formula: where β > 0. If the second order differentiability of the functionq p (t, β) is needed then, the following function: is used instead of the function in (2). Therefore, not only Steepest Descent-type local search methods but also Newton-type local search methods can be considered in the optimization process of the problem (P ).
Proof. Since the functionsq p (t, β) and q(t) have the same values outside the β neighborhood of 0, it is necessary to prove that the inequality (4) holds as −β ≤ t ≤ β. Let us consider the case −β ≤ t ≤ 0.
Thus, the maximum difference is obtained at the left end point t = 0. Therefore the inequality in (4) is obtained.
The smoothing function is controlled by the parameter β. The effects of the parameter changes on q p (t, β) are illustrated in Fig. 1 (a) and (b).
Proof. From the Lemma 1 we have  Figure 1. (a) The green and solid graph is the graph of q 1 (t), the blue and dashed one is the graph ofq 1 (t, 0.5), the red and dotted one is the graph ofq 1 (t, 1). (b) The green and solid graph is the graph of q 1/2 (t), the blue and dashed one is the graph ofq 1/2 (t, 0.5), the red and dotted one is the graph ofq 1/2 (t, 1).

Remark 1.
Let g be a smooth function from R n to R. If we replace t by g(x) in the functionq p (t) defined in (1), we obtain a smoothing function for max{g(x), 0}. Moreover, it is sufficient to consider a smoothing function just for max{g(x), 0} since Remark 2. Since we have the identity such as min{g(x), 0} = − max{−g(x), 0} and |f (x)| = max{g(x), 0} + max{−g(x), 0}, it is not needed to present the smoothing results for the operators min and | · |.
3. The Degree of Approximation of Smoothing Function. Now, let us define the function θ from R n to R as θ(x) = ( m i=1 q pi (g i (x))) and smoothed functioñ θ(x, β) = ( m i=1q pi (g i (x), β)) where g i (x) : R n → R are smooth functions and 0 < p i ≤ 1 for i = 1, 2, . . . , m. If q pi (t) = max{0, t} pi , then the function θ can be considered as penalty term in constrained optimization [34]. If q pi (t) = |t| pi , the function θ can be used as a regularization term in solving inverse problems [6].
Here, we present the error estimations of the difference between optimal solution and value of smoothed function and original function by the following results.
Proof. From Lemma 2.1 we obtaiñ Therefore the inequality (5) is obtained.
Proof. From Lemma 2.1 and Theorem 3.1, we obtain Therefore the inequality (6) is obtained and the proof is completed.
Let ϕ : R + → R + be a second-order differentiable and increasing (or decreasing) function such that where h(t) is linear function on R. The well known examples of ϕ are t, 1 1+αt , αt 1+αt and arctan(αt) which are commonly used as potential function in solving inverse problems [24] and considered in global optimization of mixed integer non-linear problems [47].
Theorem 3.4. Let the function ϕ is defined as in (7). Suppose that x * is a local minimizer of ϕ(θ(x)) andx is a local minimizer of ϕ(θ(x, β)). Then, we have where K > 0 is a constant.
Proof. For any β > 0 Therefore the proof is completed.
Proof. The proof is similar to Corollary 2.
Proof. By considering the Theorem 3.4 and Corollary 3, the proof is obtained.
Proof. Since the function ϕ is increasing, By considering the properties of the function ϕ we have for K > 0. The proof is obtained similarly when ϕ is a decreasing function.

Applications of the Smoothing Technique in Global Optimization.
Our main contribution with the smoothing approach is to smooth out the non-smooth problem (P ) and make it suitable for applying global optimization techniques which are used for smooth problems. According to the above smoothing approach, the smoothing function of f (x) is shown byf (x) (see Definition 1) and the problem (P ) is restated as (P ) min x∈R nf (x) where the objective functionf : R n → R is continuously differentiable. Since the problem (P ) transforms into a smooth global optimization problem, a new global optimization method based on smooth local searches can, thus, be used to solve the problem (P ).
The smoothing approach can be applied to many different branches of optimization such as exact penalty functions, inverse problems etc. The smoothing approach can also be used for the global optimization of non-smooth problem by combining with gradient based global optimization algorithm. In this section, we use our smoothing approach and consider the auxiliary function based global optimization algorithm AFA proposed in [41]. The algorithm is programmed in MATLAB and has been executed on Intel Core i5-3337U 1.8GHz with 7.8 Gb Ram. In the following example we show the usage of the new smoothing approach in order to smooth out non-convex, non-smooth and non-Lipschitz functions.
Problem. Let us first define the following n−dimensional unconstrained global optimization problem: where m is the number of local minimizer, n is the dimension of the problem, b j = (b j 1 , b j 2 , . . . , b j n ) ∈ R n , a j is the real number and x pj = ( n i=1 |x| pj ) 1 p j , p j > 0 for j = 1, 2, . . . , m.
It can be observed that the number of local minimizers, the dimension, the location of the local minimizers of the of the objective function f (x) can be chosen. Moreover, the biggest value of a j for j = 1, 2, . . . , m gives the location of the global minimizer of f (x). Depending the parameter p, the minimization problem at (10) may be smooth, non-smooth and non-Lipschitz.
Example 1 (2−dimensional non-Lipschitz problem). Based on the Problem 4, we design a two dimensional non-smooth and non-Lipschitz problem as where the values of b j = (b j 1 , b j 2 ) ∈ R 2 , a j ∈ R and p j > 0 for j = 1, 2, . . . , 5 are presented in the Table 1 = (0, 0), the directional derivatives of f are not exist. Clearly, f is continuous but it is non-smooth and non-Lipschitz. Therefore, it is not possible to find the local and global minimizers of this function by using classical gradient-based Table 1. The values of a j , b j and p j in Example 1 local searches. The graph of that function is shown in Fig. 2 (a). As it is studied in this paper, the objective function f is smoothed out by applying the smoothing approach. We construct the following smoothing function of f whereQ(x, β) =q 1 (x 1 , β) +q 2 (x 2 , β) andR(x, β) =r 1 (x 1 , β) +r 2 (x 2 , β). The functionq i is defined as for i = 1, 2 with β > 0. We consider the following problem instead of (11). The new problem can be defined as the global minimization of the continuously differentiable function. The graph of the functionf is shown in Fig 2  (b) with β = 0.1. At this stage, any of gradient based global optimization algorithms can be used to solve the problem (12). For example, we apply AFA to problem (12) and present the results of the minimization process in Table 2. Table 2 is constituted by applying AFA to problem (12) and reporting the value of smoothing parameter β, the iteration number by k, the k−th local minimizer off (x, β) byx k , the corresponding value of k−th local minimizer in smoothing function byf (x k , β), the corresponding value of k−th local minimizer in original function by f (x k ), the k−th local minimizer of f (x) by x * k and the corresponding value of k−th local minimizer in original function by f (x * k ). It can be observed from the Table 2 that the local minimizers of original  and smoothing functions are located at the points with very small differences. If the smoothing parameter β approaches to 0, the differences between the locations and values of local minimizers of original and smoothing functions becomes smaller.
Example 2. Let us define the following constrained optimization problem: where the values of b j , a j ∈ R and p j > 0 for j = 1, 2, . . . , 6 are presented in the Table 3. We consider the penalty approach in [34] to solve the problem given in (13) Table 3. The values of a j , b j and p j in Example 2 j 1 2 3 4 5 6 a j 1.8 1. defining the surrogate unconstrained problem as where λ > 0 and G i (x) = max{g i (x), 0}. It can be seen that the function F λ (x) in (14) is non-smooth and non-Lipschitz (see Fig. 3 (a)). By employing smoothing approach, the smooth function that mimics the original function is obtained. We construct the following smoothing function of F λ as for j = 5, 6 with β > 0. The smoothing penalty term is defined as Therefore, we consider the following problem instead of the problem (14). The new problem can be defined as the global minimization of the continuously differentiable function. The graph of the functionF λ is shown in Fig. 3  based global optimization algorithms can use to solve the problem (15). As it is used in Example 1, we again consider the AFA and present the results of the minimization process in Table 4. Table 4 is constituted by applying AFA to problem (15) and reporting the value of smoothing parameter β, the iteration number by k, the k−th local minimizer of smoothing penalty functionF λ (x, β) byx k , the corresponding value of k−th local minimizer in smoothing penalty function byF λ (x k , β), the corresponding value of k−th local minimizer in penalty function by F λ (x k ), the corresponding value of k−th local minimizer in original function by f (x k ), the k−th local minimizer of the original function f (x) by x * k and the corresponding value of k−th local minimizer in original function by f (x * k ). The numerical results in Table 4 show that our smoothing approach is viable for constrained global optimization, too. It can be observed again that, if the smoothing parameter β approaches to 0, the differences between the locations and values of local minimizers of original and smoothing functions are getting smaller.  We have given two different one-dimensional numerical examples in order to illustrate how the presented smoothing approach works with global optimization algorithms. We consider some well-known non-smooth test problems and apply AFA to find the global minimizer of non-smooth test problems. Furthermore, we compare our results with the results obtained from Global Descent Algorithm (G-DA) presented in [14] in terms of total iteration numbers, function evaluations, total computation time. The list of test problems is shown in Table 5. The results of AFA and their comparison with the results of GDA on the considered non-smooth test problems are presented in Table 6. We consider the smoothing function of the objective function instead of the original objective function. The table is constituted by applying AFA to the smoothed version of test problems from the starting point x 0 and reporting the global minimum value of original objective function by f * , the number of function evaluations including local searches by f.eval and CPU time by Time(sec). We find the global minimizer of the smoothing function and compute the corresponding global minimum value by putting it in the original function. The results for GDA on test problems are taken from [14]. It can be observed that AFA presents satisfactory results with less function evaluations and computation time than GDA.

5.
Conclusion. In this study, we have introduced a new smoothing technique for non-smooth functions. The proposed smoothing technique can be used in solving penalty, minimax, inverse etc. type problems and also global optimization of continuous, multi-modal but non-smooth, non-convex and also non-Lipschitz problems as well as many optimization problems which contain any of "min, min p ,max, max p , | · | and | · | p " for 0 < p < 1 operators. Our smoothing technique can be easily controlled by one parameter. This parameter gives a possibility to obtain sensitive approximation (or representation) to the original non-smooth function.
We visualize the process of our smoothing approach on two different examples. According to numerical results in Table 6, the smoothing approach notably decrease the computational cost in solving the test problems.
For future studies, the cooperation of smoothing approach with DIRECT method proposed in [13] can be interesting for this purpose. Finally, the application of this approach on real-world problems could be promising.