Optimal stopping problems with restricted stopping times

This paper provides a general ground for the problems of optimal stopping times over the families of partially available (or restricted) stopping times. It subsumes the classical framework in continuous-time, discrete-time, as well as semi-Markov settings as special cases. We model the problem by a restricted pool of stopping times meeting certain natural conditions and present its solution by means of Snell's envelope technique that extends the classical results. We further extend this type of problems to the stochastic processes indexed by partially ordered set.

The classical theory of optimal stopping time is generally limited to either the continuous-time setting, where stopping times can take any value in an interval, or the discrete-time setting with stopping times in a set of discrete points -either pre-determined or random in a semi-Markovian setting. Such settings, however, are inadequate for describing certain practical problems in the real world, where the time setting is neither purely continuous nor totally discrete over the entire time horizon, and random time points are not necessarily in a semi-Markovian setting. A practical example can be found in, e.g., Dupuis and Wang (2002) [9], who investigated a particular problem of optimal stopping times to maximize an expected discounted investment return. In their setting, stoppings are allowed only at jump times of some exogenous uncontrolled Poisson process. Similar ideas are employed to model the liquidity in finance and economics, see, for example, Oksendal and Sulem (2002) [26], Rogers and Zane (2002) [29], Pham and Tankov (2009) [28], among others, in which the available stopping times form increasing sequences.
As a simple example involving restricted stopping times beyond the existing continuous and discrete time settings, consider an agent who manages a set of, say m, economic projects, each of which will bring her a random income at each stopping. According to her prefixed time schedule, the whole time horizon is split into mk deterministic intervals [i, i + 1], i = 0, 1, . . . , mk − 1, with an integer k. Under her working capacity and/or other limitations such as geographical distance and physical presence, she can take care of only one project in each time interval, so that the i-th project (i = 1, 2, . . . , m) will be taken care of only over intervals Therefore, the stopping times of project i are restricted to Applications of restricted stopping times can also be found in manufacturing scheduling practice, which aims to optimize certain performance measures in processing a number of jobs on a machine that is subject to random breakdowns; see, e.g., Cai et al (2005Cai et al ( , 2009 [6] [7] and the references therein. If the machine can switch between jobs before they are completed, then the stopping times of processing jobs are restricted to the random time intervals when the machine is operational. More discussions on restricted stopping times in manufacturing are provided in the 4th example in Section 2.
Due to the wide range of applications involving stopping problems with partially available sets of stopping times, there is a clear need for a general framework to formulate and solve such problems, which has been lacking thus far. Moreover, a general theory of optimal stopping time appears required in order to develop the optimal allocation theory for multi-armed bandit processes with restricted machine availability.
In this paper, we introduce a framework for optimal stopping problems over restricted sets of stopping times, and present their solutions in terms of Snell's envelop characterization. More specifically, we consider a stochastic gain process that collects a random gain when it is stopped. The feasible stopping times satisfy certain mild technical conditions (see the next section for details), and the objective is to find an optimal stopping time at which the expected gain reaches its maximum value. This new framework establishes a general ground for the problems of optimal stopping beyond all classical models in continuous-time, discrete-time, as well as semi-Markov settings, which are included as special cases. We here refer to this type of problems as the optimal stopping problem with restricted stopping times, or simply the restricted optimal stopping problem.
The structure of this article is as follows. After formulating the problem in Section 2, the solution is presented in Section 3, which extends the results in the classical unrestricted version. Section 4 further extends the restricted optimal stopping problem to partially ordered time sets. The final section provides some remarks.
Our objective is to find an optimal stopping time τ * in a sub-collection M of F-stopping times -specifically, stopping times with respect to the filtration F and taking values in [0, ∞] -so as to attain the supremum The sub-collection M is referred to as a restricted class of stopping times, and each τ ∈ M is called an allowable stopping time.
Throughout this paper we assume the existence of E[G τ |F 0 ], for which an obvious sufficient condition is E sup Moreover, the following condition is taken for granted to ensure the existence of an optimal stopping time: To formulate the restricted class M of stopping times, we make the following assumptions.
(ii) The collection M is closed under countably many maximizations and minimizations in the sense that {ν n } ∞ n=1 ⊂ M implies n ν n ∈ M and n ν n ∈ M, where ∨ and ∧ represent the least upper and greatest lower bounds respectively. (iii) If ν 1 , ν 2 , ∈ M and A ∈ F ν1∧ν2 , then Some explanations of these assumptions are discussed below.
(1). Part (i) of Assumption 1 allows a stopping time to be "idle" (never start) or "never stop". If the stopping times 0 and ∞ are not included, the optimal stopping time might not exist at all. Obvious examples include continuous or discrete-time processes with monotone paths.
(2). For any two allowable stopping times u and v, a decision maker can decide to stop at either the smaller or the larger time. Hence a practically natural requirement for M is that u, v ∈ M implies u ∨ v ∈ M and u ∧ v ∈ M. Note that for any sequence of F-stopping times {ν n } ∞ n=1 ⊂ M, the random variables n ν n and n ν n are still F-stopping times, where the latter is guaranteed by the right continuity of the filtration. The stability under countable n and n in Part (ii) states that M is closed under certain limit operations, which is necessary for the existence of optimal stopping times. Apparently, Parts (i) and (ii) imply that M is closed under any finitely many and operations and as well as their combinations, and, furthermore, imply the stability of M under countably many operations formed by and .
(3). Part(iii) is also quite natural: at the stopping time ν 1 ∧ ν 2 , the decision maker should have the freedom of stopping the process ν 1 or ν 2 by checking an event in the information σ-algebra can be re-phrased as a seemingly weaker but actually equivalent one: (iii) If a stopping time ν 1 is allowable and A ∈ F ν1 , then the restriction of ν 1 on A (i.e., Because we are discussing problems of optimal stopping times, though the time horizon is taken to beR + = [0, +∞] in this paper, it is not difficult to replace [0, +∞] with a finite time horizon, say, [0, T ] for some constant T < ∞, by restricting the stopping times within [0, T ], such that it does not bring any generalization to discuss an arbitrary interval [0, T ] to instead [0, +∞]. In addition, if the time horizon is not right closed, there may be no optimal solution for this problem. An obvious example is a deterministic increasing function X t of t, for which no stopping time can be optimal if the time horizon is [0, +∞).
As usually adopted in the literature, we assume that G t ≥ 0 for all t without loss of generality. If a process (G t ) takes real values, we can set H = essinf G t , which is integrable due to assumption (2), and introduce the right continuous and quasi-left continuous version of the martingale M t = E[H|F t ] (t ≥ 0), thanks to the quasi-left-continuity of the filtration. Then replace the initial gain process (G t ) with the adapted right continuous process defined byG Listed below are just a few of examples of restricted optimal stopping problems from a variety of disciplines, which include classical continuous/discrete-time problems as simple or trivial special cases, so as to show that restricted optimal stopping problems exist in many real situations. In the first case, one can only do nothing to take the immediate reward G 0 or wait forever to take the final reward G ∞ . The second case is commonly known as the optimal stopping problem in continuous-time. If the stopping times can only take integer values, then restricted optimal stopping problems reduce to the optimal stopping problems in discrete-time. 2. Let G be a sub-filtration of F that satisfies the usual conditions, i.e., F is finer than G. We can define a restricted domain of stopping times M G = {τ : τ is a G-stopping time}. Then compared with all F-stopping times, M G is a restricted class. Let G indicate the information filtration of a particular customer who has only limited authority to operate a machine, observe the behavior of the machine that is relevant to the tasks he assigned to the machine so as to make certain rewards and can only make his decisions according to what he observed. Let F be the information of someone possessing super authority to observe the whole performance of the machine. Then, generally, G ⊂ F in the sense G t ⊂ F t for all t ∈R + so that G-stopping times constitute a restricted subset of the collection of F-stopping times. Another example is the model discussed in, e.g., Dupuis and Wang (2002). Let (G t ) be an F-adapted reward process and (N t ) an F-adapted exogenous uncontrollable Poisson process with intensity λ. Denote by T k , k = 0, 1, 2, . . . , the times at which (N t ) makes a jump, where T 0 = 0. The decision maker can only decide to continue or stop at those T k . If G is the natural filtration of (N t ), then the allowable stopping times are those in M = {τ : τ is a G-stopping time such that N τ − N τ − = 1}, which clearly satisfies Assumption 1. 3. A further example is the semi-Markov decision process, in which the payoff processes are semi-Markovian and stopping can only occur at transition times. More specifically, let (G t ) be a semi-Markov process and F its natural filtration. Denote by T 0 , T 1 , T 2 , . . . the time instants at which (G t ) makes a transition, with T 0 = 0. The decision maker can continue or stop at those T k . Clearly, the class of allowable stopping times in this case is Consider a manufacturing scheduling problem of processing a set of jobs by a single machine. The machine is subject to breakdowns due to various causes such as damage to one of its parts, running out of engine oil, or disruption of power supply. When the machine is working to process a job, the processing of the job can be stopped at an arbitrary time point so that the machine can switch to processing another unfinished job. After a breakdown, however, as no job is being processed, stopping job processing is not possible until the machine is fixed. In such a case, the stopping times are restricted in the time intervals in which the machine is operational. Certainly, the M in all these examples meet the conditions in Assumption 1.

Solution.
We now derive the solution to the optimal problem formulated in Section 2. The main ideas are similar to those in Section I.2.1 of Peskir and Shiryaev (2006) [27]. The difference here is to take into account the restriction on the collection of allowable stopping times.
For any stopping time ν (not necessarily in M), write M ν = {τ ∈ M; τ ≥ ν a.s.} and let F ν be the pre-event σ-field at stopping time ν. Introduce a random variable Note that Z ∞ = G ∞ . Problem (1) will be solved if we solve problem (3) for arbitrary but fixed ν; that is, to find τ * ∈ M such that Z ν = E[G τ * |F ν ]. We need to examine certain properties of Z ν in order to proceed with further discussions of the solution to the optimality problem in (3). In particular, we need to check the following two points: (i) For any deterministic time point t, Z ν = Z t with probability 1 on {ν = t} so that Z t can be well defined; and (ii) the stochastic process (Z t ) is a supermartingale. To show this, we begin with an expression for esssup ρ∈Mτ E[G ρ |F ν ] as the limit of a sequence of nondecreasing random variables.
Because A ∈ F ν ⊆ F ρ1∧ρ2 , it follows from Assumption 1 that ρ 3 ∈ M τ and This proves that the family {E[G ρ |F ν ] : ρ ∈ M τ } is closed under pairwise maximization. Next, for any sequence of stopping times {ρ n } ⊂ M τ such that recursively define a new sequence of stopping times {ρ n } ∞ n=1 ⊂ M τ byρ 1 = ρ 1 and ρ n from (ρ n−1 , ρ n ) in the same way as we defined ρ 3 from (ρ 1 , ρ 2 ) in equation (4). It is then easy to check that E[Gρ n |F ν ] converges nondecreasingly to Z ν . The proof is thus complete.
The following proposition states the desired properties of (Z t ).

Proposition 1. (i) For any couple of stopping times ν and σ we have
(ii) Given any couple of stopping times ν and τ such that τ ≥ ν, and consequently, Proof. To prove (6) such that The same argument can be applied to build the opposite inclusion and hence Consequently, This proves the result in (6).
For the first equality in (7), use Lemma 3.1 to choose a sequence {ρ n } ∞ n=1 in M τ such that {E[G ρn |F τ ]} ∞ n=1 nondecreasingly converges to Z τ . Then the monotone convergence theorem for conditional expectation shows that On the other hand, since (7). The last inequality in (7) is simply due to the inclusion relation M τ ⊂ M ν .
In the definition of Z ν in (3), the stopping time ν is not necessarily in M. If the stopping times are deterministic, then (6) ensures Z t to be well defined and (7) again tells that the process (Z t ) is a supermartingale.
The following theorem characterizes the optimal stopping times if they exist.
Theorem 3.2. Let τ * ∈ M ν be an allowable stopping time. The following three statements are equivalent:
Proposition 2. For any ν ∈ M and any sequence {ν n } ∞ n=1 ⊂ M ν such that ν n → ν, we have lim Proof. First, note that for any general sequence {ν n } ⊂ M ν , due to A ∈ F ν ⊆ F νn for every n ≥ 1 and the supermartingale property of Z, it is easy to see that Next, associate every stopping time ρ ∈ M ν with a sequence of stopping times σ n = ρ ∨ ν n ∈ M νn , n ≥ 1. Then σ n → ρ, and hence the right continuity of (G t ) and Fatou's lemma lead to That is, for every A ∈ F ν and ρ ∈ M ν , Furthermore, let {ρ k } ∞ k=1 ⊂ M ν be an arbitrary sequence of stopping times selected as in Lemma 3.1 such that E[G ρ k |F ν ] converges increasingly to Z ν as k → ∞. Then the monotone convergence theorem together with (14) implies Thus (12) is established by combining (13) and (15).
In order to tackle the existence of optimal stopping times, we proceed to construct a family of "approximately optimal" stopping times for any λ ∈ (0, 1) and stopping time ν . Define D λ ν = {τ ∈ M ν : λZ τ ≤ G τ }. Then ∞ ∈ D λ ν and hence D λ ν is nonempty. Moreover, if τ ∈ M ν is an arbitrary stopping time, then we can deduce a new stopping timeτ = τ I (λZτ ≤Gτ ) + ∞I (λZτ >Gτ ) , which clearly belongs to D λ ν . Define There are a few easy facts on D λ ν . 1. For fixed ν and λ, D λ ν is well-defined because D λ ν = ∅. Furthermore, for varying ν, Lemma 3.3 shows that there is no ambiguity caused by (17). Because the stopping time ν is not necessarily in M, this property also shows that D λ t can be well defined. Moreover, D λ ν can intuitively be almost surely defined by D λ ν (w) = inf{s : there is a stopping time τ ∈ M ν such that s = τ (w) and λZ s ≤ G s } 2. Let {ν λ n } ∞ n=1 ∈ D λ ν be a sequence such that ν λ n ↓ D λ ν . By part (ii) of Assumption 1 again, {ν λ n } ∞ n=1 can be selected to be decreasing and thus D λ ν = n ν λ n ∈ M ν a.s.
is also a stopping time. 3. D λ ν is separately nondecreasing in λ and in ν. By the right continuity of G and the second assertion of Proposition 3, we can further deduce that Combining with (7), the following proposition tells that the process (Z t ) restricted on [ν, D λ ν ] is a martingale. Proposition 4. For λ ∈ (0, 1) and any stopping time ν ∈ M, Proof. Note that Z ν ≥ E[Z D λ ν |F ν ] by the relation in (7). For the reversed inequality, consider the random variables where τ are arbitrary stopping times. Because dominates G τ for all stopping times τ . Therefore, for the stopping time ν, In view of ν ≤ τ and so ν ≤ D λ ν ≤ D λ τ , we see that . This completes the proof. Now that for any fixed stopping time ν, the family of stopping times {D λ ν : λ ∈ (0, 1)} is nondecreasing in λ, we can define which is obviously a stopping time. Under the assumptions of right continuous paths for G and condition (2), we shall show that the D 1 ν defined by (21) is the solution to the optimal stopping problem (3).
Then equation (21) implies D 1 ν = lim n→∞ D λn ν a.s. By the property of Z ν together with assertion (i) above, we have Letting n → ∞, (22) follows from the property of conditional expectation and the fact that G is also left-continuous over stopping times, which implies (23) by Theorem 3.2.
The proof is thus complete.
4. Extension to partially ordered set. Results of optimal stopping times on completely ordered time horizon are usually considered as special cases of the general theory of optimal stopping for a family of random variables indexed by a partially ordered set. This general setting was considered for the first time by Krengel and Sucheston (1981) [19], who proved a number of general theorems and applied them to the case of functionals of a family of independent and identically distributed random variables. More researches on partially ordered sets can be found in, e.g., Mandelbaum and Vanderbei (1981) [21], Lawler and Vanderbei (1983) [20], Nualart (1992) [25], among others. This section extends the previously obtained result of restricted stopping problems to stochastic processes on partially ordered time set. Let S be a partially ordered topological space with a minimal element 0 and a maximal element ∞ and equipped with the operations ∨ and ∧ such that for any a, b ∈ S: A typical instance of S is R n + with a = (a 1 , a 2 , . . . , a n ) ≤ b = (b 1 , b 2 , . . . , b n ) if and only if a i ≤ b i for i = 1, . . . , n. Further let (Ω, F, P ) be a complete probability space with a filtration (F t ) t∈S , i.e. a family of complete sub σ-algebras of F such that u ≤ v implies F u ≤ F v . A stopping point ν is defined as a random point in S such that {ν ≤ s} ∈ F s for all s ∈ S. Unlike in the case of perfectly ordered S, for two arbitrary stopping points ν and τ , the minimum τ ∧ ν need not be a stopping point and the set {τ ≤ ν} may not be in F τ (e.g., Walsh, 1981). Let M be a class of stopping points satisfying the following assumptions.
Similarly, we suppose that (G t ) t∈S is a family of random variables that are adapted, right-continuous (i.e., t 0 ≤ t and t → t 0 imply G t → G t0 for t 0 , t ∈ S) with probability 1, and satisfies E[sup t∈S |G t |] < ∞. The definition of M τ is the same as before.
This proves that the family {E[G ρ |F ν ] : ρ ∈ M τ } is closed under pairwise maximization. The proof of the rest is the same as that of Lemma 3.1.
For any pair of stopping points ν and τ such that τ ≥ ν, and consequently, Proof. Note that {ν = σ} ∈ F ν ∩ F σ . The proof is a straightforward generalization of Proposition 1.

Conclusion.
We have so far presented a framework of optimal stopping problems in which the allowable stopping times are restricted on certain classes. This framework unifies the classical optimal stopping problems in continuous time, discrete time, as well as for semi-Markov processes. Moreover, this framework also includes numerous scenarios beyond the classical models. This is a fundamental work and is expected to allow further applications in many dynamic decision problems such as the optimal allocation problems of Bandit processes, for which the optimal policies have so far only been obtained in the continuous time, discrete time and under semi-Markov settings.
In this paper, as in most of the literature on dynamic decision problems and stochastic controls, there are two technical requirements: the right continuity of the gain process G and the right continuity of the information filtration. It would be important and interesting to know what would happen if we drop one or both of these two technical requirements, because they are not necessarily met in common and practical situations. However, complete solutions to these problems appear to be a great challenge for future research. The other technical condition is E[esssup t∈R+ |G t ||F 0 ] < ∞. It means that our results can only be applied for the gain processes that are essentially bounded by certain integrable random variables. This condition is not always satisfied in practical applications. Whether it can be relaxed is another interesting and challenging problem for further research.