Exact and heuristic methods for personalized display advertising in virtual reality platforms

In this paper, motivated from a real problem faced by an online Virtual Reality (VR) platform provider, we study a personalized advertisement assignment problem. In this platform users log in/out and change their virtual locations. A number of advertisers are willing to pay for ad locations to reach these users. Every time a user visits a new location, the company displays one of the ads. At the end of a fixed time horizon, a reward is collected which depends on the number of ads of each advertiser displayed to different users. The objective is to assign ads dynamically to maximize the expected reward. The problem is studied in a framework where the behaviors of users are modeled with two-state continuous-time Markov processes. We describe two exact and four heuristic algorithms. We compare these algorithms and conduct a sensitivity analysis over problem and algorithm specific parameters. These are the main contributions of the current paper. Exact algorithms suffer from the curse of dimensionality, hence, heuristic methods might be considered instead in some cases. However, exact methods can also be used as part of heuristics since the experimental analysis demonstrates that they are robust for parameters that influence the computational requirements.

1. Introduction. Virtual Reality (VR) is becoming a reality which can influence our lives substantially. As it was the case in the introduction of Internet which converted personal computers (PC) to networking devices, or that of the mobile/smart phones which added mobility to our communication and computing experience, the advance of VR technologies is likely to bring the next "major computing and communication platform" as Mark Zuckerberg, the creator of Facebook, puts forward. The pace at which it develops and creates new ways of conducting businesses in various industries (e.g., entertainment, travel, advertisement, etc.) is remarkable.
The trend is not surprising given the interest of the leading technology firms in VR related products. Facebook acquired Oculus for $2 Billion in 2014, and recently in March, 2016 introduced Rift VR to the marketplace. Rift VR consists of a head mounted OLED display, headphones, positional and rotational tracking devices, and a software. It simulates a 3D environment where users can interact with the environment (and with each other) with a satisfactory sensory experience. Facebook is not alone in this race. HTC's Vive, Sony's PlayStation VR are among others that pioneer the VR headset technology and either currently in the market or will be soon. Samsung and Oculus collaborated on the Gear VR, which was released in 2015 for slightly less than $100 and uses a Samsung smart phone as the display unit. Likewise, Google's Cardboard, which is merely a phone holder frame that works with any android phone or iPhone, makes VR affordable for almost anyone at a price of $15.
At the moment on-line gaming leads the way as the major application area of VR technologies. However, show business, social media, professional training, health care etc. are at the verge of joining the realm of VR. Rift VR has already a free application called as Oculus Cinema, which provides a virtual theater environment. Linden Lab, the creator of the Second Life, announced that they are developing a new social platform where users can sell/buy properties, services, and products as in Second Life on PCs, built this time for VR head sets. Some start-ups are selling VR travel packages which introduce major highlights of the destinations to the travelers and assist them to decide on their next trips. Scientific research demonstrates potentials of VR in various health care applications such as training for and assessing certain surgeries, treating post traumatic stress disorders, and improving the balance control of pediatric patients. We refer the readers to [3], [10], and [14] for health care applications of VR.
Penetration of VR into our daily lives also motivates new families of problems in the operations research/management science (OR/MS) domain. Various OR problems abiding by the physical restrictions of the real world will have the chance to relax some of these constraints in virtual domains. For example, it is impossible to clone a real cafeteria where you meet with your friends whenever it is overcrowded. Yet, it can be done very easily in a VR environment and might even be a good idea in order to improve the experience of the users. Replenishing the inventory in a virtual shop instantaneously with no lead times, or delivering goods to your customers with no shipment cost etc., might be realities in virtual environments and not merely simplifying assumptions anymore. Therefore, VR appears as a vibrant applicant area of the OR/MS methodologies, and it asks for solutions for the new problems that are emerging due to its unique properties.
With its properties, VR shifts the paradigms of advertising as well; for example, the advertisement that a user is exposed to in a virtual environment (billboards, etc.) is not necessarily the same that for another user who is walking next to her in that virtual environment. Moreover, the abundance of data gathered effectively at no cost from viewers easily enables cost efficient, targeted, and personalized advertisement campaigns in VR. As a result, various of new decision making problems arise which are not available in conventional mass advertising.
This research is motivated from an advertisement assignment problem faced by a software company which develops and provides a virtual platform similar to Second Life. The objective of the company is to make ad assignments to its users dynamically in order to maximize its revenues. In Kilic et al. [7], personalized advertisement assignment problem is formulated as a stochastic control problem in a model where the behavior of users is represented with two-state continuous-time Markov processes. However, Kilic et al. [7] neither aiming to develop exact and/or heuristic algorithms nor conducting an experimental analysis and study their behaviour and/or performance of such algorithms. In the current paper, we follow up on this model and considered two exact and four heuristic algorithms for this formulation. The heuristics considers different constraints (which will be explained in detail later) and are simple, easy to apply, and not effected by the curse of dimensionality. Hence, they provide easily implementable business solutions when the optimal solution is computationally expensive to obtain. We also conducted numerical analyses in order to gain some insights on the structure and performance of the algorithms. The result of the experimental analyses include insights regarding to (i) the conditions under which heuristics fail to perform better than random selection (ii) the constraints that are improving the performance of myopic heuristics if considered as part of the algorithm (iii) the constraints that are diminishing the performance of myopic heuristics if considered as part of the algorithm (iv) the possibility of choosing low resolution and/or h-value parameters in order to address the curse of dimensionality problem associated with the exact algorithms. To sum up, the algorithms and their sensitivity analyses (over problem and algorithm specific parameters) are the main contributions of the current paper.
The rest of the paper is as follows. In Section 2 we provide the details of the personalized advertisement assignment problem in a virtual environment. The relevant literature is discussed in Section 3. The dynamic programming formulation and the solution approaches, namely the value iteration based approach and the finite difference based approach, along with four heuristics are presented in Section 4. The numerical analysis and observations are provided in Section 5. Section 6 concludes the paper with final remarks.
2. Personalized advertisement. With the advance of Internet technologies, the growing number of users, and the time these users spend online, Internet quickly became a major outlet for advertisement. A report published by Zenith Optimedia [18] states that the Internet advertising grew by 19 % in 2015, and according to their forecast it will have an average annual growth of 13% between 2015 and 2018. According to the same report, Internet advertising as a whole will overtake TV advertising and internet will become the largest medium of advertisement as of 2017.
Internet enables the publishers (i.e., the owners of the platforms on which advertisement is made) to collect large amount of data about the users (i.e., viewers). The data can be retrieved from users' online activities (log data such as frequency, duration, time, date of visits, interactions with other users, etc.) or can be provided directly through registration forms, online questionnaires, etc. The publishers use this information in order to increase the effectiveness of the advertisements.
Langheinrich et al. [8] categorizes Internet advertising into four; untargeted, editorial (based on the topic of the publishers website), targeted (such as behavioral, geographical, temporal, technical, contextual, etc.) and personalized advertising. The latter two, namely, targeted advertising and personalized advertising are at the heart of the current trend in online advertising, and they receive more attention from practitioners and researchers. Another taxonomy in Internet advertising is based on the type of the advertisement, i.e., display, paid search, and classified. The display advertisements are those that include the traditional displays (such as banners), images, social media and online videos. On the other hand, the paid search, which is commonly referred to as search engine marketing, is the type of advertising that targets audience based on the keywords they use in search engines. Lastly, classified advertisements are online versions of the traditional classifieds. According to the report of Zenith Optimedia [18], the online display advertising spending in 2015 was $74 Billion, whereas paid search spending and classified spending were $69 Billion and $17 Billion respectively in the same year. The same report indicates that the classified ads are loosing their share and it is projected that this trend will continue in the near future as well.
In this research, our focus is on personalized display advertisements for a VR based social platform. These platforms are mostly real life simulations modeled with 2-D or 3-D design technologies. In such environments, users create for themselves a second life, which doesn't necessarily abides with the constraints of the physical world. Revenues generated in such platforms by displaying advertisements to users have a significant share in the total income of the companies that develop and manage them [9].
In web browsing, a user is categorized as exposed whenever s/he visits the web page. On the other hand, exposure is defined differently in virtual environments. In such platforms, exposures are realized whenever the user is in the vicinity of an advertisement. This description of vicinity is based on the virtual distance, i.e., magnitude of the advertisement on the screen in terms of the total square pixel. Also, the exposure is realized if the user is in the vicinity for a certain amount of time. There are no established industrial standards, and the terms are usually negotiated between the publishers and advertisers. Also, advertisers are willing to pay more for viewers with a particular profile. Hence, the agreement between the two parties may include certain specifications regarding the targeted viewers (e.g., demographics, location, interests, surfing habits, etc.) and exposure payment terms.
As the publisher, the manager of the VR social platform faces a decision making problem every time a user is in the vicinity of a virtual advertisement location: s/he needs to decide which advertisement is to be displayed and how to make this decisions overtime so that the revenue at the end is maximized. This problem is referred to as the personalized display advertising problem and it consists of two separate phases. The matching problem is the first phase in which the compatibility of users with the set of specifications (established by the advertisers) are identified. Advertisement assignment problem is dealt in the second phase, where a specific ad is assigned to an ad place whenever a user is in the vicinity. In this current paper, we focus only on the assignment phase and assume that the matching phase is already conducted.
In this framework, one of the main concerns in the assignment phase is about satisfying the maximum / minimum display requirements per viewer. In the advertising literature there is a consensus on the inverted U -shaped relation between ad repetition and its effect in terms of recall and attitude [12]. Same advertisement that keeps popping up at every corner in a website can be annoying for the users, and this is undesirable for both the publisher and the advertiser. Furthermore, excessive repetition of the advertisement has no impact on viewers after a certain point. The ads start to wear out and advertisers waste their money if the message is already received. Therefore, due to both annoyance and the law of diminishing returns, setting a maximum display number per user is a widely accepted and desirable constraint [2].
On the other hand, only a few advertisement with little repetition would be inadequate. At the first couple of times, viewers can feel unfamiliar with the ad and ignore the message whereas some repetition would improve their ability to remember the advertisement in future. Thus, a minimum display number is another desirable constraint for advertisers [2]. As a result, the publisher receives a payment only if the viewer's exposure is more than a minimum display number.
Naturally, there is a budget constraint for every advertiser which is referred to as the maximum payment constraint. The contracts typically determine the level beyond which the advertiser would not pay anymore. Also, advertisers want to guarantee a number of displays in total so that they reach a critical mass. This is the tipping point utilizing the word of mouth effect [11]. We refer to this constraint as minimum payment constraint under which the advertiser makes payment to the publisher only if the advertisement is displayed more than a certain number of viewers.
3. Relevant literature. There is limited literature on personalized display advertising in VR environments since the technology itself is still emerging. However, targeted and personalized display advertising for banners in websites or for PC based online social/gaming platforms has been studied to some extent. This earlier work deals with the advertisement assignment problem similar to the one that we focus in this research. Prior to 2010, the problem was researched mostly in the context of web banners. Later the online social/gaming platforms also started to receive attention from the researchers. Some of the earlier research in this context focus only on the matching phase of the problem whereas some others provide solutions to both matching and assignment problems.
ADWIZ developed by Langheinrich et al. [8] is among the most notable personalized advertisement systems. It utilizes the short term interests of the users (particularly the keywords they use in search engines) in order to determine the advertisements that would be assigned to them. ADWIZ considers the constraints specified by the advertisers while making these assignments. Various probabilities such as the keyword search probabilities, click-through rates are periodically updated by a learning system. The advertisements are assigned to users based on display probabilities which are determined by a simple linear programming model where the objective is maximizing the expected revenue and the constraints are those that are required by the advertisers (e.g., a target display). Later, Tomlin [13] extends the LP model of ADWIZ by means of an entropy model and takes the formulation to multiple time periods.
Another line of research is on rule based systems. Bae et al. [1], for example, proposes a framework for both the matching and the advertisement assignment phases. Based on the web usage data, the profiles of the users are determined. Next, the advertisements are assigned to users via a fuzzy rule based system where the rules are elicited from a number of ad experts. Later Ha [4] modifies this framework and the fuzzy rules utilized in the system are learned from the historical responses of the users to the exposed advertisements. It should be noted that neither of these papers considers the requirements specified by the advertisers such as the minimum/maximum display limits per users, minimum/maximum payment constraints.
Kazienko and Adamski [5] developed the AdROSA personalized advertisement assignment system taking these constraints into account. The user profiles are determined based on navigational history as well as ad visiting patterns, and the advertisements that are likely to generate higher income are assigned. That is, the assignment system merely favors the advertisements that appealed to other users who had similar history. Zhou et al. [19] also provides a framework, which addresses both the matching and the assignment phases. Based on the web log history, ad visiting patterns, and the registration history, the customer profiles are determined and matched with the advertisers' specifications. For the advertisement assignment phase, an approach similar to AdROSA is adapted in which the advertisements favored by similar users' attain a higher probability of display. Optimal advertisement assignment is left as a future research topic in the paper.
Turner et al. [16] brings a dynamic policy for in-game advertising assignment problem. The method employs a two-stage algorithm. In the first stage, an LP is solved under various restrictions on the weekly schedules of impressions and campaigns in order to determine the service rates. These rates are used in the second stage to make assignments to ad places (called as inventory elements) whenever a player enters a new game zone or level. Numerical results are reported and a benchmark with an existing algorithm already applied by a company is given. The results indicate substantial improvement over the existing algorithm.
Turner [15] formulates the display advertising problem as a transportation problem with quadratic objective function. In this formulation, user types are considered as source nodes, campaigns are regarded as target nodes, and the decision variables are either the number of impressions or the proportion of expected supply carried over each arc. Even though this is a single-period problem, it can be solved over time on a rolling horizon basis to improve its performance.
Kilic and Bozkurt [6] considers the problem in a virtual reality social platform and focuses on the advertisement assignment phase only. Authors propose four different heuristics to handle different subset of the requirements set forth by the advertisers as constraints. Different from the earlier work, Kilic and Bozkurt [6] incorporates a look-ahead feature simulating future events and making decisions based on these simulations.
The framework in Kilic and Bozkurt [6] is revisited again in Kilic et al. [7] where a stochastic model and its rigorous mathematical analysis are provided. In the latter, the problem is formulated as a continuous time stochastic control problem in which the objective is to maximize the expected total revenue received from the advertisers and the (dynamic) decision variables are the ads shown to users when they visit a new virtual location. In the current paper, we follow the formulation of Kilic et al. [7], and we provide exact and heuristic algorithms which can be used in practice.
Every time a user (re-)enters the state {1}, the advertiser can display an ad of one of m-many advertisers. This decision is a variable on the extended set of indices of the advertisers {0, 1, . . . , m}, with 0 representing the choice of not displaying any ad. The accumulation of these choices are stored in a cumulative exposure matrix process A = (A t ) t∈[0,T ] where A t is an n × m matrix whose (i, j)'th component stores the number of times user i is exposed to advertiser j's ad by time t. At the terminal time T , for a given reward function F , the manager of the virtual environment receives a revenue of F (A(T )). The function F is assumed to be an R-valued function on some compact subset N of N n×m . The compactness of the domain is to reflect that the users are not overexposed to ads. Clearly, there can be various requests from the advertisers as mentioned in Section 2 above; advertisers may require a minimum number of displays before granting a payment, or they may place some incentives if their ads are shown more than those of the competitors etc. All these preferences/constraints are assumed to be embedded in the definition of the function F .
In this framework, the objective of the manager is to find a dynamic display policy maximizing the expected revenue E F (A(T )) . For notational convenience, we represent such a policy with a vector D = (D 1 , D 2 , . . .) where D is the display decision taken by the manager at time T , for ≥ 1. Here we set D = 0 if X Y (T ) = 0; that is, we show no ad if the user Y logs out and becomes offline. 4.1. Value function and DP operator. The formulation above indicates that the problem is Markovian in i) cumulative exposure matrix, ii) current states of the users, and iii) remaining time until the end of planning horizon. Over the state space we define the value function as In plain words, V (a, x, t) is the maximum expected revenue the manager can obtain when the current exposure matrix is a ∈ N , current state of users is x ∈ {0, 1} n , and the problem terminates at time t. Here, the expectation operator E (a,x) corresponds to a probability measure P (a,x) under which A(0) = a and X 1 (0), . . . , X n (0) = x ≡ (x 1 , . . . , x n ) with probability one. Given this Markovian structure, the dynamic programming principle suggests that the value function should satisfy the equation for a function f on N . Here, and S i,j is given by in terms of an n×m matrix I i,j whose (i, j)'th entry is one, all other entries are zero.
For a function f representing the value of continuing after a decision, gives the value of displaying the advertiser j's ad to user i. This suggests that the manager should pick the decision arg max solving the maximization problem in (4). In (3), we assume that f (a + I i,j , ·, ·) = −∞ whenever a ∈ N and a + I i,j / ∈ N . Note that the expectation in (3) can also be written explicitly using the distribution of the first event time T 1 as where This yields the density and we obtain whereē i denotes the row vector of all zeros except the i'th entry being one. The expression in (9) represents the value of the terminal reward when none of the users change their virtual location for the rest of the time horizon (that is when t < T 1 ). The expression in (10), on the other hand, corresponds to the second line in (3) and it gives the value of continuing when one of the users changes his/her virtual location before t. In the integral term in (10), the expression inside the brackets is the expected reward that we will get (from then on) if user i changes the virtual location when there is t − u time units left until the end of the time horizon. Then, the sum computes an expected value considering the probability distribution over users for such an action, and finally the integral takes care of the randomness in the time of the first event T 1 .
In [7], it is shown that the value function is indeed the unique bounded solution of the equation f = J [f ]. If we construct the sequence of functions we have That is, each function U k gives the maximal expected revenue that the manager receives when s/he is allowed to take a display decision at the first k event times T 1 , . . . , T k only. It follows from this observation that the functions U k 's form a non-decreasing sequence of functions. Moreover, they convergence to the function V uniformly fast with an explicit bound on the convergence rate given by the inequality where It can be shown that the maximum truncated expected revenue in (12) is attained by the display policy 0 elsewhere (15) in which A (k) is the resulting exposure process. Note that for some large k we can obtain the function U k by carrying out the computations in (9-10) iteratively so that the resulting error term for V − U k is less than some acceptable level ε > 0 (see (13)). For such a value of k, if we apply the policy D (k) in (15), the expected revenue that we receive is no more than ε-away from the true optimal value. In terms of the value function V = U ∞ , the optimal policy Here, for k ≥ 1, A (∞) (T k −) is the exposure matrix just before time T k which follows from applying the display decisions D k−1 . Once we have the function V available, we can apply the optimal policy above.

4.2.
Computing the value function. The successive approximations above also yield a numerical method to compute/approximate the value function. For a large value of k, we can compute the function U k by applying the DP operator in (3) kmany times starting with U 0 (a, x, t) = F (a) (see (9)(10)). The number of iterations are set in advance so that the error term in (13) is negligible.
Alternatively, we can use a finite difference method to compute the value function numerically. Using a standard infinitesimal first step analysis for the difference one can show, after letting h 0, that the value function satisfies the equation where The equation (17) with the boundary condition V (a, x, 0) = F (a) characterizes the value function uniquely. Therefore, we can start from the boundary t = 0 and construct/approximate the value function via a standard a finite scheme method over a grid with sufficiently small step size h > 0 using In Section 5, we use both the successive approximations and the finite difference approach to evaluate the value function numerically. These methods summarized in the algorithms below.

Heuristics.
Since the exact algorithms are computationally demanding, it might be desirable to use greedy heuristics for practical purposes. These heuristics typically would check the best alternative in terms of the immediate marginal return. On the other hand, the immediate return materializes only if the contractual constraints (i.e., minimum / maximum and display limits per viewer, and minimum / maximum payment amount) are satisfied. Hence, among the set of all ads only those ads that satisfy these constraints would be alternative and a typical greedy heuristic would choose the one that yields the highest marginal return.
Nevertheless, particularly in the early stages of the time period since neither the minimum display limits per viewer nor the minimum payment amount is satisfied, such an approach would assign the initial ads randomly and neglect their potential marginal returns. Therefore, a greedy heuristic that neglects these constraints and assigns the ads based on the potential marginal revenue, can also be considered as a possible alternative. Note that here the term potential is used to describe the marginal revenue since the heuristic considers the return as if all of the constraints Algorithm 2: Value iteration algorithm.
Step 1: For a tolerance level ε > 0 determine k such that the error bound in (13) is less than ε. For large R ∈ N define the step size (in time) h = T /R and construct the discrete set Step 2: Compute J [U l ](a, x, t) by evaluating and adding the expressions in (9-10), and set for all (a, x, t) ∈ ∆ d .
Step 3: Increment l by one. Stop if l = k, otherwise go to Step 2. were satisfied at the end of the time horizon, which in reality might not be the case. Yet, the latter alternative would suffer from not considering the maximum display limit per viewer and maximum payment amount constraints, since any exposure beyond these limits would not yield any actual return for the publisher at the end of the time horizon. As a result a third alternative which only considers the maximum display limit per viewer and the maximum payment amount considers seems to be a good way of overcoming the disadvantages of the both of these two alternatives.
In light of this discussion, we proposed four additional heuristics. These are labeled as A, B, C, and Random. Briefly speaking, these heuristics except the last one are mainly greedy heuristics, and they differ from each other on the degree of shortsightedness and corresponds to one of the typical candidate greedy heuristic which was mentioned earlier. That is to say, the subset of the constraints considered by a particular heuristic is what sets it apart from the others.
The first heuristic, i.e., Heuristic A, considers all of the four constraints, that is, the maximum advertisement display number, minimum advertisement display number, maximum budget and minimum budget constraints. It checks all these constraints for each one of the advertisements and rules out those that are not going to generate any revenue due to the constraints. It then assigns the ad that yields the highest immediate return among the remaining ones. Heuristic B, takes into account only the maximum advertisement display and the maximum budget constraints and neglects the other two constraints during the candidate advertisements selection/elimination process. Likewise, the third heuristic, namely Heuristic C, neglects all of the constraints and dispatches the advertisement that yields the highest potential return for the user (as if there is no constraint). The steps of the Heuristics A, B and C are depicted in Algorithm 3 below.
Finally, the fourth heuristic is a purely randomized heuristic which and it displays one of the advertisements at random with equal probabilities. We have considered the Random heuristic merely as a benchmark in the experimental analysis.
5. Numerical analysis. Here, our intention is not to provide a full scale experimental analysis which is computationally prohibitive due to the number of parameters. We conduct the numerical analysis over relatively small problems in order to Algorithm 3: Heuristic algorithms A-B-C. Let T 1 < T 2 < . . . ≤ T be the times of successive transitions of users, and Y 1 , Y 2 , . . . be the indices of the users making those transitions. The heuristics determine the decisions as follows: Step 1: For the user Y create the set of advertisements M satisfying the constraints. The constraints vary among the heuristics as follows: Heuristic A: All four constraints. Heuristic B: Only the two maximum constraints. Heuristic C: No constraint. (1)gain insights regarding to the sensitivity over the algorithm parameters (2) compare the performances of the heuristics with the exact methods (given in Section 4.2) which suffer greatly from the curse of dimensionality. The parameters associated with the problem and developed algorithms are listed in Table 1. The problem specific parameters can be categorized into two subsets: those that are based on the user activities, and those that are due to the contract between the advertisers and the publishers. The former consists of the initial states, transition rates and probabilities, whereas the latter includes the exposure payment data, minimum/maximum display constraints, and the minimum/maximum payment constraints.
Problem size refers to the number of users n and the number of advertisements m. In order to avoid excessive computing times (i.e., to avoid the curse of dimensionality) we assumed a setting with two users and two advertisers in the following numerical analysis. In real life problems, the transition rates, the β-probabilities, and initial states can be estimated from the user activities stored in log files. In our numerical experiments the initial states were assumed to be as [0 0]; that is, both users are offline at time zero. The 'payment per display' matrix P , the transition rates µ i,j 's, and the probabilities β i 's are set as follows: The contracts between the advertisers and the publisher shape the reward function F in practice. In particular, each contract specifies the within-constraintspricing scheme; that is, how much the advertiser would pay per exposure provided that the constraints are not violated. This payment structure per exposure is given by the matrix P . At the end of the time horizon, if no constraint is violated, the total reward is matrix product of P with the exposure matrix a. If a minimum exposure constraint is violated for a particular user, no payment is made from the advertiser. If the maximum exposure constraint is violated, the advertiser does not pay for the excess exposure. In any case, total payment made by an advertiser is capped by the maximum budget constraint. Finally, no payment is made by an advertiser if the minimum budget constraint is not satisfied.
Even though there seems to be a consensus in the advertising literature regarding the inverted U -shaped relation between the repetitions and the effect of the advertisement, the determination of the optimal number of the repetitions is still an open question and depends on multiple other factors (such as familiarity of the brand, the advertising length, etc.). There are two separate schools of thoughts on this matter, namely the minimalists and the repetitionists. The minimalists considers three or four as the desired number of repetitions where this number is around eight for repetitionists [17]. Schmidt and Eisend [12] conducts a meta-study of the existing literature in order to analyze the advertising repetition effect. It covers 37 papers that conducts experimental analysis and observes that (with only one exception) the exposure rates utilized in these studies range from one to eight exposures with an average of 3.5. Therefore, it seems to be safe to set the minimum display constraint to three, and maximum display constraint to either five (if one takes side with the minimalists) or eight (if one follows the repetitionists' view). Therefore, we used both values for the maximum display constraint. We regard these choices as low and high respectively.
On the other hand, minimum and maximum payment constraints are highly problem specific. They should be adjusted in an experimental analysis by means of the other problem specific parameters such as the exposure payment values, problem size, and transition rates. In the numerical analysis, we consider two levels (low and high again) for each of the minimum and maximum payment constraints. The corresponding values are determined based on the exposure payment values and the transition rates for the numerical analysis. These values are assumed to be 10 for low minimum payment constraint and 30 for high minimum payment constraint. On the other hand, for the maximum payment constraint the low and the high levels are assumed to be 40 and 70 respectively.
As a result we obtain eight different experimental conditions for the numerical analysis. These conditions are listed in Table 2. Note that the experiment numbers are chosen to encode the experimental conditions; for example, Experiment 111 corresponds to the case where the maximum display constraint is low (= 5), minimum payment constraint is low (= 10) and the maximum payment constraint is also low (= 40).
In order to conduct the numerical study, sample paths of two-state Markov processes representing users' actions are simulated. At the beginning of the analysis, an appropriate value for the number of replications (or the number of sample paths) is determined so that the variation due to randomness is swept away, and yet it is also computationally feasible to obtain stable solutions. For this purpose, random replications are generated and the moving averages (of the terminal reward) is computed for each algorithm. Figure 1 depicts the results obtained in Experiment #111 for the six algorithms. Similar results are obtained for the remaining seven experimental conditions. We observe that the sample means become quite stable after 1000 runs, hence we used 1000 as the number of replications in the analysis. In the following results, the performances of the heuristics are assessed based on the results obtained from 1000 replications. For the dynamic programming based successive approximation approach and the finite difference approach, both the theoretical results (i.e., computed expected revenue) and the revenue realizations (i.e., sample mean of revenues generated by their policies using given realizations) are presented whenever appropriate.
5.1. Sensitivity analysis for algorithm specific parameters. This sensitivity analysis is conducted to tune the parameters of the finite difference and value iteration algorithms so that they perform sufficiently well in reasonable time. The analysis is useful as part of the validation and verification process of the developed algorithms. It reveals insights regarding the performance of the algorithms both in terms of the computational time and sample means of revenues.
The finite difference algorithm has h-value as a parameter, which is the step length over time in the implementation of the algorithm (see Algorithm 1). Higher the h-value, better the approximation but higher the computational time. We conduct numerical analysis for various values of L = T /h with the time horizon T = 1. The results for the Experiment #111 are depicted in Figure 2 as an example. Similar results are obtained for the remaining seven experiments, and as a result L = 100 is set and used in the rest of the analysis.
The number of iteration (k) is one of the fundamental factors that effect the results of the dynamic programming based value iteration algorithm (see Algorithm 4.2). Even though a theoretical value depending on an error parameter > 0 is provided earlier, we conducted a numerical analysis in order to determine reasonably good values for this parameter for the sake of decreasing the computational efforts. On the other hand, similar to the L-value (or h-value) in the finite difference algorithm, the resolution parameter R (i.e., number of partition points of the time horizon) is required to approximate the integrals in Equation (10). Clearly, higher the resolution, better the approximation but higher the computational time. Therefore, an appropriate value for resolution parameter is also required where dependable results are achieved in relatively short times. Computed expected revenues in Figure 3 illustrate this dependence for Experiment #111. Figure 3 depicts the computed expected revenues determined at each iteration for the value iteration algorithm with different resolution parameter values R=10, 20, 40, 60, 80, 100, 120 in Experiment #111. From the figure we can observe that setting the value iteration number to 40 is safe since the expected revenues are sufficiently stable at this level, which is also the case for the remaining seven experiments. Figure 4 presents the expected revenues given by the value iteration algorithm with iteration number 40 at different resolution parameter values in Experiment #111. Recall that the prior analysis regarding the finite difference algorithm lead us to set  the L-value to 100. Consistent with this choice, we set the resolution parameter for the value iteration algorithm to R = 100 which is also a level where the expected revenue is stabilized as depicted in Figure 4.
The computational requirement of the value iteration algorithm is linearly dependent with iteration number. However, it is exponentially related with the resolution value as depicted in Figure 5. Therefore, the resolution parameter is among the factors that significantly deteriorates the practical applicability of the value iteration algorithm. On the other hand, the analysis reveals that the personalized advertisement assignment problem is actually very robust and even though the actual expected revenues widely deviates for R = 10 and R = 100 as depicted in Figure 4, the sample mean revenues obtained from 1000 replications do not differ from each other as illustrated in Figure 6. That is to say, even though the real expected revenues can't be determined for low resolution values, when it comes deciding which advertisement should be assigned for a user in a realization, the decision is reliable. This observation is important and hints for a possible approximate implementation of the value iteration algorithm in practice.
A comparison of computational time requirements for heuristics is not provided in the paper since the computational times are in milliseconds for heuristics. For the finite difference method it is in minutes, and for the value iteration approach it is in the order of hours. That is to say, the finite difference and value iteration algorithms require excessive amount of time in order to determine the optimal expected revenues. This is because we evaluate the operators repeatedly over the state space. However, one should keep in mind that after the initial setup time required to generate the optimal expected revenues, for any realization the decision can be made simply by preparing a look-up table and using it whenever needed.

Performance analysis.
The second group of analysis is conducted in order to assess the performances of the algorithms in various experimental conditions. Table 3 presents the sample mean revenues (SMR) of 1000 replications attained by all six of the algorithms and also the expected revenues (ER) calculated by the finite difference and value iteration algorithms. In order to assess the statistical significance of the differences we conducted paired t-tests for the difference of means. The results reveal that with some exceptions, all of the differences between the SMRs available in the table are statistically significant with p-value ≤ 0.001. The exceptions where the differences are not statistically significant: (1) difference between the SMR's associated with the finite difference and value iteration approaches in any experimental conditions (2) difference between the SMRs attained by Heuristic C and Random for Experiments #121 and #122 (3) difference between the SMRs attained by Heuristic B and finite difference algorithm for Experiments #211 and  The finite difference and value iteration approaches outperforms the heuristics in all experimental conditions with the exception of Experiment #211 and #212 where the difference between the SMRs of the two algorithms and Heuristic B are not statistically significant. Recall that Heuristic B checks only the maximum advertisement display and maximum payment constraint (while making a decision), and neglects the other two constraints. Experiment #211 and #212 are the cases where the maximum advertisement display is high and the minimum payment constraint is low, in other words both are loose and easier to attain. Therefore, not considering the minimum payment constraint seems not to be a problem and Heuristic B performs well. Experiment #111 and #112 are the other experimental conditions where minimum payment constraint is loose and Heuristic B also performs well in these experiments and yields SMRs close to the optimal expected revenues.
Heuristic B outperforms the other three heuristics in all of the experimental conditions. Even though Heuristic A considers all of the four constraints, apparently not considering the minimum display constraint and minimum payment constraint improves the performance of the heuristic. This might be due to the fact that these two constraints leads the heuristics to decide randomly early in the time horizon since none of the advertisements satisfies these two constrains and the ties are broken randomly which in turn worsens the performance of the algorithm.
For Experiment #121 and #122, the difference between Heuristic C and Random is not statistically significant. Heuristic C doesn't consider any constraints and assigns the advertisement that yields the highest immediate revenue. However, this myopic strategy seems not to be different then the random assignment particularly for the case where maximum display constraint is low (i.e., assume the value suggested by the minimalist view in advertisement literature) yet minimum payment constraint is high. The analysis suggests that considering the maximum advertisement display and maximum payment constraints would benefit the heuristic so that practitioners who would like to employ myopic strategies in practice should consider them in their design.
We observe that in Experiments #121 and #122, heuristic algorithms perform drastically poor; note that in these cases maximum display and minimum payment constraints are tight. This shows that in the presence of tight constraints, the myopic heuristics A, B, C do not necessarily outperform the purely random policy. On the other hand, heuristics such as B, which incorporates some of the constraints of the problem in order to improve the shortsightedness might be valuable in various experimental conditions, but also suffers drastically in Experiments #121 and #122.
As expected, the ER values of the value iteration and finite difference approaches are very close to each other in all experiments. The difference between the SMRs are not statistically significant either as expected. As pointed out earlier, value iteration algorithm is computationally more demanding than the finite difference algorithm. Furthermore, the discussion provided in the sensitivity analysis section regarding the robustness of the value iteration algorithm with respect to the resolution parameter R also holds for the finite difference algorithm with respect to L (or h-value). Figure  7 depicts the SMR values of the finite difference algorithm for different L-values in Experiment #111 where one can observe that it is possible to speed-up the finite difference algorithm by decreasing the L-value without loosing much from the realized performance. Therefore, finite difference algorithm seems to be a better candidate among the exact approaches for the solution of advertisement assignment problem.
6. Concluding remarks. This research is inspired from a real-life problem faced by a 3D virtual social platform developer company. The company seeks to improve/optimize its advertisement revenues. The virtual environment allows the developer company (i.e., publisher) to personalize the advertisements and maximize its profit by targeting various users that are compatible with the specifications determined by the advertisers. This problem encompasses two phases; namely, the matching phase where the compatibility of the users' profiles with the advertisers' specifications are determined, and the assignment phase where the best advertisement among the set of advertisements is assigned whenever a user becomes active (i.e., an opportunity to expose an advertisement is realized). The advertisers pay different amounts depending on the degree of compatibility that is determined during the matching phase. Based on the contracts between the publisher and the advertisers, various constraints exist in the problem. These are the maximum payment, minimum payment, maximum advertisement display number, and minimum advertisement display number constraints.
In this paper, six different personalized advertisement assignment algorithms are considered and their performances are evaluated. First two, namely the value iteration and finite difference algorithms, follow from [7]. Additionally, three greedy heuristics with varying level of "consciousness" and a random heuristic are proposed.
For these algorithms, two sets of numerical analysis are conducted. The first one, was conducted in order to tune the essential parameters of the dynamic programming algorithms, so that the rest of the analysis could be conducted in reasonable computational times yet with sufficient performance. This analysis also serves as a support to validate and verify the algorithms. The second set of experiments was conducted in order to assess and compare the performances of the developed algorithms in terms of the sample mean revenues that they generate after 1000 replications in eight different experimental conditions.
The results suggest that myopic algorithms perform quite poor particularly when the constraints such as maximum display and minimum payment are tight. Therefore, practitioners that are keen to use myopic strategies should realize that such policies might yield revenues that are as low as those given by total random policies (using no information in making ad assignments) depending on the experimental conditions. In some cases, heuristics (such as Heuristic B), which incorporate some of the constraints of the problem in order to improve the shortsightedness might perform well yet may also suffer drastically in other cases.
The experimental analysis also reveals that the advertisement assignment problem is robust to a degree and might yield similar revenues based on decisions that are not based on the exact expected revenues which require extensive computations but obtained by lower resolution and/or h-value parameters. Both of the algorithms suffer from the curse of dimensionality, and the size of the state space is the major obstacle to use these algorithms in practice. Heuristic approaches that is on the clustering of the users and advertisements in order to reduce the state-space might be promising approaches that worth investigating as future research.
The personalized advertisement assignment problem is only an example of a new family of problems where the physical limitations of the real world can be relaxed with the emergence of the virtual reality environments. We believe that the OR/MS community have much to offer for the problems of virtual world as well, hence more attention is required from the researchers to this fertile field.