Theoretical contributions in

stochastic optimization, optimal learning and machine learning

 

The computational and applied research in CASTLE Lab is supported by a strong foundation of theoretical research. Most of these are contributions to stochastic optimization, but recently we have been making contributions to machine learning and a field of research we are referring to as optimal learning. This work has lead to practical breakthroughs such as a fleet simulator for a major transportation company which produced an infinite dimensional stochastic dynamic program, and a stochastic, multiple scale model for energy resource planning which spanned hundreds of thousands of time periods.

L. Hannah, D. Blei and W. B. Powell, "Dirichlet Process Mixtures of Generalized Linear Models"

We propose Dirichlet Process-Generalized Linear Models (DP-GLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and any response that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate and give a practical example for when those conditions hold. Additionally, we provide Bayesian bounds on the distance of the estimate from the true mean function based on the number of observations and posterior samples. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART and Gaussian processes. We show that the DP-GLM is competitive with the other methods, while accommodating various inputs and outputs and being robust when confronted with heteroscedasticity.

(click here to download paper)

P. Frazier and W. B. Powell, “Asymptotic Optimality of Sequential Sampling Policies for Bayesian Information Collection”

We consider adaptive sequential sampling policies in a Bayesian framework. Under the assumptions that the sampling distribution is from an exponential family and that
the number of distinct measurement types is nite, we give sucient conditions for an adaptive sampling policy to achieve asymptotic optimality. Here, asymptotic optimality is understood to mean that the limit of the expected loss under the given sampling policy as the number of measurements allowed grows to in nity attains the minimum over all possible sampling policies. This property is important because it ensures convergence in the limit for sophisticated policies designed to maximize performance over the short-term. We then apply these sucient conditions to show asymptotic optimality of three previously proposed ranking and selection policies: OCBA for linear loss, LL(S), and LL(1). We also show how this sucient condition may be generally applied to a broad class of knowledge-gradient policies.

(click here to download paper)

J. Nascimento, W. B. Powell, “An Optimal Approximate Dynamic Programming Algorithm for the Energy Dispatch Problem with Grid-Level Storage,”

This paper provides a convergence proof for an approximate dynamic programming algorithm that features a scalar, controllable state variable (the amount of a resource such as energy held in storage) in the presence of a discrete vector of exogenously changing state variables. This model required discretizing the amount of water held in a reservoir at a fine level to estimate a piecewise linear value function approximation. This paper provides a convergence proof for an ADP algorithm that requires no exploration, and which does not require enumerating all the possible values of the amount held in storage.

(click here to download paper)

J. Ma and W. B. Powell, “Convergence Proofs for Least Squares Policy Iteration Algorithm of High-Dimensional Infinite Horizon Markov Decision Process Problems,”

We have considerable experience developing convergence proofs for discrete or piecewise linear value function approximations that arise in the management of discrete resources. This paper provides a convergence proof for dynamic programs with continuous states and actions. The paper initially makes the assumption that the value function can be represented exactly using a finite set of known basis functions, a widely used technique in approximate dynamic programming. We propose, and prove convergence for, a practical algorithm that can handle both continuous and vector-valued states and actions, using the concept of thepost-decision state variable. The algorithm uses approximate policy iteration, that depends on inexact evaluations of a policy. The paper closes with an extension to unknown basis functions.

(click here to download paper)

Nascimento, J. and W. B. Powell, “An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem,” Mathematics of Operations Research, Vol. 34, No. 1, pp. 210-237 (2009). (c) Informs

I have worked for a number of years using piecewise linear function approximations for a broad range of complex resource allocation problems. A few years ago we proved convergence of this algorithmic strategy for two-stage problems (click here for a copy). In this latest paper, we have our first convergence proof for a multistage problem. This paper is more than a convergence proof for this particular problem class - it lays out a proof technique, which combines our work on concave approximations with theory laid out by Bertsekas and Tsitsiklis (in their Neuro-Dynamic Programming book).

(click here to download paper)

S. Dayanik, W. Powell, and K. Yamazaki, “Index policies for discounted bandit problems with availability constraints,” Advances in Applied Probability, Vol. 40, No. 2, pp. 377-400 (2008).

One of the most famous problems in information collection is the multiarmed bandit problem, where make a choice (out of a discrete set of choices), observe a reward, and use this observation to update estimates of the future value of rewards. This problem can be solved by choosing the option with the highest index (known as the Gittins index). In this paper, we present an index problem for the case where not all the choices are available each time. As a result, it is sometimes important to make an observation just because the observation is available to be made.

(click here to download paper)

Frazier, P., W. B. Powell and S. Dayanik, “A Knowledge Gradient Policy for Sequential Information Collection,” SIAM J. on Control and Optimization, Vol. 47, No. 5, pp. 2410-2439 (2008).

The knowledge gradient policy is introduced here as a method for solving the ranking and selection problem, which is an off-line version of the multiarmed bandit problem. Imagine that you have M choices (M is not too large) where you have a normally distributed belief about the value of each choice. You have a budget of N measurements to evaluate each choice to refine your distribution of belief. After your N measurements, you have to choose what appears to be the best based on your current belief. The knowledge gradient policy guides this search by always choosing to measure the choice which would produce the highest value if you only have one more measurement (the knowledge gradient can be viewed as a method of steepest ascent). The paper shows that this policy is myopically optimal (by construction), but is also asymptotically optimal, making it the only stationary policy that is both myopically and asymptotically optimal. The paper provides bounds for finite measurement budgets, and provides experimental work that shows that it works as well as, and often better, than other standard learning policies.

(click here to download paper)

Papadaki, K. and W. B. Powell, “Monotonicity in Multidimensional Markov Decision Processes for Batch Service Problems,” Operations Research Letters, Vol. 35, pp. 267-272 (2007).

Structural properties of stochastic dynamic programs are essential to understanding the nature of the solutions and in
deriving appropriate approximation techniques.We concentrate on a class of multidimensional Markov decision processes and
derive sufficient conditions for the monotonicity of the value functions. We illustrate our result in the case of the multiproduct
batch dispatch (MBD) problem.

(Click here to download paper)

George, A. and W.B. Powell, “Adaptive Stepsizes for Recursive Estimation with Applications in Approximate Dynamic Programming,” Machine Learning, Vol. 65, No. 1, pp. 167-198, (2006). (c) Springer

One of the first challenges anyone will face when using approximate dynamic programming is the choice of stepsizes. Use the wrong stepsize formula, and a perfectly good algorithm will appear not to work. Deterministic stepsize formulas can be frustrating since they have parameters that have to be tuned (difficult if you are estimating thousands of values at the same time). This paper reviews a number of popular stepsize formulas, provides a classic result for optimal stepsizes with stationary data, and derives a new optimal stepsize formula for nonstationary data. This result assumes we know the noise and bias (knowing the bias is equivalent to knowing the answer). A formula is provided when these quantities are unknown. Our result is compared to other deterministic formulas as well as stochastic stepsize rules which are proven to be convergent. The numerical work suggests that the new optimal stepsize formula (OSA) is very robust. It often is the best, and never works poorly.

(Click here to download paper)

 

Powell, W.B., A. Ruszczynski and H. Topaloglu, “Learning Algorithms for Separable Approximations of Stochastic Optimization Problems,” Mathematics of Operations Research, Vol 29, No. 4, pp. 814-836 (2004). (c) Informs.

We have been doing a lot of work on the adaptive estimation of concave functions. This paper represents a major plateau. It describes a new algorithm dubbed the Separable Projective Approximation Routine (SPAR) and includes 1) a proof that the algorithm converges when we sample all intervals infinitely often, 2) a proof that the algorithm produces an optimal solution when we only sample the optimal solution of our approximation at each iteration, when applied to separable problems, 3) a bound when the algorithm is applied to nonseparable problems such as two-stage stochastic programs with network resource, and 4) computational comparisons against deterministic approximations and variations of Benders decomposition (which is provably optimal). The experiments show that the SPAR algorithm, even when applied to nonseparable approximations, converges much more quickly than Benders decomposition.

(Click here to download paper)

R. K.-L. Cheung and W.B. Powell, "SHAPE – A Stochastic, Hybrid Approximation Procedure for Two-Stage Stochastic Programs,” Operations Research, Vol. 48, No. 1, pp. 73-79 (2000) (c) Informs.

This paper offers a provably convergent algorithm for two-stage stochastic programs. The method uses a differentiable (strictly convex) auxiliary function, and then "tilts" it using stochastic gradient information. The algorithm is very easy to implement.

(Click here to download paper)

Chen, Z.-L. and W.B. Powell, "A Convergent Cutting Plane and Partial Sampling Algorithm for Multistage Linear Programs with Recourse," Journal of Optimization Theory and Applications, Vol. 103, No. 3, pp. 497-524 (1999). (c) Plenum Publishing.

This paper provides a proof of convergence for a class of nested Bender's cutting-plane algorithm. The basic model allows only right-hand side randomness, and requires independence between stages. The algorithm is computationally tractable for problems that are much larger than any prior algorithm for which a proof of convergence has been produced. It requires a finite sample space, but it may have thousands of realizations per stage, and the technique can handle dozens (even hundreds) of stages. Numerical performance will depend on the specific problem class.

(Click here to download paper)