优化 | Operations Research论文综述(69(1)期)

『运筹OR帷幄』原创

作者：陈宇文（牛津大学在读博士）翁欣（清华大学在读博士）

编者按

运筹学遍布世界的各个角落，从数学到计算机，从控制论到自动化，从机器学习到深度学习，从数字化到社会治理，我们都可以看到运筹学方法的影子。本篇推送我们整理了期刊Operations Research第69(1)期的论文导读。所有导读均有中英文翻译，并附有原文链接。

Preservation of Supermodularity in Parametric Optimization: Necessary and Sufficient Conditions on Constraint Structures

The concept of supermodularity has received considerable attention in economics and operations research. It is closely related to the concept of complementarity in economics and has also proved to be an important tool for deriving monotonic comparative statics in parametric optimization problems and game theory models. However, only certain sufficient conditions (e.g., lattice structure) are identified in the literature to preserve the supermodularity. In “Preservation of Supermodularity in Parametric Optimization: Necessary and Sufficient Conditions on Constraint Structures,” by Chen, Zhuoyu Long, and Qi new concepts of mostly sublattice and additive mostly sublattice are introduced. With these new concepts, necessary and sufficient conditions for the constraint structures are established so that supermodularity can be preserved under various assumptions about the objective functions. Furthermore, some classes of polyhedral sets that satisfy these concepts are identified,and the results are applied to assemble-to-order systems.

1.在参数优化中保持超模性：约束结构的充要条件

作者：Xin Chen , Daniel Zhuoyu Long , Jin Qi

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2020.1992

超模性这一概念在经济学和运筹学领域受到了广泛的关注。它与经济学中的互补性概念密切相关，也被证明是参数优化问题和博弈论模型中的进行单调比较静态分析的重要工具。然而，以往文献中只确定了某些充分条件（如网格结构）来保持超模性。本文中，作者引入了多子格（mostly sublattice）和可加性多子格（additive mostly sublattice）的新概念，建立了在各种目标函数假设下都能保持超模性的约束结构的充要条件。此外，作者还给出了满足这些概念的几类多面体集，并将这些结果应用于按订单装配的系统。

Pooling Queues with Strategic Servers: The Effects of Customer Ownership

Pooled queues are employed in many service settings (e.g., call centers, grocery stores) as a way to improve operational performance by reducing the variability in customer arrivals. However, in some settings, such as hospital emergency departments, empirical evidence suggests that pooled queues may not always lead to better operational performance. In “Pooling Queues with Strategic Servers,” Armony, Roels, and Song develop a game-theoretic model that compares the performance of pooled vs. dedicated queues when servers choose their capacities strategically and exhibit varying scopes of customer ownership. They find that dedicated queues can yield substantial benefits when servers’degree of customer ownership is so low that they choose to operate at high utilization, and when they care much more about their customers’ processing times than about their waiting times; otherwise, the queue configuration makes little difference. These results have important implications for managers trying to determine which queue configuration to adopt.

2.服务台具有策略性时的集中排队：顾客所有权的影响

作者：Mor Armony , Guillaume Roels , Hummy Song

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2020.2004

作为一种通过减少顾客到达的易变性来提高运营性能的方法，集中排队在许多服务中被使用（如呼叫中心、杂货店)。然而在某些情况下，如医院急诊科，经验表明集中排队不一定会带来更好的运营性能。本文中，当服务台策略性地选择它们的容量并表现出不同程度的顾客所有权时，作者开发了一个博弈论模型用于比较集中排队（pooled queues）与专用排队（dedicated queues）的性能。他们发现，当服务台的顾客所有权低到选择在高利用率下运行时，以及当他们更关心顾客的处理时间而不是等待时间时，专用排队具有实质性的优势；在其他情况下，不同排队方式的性能几乎没有差别。这些结果对于需要选择排队方式的管理者具有重要意义。

Robust Multiperiod Vehicle Routing Under Customer Order Uncertainty

In “Robust Multiperiod Vehicle Routing Under Customer Order Uncertainty,” Subramanyam, Mufalli, La´ınez-Aguirre, Pinto, and Gounaris study tactical vehicle routing operations where the distributor maintains some control over the time period in which customers are served. However, as customer service requests are received dynamically over the planning horizon, routing plans should maintain a satisfactory degree of flexibility to accommodate potential service requests that have not yet been placed. This setting constitutes a multistage optimization problem with binary recourse decisions and binary random variables, which is inherently intractable. To that end, the authors propose a tractable approximation in the form of a nonanticipative two-stage robust optimization model for which they develop a branch-and-cut solution approach. The practicality of the approach as well as the high quality of the routing plans it generates are demonstrated via a series of rolling-horizon simulations.

3.顾客订单具有不确定性时的鲁棒多周期车辆路径规划

作者：Anirudh Subramanyam, Frank Mufalli, José M. Laínez-Aguirre, Jose M. Pinto, Chrysanthos E. Gounaris

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2020.2009

本文中，作者研究了分销商对服务顾客的时段具有一定控制权时的战术车辆路线运营。当在计划时段内动态接收客户服务请求时，路线计划应具有较好的灵活性，以容纳尚未安排的潜在服务请求。这种设定构成了一个难以直接求解的、包含二元追索权决策和二元随机变量的多阶段优化问题。作者将其近似为一个易于处理的非预期两阶段鲁棒优化模型，并开发了一个分支切割求解方法。最后，作者通过一系列滚动时域模拟试验，验证了该方法的实用性以及它生成的路线规划具有高质量。

Technical Note—Understanding the Performance of Capped Base-Stock Policies in Lost-Sales Inventory Models

Single-sourcing lost-sales inventory systems with lead times have a long and rich history in the inventory literature and the earliest study dates back to 1958. Due to the complex structure of the optimal policy and computational intractability, such inventory systems are notoriously difficult to optimize. In “Technical Note—Understanding the Performance of Capped Base-Stock Policies in Lost-Sales Inventory Models,” Xin proposes a new family of simple capped base-stock policies and provides a new perspective of constructing a practical hybrid policy combining two well-known heuristics: base-stock and constant-order policies. Each capped base-stock policy is associated with two parameters: a base-stock level and an order cap. Compared with the traditional base-stock policy, the additional order cap provides order-smoothing and constrains high future holding costs. The author numerically demonstrates its superior performance by comparing with other existing heuristics and explains why it performs so well and when it brings the most benefits. The author also builds a theoretic connection between the capped base-stock policy and the constant-order policy.

4.上限基本库存策略在需求损失库存模型中的表现

作者：Linwei Xin

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2020.2019

关于具有提前期的单一货源需求损失库存系统的研究有着悠久而丰富的历史，最早的研究可以追溯到1958年。众所周知，由于最优策略的复杂结构和计算的难解性，这类库存系统难以优化。本文中，作者提出了一类简单的上限基本库存策略，并通过结合两种著名的启发式策略（基本库存和固定量订购策略），提供了一个新视角来构建一种实用的混合策略。每一个上限基本库存策略都与两个参数相关：基本库存水平和订购上限。与传统的基本库存策略相比，新增的订单上限确保了订购量的平稳，并控制了较高的未来持有成本。作者通过数值试验与其他已有的启发式算法比较，证明了这一方法的性能更优，并解释了性能更优的原因以及什么情况下这一方法能带来最大效益。作者还建立了上限基础库存策略与固定量订购策略的理论联系。

Targeting, Deployment, and Loss-Tolerance in Lanchester Engagements

Military forces engaged in a battle of attrition have classically been described by Lanchester equations. Existing Lanchester combat models focus on two force parameters: numbers (force size) and per-capita effectiveness (attrition rate). Whereas these two parameters are central in projecting a battle’s outcome, there are other important factors that affect the battlefield: (1) targeting capability, that is, the capacity to identify live enemy units and not dissipate fire on nontargets; (2) tactical restrictions preventing full deployment of forces; and (3) morale and tolerance of losses, that is, the capacity to endure casualties. Atkinson, Kress, and MacKay incorporate these three factors into Lanchester models in “Targeting, Deployment, and Loss-Tolerance in Lanchester Engagements” and derive force-parity equations for various combinations of these factors and obtain general implications and tradeoffs. The paper shows that more units and better weapons (higher attrition rate) are preferred over improved targeting capability and relaxed deployment restrictions, unless these are poor.

5.兰彻斯特战役中的定位、部署和损失容忍

作者：Michael P. Atkinson , Moshe Kress , Niall J. MacKay

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2020.2022

参与消耗战的军事力量通常用兰彻斯特方程来描述。现有的兰彻斯特作战模型主要关注两个力量参数：数量（*队军**大小）和人均效能（损耗率）。尽管这两个参数是预测战斗结果的核心，但还有其他影响战场的重要因素：(1)瞄准能力，即识别敌军存活单元和不向非目标分散火力的能力；(2)避免部队全面部署的战术限制；(3)士气和对损失的容忍，即忍受伤亡的能力。本文中作者将这三个因素纳入兰彻斯特模型，得到了这些因素的一般性含义和相互之间的权衡。论文表明，除非的确很糟糕，更多的力量单位和更好的*器武**（更高的损耗率）比提高瞄准能力和放宽部署限制更重要。

The Shortest Path Interdiction Problem with Randomized Interdiction Strategies: Complexity and Algorithms

Traditional network interdiction problems limit the leader to use a deterministic interdiction strategy, but a stochastic interdiction strategy can present a greater dilemma to the follower who seeks a shortest path on the network. In "The Shortest Path Interdiction Problem with Randomized Interdiction Strategies: Complexity and Algorithms," Holzmann and Smith examine a shortest-path interdiction problem in which the leader randomizes its attacks, and the follower only knows the probability of where attacks will occur. The cost of each arc increases as a function of the number of times the arc is interdicted, and the leader seeks to maximize the follower’s shortest expected path. When the arc cost function is linear in the number of attacks that occur on the arcs, this problem is solvable in polynomial time by linear programming. However, both the special cases in which the arcs are all convex or all concave turn out to be NP-hard. The authors give general-purpose algorithms for this interdiction problem, along with a polynomial-time approximation algorithm for the case of concave cost functions.

6.使用随机阻断策略的最短路径阻断问题：复杂性与算法

作者：Tim Holzmann , J. Cole Smith

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2020.2023

传统的网络阻断问题限制了领导者使用确定性的阻断策略，而随机阻断策略会使寻求网络最短路径的跟随者面临更大的困难。本文中，作者研究了一个最短路径阻断问题：在这个问题中，领导者将其攻击随机化，而追随者只知道攻击发生的概率。每条弧线的成本随着弧线被阻断次数的增加而增加，领导者的目标是使追随者的最短期望路径最大化。当弧成本函数与对弧的攻击次数成线性关系时，该问题可通过线性规划在多项式时间内求解。然而，弧全部为凸或凹的这两种特殊情况都是np难的。作者给出了这个阻断问题的通用算法，以及凹成本函数情况下的多项式时间近似算法。

Risk Arbitrage Opportunities for Stock Index Options

Research about equity index options has shown that option prices systematically violate rational pricing bounds for the risk-averse representative investor. These results raise the question of whether profitable trading possibilities exist in this market. Standard portfolio optimization does not apply because of the large bid-ask spreads and low quote sizes in this market. In “Risk Arbitrage Opportunities for Stock Index Options,” Post and Longarela are motivated by these complications, a system of linear inequalities is developed that completely characterizes all risk arbitrage opportunities in the presence of transaction costs and portfolio restrictions. The practical use of this system is illustrated with an application to front-month S&P500 stock index options. Smallscale portfolios seem to produce surprisingly large abnormal returns out of sample; outperformance, however, seems elusive for institutional investors because of the limited quote size, possibly reflecting data limitations.

7.股票指数期权的风险套利机会

作者：Thierry Post , Iňaki Rodríguez Longarela

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2020.2012

对股票指数期权的研究表明，对于想规避风险的代表投资者来说，期权价格系统性地违反了合理的定价界限。这些结果带来了一个问题：在这个市场中是否存在有利可图的交易可能性。标准的投资组合优化并不适用，因为这个市场的买卖价差很大但报价规模很小。出于对这些复杂现象的兴趣，作者在本文中建立了一个线性不等式系统，这一系列不等式在存在交易成本和投资组合限制的情况下，完整地刻画了所有风险套利机会。作者以即月标普500股票指数期权为例，说明了该系统的实际应用。小规模的投资组合似乎在样本上产生了惊人的巨大异常回报，然而，由于报价规模有限（可能反映了数据的局限性），对机构投资者来说，跑赢大盘似乎难以实现。

Competitive Equilibrium and Trading Networks: A Network Flow Approach

This paper considers a network of agents who trade indivisible goods or services via bilateral contracts. Under a substitutability assumption on preferences, it is known that a competitive equilibrium exists. In“Competitive Equilibrium and Trading Networks: A Network Flow Approach,” Candogan,Epitropou, and Vohra show how to determine equilibrium outcomes as a generalized submodular flow problem. Existence of a competitive equilibrium and its equivalence to seemingly weaker notions of stability follow directly from the optimality conditions of the flow problem. The formulation enables the authors to perform comparative statics with respect to the number of buyers, sellers, and trades. In particular, they are able to shed light on the impact of new trading opportunities on the equilibrium trades, prices, and surpluses. In addition, they present algorithms for finding competitive equilibria in trading networks and testing stability.

8.竞争均衡与交易网络：一种网络流方法

作者：Ozan Candogan , Markos Epitropou , Rakesh V. Vohra

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2020.1997

本文考虑一个通过双边合同进行不可分割商品或服务交易的代理网络。众所周知，在偏好可替代性假设下，竞争均衡是存在的。本文中，作者展示了如何将其化为一个通用次模流问题来确定均衡结果。流问题的最优性条件直接决定了竞争均衡和对应的看似稳定性稍弱的等价是否存在。这种建模方式使作者能够对买方、卖方和交易的数量进行比较静态分析。特别地，作者还阐明了新的贸易机会对均衡贸易、价格和盈余的影响。此外，他们提出了在交易网络中寻找竞争均衡和测试稳定性的算法。

Value of Information in Bayesian Routing Games

Over the years, travelers are increasingly relying on navigation services to inform their route choices and save time. How much value—in terms of saved time—can these services indeed provide to their users? The answer is not as simple as one may think because the value of traffic information not only depends on its accuracy, but also on how widely it is shared among travelers. For instance, information on an incident has the highest value for a traveler when the traveler is the only person who knows about it and takes a detour. However, this information becomes less valuable when it is shared with more travelers, who may take the same detour and cause congestion. In “Value of Informationin Bayesian Routing Games,” Wu, Amin, and Ozdaglar use a game-theoretic approach to analyze how the relative value of information provided by different navigation services changes with their market share and how travelers may choose from different available services.

9.贝叶斯路径博弈中的信息价值

作者：Manxi Wu , Saurabh Amin , Asuman E. Ozdaglar

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2020.1999

这些年来，旅行者越来越多地依靠导航服务来选择路线以节省时间。就节省的时间而言，这些服务究竟能为用户提供多少价值？答案并不像人们想象的那么简单，因为交通信息的价值不仅取决于它的准确性，还取决于它在旅行者之间的共享程度。例如，当某个旅行者是唯一知道事件并绕道的人时，这一事件信息对他来说具有最高价值。然而，当这些信息被分享给更多的旅行者时，就变得不那么有价值了，他们可能也会绕道而行，进而造成拥堵。本文中，作者使用博弈理论的方法分析了不同导航服务提供的信息的相对价值如何随着市场份额的变化而变化，以及旅行者如何从不同的可用服务中进行选择。

Technical Note—Global Robust Stability in a General Price and Assortment Competition Model

In “Technical Note—Global Robust Stability in a General Price and Assortment Competition Model,” Federgruen and Hu analyze a general but parsimonious price competition model for an oligopoly in which each firm offers any number of products. The demand volumes are general piecewise affine functions of the full price vector, generated as the “regular” extension of a base set of affine functions. The model specifies a product assortment, along with their prices and demand volumes, in contrast to most commonly used demand models, such as the multinomial logit model or any of its variants. The authors show that a special equilibrium in this model has global robust stability. This means that, from any starting point, the market converges to this equilibrium when firms use a particular response mapping to dynamically adjust their own prices in response to their competitors’prices. The mapping requires each firm to only know the demand function and cost structure for its own products (but not for other firms’products).

10.通用价格和品类竞争模型中的全局鲁棒稳定性

作者：Awi Federgruen, Ming Hu

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2020.2001

本文中，作者分析了一个通用但简洁的价格竞争模型，该模型适用于每个公司提供任意数量产品的寡头垄断。通过仿射函数基集的“常规”扩展，需求量可表示成取决于完整价格向量的广义分段仿射函数。该模型确定了一个产品组合（包括它们的价格和需求量），与最常用的需求模型（如multinomial logit 模型及其变体）进行对比。作者证明了该模型的一个特殊均衡具有全局鲁棒稳定性。这意味着，从任意点起始，当企业使用特定的响应映射来动态调整自己的价格以响应竞争对手的价格时，市场都会收敛至这一均衡。这种映射只要求每个企业知道自己产品的需求函数和成本结构，不需要知道其他企业的产品相关信息。

Technical Note—On Revenue Management with Strategic Customers Choosing When and What to Buy

In practice, when thinking about their purchasing decisions, customers usually strategize along two dimensions: (1) when to buy and (2) what to buy. That is, they might delay a purchase in anticipation of future price reductions, and/or they might purchase cheaper substitutes. Despite this, the literature has thus far dealt exclusively with either one of the two extremes whereby one of the two strategic dimension is missing. For example, a large body of work has studied forward-looking customers strategizing on when to buy but has done so merely within models in which customers have no alternatives to choose from. Conversely, another large body of work on assortment optimization and choice modeling has studied customers who choose what to buy from multiple product offerings but acting myopically. In "Technical Note—On Revenue Management with Strategic Customers Choosing When and What to Buy," Chen and Trichakis take a first step toward analyzing dynamic pricing when customers are fully strategic and choose both when and what to buy. Using a novel decomposition approach for the underlying multidimensional mechanism design problem, the paper presents theoretical and numerical performance analyses of static pricing policies.

11.顾客具有战略性时的收益管理：顾客决定何时购买、购买何种产品

作者：Yiwei Chen , Nikolaos Trichakis

原文：

https://pubsonline.informs.org/doi/abs/10.1287/opre.2020.2008

在实际中，顾客进行购买决策时通常围绕两个方面来制定策略：(1)什么时候购买和(2)买什么。也就是说，他们可能会因为预期未来价格下降而推迟购买，并且/或者他们可能会购买更便宜的替代品。但目前的文献只考虑两个战略维度中的一个方面而忽略另外一个方面。例如，大量研究关注了有前瞻性客户的择时购买策略，但都仅仅考虑了顾客没有其他产品可供选择的情况。相反，另一大类研究组合优化和选择建模的工作关注了那些从多种产品中选择但目光短浅、不考虑未来价格变化的顾客。本文作者首次提出当客户具有完全战略意识时的动态定价分析，这意味着在模型中客户能同时考虑购买产品的时间和产品的种类。对于这一潜在的多维机制设计问题，作者通过一种新的分解方法，对静态定价策略进行了理论和数值性能分析。

A Simple and Approximately Optimal Mechanism for a Buyer with Complements

Recent literature on approximately optimal revenue maximization has shown that in settings where agent valuations for items are complement free, the better of selling the items separately and bundling them together guarantees a constant fraction of the optimal revenue. However, most real world settings involve some degree of complementarity among items. The role that complementarity plays in the trade-off of simplicity versus optimality has been an obvious missing piece of the puzzle. In “A Simple and Approximately Optimal Mechanism for a Buyer with Complements,” Eden, Feldman, Friedler, Talgam-Cohen, and Weinberg show that the same simple selling mechanism— the better of selling separately and as a grand bundle— guarantees a fraction of the optimal revenue, where is a measure of the degree of complementarity. One key modeling contribution is a tractable notion of “degree of complementarity” that admits meaningful results and insights—they demonstrate that previous definitions fall short in this regard.

12 具有互补性的、针对买方的一种简单近似最优机制

作者：Alon Eden, Michal Feldman, Ophir Friedler , Inbal Talgam-Cohen , S. Matthew Weinberg

原文：https://doi.org/10.1287/opre.2020.2039

关于近似最佳收益最大化的最新文献表明，在代理人对于项目的价值评估不存在互补性的情况下，单独销售或*绑捆**销售两种方案中更优的一种，可以保证获得恒定的最佳收益。但是，大多数现实世界的场景都涉及项目之间的某种程度的互补。显然，互补性在简单性和最佳性之间的权衡中扮演的角色是一个还未解决的难题。本文中，作者指出，使用同样简单的销售机制，也就是选择单独销售或*绑捆**销售中更优的一种，能够保证达到一定比例的最佳收益，这一比例与互补程度有关。本文的一个重要的建模贡献是引入了能够度量的“互补程度”，解决了以往文献在这方面的定义缺失的问题。

On Policies for Single-Leg Revenue Management with Limited Demand Information

In “On Policies for Single-Leg Revenue Management with Limited Demand Information,” Ma, Simchi-Levi, and Teo revisit the foundational revenue management problem of dynamically pricing the limited seats on a single flight leg. The authors propose a new “Valuation Tracking” policy, which dynamically raises the price within a feasible range as seats get sold but, importantly, does not rely on any information about the remaining demand expected to occur before the flight takes off. This new policy can be seen as an adaptation of historically used “booking limits” policies, designed for flight fare classes that captured independent segments of demand, to modern times,where airlines are frequently adding sophisticated fare classes that capture overlapping demand segments. The new policy achieves the best-possible theoretical competitive ratio and also performs robustly in a comprehensive numerical comparison with all the different policies that have been proposed for single-leg revenue management under limited demand information.

13 有限需求信息下的单航段收益管理政策

作者：Will Ma , David Simchi-Levi , Chung-Piaw Teo

原文：https://doi.org/10.1287/opre.2020.2048

作者再次探讨了在单个航段上对有限座位进行动态定价的收益管理问题。作者提出了一种新的“估价跟踪”政策，该政策可在座位售出后在可行的范围内动态提高价格，但重要的是，它不依赖于航班起飞前预计会出现的剩余需求的任何信息。这项新政策可以看作是为适应当下的需求而对过往使用的“预订限制”政策的改进（针对的是捕获了独立需求细分的机票价格类别，航空公司经常添加复杂的票价类别来捕获重叠的需求细分市场）。新政策达到了最佳可能的理论竞争率，并且在有限的需求信息下，与针对单航段收益管理提出的所有不同政策相比在许多数值比较中也表现出色。

Tractable Equilibria in Sponsored Search with Endogenous Budgets

Given the scale of the sponsored search market, it is practically important yet technically difficult to understand the interplay between bidders and the ad network and its effect on the long-run state of the market. Although typical equilibrium models account for bidders strategizing over the individual bids they submit to the auctions, they ignore that bidders also strategically set their campaign budgets. In “Tractable Equilibria in Sponsored Search with Endogenous Budgets,” Ciocan and Iyer ask how this additional strategic layer affects market operation and prove that endogenizing budgets surprisingly yields simple and interpretable equilibria. Namely, these equilibria generate quasi-truthful bidding strategies guaranteeing bidders an ROI exceeding their cost per dollar of committed budget. Additionally, the ad network’s optimal allocation policy becomes greedy with high probability. Thus, in this equilibrium, the ad network need not solve computationally challenging, large-scale linear optimization problems typically required under exogenous budgets.

14 具有内生预算的赞助搜索中的易处理的均衡

作者：Dragos Florin Ciocan , Krishnamurthy Iyer

原文：https://doi.org/10.1287/opre.2020.2052

考虑到赞助搜索市场的规模，了解投标者与广告网络之间的相互作用及其对市场长期状态的影响在实践中非常重要，但是在技术上却很困难。尽管典型的均衡模型考虑了竞标者对提交给拍卖的单个竞标的策略，但他们忽略了竞标者也会战略性地设定了竞选预算。在本文中，作者研究了这一额外的战略考量如何影响市场运作，并证明了内生预算令人惊讶地产生了简单且可解释的均衡。即，这些均衡产生准真实的出价策略，保证出价人的ROI超过其每美元承诺预算的成本。此外，广告网络的最佳分配政策很有可能变得贪婪。因此，在这种平衡下，广告网络无需求解在外部预算下通常需要的复杂大规模线性优化问题。

Intertemporal Price Discrimination with Time-Varying Valuation

Typically, customers’ valuations, throughout their purchasing journey of a durable good, are assumed constant. In reality, these valuations can be changing through time due to new information gathered or a change in customers’ personal state. In “Intertemporal Price Discrimination with Time-Varying Valuations,” Araman and Fayad suggest a very general model for customers’ valuation process of a durable good. The authors analyze the optimal committed pricing policy the firm should set in this context. Interestingly, they show that cyclic policies, known to be optimal for constant valuations, remain optimal under this general setting, and suggest an algorithm to obtain them. However, the time-varying valuations feature presents a major complexity with respect to the structure and the tractability of the optimal policy. By forcing the firm to keep each price for a minimum amount of time, the authors obtain well-behaved and numerically tractable policies.

15 考虑时变价值的跨期价格区分

作者：Victor F. Araman , Bassam Fayad

原文：https://doi.org/10.1287/opre.2020.1984

通常，我们会假设客户在购买耐用品的整个过程中的估值都是恒定的。但是实际上，由于收集到的新信息或客户个人状态的改变，这些估值可能会随着时间而改变。作者在文中提出了一种非常通用的模型来评估客户对耐用品的估价过程。作者分析了在这种情况下公司应制定的最佳定价策略。有趣的是，作者在文中表明已知的对于恒定估值最理想的周期性策略在这种一般设置下仍保持最佳状态、并提出了获得这些策略的算法。但是，随时间变化的估值功能在最优政策的结构和易处理性方面存在很大的复杂性。通过强迫公司将每个价格保持最短的时间，作者获得了表现良好的和数值上容易处理的策略。

A Stochastic Assignment Problem with Unknown Eligible Probabilities

优化|OperationsResearch论文综述,69,1期

A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents

In “Technical Note—A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents,” Russo gives a short, self-contained proof of a sharp connection between Gittins indices and Bayesian upper confidence bound algorithms. The author considers a Gaussian multiarmed bandit problem with discount factor γ. The Gittins index of an arm is shown to equal the γ-quantile of the posterior distribution of the arm’s mean plus an error term that vanishes as γ→1. In this sense, for sufficiently patient agents, a Gittins index measures the highest plausible mean-reward of an arm in a manner equivalent to an upper confidence bound.

17 有耐心的个体的置信区间上限和Gittins指数等价性的说明

作者：Daniel Russo

原文：https://doi.org/10.1287/opre.2020.1987

本文中，作者简短、独立地证明了Gittins指数与贝叶斯上限置信度算法之间的紧密联系。作者考虑了贴现因子为γ的高斯多臂老*机虎**问题。老*机虎**的Gittins指数等于均值的后验分布加上一个误差项，且误差项在γ→1时会消失。从这个意义上讲，对于有足够耐心的个体，Gittins指数以等价于置信度上限的方式测量老*机虎**的最高合理均值。

The Discrete Moment Problem with Nonconvex Shape Constraints

The discrete moment problem aims to find a worst-case discrete distribution that satisfies a given set of moments. In “The Discrete Moment Problem with Nonconvex Shape Constraints,” Chen, He, Jiang, Ryan, and Zhang study the discrete moment problems with additional shape constraints that guarantee the worst-case distribution is either log-concave (LC) or has an increasing failure rate (IFR) or increasing generalized failure rate (IGFR). These classes are useful in practice, with applications in revenue management, reliability, and inventory control. The authors characterize the structure of optimal extreme point distributions and show, for example, that an optimal extreme point solution to a moment problem withm moments and LC shape constraints is piecewise geometric with at most m pieces. Using this

optimality structure, they design an exact algorithm for computing optimal solutions in a low-dimensional space of parameters. The authors leverage this structure to study a robust newsvendor problem with shape constraints and compute optimal solutions.

18 具有非凸形状约束的离散矩问题

作者：Xi Chen, Simai He, Bo Jiang, Christopher Thomas Ryan, Teng Zhang

原文：https://doi.org/10.1287/opre.2020.1990

离散矩问题旨在找到满足给定矩集的最坏情况的离散分布。作者在本文中研究了具有附加形状约束的离散矩问题，这些问题可以保证最坏情况的分布是对数凹形的（LC）、有逐步增大趋势的故障率（IFR）或具有逐步增大趋势的通用故障率（IGFR）。这些类别的分布具有广泛的应用场景，可用于收入管理、可靠性和库存控制。作者描述了最佳极限点分布的结构，并表明对于一个具有m个力矩和LC形状约束的力矩问题的一个最佳极限点解决方案是最多有m部分的分段几何结构。基于这个最优结构，作者们设计了一种用于在低维参数空间中计算最优解的精确算法。作者利用这种结构来研究具有形状约束的鲁棒newsvendor问题，并计算最佳解决方案。

How to Learn When the Data Cannot be Trusted?

In many practical settings, the decision makers have to learn their best actions by experimenting with possible options and collecting feedback (data) over time. It is often assumed that the collected data can be trusted as they reflect the ground truth. But this assumption is violated when the data are generated by strategic players. Consider online advertising market in which the ad exchange (decision maker) aims at learning the best reserve prices in the repeated auctions. In this setting, the data are advertisers’ submitted bids. Such data can be strategically corrupted by advertisers to trick the learning algorithm of the ad exchange to offer them lower reserve prices in the future auctions. In “Dynamic Incentive-Aware Learning: Robust Pricing in Contextual Auctions,” Golrezaei, Javanmard, and Mirrokni design effective learning algorithms with sublinear regret in such environments that are robust to the strategic behavior of the players.

19 基于激励感知的动态学习：上下文拍卖中的鲁棒定价策略

作者：Negin Golrezaei, Adel Javanmard, Vahab Mirrokni

原文：https://doi.org/10.1287/opre.2020.1991

在许多实际环境中，决策者必须通过试验可能的选择并随时间收集反馈（数据）来学习最佳行动。通常情况下，我们会假设收集到的数据是真实可信的。但是，当战略参与者生成数据时，这个假设就会被*翻推**。例如在在线广告市场中，广告交易（决策者）旨在通过重复拍卖来了解最佳底价。在此种情形下，数据就是广告商提交的出价。广告客户可能会有策略性地污染此类数据，从而欺骗广告交易平台的学习算法，使得算法在未来的拍卖中为他们提供更低的底价。本文中，作者设计了在这种环境下对参与者的战略行为具有鲁棒性、且存在次线性后悔值的学习算法。

Understanding the Regret of Bandit Algorithms in Queueing Systems

Traditional scheduling problems in stochastic queueing systems assume that the statistical parameters are known a priori. In ’’Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach’’, Krishnasamy, Sen, Johari, and Shakkottai consider the problem of online scheduling in a parallel-server system when the statistical parameters are unknown. The authors study this question in the stochastic multiarmed bandits framework with the queue length as the performance objective. In contrast to the classic stochastic multiarmed bandits problem, where the regret scales logarithmically with time, they show that the queue regret (difference in expected queue length between a bandit algorithm and a genie policy) exhibits a more complex behavior. It grows logarithmically in the initial stage and eventually decays almost inversely with time. This remarkable behavior is explained through the analysis of regenerative cycle lengths, which shorten with time as the bandit algorithm learns to stabilize the queues.

20 学习队列中未知的服务费率：一种多臂老*机虎**方法

作者：Subhashini Krishnasamy , Rajat Sen , Ramesh Johari , Sanjay Shakkottai

原文：https://doi.org/10.1287/opre.2020.1995

随机排队系统中的传统调度问题会假定统计参数是先验的。本文中，作者考虑了在并行服务器系统中统计参数未知时进行在线调度的问题。作者在以队列长度为性能目标的随机多臂老*机虎**框架中研究了这个问题。与经典的（后悔值随时间呈对数线性增长）随机多臂老*机虎**问题相反，作者表明，队列后悔值（bandit算法和已知服务概率的算法在预期队列长度上的差异）表现出一种更为复杂的行为。它在初始阶段以对数形式增长、但最终随着时间几乎成反比地衰减。作者通过对再生循环长度的分析来解释这种值得注意的行为：随着bandit算法逐渐学习去稳定队列，该循环长度会随着时间而缩短。

Descending to Stability: Robust Power Control for Stochastic Wireless Networks

The explosive growth of smart devices together with their dense connections via wireless networks have become a ubiquitous feature of smart cities. These devices, typically light-weight and battery-driven, burn power when they communicate with one another to form a functional ecosystem. As such, it is both theoretically interesting and practically impactful to design robust power control algorithms that operate smoothly under the challenges brought forth by dense networks in our current Internet of Things era. In “Robust Power Management via Learning and Game Design,” Zhou, Mertikopoulos, Moustakas, Bambos, and Glynn take a novel hybrid approach that combines learning with game design to build a novel and efficient power control algorithm that is provably stable and optimal, thereby not only contributing deployable algorithms with strong performance but also opening up new avenues for distributed algorithm design at large.

21 通过学习和机制设计的鲁棒能源管理方法

作者：Zhengyuan Zhou, Panayotis Mertikopoulos, Aris L. Moustakas, Nicholas Bambos, Peter Glynn

原文：https://doi.org/10.1287/opre.2020.1996

智能设备数量的爆炸式增长及其通过无线网络的密集连接已成为智能城市的普遍特征。这些由重量轻且由电池驱动的设备在彼此通信以形成功能性生态系统时，通常会消耗电能。因此，为了让能源网络在当今物联网时代密集网络带来的挑战下能够平稳运行，设计鲁棒的功率控制算法在理论和实践上具都具有重要意义。本文中，作者采取了一种将学习与机制设计相结合的混合方法，从而打造出一种新颖有效的功率控制算法（被证明是稳定和最优的），不仅提供了具有强大的性能的可行算法，而且为整个分布式算法设计开辟了新的途径。

A New Nonsparse Learning Methodology for High-Dimensional Data Analysis Is Coming

As a popular tool for producing meaningful and interpretable models, large-scale sparse learning works efficiently in many optimization applications when the underlying structures are indeed or close to sparse. However, naively applying the existing regularization methods can result in misleading outcomes because of model misspecification. In “Nonsparse Learning with Latent Variables,” Zheng, Lv, and Lin consider nonsparse learning under the factors plus sparsity structure, which yields a joint modeling of sparse individual effects and common latent factors. A new methodology of nonsparse learning with latent variables (NSL) is proposed for joint estimation of the effects of two groups of features, one for individual effects and the other associated with the latent substructures, when the nonsparse effects are captured by the leading population principal component score vectors. The efficiency of NSL is evidenced by simulation studies and nutrient intake and human gut microbiome data analysis.

22 具有潜在变量的非稀疏学习

作者：Zemin Zheng, Jinchi Lv, Wei Lin

原文：https://doi.org/10.1287/opre.2020.2005

作为一种生成有意义、具有可解释性的模型的流行方法，大规模稀疏学习被广泛应用于结构稀疏或接近稀疏的问题。但是，由于模型错误指定，简单地使用正则化方法可能会产生错误的结果。本文中，作者考虑了潜在因素加稀疏结构下的非稀疏学习，得到稀疏个体效应和共同潜在因子的联合建模。作者提出了一种具有潜在变量的非稀疏学习方法（NSL），用于联合估计两组分别与个体效应与潜在子结构相关的特征的影响（通过主成分分析来获取非稀疏效应）。通过模拟研究以及营养摄入、人体肠道微生物组数据分析证明了NSL的效率。