Multiobjective Reinforcement Learning for Cash Planning: how?

Reinforcement Learning

Reinforcement Learning is one of the hottest research topics currently and its popularity is only growing day by day. Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.

Here’s a video of a Deep reinforcement learning PacMan agent (Ref. https://www.kdnuggets.com/)

The current framework of Reinforcement Learning is mainly based on single objective performance optimization, which is maximizing the expected returns based on scalar rewards that come from either univariate environment response or from a weighted aggregation of a multivariate response.

Multiobjective Reinforcement Learning

In many real-world situations, tradeoffs must be made among multiple conflicting objectives that have a different order of magnitude, measurement units, and business-specific contexts related to the problem being solved (i.e. costs, lead time, quality of service, profits, etc.). The aggregation of such sub-rewards to get a scalar reward assumes perfect knowledge about the decision-maker preferences and the way she perceives the importance of each objective.

ATM Cash Planning Problem

We consider the problem of learning the best ATM cash replenishment policies in an uncertain multi-objective context given an arbitrary history of cash withdrawals that may be non-stationary and may contain hard to predict peaks.

Solving approach

We propose a model-free Multiobjective Deep Reinforcement Learning approach to find the near-optimal replenishment policy per ATM that outperforms the operator (human) policy. The idea is to disaggregate the performance of a replenishment policy to form a vector of objective functions. The performance of the human policy is then a multi-dimensional reference point (Rh). The task of the deep reinforcement learning algorithm is to find a policy that generates a set of performance points that Pareto-dominate the current human reference point (Rh).

This short article is based on an extended abstract submitted by Nabil BELGASMI to the 5th World Machine Learning and Deep Learning Congress., on August 30-31, 2018 Dubai, UAE. The author was granted the Best Research Paper Award.

https://machinelearning.conferenceseries.com/abstract/2018/multiobjective-deep-reinforcement-learning-approach-for-atm-cash-replenishment-planning

Stay tuned with BUSINESS & AI:

Please follow and like us:

Leave a Comment

Your email address will not be published. Required fields are marked *