Supply Chain Optimization
Assembly Line
Multi-agent deep reinforcement learning for multi-echelon supply chain optimization
In this article, we explore how the problem can be approached from the reinforcement learning (RL) perspective that generally allows for replacing a handcrafted optimization model with a generic learning algorithm paired with a stochastic supply network simulator. We start by building a simple simulation environment that includes suppliers, factories, warehouses, and retailers, as depicted in the animation below; we then develop a deep RL model that learns how to optimize inventory and pricing decisions.
Our first step is to develop an environment that can be used to train supply chain management policies using deep RL. We choose to create a relatively small-scale model with just a few products and facilities but implement a relatively rich set of features including transportation, pricing, and competition. This environment can be viewed as a foundational framework that can be extended and/or adapted in many ways to study various problem formulations. Henceforth, we refer to this environment as the World of Supply (WoS).