Document Type : Research Article
Author
Department of Information Technology, Payame Noor University (PNU), P.O. Box 19395-3697, Tehran, Iran
Abstract
Relational graph structures add a layer of complexity to multi-objective combinatorial optimization (MOCO) that often renders large-scale NP-hard instances computationally prohibitive. While traditional metaheuristics like NSGA-II remain the industry standard, their reactive nature prevents them from learning policies that generalize to unseen tasks. To address this, an end-to-end Deep Reinforcement Learning (DRL) framework is introduced, integrated with a Graph Convolutional Network (GCN) specifically for the Multi-Objective Project Portfolio Selection Problem (PPSP). By mapping the structural interdependencies of projects, the GCN provides critical cues that allow a Proximal Policy Optimization (PPO) agent to construct high-quality portfolios. Training stability is ensured through a reward normalization strategy derived from weighted-sum Pareto scalarization theory. Benchmarks on Barab'{a}si-Albert and fully-connected graph instances reveal that the proposed DRL agent achieves a Hypervolume indicator 2.4 times higher than NSGA-II on 50-project tasks. Notably, interpretability analysis shows the model learns to prioritize high-degree "hub" projects with strategic synergies. Regarding scalability, the agent maintained over 90% of its Hypervolume performance when transitioned from 50 to 200 projects in a zero-shot manner, requiring no further training. This efficiency is mirrored in its computational speed; an average inference time of 12.69 ms represents a 300-fold acceleration compared to the metaheuristic baseline. Such results underscore the potential of GNN-driven structural exploitation as a robust alternative for high-speed, multi-objective optimization.
Highlights
- A GNN-enhanced PPO framework is proposed for multi-objective project portfolio selection, unifying structural graph encoding with Pareto-grounded reward normalization.
- The DRL agent achieves a Hypervolume indicator 2.4× higher than NSGA-II on structured Barabási-Albert graph instances with N=50 projects.
- Zero-shot scalability is demonstrated: an agent trained on 50 projects maintains over 90% Hypervolume when tested on 200-project instances without retraining.
- Inference is completed in ≈12.69 ms — over 300× faster than NSGA-II — enabling real-time multi-objective decision support.
- Interpretability analysis confirms that the GNN learns to prioritize high-degree “hub” projects, providing transparent and explainable portfolio construction logic.
Keywords
- Combinatorial optimization
- Multi-objective optimization
- Deep reinforcement learning
- Graph neural networks
- Project portfolio selection
Main Subjects