In collaboration with Payame Noor University and the Iranian Society of Instrumentation and Control Engineers

Document Type : Research Article

Author

Department of Information Technology, Payame Noor University (PNU), P.O. Box 19395-3697, Tehran, Iran

Abstract

Relational graph structures add a layer of complexity to multi-objective combinatorial optimization (MOCO) that often renders large-scale NP-hard instances computationally prohibitive. While traditional metaheuristics like NSGA-II remain the industry standard, their reactive nature prevents them from learning policies that generalize to unseen tasks. To address this, an end-to-end Deep Reinforcement Learning (DRL) framework is introduced, integrated with a Graph Convolutional Network (GCN) specifically for the Multi-Objective Project Portfolio Selection Problem (PPSP). By mapping the structural interdependencies of projects, the GCN provides critical cues that allow a Proximal Policy Optimization (PPO) agent to construct high-quality portfolios. Training stability is ensured through a reward normalization strategy derived from weighted-sum Pareto scalarization theory. Benchmarks on Barab'{a}si-Albert and fully-connected graph instances reveal that the proposed DRL agent achieves a Hypervolume indicator 2.4 times higher than NSGA-II on 50-project tasks. Notably, interpretability analysis shows the model learns to prioritize high-degree "hub" projects with strategic synergies. Regarding scalability, the agent maintained over 90% of its Hypervolume performance when transitioned from 50 to 200 projects in a zero-shot manner, requiring no further training. This efficiency is mirrored in its computational speed; an average inference time of 12.69 ms represents a 300-fold acceleration compared to the metaheuristic baseline. Such results underscore the potential of GNN-driven structural exploitation as a robust alternative for high-speed, multi-objective optimization.

Highlights

  • A GNN-enhanced PPO framework is proposed for multi-objective project portfolio selection, unifying structural graph encoding with Pareto-grounded reward normalization.
  • The DRL agent achieves a Hypervolume indicator 2.4× higher than NSGA-II on structured Barabási-Albert graph instances with N=50 projects.
  • Zero-shot scalability is demonstrated: an agent trained on 50 projects maintains over 90% Hypervolume when tested on 200-project instances without retraining.
  • Inference is completed in ≈12.69 ms — over 300× faster than NSGA-II — enabling real-time multi-objective decision support.
  • Interpretability analysis confirms that the GNN learns to prioritize high-degree “hub” projects, providing transparent and explainable portfolio construction logic.

Keywords

Main Subjects

[1] Amirian, S., Amiri, M., Taghavifard, M.T. (2024). “Optimizing supply chain design for sustainability and reliability: A comparative study of augmented epsilon and normalized normal constraint methods”. Control and Optimization in Applied Mathematics, 9(1), 97– 130. https://doi.org/10.30473/coam.2023.67540.1230
[2] Angioni, D., Archetti, C., Speranza, M.G. (2025). “Neural combinatorial optimization: A tutorial”. Computers & Operations Research, 182, 107102. https://doi.org/10. 1016/j.cor.2025.107102
[3] Boschetti, M.A., Maniezzo, V. (2024). “Contemporary approaches in matheuristics: An updated survey”. Annals of Operations Research, 343(2), 663–700. https://doi.org/ 10.1007/s10479-024-06302-z
[4] Darvish, A., Sepehri, M. (2025). “Artificial intelligence-driven project portfolio optimization under deep uncertainty using adaptive reinforcement learning”. Applied Sciences, 15(23), 12713. https://doi.org/10.3390/app152312713
[5] Deb, K., Pratap, A., Agarwal, S., Meyarivan, T. (2002). “A fast and elitist multiobjective genetic algorithm: NSGA-II”. IEEE Transactions on Evolutionary Computation, 6(2), 182–197. https://doi.org/10.1109/4235.996017
[6] Ekmekcioğlu, Ö., Pınar, M.Ç. (2023). “Graph neural networks for deep portfolio optimization”. Neural Computing and Applications, 35(28), 20663–20674. https://doi.org/ 10.1007/s00521-023-08862-w
[7] Fu, X., Gu, S. (2024). “Deep reinforcement learning algorithm based on graph weight multi-pointer network for solving multiobjective traveling salesman problem”. IEEE Access, 12, 179091–179103. https://doi.org/10.1109/ACCESS.2024.3505436
[8] Gama, F., Isufi, E., Leus, G., Ribeiro, A. (2020). “Graphs, convolutions, and neural networks: From graph filters to graph neural networks”. IEEE Signal Processing Magazine, 37(6), 128–138. https://doi.org/10.1109/MSP.2020.3016143
[9] Hu, D., He, J. (2025). “A dynamic multi-objective optimization approach for computing resource allocation in industrial model repository”. Swarm and Evolutionary Computation, 99, 102219. https://doi.org/10.1016/j.swevo.2025.102219
[10] Janinhoff, L., Klein, R., Sailer, D., Schoppa, J.M. (2024). “Out-of-home delivery in last-mile logistics: A review”. Computers & Operations Research, 168, 106686. https:// doi.org/10.1016/j.cor.2024.106686
[11] Kipf, T.N., Welling, M. (2017). “Semi-supervised classification with graph convolutional networks”. In: Proceedings of the 5th International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1609.02907
[12] Liesiö, J., Salo, A., Keisler, J.M., Morton, A. (2021). “Portfolio decision analysis: Recent developments and future prospects”. European Journal of Operational Research, 293(3), 811–825. https://doi.org/10.1016/j.ejor.2020.12.015
[13] Lim, Q.Y.E., Cao, Q., Quek, C. (2022). “Dynamic portfolio rebalancing through reinforcement learning”. Neural Computing and Applications, 34(9), 7125–7139. https: //doi.org/10.1007/s00521-021-06853-3
[14] Liu, X., Tong, X., Liu, Q. (2021). “Profiling Pareto front with multi-objective Stein variational gradient descent”. Advances in Neural Information Processing Systems (NeurIPS), 34, 14721–14733.
[15] Manchanda, S., Michel, S., Drakulic, D., Andreoli, J.M. (2023). “On the generalization of neural combinatorial optimization heuristics”. Machine Learning and Knowledge Discovery in Databases, (pp. 426–442). Springer, Cham. https://doi.org/10.1007/ 978-3-031-26412-2_27
[16] McPherson, M., Smith-Lovin, L., Cook, J.M. (2001). “Birds of a feather: Homophily in social networks”. Annual Review of Sociology, 27(1), 415–444. https://doi.org/10. 1146/annurev.soc.27.1.415
[17] Miettinen, K. (1999). Nonlinear multiobjective optimization. Springer Science & Business Media. ISBN: 978-0-7923-8278-2
[18] Mohseni, N., McMahon, P.L., Byrnes, T. (2022). “Ising machines as hardware solvers of combinatorial optimization problems”. Nature Reviews Physics, 4(6), 363–379. https: //doi.org/10.1038/s42254-022-00440-8
[19] Pereira, J.L.J., Oliver, G.A., Francisco, M.B., Cunha, S.S., Gomes, G.F. (2022). “A review of multi-objective optimization: Methods and algorithms in mechanical engineering problems”. Archives of Computational Methods in Engineering, 29(4), 2285–2308. https://doi.org/10.1007/s11831-021-09663-x
[20] Sharma, S., Kumar, V. (2022). “A comprehensive review on multi-objective optimization techniques: Past, present and future”. Archives of Computational Methods in Engineering, 29(7), 5605–5633. https://doi.org/10.1007/s11831-022-09778-9
[21] Sutton, R.S., Barto, A.G. (2018). Reinforcement learning: An introduction (2nd ed.). The MIT Press.
[22] Wang, H. (2024). “Multi-objective reinforcement learning based on nonlinear scalarization and long-short-term optimization”. Robotic Intelligence and Automation, 44(3), 475– 487. https://doi.org/10.1108/RIA-11-2023-0174
[23] Wang, S., Sun, W., Huang, M. (2024). “An adaptive large neighborhood search for the multi-depot dynamic vehicle routing problem with time windows”. Computers & Industrial Engineering, 191, 110122. https://doi.org/10.1016/j.cie.2024.110122
[24] Yuan, M., Yu, Q., Zhang, L., Lu, S., Li, Z., Pei, F. (2025). “Deep reinforcement learning based proximal policy optimization algorithm for dynamic job shop scheduling”. Computers & Operations Research, 183, 107149. https://doi.org/10.1016/j.cor.2025. 107149