<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.7//EN" "https://dtd.nlm.nih.gov/ncbi/pubmed/in/PubMed.dtd">
<ArticleSet>
<Article>
<Journal>
				<PublisherName>Payame Noor University (PNU)</PublisherName>
				<JournalTitle>Control and Optimization in Applied Mathematics</JournalTitle>
				<Issn>2383-3130</Issn>
				<Volume></Volume>
				<Issue>Articles in Press</Issue>
				<PubDate PubStatus="epublish">
					<Year>2026</Year>
					<Month>05</Month>
					<Day>21</Day>
				</PubDate>
			</Journal>
<ArticleTitle>Harnessing Relational Structures in Multi-Objective Project Portfolio Optimization: A GNN-Enhanced Deep Reinforcement Learning Framework</ArticleTitle>
<VernacularTitle></VernacularTitle>
			<FirstPage></FirstPage>
			<LastPage></LastPage>
			<ELocationID EIdType="pii">12974</ELocationID>
			
<ELocationID EIdType="doi">10.30473/coam.2026.75894.1340</ELocationID>
			
			<Language>EN</Language>
<AuthorList>
<Author>
					<FirstName>Babak</FirstName>
					<LastName>Masoudi</LastName>
<Affiliation>Department of Information Technology, Payame Noor University (PNU), P.O. Box 19395-3697, Tehran, Iran</Affiliation>

</Author>
</AuthorList>
				<PublicationType>Journal Article</PublicationType>
			<History>
				<PubDate PubStatus="received">
					<Year>2025</Year>
					<Month>09</Month>
					<Day>23</Day>
				</PubDate>
			</History>
		<Abstract>Relational graph structures add a layer of complexity to multi-objective combinatorial optimization (MOCO) that often renders large-scale NP-hard instances computationally prohibitive. While traditional metaheuristics like NSGA-II remain the industry standard, their reactive nature prevents them from learning policies that generalize to unseen tasks. To address this, an end-to-end Deep Reinforcement Learning (DRL) framework is introduced, integrated with a Graph Convolutional Network (GCN) specifically for the Multi-Objective Project Portfolio Selection Problem (PPSP). By mapping the structural interdependencies of projects, the GCN provides critical cues that allow a Proximal Policy Optimization (PPO) agent to construct high-quality portfolios. Training stability is ensured through a reward normalization strategy derived from weighted-sum Pareto scalarization theory. Benchmarks on Barab\&#039;{a}si-Albert and fully-connected graph instances reveal that the proposed DRL agent achieves a Hypervolume indicator 2.4 times higher than NSGA-II on 50-project tasks. Notably, interpretability analysis shows the model learns to prioritize high-degree &quot;hub&quot; projects with strategic synergies. Regarding scalability, the agent maintained over 90% of its Hypervolume performance when transitioned from 50 to 200 projects in a zero-shot manner, requiring no further training. This efficiency is mirrored in its computational speed; an average inference time of 12.69 ms represents a 300-fold acceleration compared to the metaheuristic baseline. Such results underscore the potential of GNN-driven structural exploitation as a robust alternative for high-speed, multi-objective optimization.</Abstract>
		<ObjectList>
			<Object Type="keyword">
			<Param Name="value">Combinatorial optimization</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Multi-objective optimization</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Deep reinforcement learning</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Graph neural networks</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Project portfolio selection</Param>
			</Object>
		</ObjectList>
<ArchiveCopySource DocType="pdf">https://mathco.journals.pnu.ac.ir/article_12974_b8f831cf15724b8dec1c7b27d6f78099.pdf</ArchiveCopySource>
</Article>
</ArticleSet>
