In collaboration with Payame Noor University and the Iranian Society of Instrumentation and Control Engineers

Document Type : Research Article


1 ‎Department of Mathematics‎, ‎University of Kashan‎, ‎Kashan‎, ‎Iran.

2 ‎K.N‎. ‎Toosi University of Technology‎, ‎Tehran‎, ‎Iran‎.


‎The Proximal Stochastic Average Gradient (Prox-SAG+) is a primary method used for solving optimization problems that contain the sum of two convex functions. This kind of problem usually arises in machine learning, which utilizes a large amount of data to create component functions from a dataset. A proximal operation is applied to obtain the optimal value due to its appropriate properties. The Prox-SAG+ algorithm is faster than some other methods and has a simpler algorithm than previous ones. Moreover, using this specific operator can help to reassure that the achieved result is optimal. Additionally, it has been proven that the proposed method has an approximately geometric rate of convergence. Implementing the proposed operator makes the method more practical than other algorithms found in the literature. Numerical analysis also confirms the efficiency of the proposed scheme.


[1] Hastie, T., Tibshirani, R., Friedman, J. (2009). “The elements of statistical learning”, Data Mining, Second edition, Springer Series in Statistics, Springer, New York.
[2] Hu, C., Kwok, J.T., Pan, W. (2009). “Accelerated gradient methods for stochastic optimization and online learning”, In Advances in Neural Information Processing Systems, 22,
[3] Johnson, R., Zhang, T. (2013). “Accelerating stochastic gradient descent using predictive variance reduction”, In Advances in Neural Information Processing Systems, 26, 315-323.
[4] Le Roux, N., Schmidt, M., Bach, F. (2012). “A stochastic gradient method with an exponential convergence rate for finite training sets”, In Advances in Neural Information Processing Systems, 25, 2672-2680.
[5] Li, Z., Li, J. (2018). “A simple proximal stochastic gradient method for nonsmooth nonconvex optimization”, 32nd Conference on Neural Information Processing Systems, Canada.
[6] Nesterov, Y. (2004). “Introductory lectures on convex optimization‎: ‎A basic course”, Kluwer, Boston.
[7] Nesterov, Y. (2013). “Gradient methods for minimizing composite functions”, Mathematical Programming, Series B, 140, 125-161.
[8] Parikh, N., Boyd, S. (2013). “Proximal algorithms”, Foundations and Trends R in Optimization, 1(3), 131-223.
[9] Rockafellar, R.T. (1970). “Convex analysis”, Princeton University Press.
[10] Schmidt, M., Le Roux, N., Bach F. (2017). “Minimizing finite sums with the stochastic average gradient”, Mathematical Programming, (1-2), Series A, 113.
[11] Shalev-Shwartz, S., Zhang, T. (2012). “Proximal stochastic dual coordinate ascent”, arXiv:1211.2772, November.
[12] Shalev-Shwartz, S., Zhang, T. (2013). “Stochastic dual coordinate ascent methods for regularized loss minimization”, Journal of Machine Learning Research, 14, 567-599.
[13] Wojtowytsch, S. (2023). “Stochastic gradient descent with noise of machine learning type Part I: Discrete time analysis”, Journal of Nonlinear Science 33, 45.
[14] Wojtowytsch, S. (2021). “Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis, arXiv:2106.02588.
[15] Xiao, L., Zhang, T. (2014). “A proximal stochastic gradient method with progressive variance reduction”, SIAM Journal on Optimization, 24(4), 2057-2075.
[16] Yuan, W., Hu, F. Lu, L. (2022). ”A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference”, Applied Intelligence, 52(4), 3939–3953.
[17] Yun, J., Lozano, A.C., Yang, E. (2020). “A general family of stochastic proximal gradient methods for deep learning”, arXiv: 2007.07484 (cs).