[1] Hastie, T., Tibshirani, R., Friedman, J. (2009). “The elements of statistical learning”, Data Mining, Second edition, Springer Series in Statistics, Springer, New York.
[2] Hu, C., Kwok, J.T., Pan, W. (2009). “Accelerated gradient methods for stochastic optimization and online learning”, In Advances in Neural Information Processing Systems, 22,
781-789.
[3] Johnson, R., Zhang, T. (2013). “Accelerating stochastic gradient descent using predictive variance reduction”, In Advances in Neural Information Processing Systems, 26, 315-323.
[4] Le Roux, N., Schmidt, M., Bach, F. (2012). “A stochastic gradient method with an exponential convergence rate for finite training sets”, In Advances in Neural Information Processing Systems, 25, 2672-2680.
[5] Li, Z., Li, J. (2018). “A simple proximal stochastic gradient method for nonsmooth nonconvex optimization”, 32nd Conference on Neural Information Processing Systems, Canada.
[6] Nesterov, Y. (2004). “Introductory lectures on convex optimization: A basic course”, Kluwer, Boston.
[7] Nesterov, Y. (2013). “Gradient methods for minimizing composite functions”, Mathematical Programming, Series B, 140, 125-161.
[8] Parikh, N., Boyd, S. (2013). “Proximal algorithms”, Foundations and Trends R in Optimization, 1(3), 131-223.
[9] Rockafellar, R.T. (1970). “Convex analysis”, Princeton University Press.
[10] Schmidt, M., Le Roux, N., Bach F. (2017). “Minimizing finite sums with the stochastic average gradient”, Mathematical Programming, (1-2), Series A, 113.
[11] Shalev-Shwartz, S., Zhang, T. (2012). “Proximal stochastic dual coordinate ascent”, arXiv:1211.2772, November.
[12] Shalev-Shwartz, S., Zhang, T. (2013). “Stochastic dual coordinate ascent methods for regularized loss minimization”, Journal of Machine Learning Research, 14, 567-599.
[13] Wojtowytsch, S. (2023). “Stochastic gradient descent with noise of machine learning type Part I: Discrete time analysis”, Journal of Nonlinear Science 33, 45.
[14] Wojtowytsch, S. (2021). “Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis, arXiv:2106.02588.
[15] Xiao, L., Zhang, T. (2014). “A proximal stochastic gradient method with progressive variance reduction”, SIAM Journal on Optimization, 24(4), 2057-2075.
[16] Yuan, W., Hu, F. Lu, L. (2022). ”A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference”, Applied Intelligence, 52(4), 3939–3953.
[17] Yun, J., Lozano, A.C., Yang, E. (2020). “A general family of stochastic proximal gradient methods for deep learning”, arXiv: 2007.07484 (cs).
[19] https://www.kaggle.com/datasets/eswarchandt/phishing-website-detector.