Empirically Investigating Sharpness-Aware Minimization

Alex; Michael; Mingrui

doi:10.13021/jssr2022.3393

Authors

Alex Li
Michael Crawshaw
Dr. Mingrui Liu

DOI:

https://doi.org/10.13021/jssr2022.3393

Abstract

Sharpness Aware Minimization (SAM) is an optimizer that supposedly finds more generalizable solutions by introducing a “sharpness” penalty into the loss function, causing the model to converge to flatter optima. Several studies have questioned the effectiveness of SAM, with one such paper (On the Maximum Hessian Eigenvalue and Generalization) finding that larger batch sizes diminish and eventually completely eliminate the generalization benefits of SAM over Stochastic Gradient Descent (SGD), and that the correlation between flatness as measured by the maximum hessian eigenvalue of the loss function and generalization is extremely conditional, e.g. manipulating the maximum hessian eigenvalue by scaling learning rate and batch size without affecting generalization. Through preliminary tests on the CIFAR-10 image classification dataset using ResNet18, we found that SAM’s training accuracy compared to SGD is lower, but does seem to generalize better to the testing set. In this study, we aim to find empirical evidence of SAM’s ability to find flatter optima compared to SGD by measuring flatness through loss values at projected points of gradient ascent.

Empirically Investigating Sharpness-Aware Minimization

Authors

DOI:

Abstract

Published

Issue

Section

Categories

License

assip