Should I try multiple optimizers when fine-tuning a pre-trained Transformer for NLP tasks? Should I tune their hyperparameters?

Nefeli Gkouti, Prodromos Malakasiotis, Stavros Toumpis, Ion Androutsopoulos

Main: Machine Learning for NLP Poster Paper

Session 9: Machine Learning for NLP (Poster)
Conference Room: Radisson
Conference Time: March 20, 09:00-10:30 (CET) (Europe/Malta)
TLDR:
You can open the #paper-393 channel in a separate window.
Abstract: