In this work we describe and analyze a supervised learning system for word emphasis selection in phrases drawn from visual media as a part of the Semeval 2020 Shared Task 10. More specifically, we begin by briefly introducing the shared task problem and provide an analysis of interesting and relevant features present in the training dataset. We then introduce our LSTM-based model and describe its structure, input features, and limitations. Our model ultimately failed to beat the benchmark score, achieving an average match() score of 0.704 on the validation data (0.659 on the test data) but predicted 84.8% of words correctly considering a 0.5 threshold. We conclude with a thorough analysis and discussion of erroneous predictions with many examples and visualizations.
展开▼