In the paper ‘Do Pre-Trained Language Models Detect and Understand Semantic Underspecification? Ask the DUST!’, Frank and his co-writers Michael Hanna and Sandro Pezzelle propose a novel dataset of sentences that are either ambiguous ("I saw the man on the hill with the telescope") or underspecified/not fully explicit ("don't spend too much") to study the ability of pre-trained language models (LMs) to deal with this tricky (but very common) type of language.
In everyday language use, speakers frequently utter and interpret sentences that are semantically underspecified, namely, whose content is insufficient to fully convey their message or interpret them univocally.
Frank and his co-writers find that newer LMs are reasonably able to identify underspecified sentences when explicitly prompted. However, interpreting them correctly is much harder for any LMs, contrary to what theoretical accounts of underspecification would predict. Overall, their study highlights the importance of using naturalistic data and communicative scenarios when evaluating LMs’ language capabilities.