1 results found
https://www.academia.edu/74401277/Masked_Language_Modeling_and_the_Distributional_Hypothes…

… we use the original 16GB and target languages, LSTM language models lever- BookWiki corpus (the Toronto Books Corpus, Zhu age the latent hierarchical structure of the input to et al. 2015, plus English Wikipedia) from Liu et al. obtain better performance than a random, Zipfian (…