Analyzing the Performance of Bidirectional Transformer and Generalized Autoregressive Permutation Pre-trained Language Models for Sentiment Classification task
Main Article Content
Abstract
With the advancement of deep learning, automatic feature extraction and processing larger data is achievable now days. With the ability of modeling two-way contexts, a new language representation model called Bidirectional Encoder Representations from Transformer (BERT) and Generalized Autoregressive Pretraining for Language Understanding (XLNet) has been introduced to pre-training from larger corpus to understand the linguistic feature for sentiment classification task. These two models learn the context bidirectionally but differ in masking strategy and pre-train-fine-tune discrepancy. In the paper both BERTbase and XLNETbase models are applied are experimented on IMDB and coursera dataset and compared with RNN. XLNET overcomes the constraints of BERT because of it uses autoregressive.