Maximus and Me: Deep Learning for NLP Best Practices

Thursday, July 27, 2017

Deep Learning for NLP Best Practices

Word embeddings

Word embeddings are arguably the most widely known best practice in the recent history of NLP. It is well-known that using pre-trained embeddings helps (Kim, 2014) [12]. The optimal dimensionality of word embeddings is mostly task-dependent: a smaller dimensionality works better for more syntactic tasks such as named entity recognition (Melamud et al., 2016) [44] or part-of-speech (POS) tagging (Plank et al., 2016) [32], while a larger dimensionality is more useful for more semantic tasks such as sentiment analysis (Ruder et al., 2016) [45].

Depth

While we will not reach the depths of computer vision for a while, neural networks in NLP have become progressively deeper. State-of-the-art approaches now regularly use deep Bi-LSTMs, typically consisting of 3-4 layers, e.g. for POS tagging (Plank et al., 2016) and semantic role labelling (He et al., 2017) [33]. Models for some tasks can be even deeper, cf. Google's NMT model with 8 encoder and 8 decoder layers (Wu et al., 2016) [20]. In most cases, however, performance improvements of making the model deeper than 2 layers are minimal (Reimers & Gurevych, 2017) [46].

These observations hold for most sequence tagging and structured prediction problems. For classification, deep or very deep models perform well only with character-level input and shallow word-level models are still the state-of-the-art (Zhang et al., 2015; Conneau et al., 2016; Le et al., 2017) [28, 29, 30].

- More Here

Thursday, July 27, 2017

Deep Learning for NLP Best Practices

No comments: