Elvis Saravia bio photo

Elvis Saravia

Co-Founder at DAIR.AI

Twitter Github

Welcome to the 7th issue of the NLP Newsletter. I hope you are having a wonderful day and that you and your loved ones are safe in these difficult times. We decided to publish this newsletter to bring some joy to our readers so please read when you have free time. For now, let’s keep focused on the things that are of top priority— our families and friends. ❤️ 💛 💚

A few updates about the NLP Newsletter and dair.ai All French and Chinese translations for the previous issues of the NLP Newsletter are now available. Find out how you can contribute to the translation of previous and upcoming issues of the NLP Newsletter at this link.

We recently created two GitHub repositories that contain NLP paper summaries and PyTorch notebooks to get you started with neural networks.

Research and Publications 📙

Measuring Compositional Generalization

In the context of machine learning, compositional generalization is the ability to learn to represent meaning and in turn sequences (novel combinations) from what’s learned in the training set. To this date, it is not clear how to properly measure compositionality in neural networks. A Google AI team proposes one of the largest benchmarks for compositional generalization using tasks such as question answering and semantic parsing. The picture below shows an example of the proposed model using atoms (produce, direct, etc.) to produce novel compounds, i.e., combinations of atoms. The idea of this work is to produce a train-test split that contains examples that share similar atoms (building blocks to generate examples) distribution but different compound distribution (the composition of atoms). The authors claim that is a more reliable way to test for compositional generalization.

Credit: Google AI Blog

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

Researchers ran a comprehensive set of fine-tuning trials to better understand the effect of weight initialization and early stopping in the performance of language models. Through various experiments that involved fine-tuning BERT hundreds of times, it was found that distinct random seeds produce very different results. In particular, the study reports that some weight initialization does perform well across a set of tasks. All the experimental data and trials were publicly released for other researchers that are interested in further understanding different dynamics during fine-tuning.

Zoom In: An Introduction to Circuits

OpenAI researchers published a piece discussing the state of interpretability of neural networks and the proposal of a new approach to interpreting them. Inspired by cellular biology, the authors delve deep into understanding vision models and what they learn by inspecting the weights of neural networks. Essentially, the study presented a few claims along with collected evidence that they believe could pave the way to better interpret neural networks.

NLP Research Highlights — Issue #1

In a new dair.ai series called NLP Research Highlights, we provide detailed descriptions of current interesting and important NLP research. This will serve as a way to keep track of NLP progress via approachable summaries of these works. In the first quarterly issue, topics range from improving language models to improving conversational agents to state-of-the-art speech recognition systems. These summaries will also be maintained here.

Learning to Simulate Complex Physics with Graph Networks

In the past few months, we have been featuring a lot about Graph Neural Networks (GNNs) due to their effectiveness not only in NLP but in other areas such as genomics and materials. In a recent paper, researchers propose a general framework based on graph networks that is able to learn simulations in different domains such as fluids and deformable materials. The authors claim that they achieve state-of-the-art performance across different domains and that their general-purpose approach is potentially the best-learned physics simulator to date. Experiments include the simulation of materials such as goop over water and other interactions with rigid obstacles. They also tested a pre-trained model on out-of-distribution tasks and found promising results that show the generalization of the framework to larger domains.

(Sanchez-Gonzalez et al., 2020)

Language-specific BERT models

Arabic BERT (AraBERT) is now available in the Hugging Face Transformer library. You can access the model here and the paper here. Recently, a Japanese version of BERT was also released. And there is also a Polish version of BERT called Polbert.

Creativity, Ethics, and Society 🌎

Computational predictions of protein structures associated with COVID-19

DeepMind releases computationally-predicted structures for proteins linked with the virus related to COVID-19. The predictions are directly obtained from the AlphaFold systems but haven’t been experimentally verified. The idea with this release is to encourage contributions that aim to better understand the virus and how it functions.

Court cases that sound like the weirdest fights

Janelle Shane shares the results of a fun experiment where a GPT-2 model is fine-tuned to generate cases against inanimate objects. The model was fed a list of cases where the government was seizing contraband or dangerous goods and it generated cases like the ones shown in the picture below.


Toward Human-Centered Design for ML Frameworks

Google AI published the results of a large-scale survey of 645 people who used TensorFlow.js. They aimed to find out from non-ML software developers what are the most important features and their overall experience with using current ML frameworks. Findings include that the “lack of conceptual understanding of ML” hinders the use of ML frameworks for this particular set of users. Participants in the study also reported the need for better instructions on how to apply the ML models to different problems and more explicit support for modification.

Face and hand tracking in the browser with MediaPipe and TensorFlow.js

This awesome TensorFlow article provides a walkthrough of how to enable real-time face and hand tracking on the browser using TensorFlow.js and MediaPipe.

Credit: TensorFlow Blog

Tools and Datasets ⚙️

NLP Paper Summaries

We recently created a repository containing a list of carefully curated NLP paper summaries for some of the most interesting and important NLP papers in the past few years. The focus is to feature paper summaries and blog posts of important papers to help improve the approachability and accessibility of NLP topics and research.

A differentiable computer vision library for PyTorch.

Kornia is an open-source library built on top of PyTorch that allows researchers to use a set of operators for performing differentiable computer vision using PyTorch. Some capabilities include image transformations, depth estimation, and low-level image processing, to name a few. It is heavily inspired by OpenCV but the difference is that it is meant to be used for research as opposed to building production-ready applications.

Introducing DIET: state-of-the-art architecture that outperforms fine-tuning BERT and is 6X faster to train

DIET (Dual Intent and Entity Transformer) is a natural language understanding (NLU) multitask architecture proposed by Rasa. The framework focuses on multitask training to improve results on both intent classification and entity recognition. Other benefits of DIET include the ability to use any of the current pre-trained embeddings such as BERT and GloVe. However, the focus was to provide a model that improves the current state-of-the-art performance on those tasks and is faster to train (6X speedup reported). The model is available in the Rasa Open Source python library.

DIET framework

Lost in (language-specific) BERT models? BERT Lang Street is a neat website that provides the ability to search over 30 BERT-based models with 18 languages and 28 tasks with a total of 177 entries. For instance, if you wanted to find out the state-of-the-art results for sentiment classification using BERT models, you can just search for “sentiment” in the search bar (example shown in the screenshot below).


Andrey Kormilitzin releases Med7 which is a model for performing clinical NLP (in particular named entity recognition (NER) tasks) on electronic health records. The model can identify up to seven categories and is available for use with the spaCy library.

An Open Source Library for Quantum Machine Learning

TensorFlow Quantum is an open-source library that provides a toolbox for rapid prototyping of quantum ML research that allows the application of ML models to approach problems ranging from medicine to materials.

Fast and Easy Infinitely Wide Networks with Neural Tangents

Neural Tangents is an open-source library that allows researchers to build and train infinite-width models and finite neural networks using JAX. Read the blog post of the release here and get access to the library here.

Articles and Blog posts ✍️

From PyTorch to JAX: towards neural net frameworks that purify stateful code

Sabrina J. Mielke published an article that provides a walkthrough of how to build and train neural networks using JAX. The article focuses on comparing the inner workings of PyTorch and JAX when building neural networks, which helps to better understand some of the benefits and differences of JAX.

Source **

Why do we still use 18-year old BLEU?

In this blog post, Ehud Reiter talks about why we still use old evaluation techniques like BLUE for evaluating NLP models for tasks like machine translation. As a researcher in the space, he also expresses the implications for techniques that perform the evaluation on more recent tasks.

Introducing BART

BART is a new model proposed by Facebook that involves a denoising autoencoder for pretraining seq2seq models that improve performance on downstream text generation tasks such as abstractive summarization. Sam Shleifer provides a nice summary of BART and how he integrated it into the Hugging Face Transformers repo.

A Survey of Long-Term Context in Transformers

Madison May recently wrote an interesting survey describing ways to improve Transformer based approaches, which include Sparse Transformers, Adaptive Span Transformers, Transformer-XL, compressive Transformers, Reformer, and routing transformer. We also touched on some of these topics in the dair.ai publication and in this list of paper summaries.

“Mind your language, GPT-2”: how to control style and content in automatic text writing

Despite the impressive fluency automatic text writing has exhibited in the past year, it is still challenging to control attributes like structure or content of the machine-written text. In a recent blog post, Manuel Tonneau discusses the recent progress and the perspectives in the field of controllable text generation, from Hugging Face’s GPT-2 model fine-tuned on arXiv to Google’s T5, with mentions of Salesforce’s CTRL and Uber AI’s PPLM.

Education 🎓

Talk: The Future of NLP in Python

In one of our previous newsletters, we featured THiNC which is a functional deep learning library focused on compatibility with other existing libraries. This set of slides introduces a bit more of the library which was used in the talk by Ines Montani for PyCon Colombia.

Transformers Notebooks

HuggingFace published a set of Colab notebooks that help to get started with their popular Transformers library. Some notebooks include using tokenization, setting up NLP pipelines, and training a language model on custom data.

TensorFlow 2.0 in 7 hours

Check out this ~7-hour free course on TensorFlow 2.0 containing topics that range from basic neural networks to NLP with RNNs to an introduction to reinforcement learning.

DeepMind: The Podcast

DeepMind has released all episodes (in the form of a YouTube playlist) for their podcast which features scientists, researchers, and engineers discussing topics that range from AGI to neuroscience to robotics.

Machine Learning and Deep Learning Courses

Berkeley is publicly releasing the entire syllabus for its course on “Deep Unsupervised Learning” mainly focusing on the theoretical aspects of self-supervised learning and generative models. Some topics include latent variable models, autoregressive models, flow models, and self-supervised learning, to name a few. Youtube videos and slides are available.

We also found this impressive list of advanced online courses on machine learning, NLP and deep learning.

And here is another course called “Introduction to Machine Learning” which includes topics such as supervised regression, performance evaluation, random forests, parameter tuning, practical advice, and much more.

Noteworthy Mentions ⭐️

The previous NLP Newsletter (Issue #6) is available here.

Connon Shorten published a video explaining the ELECTRA model which proposes a technique called replaced token detection to pre-train Transformers more efficiently. If you are interested, we also wrote a short summary of the model here.

Rachael Tatman is working on a new series called NLP for Developers where the idea is to talk more in-depth about different NLP methods, when to use them and explaining common issues that you may run into.

DeepMind releases AlphaGo — The Movie on YouTube to celebrate the 4th anniversary of AlphaGo beating Lee Sedol at the game of Go.

OpenMined has open positions for Research Engineer and Research Scientist roles which is a good opportunity to get involved with privacy-preserving AI.

If you have any datasets, projects, blog posts, tutorials, or papers that you wish to share in the next iteration of the NLP Newsletter, please free to reach out to me at ellfae@gmail.com or **DM on Twitter.

Subscribe 🔖 to the NLP Newsletter to receive future issues in your inbox.