DAIR.AIJekyll2023-09-01T20:06:20+00:00https://dair.ai/DAIR.AIhttps://dair.ai/ellfae@gmail.comhttps://dair.ai/projects/prompt-engineering2023-01-07T00:00:00+00:002023-01-07T00:00:00+00:00Elvis Saraviahttps://dair.aiellfae@gmail.com
<p><img src="/images/prompt-engineering-guide.png" alt="" /></p>
<p><br />
This guide contains a non-exhaustive set of learning guides and tools about prompt engineering. It includes several materials, guides, examples, papers, and much more. The repo is intended to be used as a research and educational reference for practitioners and developers.</p>
<p><br />
Find the guide here: <a href="https://github.com/dair-ai/Prompt-Engineering-Guide">Prompt Engineering Guide</a></p>
<p><a href="https://dair.ai/projects/prompt-engineering/">Prompt Engineering Guide</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on January 07, 2023.</p>
https://dair.ai/projects/ml-youtube-courses2023-01-06T00:00:00+00:002023-01-06T00:00:00+00:00Elvis Saraviahttps://dair.aiellfae@gmail.com
<p><img src="/images/ml-youtube-courses.png" alt="" /></p>
<p><br />
At DAIR.AI we ❤️ open AI education. We have created a new repo to index and organize some of the best and most recent machine learning courses available on YouTube.</p>
<p><br />
Find the collection here: <a href="https://github.com/dair-ai/ML-YouTube-Courses">ML YouTube Courses</a></p>
<p><a href="https://dair.ai/projects/ml-youtube-courses/">ML YouTube Courses</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on January 06, 2023.</p>
https://dair.ai/projects/ml-course-notes2023-01-05T00:00:00+00:002023-01-05T00:00:00+00:00Elvis Saraviahttps://dair.aiellfae@gmail.com
<p><img src="/images/ml-course-notes.png" alt="" /></p>
<p><br />
A place to collaborate and share lecture notes on all topics related to machine learning, NLP, and AI.</p>
<p><br />
Find all the notes here: <a href="https://github.com/dair-ai/ML-Course-Notes">Machine Learning Course Notes</a></p>
<p><a href="https://dair.ai/projects/ml-course-notes/">Machine Learning Course Notes</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on January 05, 2023.</p>
https://dair.ai/posts/new-discord-dair2021-01-28T00:00:00+00:002021-01-28T00:00:00+00:00Elvis Saraviahttps://dair.aiellfae@gmail.com
<p><img src="/images/discord.png" alt="" /></p>
<p><br />
🎉 Happy new year to all!</p>
<p><br />
We have so many exciting new announcements and events for our community. To start the new year, we are happy to announce our new <a href="https://discord.gg/SKgkVT8BGJ"><strong>Discord channel</strong></a>.</p>
<p><br />
The idea is to create an inclusive and vibrant community of learners, researchers, and developers in the AI space. It’s dedicated to learning, asking questions, discussing, and sharing all the exciting trends and developments in machine learning and AI.</p>
<p><br />
See you there!</p>
<p><br />
<a href="https://twitter.com/omarsar0">Elvis</a></p>
<p><a href="https://dair.ai/posts/new-discord-dair/">New Discord Group for DAIR.AI</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on January 28, 2021.</p>
https://dair.ai/posts/NLP_Newsletter_14-en2020-08-11T00:00:00+00:002020-08-11T00:00:00+00:00Elvis Saraviahttps://dair.aiellfae@gmail.com
<p><img src="https://cdn-images-1.medium.com/max/800/0*SfmXR6C5pvmRH2_B.png" alt="" /></p>
<p><br />
Hello everyone,</p>
<p><br />
Welcome to the 14th issue of the NLP Newsletter. First of all, thank you for taking the time to read the newsletter. A few things are changing in the newsletter moving forward and this is for the better. We will be focusing on a few important machine learning and NLP themes centered around three pillars which I believe to be important for our community: <strong><em>education</em></strong>, <strong><em>research</em></strong>, and <strong><em>technologies</em></strong>. In fact, these are the same pillars that we at dair.ai are focusing on and building our initiatives and projects around. I hope you like the new format as it allows us the flexibility to discuss some important topics deeper than we usually do.</p>
<p><br />
I can’t emphasize more how important it is to keep learning in a fast-paced field like machine learning. Regardless if you sit on the cutting edge of research or deploying large-scale ML models into production every day, there is always room to learn something new every week. The question is how do you keep yourself motivated to learn. I have identified this as an opportunity for us to connect and share what matters to us. Therefore, I have started a new <em>learning group</em> called <a href="https://github.com/dair-ai/keep-learning-ml">Keep Learning ML</a>. Every Friday we will meet and have fun sharing something new we learned over the week. It could be an NLP or ML paper, tool, demo, philosophical view, a pressing issue, etc.</p>
<p><br />
In this issue, we cover topics that range from the importance of taking NLP beyond English to resources for monitoring ML systems to a conversation on the future of conversational AI systems.</p>
<p><br />
Special thanks to <a href="https://twitter.com/_skeshaw">Keshaw Singh</a> and <a href="https://twitter.com/manisnesan">Manikandan Sivanesan</a> for significantly contributing towards this edition of the NLP Newsletter.</p>
<p><br />
<em>Enjoy reading,</em></p>
<p><br />
<em>Elvis</em></p>
<h1 id="top-stories">Top Stories</h1>
<h2 id="showcasing-gpt-3-powered-applications">Showcasing GPT-3-powered applications</h2>
<p>This year OpenAI has been working on granting access to their machine learning (ML) models via an <a href="https://openai.com/blog/openai-api/">API</a>. Of course, the process of getting access to the API requires a <a href="https://forms.office.com/Pages/ResponsePage.aspx?id=VsqMpNrmTkioFJyEllK8s0v5E5gdyQhOuZCXNuMR8i1UQjFWVTVUVEpGNkg3U1FNRDVVRFg3U0w4Vi4u">formal application</a> and stating your purpose of use. A few developers and researchers have received access to the API and have been showcasing different applications of how language models like GPT-3 can be used for all sorts of creative applications that involve some level of automation.</p>
<p><br />
A developer built a site generator and another built a <a href="https://losslesshq.com/">regex generator.</a> If you are interested to see more use cases of GPT-3, Yaser Martinez Palenzuela has put together a collection of <a href="https://github.com/elyase/awesome-gpt3">demos and applications</a> that are powered by GPT-3. It is becoming clear that models like GPT-3 have huge potential to become usable in real-world applications, however, it’s unclear whether these applications are safe and how to deal with harmful prejudices or model biases that are common with these type of models that are typically pretrained on large-scale open internet data.</p>
<p><br />
<strong><em>Learn more 🎓</em></strong></p>
<p><br />
If you would like to find out more about how GPT-3 works, Jay Alammar has prepared a <a href="https://jalammar.github.io/how-gpt3-works-visualizations-animations/">set of animations</a> that explain important steps in the Transformer architecture with examples and key transformations that take place in the GPT-3 language model.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*mSEp2sRTqWJOcbj7.png" alt="Source: Jay Alammar" /></p>
<p><em>Source: <a href="https://jalammar.github.io/how-gpt3-works-visualizations-animations/">Jay Alammar</a></em></p>
<p><br />
<strong><em>Stay informed 🎯</em></strong></p>
<p><br />
With so much buzz around emerging technologies, GPT-3 being the case in point, this published article from Page Street Labs proposes an intuitive framework to make sense of the “hype” around them. This includes understanding better the use of the term “hype” itself. More precisely, users can be put into one of the four quadrants in a 2x2 visualization (shown below) based on their direct experience with some technology as well as the polarity of the hype (positive/negative). Most of the useful signals for assessing capabilities and promise come from those with a good knowledge of the underlying principles or having experience with the technology: those who create it or “see the light” being some of those generating the positive buzz, while those on the other side warning about false alarms. Readers are encouraged to read the original <a href="https://pagestlabs.substack.com/p/gpt-3-and-a-typology-of-hype">article in full</a>.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*kFJsX7bzrLOf8B1P.png" alt="Source: Page Street Labs" /></p>
<p><em>Source: <a href="https://pagestlabs.substack.com/p/gpt-3-and-a-typology-of-hype">Page Street Labs</a></em></p>
<h2 id="datasets-to-explore-scholarly-articles">Datasets to explore scholarly articles</h2>
<p>Earlier this year the Allen Institute of AI released a large dataset of <a href="https://allenai.org/data/cord-19">COVID-19 related scholarly articles</a>. While most of these articles contained studies on COVID-19 and other coronaviruses, many data scientists and researchers began using it to perform interesting analyses and build interactive applications that allow for some semantic search capabilities. The idea was to extract new insights from text about the virus that could be used by researchers and experts in the field to discover interesting facts about the virus, in some sense accelerating the rate of discovery.</p>
<p><br />
Following these footsteps, arXiv.org has <a href="https://blogs.cornell.edu/arxiv/2020/08/05/leveraging-machine-learning-to-fuel-new-discoveries-with-the-arxiv-dataset/">released</a> a large-scale dataset containing 1.7 million articles with the intention to provide scholarly articles — from different domains such as biology and computer science — in a more accessible and machine-readable format. The call to action is to empower researchers and machine learning practitioners to build tools for accelerating new discovery using ML-powered applications such as trend analysis, paper recommender engines, knowledge graph construction, and even semantic search interfaces.</p>
<p><br />
Following the release of this dataset, Elsevier also recently <a href="https://data.mendeley.com/datasets/zm33cdndxs/2">released</a> a 40K CC-BY full-text corpus containing scientific articles that could be used for NLP and ML research.</p>
<p><br />
<strong><em>Call to Action 💡</em></strong></p>
<p><br />
We at dair.ai have initiated a project where our aim is to use the arXiv dataset to explore new ways to extract insights from scholarly articles to fuel new discoveries. In addition to this research effort, we are putting together a team to build NLP-powered applications with semantic search capabilities to provide the community an open-source solution to easily find trends and other interesting insights to keep informed about different areas of research such as machine learning. Check this <a href="https://github.com/dair-ai/arxiv_analysis">announcement</a> for more details and join our <a href="https://join.slack.com/t/dairai/shared_invite/zt-dv2dwzj7-F9HT047jIGkunNKv88lQ~g">Slack group</a> for more information.</p>
<h2 id="why-is-it-important-to-monitor-machine-learning-models">Why is it Important to Monitor Machine Learning Models?</h2>
<p>Imagine a scenario where we have built and deployed a machine learning (ML) system for our users. The question now is how can we ensure that the system is consistently performing as expected over a period of time. Depending on the risk tolerance requirements for the system, the impact of failure ranges from a minor inconvenience to life-threatening situations. Traditional Application Performance Management (APM) practices such as monitoring the user experience metrics (e.g., latency) and system resources metrics (e.g., CPU and memory usage) is also applicable for ML monitoring. These are essential for identifying the known failure modes and setting up an alerting infrastructure to perform corrective actions. In addition to that, for ML systems, we also need to monitor these additional factors:</p>
<ul>
<li>Changes in input data quality such as missing columns or unexpected inputs during inference time</li>
<li>Changes in the relationship between input and output over a period of time. This causes a gradual degradation in the model due to the change in the underlying assumptions commonly referred to as concept drift.</li>
<li>The robustness of predictions due to any changes involving data, features, hyperparameters, model settings, etc. This is referred to as the CACE principle: <em>Changing Anything Changes Everything</em>.</li>
</ul>
<p>If you are interested to learn more about why monitoring of ML systems matters and coverage of a few real-world examples, take a look at the <a href="https://mlinproduction.com/why-is-it-important-to-monitor-machine-learning-models/">full article</a>.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*TaJ_SWmAmU_8goRO.png" alt="Monitoring in ML Lifecycle — Figure source" /></p>
<p><em>Monitoring in ML Lifecycle — <a href="https://martinfowler.com/articles/cd4ml.html">Figure source</a></em></p>
<p><br />
<strong><em>Learn more 🎓</em></strong></p>
<p><br />
If you are interested to continue learning about monitoring of machine learning systems and MLOps in general, we share a few references below:</p>
<ul>
<li><a href="https://github.com/eugeneyan/applied-ml">Applied ML</a> is a repository with a list of curated papers, articles, and blogs on data science & machine learning in production</li>
<li>This <a href="https://christophergs.com/machine%20learning/2020/03/14/how-to-monitor-machine-learning-models/">article</a> provides a comprehensive guide covering the complexity, importance of monitoring, and providing practical guidance on monitoring ML Systems.</li>
<li>Continuous Integration is a software development practice of testing each change made to the code base automatically. This helps to detect the granular change that causes the tests to fail and enable the team to fix the integration problem early in the development cycle. In this <a href="https://www.youtube.com/watch?v=9BgIDqAzfuA&feature=youtu.be">video</a>, learn more about how this practice can be applied for machine learning projects with tools such as <a href="https://github.com/iterative/cml">Continuous Machine Learning</a> (CML).</li>
</ul>
<h2 id="big-bird-for-longer-sequences">Big Bird for longer sequences</h2>
<p><img src="https://cdn-images-1.medium.com/max/800/1*LhjnhIRT8TGzFRAa63THyA.png" alt="The figure shows how Big Bird is able to hold the properties of three different attention mechanisms — Zaheer et al. (2020)." /></p>
<p><em>The figure shows how Big Bird is able to hold the properties of three different attention mechanisms — <a href="https://arxiv.org/abs/2007.14062">Zaheer et al. (2020)</a>.</em></p>
<p><br />
It is well known that Transformer based language models like BERT that rely on self-attention mechanism have a quadratic complexity in the number of tokens. <a href="https://arxiv.org/abs/2007.14062">Big Bird</a> is a Transformer based model that aims to more effectively support NLP tasks requiring longer contexts by reducing the complexity of the attention mechanism to linear complexity in the number of tokens.</p>
<p><br />
Why is this important? Processing and extracting valuable information from longer sequences is useful when dealing with long text such as books or scholarly articles. In such cases, we would like to minimize the memory footprint which is why reducing the complexity of the attention mechanism component in the language modeling architecture is important. Reducing the complexity is important as is the case for keeping the original properties of the model. The way this is achieved by Big Bird is through viewing self-attention as a fully connected graph and taking advantage of graph properties, specifically increasing how fast the flow of information happens between pairs of nodes. The authors claim that with the new proposed sparse attention, their model can handle <em>“sequences with length of up to 8x of what was previously possible using similar hardware.”</em></p>
<p><br />
<strong><em>Learn more 🎓</em></strong></p>
<p><br />
If you want to get better intuitions about the design choices of Big Bird, Yannic Kilcher provides an accessible explanation of Big Bird in this <a href="https://www.youtube.com/watch?v=WVPE62Gk3EM&t=678s">video</a>.</p>
<h2 id="breaking-into-deep-learning-and-nlp">Breaking into deep learning and NLP</h2>
<p>One of the biggest changes this year that a lot of us needed to get used to due to the pandemic is this idea of learning remotely. This has also opened many learning opportunities for not only localized communities but also learners all around the world. In this portion of the newsletter, we share a few resources for those looking to break into deep learning or NLP.</p>
<p><br />
<strong><em>Dive into Deep Learning Study group</em></strong></p>
<p><br />
This past weekend, dair.ai hosted the first session of the new deep learning study group. The session lasted more than one hour and focused on a broad overview of deep learning. More than 150 learners joined the live session from all over the world (find the recording <a href="https://www.youtube.com/watch?v=xS3_b0BsSes">here</a>). The second session will aim to cover a few preliminaries such as probability and statistics, linear algebra, and other important concepts important for studying and applying deep learning concepts. If you are interested to join upcoming sessions, find out more about this study group <a href="https://github.com/dair-ai/d2l-study-group">here</a>.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*nwPo0Xyi9GuEMFK12zZkVg.png" alt="The content structure of the deep learning study program" /></p>
<p><em>The content structure of the deep learning study program</em></p>
<p><br />
<strong><em>Breaking into NLP by deeplearning.ai</em></strong></p>
<p><br />
Recently, deeplearning.ai released a new <a href="https://www.deeplearning.ai/natural-language-processing-specialization/">specialization focused on NLP</a>. In a recent panel discussion, Andrew Ng was joined by some experts in the field and discussed interesting topics around “breaking into NLP”. The discussion emphasized on trends in NLP and other advice for students of ML and NLP. Elvis wrote a detailed <a href="https://twitter.com/omarsar0/status/1288776352460673024?s=20">twitter thread</a> about his takeaways of this session, which range from advice to students to interesting research areas and trends in NLP.</p>
<p><br />
<strong><em>LxMLS Lisbon Machine Learning School</em></strong></p>
<p><br />
The 2020 LxMLS at Instituto Superior Técnico <a href="http://tecnico.ulisboa.pt/en/"></a>(IST) took place remotely and all lectures were delivered and publicly streamed online. This is considered one of the best programs in Europe to learn about NLP. Several well-known NLP researchers either attended the program in the past or taught in the program. The recent <a href="http://lxmls.it.pt/2020/?page_id=19">10th edition of LxMLS</a> included lectures around NLP topics that ranged from preliminaries to modeling sequential data to applying reinforcement learning in the context of NLP. You can find all the lecture <a href="https://www.youtube.com/channel/UCkVFZWgT1jR75UvSLGP9_mw/videos">recordings</a> on YouTube.</p>
<p><br />
<strong><em>Deep Learning for Computer Vision</em></strong></p>
<p><br />
Justin Johnson recently announced that they published all <a href="https://www.youtube.com/playlist?list=PL5-TkQAfAZFbzxjBHtzdVCWE0Zbhomg7r">video lectures</a> for their new course on Deep Learning for Computer Vision. According to Justin, this course is an evolution of the <a href="http://cs231n.stanford.edu/2019/">CS231n</a> course which was delivered at Stanford by him and others. All the content has been refreshed and lectures now include new topics like 3D vision and Transformers applied in the context of computer vision.</p>
<p><br />
<strong><em>Stay informed 🎯</em></strong></p>
<p><br />
<a href="https://nlpwithfriends.com/">NLP with Friends</a> is an effort to bring students together to discuss interesting research topics related to NLP. Talks are being hosted weekly on Zoom so you can join the sessions remotely.</p>
<p><br />
<strong><em>Learn more 🎓</em></strong></p>
<p><br />
<em>Here are some NLP-related competitions and workshops we found useful for you to get involved:</em></p>
<ul>
<li><a href="https://www.kaggle.com/c/contradictory-my-dear-watson"><strong><em>Contradictory, My Dear Watson: Detecting contradiction and entailment in multilingual text using TPUs.</em></strong></a> This is a playground type competition based on Natural Language Inferencing (NLI) to determine whether pairs of sentences are related. Participants are challenged to create an NLI model from a dataset including text from 15 different languages.</li>
<li><a href="https://hasocfire.github.io/hasoc/2020/call_for_participation.html"><strong><em>Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC)</em></strong></a> <strong>**</strong>provides a forum and data challenge for promoting multilingual research on detecting problematic content. This year the dataset contains 10K annotated tweets from English, German, and Hindi. The focus of the first subtask is to detect hate, offensive, or profane content in the text. The second subtask is more granular to discriminate and classify the respective type. There is a separate sub-track for Dravidian CodeMix (this was shared in our previous newsletter). The deadline for registration is 30 August 2020.</li>
</ul>
<h2 id="why-you-should-do-nlp-beyond-english">Why you should do NLP Beyond English</h2>
<p>In this recent <a href="https://ruder.io/nlp-beyond-english">article</a>, Sebastian Ruder makes an argument for why NLP researchers should focus on languages other than English. To begin with, the blog highlights the huge disparity in online data availability between a handful of high-resource languages (including English) and thousands of other languages. The main discussion is centered around the factors that should encourage more research initiatives in other languages as seen from societal, linguistic, machine learning, cultural and normative, and cognitive perspectives.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*61agh1MtYJRPZGDG.png" alt="Language resource distribution of Joshi et al. (2020). Groups 5 and 4 are languages that are well studied while other groups are largely neglected." /></p>
<p><em>Language resource distribution of <a href="https://arxiv.org/abs/2004.09095">Joshi et al. (2020)</a>. Groups 5 and 4 are languages that are well studied while other groups are largely neglected.</em></p>
<p><br />
One of the talking points is how English-specific models can limit access to open knowledge due to language barriers, cause bias and discrimination against non-English speakers, as well as endanger a language itself in extreme cases. From a linguistic perspective, many resource-rich languages are morphologically poor, meaning we are missing out on generalization capacity by ignoring other languages which can help models learn this information. Considering things from an ML viewpoint, given that most of the languages have limited data available, throwing a lot of data and hoping for the best outcome cannot be the solution. Approaches that are language-aware and can work with few data may have more likelihood of making a real impact. The blog ends with a call-to-action for future research, including the creation of datasets in multiple languages, evaluating an approach across several languages, and following the <a href="https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/">Bender Rule</a>.</p>
<h2 id="covost-v2-expanding-the-largest-most-diverse-multilingual-speech-to-text-translation-data-set">CoVoST V2: Expanding the largest, most diverse multilingual speech-to-text translation data set</h2>
<p>Multilingual datasets are beneficial for enabling ways to better test the robustness of machine learning models that aim to address either language modeling or speech recognition. One area of machine learning that could benefit from such type of data is applications in multilingual speech translation. These types of models can help with removing barriers that are very common with online communication tools, where users come from different cultures. It can also help to enable richer conversations among people that are not proficient in certain languages and amplify their voices online. There is a range of other possible ways to use the dataset and applications that depend on it such as building smart composers or more accessible search tools.</p>
<p><br />
Facebook AI <a href="https://ai.facebook.com/blog/covost-v2-expanding-the-largest-most-diverse-multilingual-speech-to-text-translation-data-set/">released</a> CoVoST V2 which they consider the largest multilingual speech-to-text dataset available to date. The new version of the dataset adds new languages to the previous first version with a total of 2900 hours of speech. Before releasing it, researchers made sure to check for data quality and that it varied across dimensions such as age, gender, and accents. With this new dataset, the hope is that it fosters research for multilingual speech translation and enables a single model to support many language pairs, especially for pairs with less data.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*QMQ1t222en_RAY_i.png" alt="Source: Facebook AI Blog" /></p>
<p><em>Source: <a href="https://ai.facebook.com/blog/covost-v2-expanding-the-largest-most-diverse-multilingual-speech-to-text-translation-data-set/">Facebook AI Blog</a></em></p>
<h2 id="the-future-of-conversational-ai-systems">The future of conversational AI systems</h2>
<p>In a recent <a href="https://www.youtube.com/watch?v=41-FNujbKac&list=PL75e0qA87dlGP51yZ0dyNup-vwu0Rlv86&index=24">panel discussion</a> about how much should conversational AI developers know about ML and linguistics, Vladimir Vlasov, Emily Bender, Thomas Wolf, and Anna Rogers share their views and concerns. The overall consensus is that current language models are capable of achieving remarkable results in some tasks but experts believe that we are not testing the models for the right things in terms of really wanting to understand how well the models actually do at language modeling. Just because a pretrained language model can already be used to build creative applications it doesn’t imply that we have solved language modeling.</p>
<p><br />
Other interesting topics of discussion were ways to improve the evaluation of language models and better understand what the models are really learning. At the current moment, it’s really difficult to tell what aspects of language these models are actually capturing. This becomes even more difficult as more of these pretrained models, relying on different architectures and built on different objectives, keep rapidly emerging. The call to action is that we focus more on standardizing methods for evaluation and deal with the question of what these models are really capturing and how to avoid their harmful use from a practical standpoint such as when building conversational AI systems.</p>
<p><br />
If you want more takeaways, feel free to check out this Twitter <a href="https://twitter.com/omarsar0/status/1291725568640245760?s=20">thread</a> or check the panel discussion directly.</p>
<h1 id="noteworthy-mentions-️">Noteworthy Mentions ⭐️</h1>
<p><strong><em>Understanding and Implementing SimCLR in PyTorch — An ELI5 Guide</em></strong></p>
<p><br />
Previously, we covered the SimCLR pre-training framework used for learning rich visual representations and achieving performance improvements over methods for self-supervised and semi-supervised learning on ImageNet. SimCLR is based on contrastive learning which attempts to train a model to <a href="https://amitness.com/2020/03/illustrated-simclr/"></a><a href="https://amitness.com/2020/03/illustrated-simclr/"><em>distinguish between similar and dissimilar things</em></a><em>.</em> Marcin recently wrote an <a href="https://zablo.net/blog/post/understanding-implementing-simclr-guide-eli5-pytorch/">extensive article</a> providing a code walkthrough and explanation of the framework using PyTorch.</p>
<p><br />
<strong><em>The Lottery Ticket Hypothesis</em></strong></p>
<p><br />
Is it possible to find subnetworks that perform similar to the original neural network on specific tasks? Using pruning techniques, current research argues that indeed this is possible, stating that the way the models are initialized has a lot to do with achieving this effect. Learn more about this area of research in this <a href="https://medium.com/dair-ai/the-lottery-ticket-hypothesis-7cd4eae3faaa">short summary</a>.</p>
<p><br />
<strong><em>NodeNet: A Graph Regularised Neural Network for Node Classification</em></strong></p>
<p><br />
Graph-based learning algorithms utilize the data and related information effectively to build superior models. Neural Graph Learning (NGL) is one such technique that utilizes a traditional machine learning algorithm with a modified loss function to leverage the edges in the graph structure. This <a href="https://arxiv.org/abs/2006.09022">work</a> proposes a model using NGL — NodeNet, to solve the node classification task for citation graphs. The authors claim that NodeNet achieves state of the art results on papers with code for Pubmed and Citeseer datasets.</p>
<p><br />
<strong>When are Contextual Embeddings Worth Using?</strong></p>
<p><br />
In this <a href="https://medium.com/dair-ai/when-are-contextual-embeddings-worth-using-b509008cc325">blog post,</a> Viktor Karlsson summarizes a paper that discusses situations when it might actually make sense to use contextual embeddings (e.g., BERT) and when it may not be worth it.</p>
<p><br />
<strong><em>Simple and Efficient Deep Learning for Natural Language Processing, with Moshe Wasserblat, Intel AI</em></strong></p>
<p><br />
In this <a href="https://youtu.be/Bgr684dPJ6U">talk</a>, Moshe Wasserblat, Intel AI, presents simple and efficient deep learning methods for NLP. The speaker provides background on popular optimization vectors for NLP and great tips on distilling BERT for vastly faster inference with a sustainable accuracy penalty. Results on the <a href="https://github.com/dair-ai/emotion_dataset">dair.ai emotion dataset</a> and other popular benchmarks were also discussed.</p>
<p><br />
<strong><em>DeText: A deep NLP framework for intelligent text understanding</em></strong></p>
<p><br />
<a href="https://engineering.linkedin.com/blog/2020/open-sourcing-detext">DeText</a> is a framework for leveraging deep learning-based NLP technologies like BERT for text understanding. DeText offers features that make it feasible to use large models that require heavy computation costs out of the box such as BERT. With this framework, it is possible to implement neural ranking for search and recommender systems.</p>
<p><br />
<a href="https://dair.ai/newsletter/"><em>Subscribe</em></a> <em>🔖 to the NLP Newsletter to receive future issues in your inbox.</em></p>
<p><a href="https://dair.ai/posts/NLP_Newsletter_14-en/">NLP Newsletter 14 [EN]: NLP Beyond English, Big Bird, Monitoring ML Models, Breaking into NLP, arXiv Dataset,…</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on August 11, 2020.</p>
https://dair.ai/posts/NLP_Newsletter_#14_[FR]2020-08-11T00:00:00+00:002020-08-11T00:00:00+00:00Loïck BOURDOIShttps://dair.ai
<p><img src="https://cdn-images-1.medium.com/max/800/0*SfmXR6C5pvmRH2_B.png" alt="" /></p>
<h1 id="avant-propos-delvis">Avant-propos d’Elvis</h1>
<p>Bonjour à tous,</p>
<p><br />
Bienvenue au 14ème numéro de la Newsletter consacrée au NLP. Tout d’abord, merci de prendre le temps de lire la newsletter. Certaines choses changent dans celle-ci et c’est pour le mieux. Nous allons nous concentrer sur quelques thèmes importants de l’apprentissage machine et du NLP, centrés sur trois piliers que je crois importants pour notre communauté : <strong><em>éducation</em></strong>, <strong><em>recherche</em></strong>, et <strong><em>technologies</em></strong>. En fait, ce sont les mêmes piliers sur lesquels nous, à dair.ai, nous concentrons et construisons nos initiatives et nos projets. J’espère que vous aimez le nouveau format car il nous permet de discuter de certains sujets importants plus en profondeur que d’habitude.</p>
<p><br />
Je ne peux insister davantage sur l’importance de continuer à apprendre dans un domaine aussi rapide que l’apprentissage machine. Que vous soyez à la pointe de la recherche ou que vous mettiez en production des modèles de ML à grande échelle chaque jour, il est toujours possible d’apprendre quelque chose de nouveau chaque semaine. La question est de savoir comment vous gardez votre motivation pour apprendre. J’ai identifié cela comme une opportunité pour nous de nous connecter et de partager ce qui compte pour nous. C’est pourquoi j’ai créé un nouveau <em>groupe d’apprentissage</em> appelé <a href="https://github.com/dair-ai/keep-learning-ml">Keep Learning ML</a>. Chaque vendredi, nous nous réunirons et nous nous amuserons en partageant ce que nous avons appris au cours de la semaine. Il peut s’agir d’un article de NLP ou de ML, d’un outil, d’une démo, d’un point de vue philosophique, d’un problème urgent, etc.</p>
<p><br />
Dans ce numéro, nous abordons des sujets qui vont de l’importance d’appliquer le NLP à d’autres langues que l’anglais aux ressources pour la surveillance des systèmes de ML, en passant par une conversation sur l’avenir des systèmes d’IA conversationnelle.</p>
<p><br />
Nous remercions tout particulièrement <a href="https://twitter.com/_skeshaw">Keshaw Singh</a> et <a href="https://twitter.com/manisnesan">Manikandan Sivanesan</a> pour leur contribution significative à cette édition de la lettre d’information de la NLP.</p>
<p><br />
<em>Bonne lecture</em></p>
<h1 id="top-stories">Top Stories</h1>
<h2 id="présentation-des-applications-basée-sur-le-gpt-3">Présentation des applications basée sur le GPT-3</h2>
<p>Cette année, OpenAI a travaillé sur l’accès de ses modèles d’apprentissage machine (ML) via une <a href="https://openai.com/blog/openai-api/">API</a>. Le processus d’accès à l’API nécessite une <a href="https://forms.office.com/Pages/ResponsePage.aspx?id=VsqMpNrmTkioFJyEllK8s0v5E5gdyQhOuZCXNuMR8i1UQjFWVTVUVEpGNkg3U1FNRDVVRFg3U0w4Vi4u">demande formelle</a> et l’indication de votre objectif d’utilisation. Quelques développeurs et chercheurs ont reçu l’accès à l’API et ont présenté différentes applications du GPT-3.</p>
<p><br />
Un développeur a construit un générateur de site et un autre un <a href="https://losslesshq.com/">générateur de regex</a>. Si vous souhaitez voir d’autres cas d’utilisation de GPT-3, Yaser Martinez Palenzuela a rassemblé une collection de <a href="https://github.com/elyase/awesome-gpt3">démos et applications</a> qui sont alimentées par GPT-3. Il devient évident que des modèles comme le GPT-3 ont un énorme potentiel pour être utilisés dans des applications du monde réel, cependant, il n’est pas clair si ces applications sont sûres et comment traiter les préjugés nuisibles ou les biais qui sont communs avec ce type de modèles qui sont généralement pré-entraînés sur des données Internet ouvertes à grande échelle.</p>
<p><br />
<strong><em>En savoir plus 🎓</em></strong></p>
<p><br />
Si vous souhaitez en savoir plus sur le fonctionnement du GPT-3, Jay Alammar a préparé une <a href="https://jalammar.github.io/how-gpt3-works-visualizations-animations/">série d’animations</a> qui explique les étapes importantes de l’architecture du transformer avec des exemples et des transformations clés qui ont lieu dans le modèle de langage du GPT-3.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*mSEp2sRTqWJOcbj7.png" alt="Source : Jay Alammar" />
<em>Source : <a href="https://jalammar.github.io/how-gpt3-works-visualizations-animations/">Jay Alammar</a></em></p>
<p><br />
<strong><em>Rester informé 🎯</em></strong></p>
<p><br />
Avec tout le buzz autour des technologies émergentes, GPT-3 en étant un exempel, cet article publié par Page Street Labs propose un cadre intuitif pour donner un sens au “battage” qui les entoure. Il s’agit notamment de mieux comprendre l’utilisation du terme “hype” lui-même. Plus précisément, les utilisateurs peuvent être placés dans l’un des quatre quadrants d’une visualisation 2x2 (voir figure ci-dessous) en fonction de leur expérience directe avec certaines technologies ainsi que de la polarité du battage (positif/négatif). La plupart des signaux utiles pour évaluer les capacités et les promesses proviennent de ceux qui ont une bonne connaissance des principes sous-jacents ou qui ont une expérience de la technologie : ceux qui la créent ou qui “voient la lumière” sont parmi ceux qui génèrent le buzz positif, tandis que ceux qui se trouvent de l’autre côté mettent en garde contre les fausses alarmes. Les lecteurs sont encouragés à lire l’<a href="https://pagestlabs.substack.com/p/gpt-3-and-a-typology-of-hype">article original</a>.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*kFJsX7bzrLOf8B1P.png" alt="Source : Page Street Labs" />
<em>Source : <a href="https://pagestlabs.substack.com/p/gpt-3-and-a-typology-of-hype">Page Street Labs</a></em></p>
<h2 id="jeux-de-données-pour-explorer-les-articles-scientifiques">Jeux de données pour explorer les articles scientifiques</h2>
<p>Plus tôt cette année, l’Institut Allen IA a publié un vaste ensemble de données d’<a href="https://allenai.org/data/cord-19">articles scientifiques liés à la COVID-19</a>. Alors que la plupart de ces articles contenaient des études sur la COVID-19 et d’autres coronavirus, de nombreux scientifiques et chercheurs ont commencé à l’utiliser pour effectuer des analyses intéressantes et construire des applications interactives qui permettent certaines capacités de recherche sémantique. L’idée était d’extraire de nouvelles informations des textes qui pourraient être utilisées par les chercheurs et les experts dans ce domaine pour découvrir des faits intéressants sur le virus, ce qui, dans un certain sens, accélérerait le rythme des découvertes.</p>
<p><br />
Dans la foulée, arXiv.org a <a href="https://blogs.cornell.edu/arxiv/2020/08/05/leveraging-machine-learning-to-fuel-new-discoveries-with-the-arxiv-dataset/">publié</a> un jeu de données contenant 1,7 million d’articles dans le but de fournir des articles scientifiques - dans différents domaines tels que la biologie et l’informatique - dans un format plus accessible et lisible par les machines. L’appel à l’action vise à donner aux chercheurs et aux praticiens de l’apprentissage machine les moyens de créer des outils pour accélérer les nouvelles découvertes à l’aide d’applications basées sur le ML, telles que l’analyse des tendances, les moteurs de recommandation de papier, la construction de graphiques de connaissances et même les interfaces de recherche sémantique.</p>
<p><br />
Suite à la publication de ce jeu de données, Elsevier a également récemment <a href="https://data.mendeley.com/datasets/zm33cdndxs/2">publié</a> un corpus de 40 000 textes CC-BY contenant des articles scientifiques qui pourraient être utilisés pour la recherche en NLP et en ML.</p>
<p><br />
<strong><em>Appel à l’action 💡</em></strong></p>
<p><br />
Chez dair.ai, nous avons lancé un projet dont l’objectif est d’utiliser l’ensemble de données arXiv pour explorer de nouvelles façons d’extraire des informations d’articles scientifiques afin d’alimenter de nouvelles découvertes. En plus de cet effort de recherche, nous mettons sur pied une équipe chargée de créer des applications basées sur le NLP avec des capacités de recherche sémantique afin de fournir à la communauté une solution open-source pour trouver facilement des tendances et d’autres idées intéressantes pour se tenir informé des différents domaines de recherche tels que l’apprentissage machine. Consultez cette <a href="https://github.com/dair-ai/arxiv_analysis">annonce</a> pour plus de détails et rejoignez notre <a href="https://join.slack.com/t/dairai/shared_invite/zt-dv2dwzj7-F9HT047jIGkunNKv88lQ~g">groupe Slack</a> pour plus d’informations.</p>
<h2 id="pourquoi-est-il-important-de-surveiller-les-modèles-dapprentissage-automatique-">Pourquoi est-il important de surveiller les modèles d’apprentissage automatique ?</h2>
<p>Imaginez un scénario dans lequel nous avons construit et déployé un système d’apprentissage machine (ML) pour nos utilisateurs. La question est maintenant de savoir comment nous pouvons garantir que le système fonctionne constamment comme prévu sur une période donnée. Selon les exigences de tolérance du système, l’impact d’une défaillance va d’un inconvénient mineur à des situations mettant la vie en danger. Les pratiques traditionnelles de gestion des performances des applications, telles que la surveillance des mesures de l’expérience utilisateur (par exemple, la latence) et des ressources du système (par exemple, l’utilisation du processeur et de la mémoire), sont également applicables à la surveillance des systèmes de ML. Elles sont essentielles pour identifier les modes de défaillance connus et mettre en place une infrastructure d’alerte pour effectuer des actions correctives. En outre, pour les systèmes de ML, nous devons également surveiller les facteurs suivants :</p>
<ul>
<li>les changements dans la qualité des données d’entrée tels que les colonnes manquantes ou les entrées inattendues pendant le temps d’inférence</li>
<li>les changements dans la relation entre l’entrée et la sortie sur une période de temps. Cela entraîne une dégradation progressive du modèle due à la modification des hypothèses sous-jacentes, communément appelée “dérive conceptuelle”.</li>
<li>la robustesse des prédictions due à toute modification des données, des caractéristiques, des hyperparamètres, des paramètres du modèle, etc… C’est ce qu’on appelle le principe CACE : * Changing Anything Changes Everything (Changer quelque chose change tout)*.
Si vous souhaitez en savoir plus sur l’importance de la surveillance des systèmes de blanchiment d’argent et découvrir quelques exemples concrets, consultez l’<a href="https://mlinproduction.com/why-is-it-important-to-monitor-machine-learning-models/">article complet</a>.</li>
</ul>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*TaJ_SWmAmU_8goRO.png" alt="Surveillance dans le cycle de vie du ML - Source de la figure" />
<em>Surveillance dans le cycle de vie du ML - <a href="https://martinfowler.com/articles/cd4ml.html">Source de la figure</a></em></p>
<p><br />
<strong><em>En savoir plus 🎓</em></strong>
<br />
Si vous souhaitez continuer à vous informer sur la surveillance des systèmes d’apprentissage machine et sur les MLOps en général, nous partageons quelques références ci-dessous :</p>
<ul>
<li><a href="https://github.com/eugeneyan/applied-ml">Applied ML</a> est un répertoire avec une liste d’articles et de blogs sur la science des données et l’apprentissage machine en production</li>
<li>Cet <a href="https://christophergs.com/machine%20learning/2020/03/14/how-to-monitor-machine-learning-models/">article</a> est un guide complet sur la complexité et l’importance de la surveillance, et fournit des conseils pratiques sur la surveillance des systèmes de blanchiment d’argent.</li>
<li>L’intégration continue est une pratique de développement de logiciels qui consiste à tester automatiquement chaque modification apportée à la base de code. Cela permet de détecter le changement granulaire qui entraîne l’échec des tests et permet à l’équipe de résoudre le problème d’intégration au début du cycle de développement. Dans cette <a href="https://www.youtube.com/watch?v=9BgIDqAzfuA&feature=youtu.be">vidéo</a>, découvrez comment cette pratique peut être appliquée aux projets d’apprentissage machine avec des outils tels que <a href="https://github.com/iterative/cml">Continuous Machine Learning</a> (CML).</li>
</ul>
<h2 id="big-bird-pour-des-séquences-plus-longues">Big Bird pour des séquences plus longues</h2>
<p><img src="https://cdn-images-1.medium.com/max/800/1*LhjnhIRT8TGzFRAa63THyA.png" alt="La figure montre comment Big Bird est capable de maintenir les propriétés de trois mécanismes d'attention différents - Zaheer et al. (2020)" />
<em>La figure montre comment Big Bird est capable de maintenir les propriétés de trois mécanismes d’attention différents - <a href="https://arxiv.org/abs/2007.14062">Zaheer et al. (2020)</a>.</em></p>
<p><br />
Il est bien connu que les modèles de langage basés sur des transformers reposent sur le mécanisme d’auto-attention ont une complexité quadratique dans le nombre de tokens. <a href="https://arxiv.org/abs/2007.14062">Big Bird</a> est un modèle basé sur Transformer qui vise à soutenir plus efficacement les tâches de NLP nécessitant des contextes plus longs en réduisant la complexité du mécanisme d’attention à une complexité linéaire dans le nombre de tokens.</p>
<p><br />
Pourquoi est-ce important ? Le traitement et l’extraction d’informations à partir de séquences plus longues sont utiles lorsqu’il s’agit de textes longs tels que des livres ou des articles scientifiques. Dans de tels cas, nous voudrions minimiser l’empreinte mémoire, c’est pourquoi il est important de réduire la complexité de la composante du mécanisme d’attention dans l’architecture de modélisation du langage. La réduction de la complexité est importante, tout comme le fait de conserver les propriétés originales du modèle. La façon dont Big Bird y parvient est de considérer l’auto-attention comme un graphe entièrement connecté et de tirer parti des propriétés du graphe, en augmentant notamment la vitesse de circulation des informations entre les paires de nœuds. Les auteurs affirment qu’avec la nouvelle attention réduite proposée, leur modèle peut gérer des séquences <em>“d’une longueur pouvant atteindre 8 fois ce qui était possible auparavant avec un matériel similaire “</em>.</p>
<p><br />
<strong><em>En savoir plus 🎓</em></strong></p>
<p><br />
Si vous voulez avoir plus d’informations sur les choix de conception de Big Bird, Yannic Kilcher fournit une explication de ce modèle dans cette <a href="https://www.youtube.com/watch?v=WVPE62Gk3EM&t=678s">vidéo</a>.</p>
<h2 id="pénétrer-dans-lapprentissage-profond-et-le-nlp">Pénétrer dans l’apprentissage profond et le NLP</h2>
<p>L’un des plus grands changements auxquels beaucoup d’entre nous ont dû s’habituer cette année en raison de la pandémie est l’idée d’apprendre à distance. Cela a également ouvert de nombreuses possibilités d’apprentissage non seulement pour les communautés locales mais aussi pour les étudiants du monde entier. Dans cette partie de la newsletter, nous partageons quelques ressources pour ceux qui cherchent à se lancer dans l’apprentissage approfondi ou le NLP.</p>
<p><br />
<strong><em>Groupe d’étude « Dive into Deep Learning »</em></strong></p>
<p><br />
Le week-end dernier, dair.ai a accueilli la première session du nouveau groupe d’étude sur l’apprentissage approfondi. La session a duré plus d’une heure et s’est concentrée sur un large aperçu de l’apprentissage approfondi. Plus de 150 personnes venant du monde entier ont participé à la session en direct (voir l’enregistrement <a href="https://www.youtube.com/watch?v=xS3_b0BsSes">ici</a>). La deuxième session visera à couvrir quelques préliminaires tels que la probabilité et les statistiques, l’algèbre linéaire, et d’autres concepts importants pour l’étude et l’application des concepts de l’apprentissage approfondi. Si vous souhaitez vous joindre aux prochaines sessions, découvrez ce groupe d’étude <a href="https://github.com/dair-ai/d2l-study-group">ici</a>.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*nwPo0Xyi9GuEMFK12zZkVg.png" alt="La structure du contenu du programme d'études d'apprentissage approfondi" />
<em>La structure du contenu du programme d’étude de l’apprentissage profond</em></p>
<p><br />
<strong><em>S’initier au NLP grâce à deeplearning.ai</em></strong></p>
<p><br />
Récemment, deeplearning.ai a publié une nouvelle <a href="https://www.deeplearning.ai/natural-language-processing-specialization/">spécialisation axée sur le NLP</a>. Lors d’une récente table ronde, Andrew Ng a été rejoint par des experts du domaine et a discuté de sujets intéressants autour de “l’entrée dans le NLP”. La discussion a mis l’accent sur les tendances duNLP et sur d’autres conseils pour les étudiants. Elvis a écrit un <a href="https://twitter.com/omarsar0/status/1288776352460673024?s=20">fil de discussion</a> sur les points qu’il a retenus de cette session, qui vont des conseils aux étudiants aux domaines de recherche intéressants et aux tendances du NLP.</p>
<p><br />
<strong><em>LxMLS Lisbon Machine Learning School</em></strong></p>
<p><br />
Le LxMLS 2020 à l’<a href="http://tecnico.ulisboa.pt/en/">Instituto Superior Técnico</a>(IST) a eu lieu à distance et toutes les conférences ont été données et diffusées publiquement en ligne. Ce programme est considéré comme l’un des meilleurs programmes d’apprentissage de la NLP en Europe. Plusieurs chercheurs renommés ont soit participé au programme dans le passé, soit enseigné dans le cadre du programme. La <a href="http://lxmls.it.pt/2020/?page_id=19">10e édition du LxMLS</a> comprenait des conférences par exemple sur la modélisation de données séquentielles ou encore l’application de l’apprentissage par renforcement dans le contexte du NLP. Vous pouvez trouver toutes <a href="https://www.youtube.com/channel/UCkVFZWgT1jR75UvSLGP9_mw/videos">les conférences</a> sur YouTube.</p>
<p><br />
<strong><em>Apprentissage profond pour la vision par ordinateur</em></strong></p>
<p><br />
Justin Johnson a récemment annoncé qu’ils ont publié toutes les <a href="https://www.youtube.com/playlist?list=PL5-TkQAfAZFbzxjBHtzdVCWE0Zbhomg7r">conférences vidéo</a> pour leur nouveau cours sur l’apprentissage approfondi de la vision par ordinateur. Selon Justin, il s’agit d’une évolution du cours <a href="http://cs231n.stanford.edu/2019/">CS231n</a> qui a été dispensé à Stanford par lui et d’autres. Tout le contenu a été actualisé et les conférences incluent maintenant de nouveaux sujets comme la vision 3D et les transformers appliqués dans le contexte de la vision par ordinateur.</p>
<p><br />
<strong><em>Rester informé 🎯</em></strong></p>
<p><br />
<a href="https://nlpwithfriends.com/">NLP with Friends</a> est un effort pour rassembler les étudiants afin de discuter de sujets de recherche intéressants liés au NLP. Des discussions sont organisées chaque semaine sur Zoom afin que vous puissiez vous joindre aux sessions à distance.</p>
<p><br />
<strong><em>En savoir plus 🎓</em></strong></p>
<p><br />
<em>Voici quelques concours et ateliers liés au NLP que nous avons trouvés utiles pour vous faire participer:</em></p>
<ul>
<li><a href="https://www.kaggle.com/c/contradictory-my-dear-watson"><strong><em>Contradictoire, mon cher Watson : Détecter la contradiction et l’implication dans un texte multilingue en utilisant les TPU.</em></strong></a>. Il s’agit d’un concours de type “terrain de jeu” basé sur l’inférence en langage naturel (NLI) pour déterminer si des paires de phrases sont liées. Les participants doivent créer un modèle de NLI à partir d’un ensemble de données comprenant des textes de 15 langues différentes.</li>
<li><a href="https://hasocfire.github.io/hasoc/2020/call_for_participation.html"><strong><em>Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC)</em></strong></a> fournit un forum et un défi pour promouvoir la recherche multilingue sur la détection des contenus problématiques. Cette année, l’ensemble de données contient 10 000 tweets annotés en anglais, allemand et hindi. La première sous-tâche consiste à détecter les contenus haineux, offensants ou profanes dans le texte. La deuxième sous-tâche est plus granulaire pour discriminer et classer les tweets. Il existe une sous-piste distincte pour le CodeMix de Dravidian (nous en avons parlé dans notre précédente newsletter). La date limite d’inscription est fixée au 30 août 2020.</li>
</ul>
<h2 id="pourquoi-vous-devriez-faire-du-nlp-sur-des-langues-autres-que-langlais">Pourquoi vous devriez faire du NLP sur des langues autres que l’anglais</h2>
<p>Dans ce récent <a href="https://ruder.io/nlp-beyond-english">article</a>, Sebastian Ruder explique pourquoi les chercheurs en NLP devraient se concentrer sur d’autres langues que l’anglais. Pour commencer, le blog souligne l’énorme disparité dans la disponibilité des données en ligne entre une poignée de langues à haute ressource (dont l’anglais, le français, l’espagnol, …) et des milliers d’autres langues. La discussion principale est centrée sur les facteurs qui devraient encourager davantage d’initiatives de recherche dans d’autres langues, du point de vue sociétal, linguistique, de l’apprentissage machine, culturel et normatif, et cognitif.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*61agh1MtYJRPZGDG.png" alt="Distribution des ressources linguistiques de Joshi et al. (2020). Les groupes 5 et 4 sont des langues qui sont bien étudiées alors que les autres groupes sont largement négligés" />
<em>Distribution des ressources linguistiques de <a href="https://arxiv.org/abs/2004.09095">Joshi et al. (2020)</a>. Les groupes 5 et 4 sont des langues qui sont bien étudiées alors que les autres groupes sont largement négligés.</em></p>
<p><br />
L’un des points abordés est la manière dont les modèles spécifiques à l’anglais peuvent limiter l’accès à la connaissance en raison des barrières linguistiques, provoquer des préjugés et des discriminations à l’encontre des non-anglophones, ainsi que mettre en danger une langue elle-même dans des cas extrêmes. D’un point de vue linguistique, de nombreuses langues riches en ressources sont morphologiquement pauvres, ce qui signifie que nous passons à côté de la capacité de généralisation en ignorant les autres langues qui peuvent aider les modèles à apprendre ces informations. Considérer les choses d’un point de vue ML, étant donné que la plupart des langues ont des données limitées disponibles, jeter beaucoup de données et espérer le meilleur résultat ne peut être la solution. Les approches qui tiennent compte des langues et qui peuvent fonctionner avec peu de données ont plus de chances d’avoir un réel impact. Le blog se termine par un appel à l’action pour de futures recherches, notamment la création d’ensembles de données dans plusieurs langues, l’évaluation d’une approche dans plusieurs langues et le respect de la <a href="https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/">règle de Bender</a>.</p>
<h2 id="covost-v2--développer-lensemble-de-données-de-traduction-de-la-parole-au-texte-multilingue-le-plus-vaste-et-le-plus-diversifié">CoVoST V2 : Développer l’ensemble de données de traduction de la parole au texte multilingue le plus vaste et le plus diversifié</h2>
<p>Les ensembles de données multilingues permettent de mieux tester la robustesse des modèles d’apprentissage automatique qui visent à traiter soit la modélisation du langage, soit la reconnaissance vocale. Un domaine de l’apprentissage automatique qui pourrait bénéficier de ce type de données est celui des applications de traduction vocale multilingue. Ces types de modèles peuvent contribuer à lever les obstacles qui sont très courants avec les outils de communication en ligne, où les utilisateurs proviennent de cultures différentes. Ils peuvent également contribuer à enrichir les conversations des personnes qui ne maîtrisent pas certaines langues et à amplifier leur voix en ligne. Il existe toute une série d’autres façons d’utiliser l’ensemble de données et les applications qui en dépendent, comme la création de compositeurs intelligents ou d’outils de recherche plus accessibles.</p>
<p><br />
Facebook AI [publié] (https://ai.facebook.com/blog/covost-v2-expanding-the-largest-most-diverse-multilingual-speech-to-text-translation-data-set/) CoVoST V2, qu’ils considèrent comme le plus grand ensemble de données multilingues parole-texte disponible à ce jour. La nouvelle version de l’ensemble de données ajoute de nouvelles langues à la première version précédente, avec un total de 2900 heures de parole. Avant de la publier, les chercheurs ont vérifié la qualité des données et ont constaté qu’elle variait selon des critères tels que l’âge, le sexe et les accents. Avec ce nouvel ensemble de données, on espère qu’il favorisera la recherche en matière de traduction vocale multilingue et qu’il permettra à un seul modèle de prendre en charge de nombreuses paires de langues, en particulier pour les paires comportant moins de données.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*QMQ1t222en_RAY_i.png" alt="Source : Blog AI de Facebook" />
<em>Source : <a href="https://ai.facebook.com/blog/covost-v2-expanding-the-largest-most-diverse-multilingual-speech-to-text-translation-data-set/">Blog Facebook AI</a></em></p>
<h2 id="lavenir-des-systèmes-dia-conversationnelle">L’avenir des systèmes d’IA conversationnelle</h2>
<p>Lors d’une récente <a href="https://www.youtube.com/watch?v=41-FNujbKac&list=PL75e0qA87dlGP51yZ0dyNup-vwu0Rlv86&index=24">table ronde</a> sur ce que les développeurs d’IA conversationnelle devraient savoir sur le ML et la linguistique, Vladimir Vlasov, Emily Bender, Thomas Wolf et Anna Rogers ont partagé leurs points de vue et leurs préoccupations. Le consensus général est que les modèles linguistiques actuels sont capables d’obtenir des résultats remarquables dans certaines tâches, mais les experts pensent que nous ne testons pas les modèles pour les bonnes choses. Ce n’est pas parce qu’un modèle linguistique pré-entraîné peut déjà être utilisé pour construire des applications créatives que cela signifie que nous avons résolu le problème de la modélisation linguistique.</p>
<p><br />
D’autres sujets de discussion intéressants ont porté sur les moyens d’améliorer l’évaluation des modèles linguistiques et de mieux comprendre ce que les modèles apprennent réellement. À l’heure actuelle, il est vraiment difficile de dire quels aspects de la langue ces modèles captent réellement. Cela devient d’autant plus difficile qu’un nombre croissant de ces modèles pré-entraînés, reposant sur des architectures différentes et construits sur des objectifs différents, continuent à émerger rapidement. Nous devons donc nous concentrer davantage sur la normalisation des méthodes d’évaluation et nous pencher sur la question de savoir ce que ces modèles capturent réellement et comment éviter leur utilisation néfaste d’un point de vue pratique, par exemple lors de la construction de systèmes d’IA conversationnels.</p>
<p><br />
N’hésitez pas à consulter ce fil de discussion sur <a href="https://twitter.com/omarsar0/status/1291725568640245760?s=20">Twitter</a> ou à vous rendre directement à la table ronde.</p>
<h1 id="mentions-spéciales-️">Mentions spéciales ⭐️</h1>
<p><strong><em>Comprendre et mettre en œuvre SimCLR en PyTorch - Un guide ELI5</em></strong></p>
<p><br />
Auparavant, nous avons abordé le cadre SimCLR utilisé pour entraîner des représentations visuelles riches et pour améliorer les performances par rapport aux méthodes d’apprentissage auto-supervisé et semi-supervisé sur ImageNet. SimCLR est basé sur un apprentissage contrastif qui tente d’entraîner un modèle à <a href="https://amitness.com/2020/03/illustrated-simclr/"><em>distinguer les choses similaires et dissemblables</em></a>*. Marcin a récemment écrit un <a href="https://zablo.net/blog/post/understanding-implementing-simclr-guide-eli5-pytorch/">article détaillé</a> fournissant une description et une explication du code de ce cadre en utilisant PyTorch.</p>
<p><br />
<strong><em>L’hypothèse du billet de loterie</em></strong></p>
<p><br />
Est-il possible de trouver des sous-réseaux dont les performances sont similaires à celles du réseau neuronal d’origine pour des tâches spécifiques ? En utilisant des techniques d’élagage, la recherche actuelle soutient que c’est effectivement possible, affirmant que la façon dont les modèles sont initialisés a beaucoup à voir avec l’obtention de cet effet. Pour en savoir plus sur ce domaine de recherche, consultez ce <a href="https://medium.com/dair-ai/the-lottery-ticket-hypothesis-7cd4eae3faaa">bref résumé</a>.</p>
<p><br />
<strong><em>NodeNet : Un réseau neuronal régularisé par graphique pour la classification des nœuds</em></strong></p>
<p><br />
Les algorithmes d’apprentissage basés sur des graphes utilisent efficacement les données et les informations connexes pour construire des modèles supérieurs. L’apprentissage neural par graphes (NGL) est une de ces techniques qui utilise un algorithme d’apprentissage machine traditionnel avec une fonction de perte modifiée pour exploiter les bords de la structure du graphe. Ce <a href="https://arxiv.org/abs/2006.09022">travail</a> propose un modèle utilisant NGL - NodeNet, pour résoudre la tâche de classification des nœuds pour les graphes de citation. Les auteurs affirment que NodeNet permet d’obtenir des résultats de pointe sur des articles avec du code pour les ensembles de données Pubmed et Citeseer.</p>
<p><br />
<strong>Quand est-ce qu’un embedding contextuel vaut la peine d’être utilisé ?</strong></p>
<p><br />
Dans ce <a href="https://medium.com/dair-ai/when-are-contextual-embeddings-worth-using-b509008cc325">billet de blog</a>, Viktor Karlsson résume un article qui traite de situations où il pourrait être judicieux d’utiliser des embeddings contextuels (par exemple, BERT) et où cela ne vaut pas la peine.</p>
<p><br />
<strong><em>Apprentissage simple et efficace du traitement du langage naturel, avec Moshe Wasserblat, Intel AI</em></strong></p>
<p><br />
Dans cette <a href="https://youtu.be/Bgr684dPJ6U">conférence</a>, Moshe Wasserblat, Intel AI, présente des méthodes simples et efficaces d’apprentissage profond en NLP. L’orateur donne des informations sur les vecteurs d’optimisation les plus populaires et des conseils sur la distillation de BERT pour une inférence beaucoup plus rapide avec une pénalité de précision durable. Les résultats de l’ensemble de données <a href="https://github.com/dair-ai/emotion_dataset">dair.ai emotion dataset</a> et d’autres points de référence populaires ont également été abordés.</p>
<p><br />
<strong><em>DeText : Un cadre NLP pour une compréhension intelligente des textes</em></strong></p>
<p><br />
<a href="https://engineering.linkedin.com/blog/2020/open-sourcing-detext">DeText</a> est un cadre permettant de tirer parti des technologies comme BERT pour la compréhension des textes. DeText offre des fonctionnalités qui rendent possible l’utilisation de grands modèles qui nécessitent des coûts de calcul élevés dès le départ. Grâce à ce cadre, il est possible de mettre en œuvre un classement neuronal pour les systèmes de recherche et de recommandation.</p>
<hr />
<p>Vous pouvez retrouver la précédente newsletter <a href="https://dair.ai/NLP_Newsletter_-13_-FR/">ici</a></p>
<p><br />
Si vous avez des jeux de données, des projets, des articles de blog, des tutoriels ou des documents que vous souhaitez partager dans la prochaine édition de la newsletter, vous pouvez utiliser ce <a href="https://forms.gle/3b7Q2w2bzsXE6uYo9">formulaire</a>.</p>
<p><br />
<a href="https://dair.ai/newsletter/">Abonnez-vous</a> pour recevoir les prochains numéros dans votre boîte mail.</p>
<p><a href="https://dair.ai/posts/NLP_Newsletter_-14_-FR/">NLP Newsletter 14 [FR]: NLP Beyond English, Big Bird, Monitoring ML Models, Breaking into NLP, arXiv Dataset,…</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on August 11, 2020.</p>
https://dair.ai/posts/making-monolingual-sentence-embeddings-multilingual-using-knowledge-distillation2020-07-16T00:00:00+00:002020-07-16T00:00:00+00:00Viktor Karlssonhttps://dair.aiviktor2karlsson@gmail.com
<p>Encoding the semantics of words and sentences is something we take for granted that state-of-the art NLP systems are capable of. SentenceBERT provides illustrating examples of how we can make the best use of transformer based architectures in tasks such as clustering and semantic textual similarity. This model is however limited to processing sequences of text from a <strong>single language</strong>, which in some cases can be the factor preventing us from deploying such a model into production. It would therefore be interesting to figure out a way to extend these models into the realm of <strong>multilinguality</strong>, which is what Reimers et al. study in <a href="https://arxiv.org/pdf/2004.09813.pdf">Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation</a>. This article is a summary of that research paper, where I also share my thoughts and reflections on their contribution and findings.</p>
<h2 id="introduction">Introduction</h2>
<p>Multilingual models can produce token, and therefore in extension, sentence embeddings for multiple languages at once. While this capability extends the possible use cases, it comes with a small caveat: <strong>There is no guarantee that vector spaces across languages are aligned</strong>. This basically means that the same word or sentence, translated into different languages and processed by the model, could be assigned vector representations that aren’t similar nor close in the embedding space. This prevents us from performing tasks such as information retrieval, clustering and semantic textual similarity <em>across</em> languages.</p>
<p><br />
However, this is not to say that such tasks are impossible to perform <em>within</em> a single language. Semantically meaningful sentence embeddings can, and has been, generated successfully through models such as <a href="https://arxiv.org/abs/1908.10084">SentenceBERT (SBERT)</a> . If you haven’t already read that paper, I recommend you to have a look at my <a href="https://link.medium.com/lnaKDerqT7">paper summary which covers the motivation, implementation, related works and results achieved by the authors.</a> In short, SBERT is trained to generate sentence embeddings that preserve the input sequence’s semantics. This is achieved through mapping similar sentences close to each other while dissimilar ones further apart.</p>
<p><br />
What if we could extend this capability of assigning semantically meaningful representations to work both within and across a wider set of languages? That would open up lots of interesting use-cases. This, in fact, is exactly what Reimers et al. study in <a href="https://arxiv.org/pdf/2004.09813.pdf">Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation</a> . They transfer the capabilities of SBERT to multilingual models such as <a href="https://arxiv.org/pdf/1911.02116.pdf">XLM-Roberta (XLM-R)</a> through a novel <em>knowledge distillation</em> training process.</p>
<h2 id="proposed-knowledge-distillation-procedure">Proposed knowledge distillation procedure</h2>
<p>Knowledge distillation enables the behaviour of one model, referred to as the teacher, to be taught to another, the student. This method can in one configuration <em>distill</em> the <em>knowledge</em> of a large, possibly state of the art model into a much smaller one, <a href="https://medium.com/dair-ai/tinybert-size-does-matter-but-how-you-train-it-can-be-more-important-a5834831fa7d">which can retain 95%+ of the teachers performance while reducing the number of parameters with a factor of 7</a> ! It can however also be used when one model has properties we would like to transfer onto another one, without necessarily reducing model size. This is more akin to what Reimers et al. focused on.</p>
<p><br />
They employ SBERT as the teacher (T) and use a multilingual model as its student (S). To enable the student to mimic the behaviour of the teacher requires translated (parallel) sentences from one or more source languages <em>(s)</em> to one or more target languages <em>(t)</em>: \([(s_1, t_1), (s_2, t_2), …, (s_N, t_N)]\). The requirement is that the teacher can process the source language(s) <em>s</em> while the multilingual student can deal with the target ones <em>t</em>.</p>
<p><br />
The knowledge distillation training objective is constructed as follows: Train the student model S so that \(S(s_i) \approx T(s_i)\) and \(S(t_i) \approx T(s_i)\). In other words, we want the student’s sentence representations for both languages in the pair to be close to the teacher’s embedding in the source language. This can be achieved through minimising the mean-squared loss over a mini-batch <em>B</em>:</p>
<p><br />
<img src="../images/summary-making-monolingual-senence-embeddings-multilingual-using-knowledge-distillation/knowledge-distillation-loss-function.png" alt="Knowledge distillation objective for teacher model T and student S and sentence pairs (t_j, s_j) part of a mini batch B." /></p>
<p><br />
A schematic overview of the training procedure therefore looks something like this.</p>
<p><br />
<img src="../images/summary-making-monolingual-senence-embeddings-multilingual-using-knowledge-distillation/training-process-schematic.png" alt="Schematic overview of knowledge distillation training process proposed in [source](https://arxiv.org/pdf/2004.09813.pdf)" /></p>
<p><em>Note: A benefit of this procedure is that it allows training for a goal objective on high resource language to gain the sought after properties. These can then be transferred to lower resource languages where it might be more useful for the application at hand.</em></p>
<h2 id="experiments">Experiments</h2>
<p>The model is evaluated in three scenarios: <em>Multilingual semantic textual similarity</em> (mSTS), <em>Bitext retrieval</em> and how <em>training data size affects performance.</em> Let’s go through each of these, one by one.</p>
<h3 id="multilingual-semantic-textual-similarity">Multilingual semantic textual similarity</h3>
<p>Here, models are evaluated based on how well they encode the semantics both within and across different languages. This is enabled through a multilingual STS dataset where sentence pairs from different languages are ranked based on their similarity.
The authors compare five different systems in this task, most of which will be used in later experiments so it’s worth keeping these on top of mind.</p>
<ol>
<li>Multilingual language models without specific training for aligning vector spaces across languages. Specifically Multilingual-BERT (mBERT) and XLM-Roberta (XLM-R).</li>
<li>Multilingual language models trained on English STS data. This should bias the embeddings to be close if semantically similar, at least for english. These models are referred to as mBERT-nli-stsb and XLM-R-nli-stsb.</li>
<li>LASER, an encoder-decoder LSTM trained for translation between 90+ languages. A max-pooling strategy is applied over the encoder outputs in order to generate a fixed size representation for each sequence.</li>
<li>Multilingual Universal Sentence Encoder (mUSE), a dual-encoder transformer architecture trained on SNLI data and parallel corpora over 16 languages.</li>
<li>Multilingual models trained through the proposed knowledge distillation process. Specifically mBERT, DistilmBERT and XLM-R distilled with knowledge from SBERT-nli-stsb, which will be referred to as “<multilingual model=""> ← SBERT-nli-stsb”</multilingual></li>
</ol>
<p><strong>Same-language STS evaluation</strong>
<br />
These models are first evaluated based on how well they can compare similarity between sentences from a single language at a time (see table below). What we find is that multilingual models, without STS specific training perform the worst, across all languages. This is in line with what previous work has found:</p>
<blockquote>
<p>The output from vanilla-transformer models such as BERT and RoBERTa is ill-suited for semantic comparison using cosine distance.</p>
</blockquote>
<p>The monolingual SBERT-nli-stsb, only trained on <em>English</em> data, perform well above random when computing similarity between <em>Spanish</em> sentences. This in and of itself is a surprising finding! For Arabic on the other hand is the performance worse which might be due to the amount of <em>out of vocabulary</em> tokens present in Arabic. SBERT uses a <a href="https://arxiv.org/pdf/1609.08144.pdf">Word-piece tokenizer</a> which is suboptimal of multilingual tasks due to language specific assumptions that are needed during training of this tokenizer. In multilingual settings, it’s preferable to use <a href="https://github.com/google/sentencepiece">Sentence-piece tokenisation</a> which is designed to address these shortcomings.</p>
<p><br />
The best performance is achieved through the proposed distillation process. It especially improves on the baseline for Arabic.</p>
<p><br />
<img src="../images/summary-making-monolingual-senence-embeddings-multilingual-using-knowledge-distillation/results-monolingual-sts.png" alt="Same-language STS evaluation for both baseline, related works and models proposed here. Scores are 100 x Spearman rank correlation. [Source](https://arxiv.org/pdf/2004.09813.pdf)" /></p>
<p><br />
<strong>Across language STS evaluation</strong>
<br />
The second STS task used to evaluate the models is more challenging, as it asks models to evaluate similarity of sentence pairs across languages. As apparent by the table below, most multilingual models struggle and perform significantly worse compared to the same-language STS task above. It is therefore safe to conclude that the multilingual models create meaningful representation within each language, but these vector spaces are not aligned across languages.</p>
<p><br />
<img src="../images/summary-making-monolingual-senence-embeddings-multilingual-using-knowledge-distillation/results-monolingual-sts.png" alt="Across-language STS evaluation. Scores are 100 x Spearman rank correlation. [Source](https://arxiv.org/pdf/2004.09813.pdf)" /></p>
<p>LASER performs best among the baseline algorithms with an improvement of more than 10 points. However, the multilingual knowledge-distilled models outperform it by a significant margin, again providing a 10 point increase across the board. One possible reason for why LASER cannot achieve competitive results here could be a result of how it was trained - to generate translations. This capability does not necessarily translate (no pun intended) over to comparing semantics, as two sentences do not need to be exact translations in order to be semantically equivalent.</p>
<h3 id="bitext-retrieval">Bitext retrieval</h3>
<p>This task asks the models to identify translated sentences in a large corpora of different languages, where only a small set of sentences have a translated equivalent in another language. By definition this task does not necessarily lend itself to models with the capability of finding semantically similar sentences. Translated sentences are probably assigned similar vectors, but the opposite does not hold true: just because two sentence embeddings are similar does not imply that they are translations of each other.</p>
<p><br />
These challenges are reflected in the results where all the multilingual transformer models struggle. The proposed knowledge distilled XLM-R significantly outperforms these baselines but is outperformed by both mUSE and LASER. From the discussion above, this is what we should expect.</p>
<p><br />
<img src="../images/summary-making-monolingual-senence-embeddings-multilingual-using-knowledge-distillation/results-bucc-bitext-retrieval.png" alt="F1 scores for BUCC Bitext mining task._ [Source](https://arxiv.org/pdf/2004.09813.pdf)" /></p>
<h3 id="training-dataset-effect-on-performance-for-different-languages">Training dataset effect on performance for different languages</h3>
<p>The final experiment performed in this paper studies the effect of training data characteristics and size on mSTS performance. This is done through training a bilingual XLM-R on a variety of datasets consisting of English-Arabic or English-German sentence pairs.
Training models in this way has an obvious advantage: It only needs to know two languages at a time, which in the results show a 1-2 point improvement compared to the multilingual models which had the capability of processing 10 languages. This effect is known as the <em>curse of multilinguality,</em> which states that</p>
<blockquote>
<p>Adding more languages to a models training data degrades its performance if the capacity of the model remains the same.</p>
</blockquote>
<p>When it comes to dataset size does the expected trend of “more data = better performance” hold true for EN-DE while not necessarily for EN-AR. Here, dataset complexity has much more impact than size. The model trained on a dataset of 27k sentence pairs significantly outperforms one trained on 8M sentence pairs!</p>
<h1 id="conclusion">Conclusion</h1>
<p>So, what have we learned? Well, we’ve seen that knowledge distillation can be used in more ways than just for compressing the knowledge of a large model into a smaller one. We also learned about the training process used to transfer the <em>properties</em> of a model specialised in capturing sentence semantics to a multilingual model, which aligned its vector spaces across languages. This allows us to perform clustering and semantic relatedness measurements without having to worry about language, with great performance!</p>
<p><a href="https://dair.ai/posts/making-monolingual-sentence-embeddings-multilingual-using-knowledge-distillation/">Making monolingual sentence embeddings multilingual using knowledge distillation</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on July 16, 2020.</p>
https://dair.ai/posts/Summary-MobileBERT2020-07-16T00:00:00+00:002020-07-16T00:00:00+00:00Viktor Karlssonhttps://dair.aiviktor2karlsson@gmail.com
<p>As the size of NLP model increases into the hundreds of billions of parameters, so does the importance of being able to create more compact representations of these models. <strong>Knowledge distillation</strong> has successfully enabled this where in one instance <a href="https://medium.com/dair-ai/tinybert-size-does-matter-but-how-you-train-it-can-be-more-important-a5834831fa7d">96% of the teacher’s performance was retained in a 7x smaller model</a>. However, knowledge distillation is still considered an afterthought when designing the teacher models which could reduce the effectiveness, leaving potential performance improvements for the student on the table.</p>
<p><br />
Further, the difficulties in fine-tuning small student models after the initial distillation, without degrading their performance, requires us to both pre-train and fine-tune the teachers on the tasks we want the student to be able to perform. Training a student modell through knowledge distillation will therefore require <em>more</em> training compared to only training the teacher, which limits the benefits of a student model to inference-time.</p>
<p><br />
What would be possible if, instead, knowledge distillation was put front and centre during design and training of the teacher model? Could we design and successfully train a model that is <em>supposed</em> to be distilled and could the distilled version successfully be fine-tuned on any down stream task? These are some of the questions adressed in <a href="https://www.aclweb.org/anthology/2020.acl-main.195.pdf">MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices</a> which this article will provide a summary of.</p>
<h1 id="the-mobilebert-architectures">The MobileBERT architectures</h1>
<p><img src="../../images/summary-mobilebert/architecture-teacher-and-student.jpg" alt="Architecture visualisation of transformer blocks within (a) BERT, (b) MobileBERT teacher and (c) MobileBERT student. The green trapezoids marked with “Linear” are referred to as bottlenecks. [Source](https://www.aclweb.org/anthology/2020.acl-main.195.pdf)" /></p>
<h2 id="linear">Linear</h2>
<p>Knowledge distillation require us to compare teacher and student representations so that the difference between these can be minimised. This is straight forward when both matrices or vectors are of the same dimension. Therefore, MobileBERT introduces a <em>bottleneck</em> layer into the transformer block. This allows the input to both student and teacher to be equivalent in size while their internal representations can differ. These bottlenecks are shown as green trapezoids marked with “Linear” in the figure above. In this particular case is the shared dimension 512, while the internal representation sizes for teacher and student are 1024 and 128 respectively. This allows us to use a BERT-large (340M parameters) equivalent model to train a 25M parameter student.</p>
<p><br />
Further, since input and output dimensions of each transformer block are the same for both models, it is possible to transfer both embedding and classifier parameters from teacher to student simply by copying them!</p>
<h2 id="multi-head-attention">Multi-Head Attention</h2>
<p>The observant reader will have noticed that the input to the Multi-Head Attention block (MHA) is not the output from the prior linear projection. Instead, the initial input is used. There is no motivation for this design choice in the paper, which leave us to speculate. I believe the reason is the increased dimensions of freedom it allows. Basically, we separate how the model is forced to process the information into two separate streams, one fed into the MHA block and the other as a skip-connection. (It’s also quite easy to convince oneself that using the output from the linear projection does not change the behaviour of the MHA block due to its initial linear transform)</p>
<h2 id="stacked-ffn">Stacked FFN</h2>
<p>To achieve high enough capacity within the small student, the authors introduce what they call <em>stacked FFN</em>, shown as a dashed box within the student model overview in the image above. Stacked FFN’s simply repeats the Feed Forward + Add & Norm blocks 4 times, which is chosen to achieve a good parameter ratio between MHA and FFN blocks. Ablation studies in this work show that the best performance is achieved when this ratio is in the range 0.4-0.6.</p>
<h2 id="operational-optimisations">Operational optimisations</h2>
<p>Due to one of the goals being to enable fast inference on resource limited devices did the authors identify two areas where their architecture could be further improved.</p>
<ol>
<li>Replace the smooth GeLU activation function for ReLU</li>
<li>Swap the normalisation operation for an element-wise linear transformation</li>
</ol>
<h1 id="proposed-knowledge-distillation-objectives">Proposed knowledge distillation objectives</h1>
<p>To achieve knowledge transfer between the proposed teacher and student, the authors applies knowledge distillation at three staged of the model:</p>
<ol>
<li><strong>Feature map transfer</strong> - Allows the student to mimic the teacher at each transformer layer output. It the architecture image, this is shown as a dashed arrow between the output of the models.</li>
<li><strong>Attention map transfer</strong> - Which tokens the teacher attend to at different layers and heads is another important property we want the student to learn. This is enabled through minimising the difference between the attention distributions (the KL-divergence) at each layer and head.</li>
<li><strong>Pre-training distillation</strong> - It is also possible to use distillation during pre-training through combining both Masked Language Modelling and Next Sentence Prediction tasks in a linear combination.</li>
</ol>
<p>With these objectives, there are more than one way we can perform knowledge distillation. The authors propose three alternatives:</p>
<ol>
<li><strong>Auxiliary knowledge transfer</strong> - The layer-wise knowledge transfer objectives are minimised together with the main objectives - masked language modelling and next sentence prediction. This can be considered the most simple approach.</li>
<li><strong>Joint knowledge transfer</strong> - Instead of trying to achieve all objectives at once, it is possible to separate knowledge distillation and pre-training into two stages of training. First, all layer-wise knowledge distillation objectives are minimised until convergence, then further training with the pre-training objective is performed.</li>
<li><strong>Progressive knowledge transfer</strong> - The two step approach can be taken even further. Errors not yet minimised properly in early layers will propagate and affect the training of later layers if all layers are trained simultaneously. It might, therefore, be better to train one layer at a time while freezing or reducing the learning rate of previous layers.</li>
</ol>
<p><br />
<img src="../../images/summary-mobilebert/training-strategies.jpg" alt="Knowledge transfer techniques. (a) Auxiliary knowledge transfer, (b) joint knowledge transfer, (c) progressive knowledge transfer. [Source](https://www.aclweb.org/anthology/2020.acl-main.195.pdf)" /></p>
<h1 id="experimental-results">Experimental results</h1>
<p>The authors evaluate their proposed MobileBERT in three configurations; the main model with 25M parameters (MobileBERT), the same model without the operational optimisations (MobileBERT w/o OPT), as well as a model with only 15M parameters (MobileBERT-tiny). These models were compared to both baseline algorithms such ELMo, GPT and BERT-base as wells as related distillation work: BERT-PKD, <a href="https://medium.com/dair-ai/tl-dr-distillbert-8fb0f9e3c03d">DistilBERT</a> and <a href="https://medium.com/dair-ai/tinybert-size-does-matter-but-how-you-train-it-can-be-more-important-a5834831fa7d">TinyBERT</a>.</p>
<p><br />
Training these variations of MobileBERT was found to be most effective through the progressive knowledge transfer process, which consistently outperformed the other two by a significant margin.</p>
<p><br />
<img src="../../images/summary-mobilebert/results-glue.jpg" alt="Experimental results on the GLUE benchmark. [Source](https://www.aclweb.org/anthology/2020.acl-main.195.pdf)" /></p>
<p><br />
What we find is that MobileBERT w/ o OPT outperforms the much larger BERT-base by 0.2 average GLUE score, while being 4x smaller. MobileBERT on the other hand is only 0.6 points behind BERT-base while having a much faster inference time - 62 ms for a sequence of 128 tokens on a Pixel 4 phone! Its performance is however still competitive, since it outperform GTP and ELMo by a significant margin.</p>
<blockquote>
<p>It’s therefore safe to conclude that it it possible to create a distilled model which both can be performant and fast on resource limited devices!</p>
</blockquote>
<p>MobileBERT-tiny achieves slightly better performance compared to TinyBERT. This does however become even more impressive when you consider how TinyBERT was fine-tuned for the GLUE tasks. Remember, prior to this work it was not possible to fine-tune the students due to their small capacity. TinyBERT’s teacher BERT-base, therefore, had to be fine tuned before its knowledge could be distilled into TinyBERT! That is <strong>not</strong> the case for MobileBERT. It’s been fine-tuned by itself on GLUE which proves that it’s possible to create a task agnostic model through the proposed distillation process!</p>
<h1 id="conclusion">Conclusion</h1>
<p>MobileBERT introduces bottlenecks in the transformer blocks, which allows us to more easily distil the knowledge from larger teachers into smaller students. This technique allows us to reduce the width rather than the depth of the student, which is known to produce a more capable model. This model highlight the fact that <em>it’s possible to create a student model which by itself can be fine-tuned after the initial distillation process.</em></p>
<p><br />
Further, the results also show that this holds true in practice too as <em>MobileBERT is able to reach 99.2% of BERT-base’s performance on GLUE with 4x fewer parameters and 5.5x faster inference on a Pixel 4 phone!</em></p>
<p><a href="https://dair.ai/posts/Summary-MobileBERT/">MobileBERT</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on July 16, 2020.</p>
https://dair.ai/posts/NLP_Newsletter_#13_[FR]2020-07-15T00:00:00+00:002020-07-15T00:00:00+00:00Loïck BOURDOIShttps://dair.ai
<p><img src="https://cdn-images-1.medium.com/max/1200/1*DFP4TyFn1lS2rNK8au2H2Q.png" alt="" /></p>
<h1 id="avant-propos-delvis">Avant-propos d’Elvis</h1>
<p>Bienvenue au treizième numéro de la lettre d’information consacrée au NLP.</p>
<p><br />
Dans ce numéro, nous abordons des sujets qui vont des travaux présentés lors de la <a href="https://acl2020.org/">conférence ACL</a> aux outils permettant d’améliorer l’exploration des documents et des codes, en passant par plusieurs recommandations utiles en NLP.</p>
<p><br />
Nous remercions tout particulièrement <a href="https://twitter.com/_skeshaw">Keshaw Singh</a> et <a href="https://twitter.com/manisnesan">Manikandan Sivanesan</a> pour leur contribution significative à cette édition de la lettre d’information de la NLP.</p>
<p><br />
<strong><em>Quelques mises à jour sur la lettre d’information sur le NLP et sur dair.ai :</em></strong></p>
<ul>
<li>Dans l’une de nos prochaines conférences, le Dr Juan M. Banda discutera de la motivation et de la raison d’être de sa boîte à outils pour l’exploitation des médias sociaux (Social Media Mining Toolkit (<a href="https://github.com/thepanacealab/SMMT">SMMT</a>) ainsi que de la manière de l’utiliser pour définir des frameworks pour la collecte de données à grande échelle sur les médias sociaux pour les projets de recherche en NLP et en ML. Il présentera toutes les leçons apprises, les erreurs et les décisions difficiles prises pour produire et maintenir un ensemble de données à grande échelle sur la COVID-19 issues de conversations sur Twitter. Le jeu de données comprent plus de <a href="https://zenodo.org/record/3911930">424 millions de tweets dans plus de 60 langues et provenant de plus de 60 pays</a>.</li>
<li>Nous avons récemment organisé un stream en direct sur <a href="https://www.youtube.com/watch?v=O2TZPrwhPhE">comment débuter en NLP</a>. Si vous débutez en NLP et que vous cherchez des conseils de recherche, n’hésitez pas à consulter la discussion. Si vous souhaitez être informé, vous pouvez vous inscrire sur la <a href="https://www.youtube.com/watch?v=O2TZPrwhPhE">chaîne YouTube</a> ou sur la <a href="https://www.meetup.com/dair-ai/">page Meetup</a>.</li>
<li>Dans la <a href="https://www.meetup.com/dair-ai/events/271794687/">prochaine discussion</a>, nous discuterons du document intitulé “Deep Learning Based Text Classification: A Comprehensive Overview ».</li>
</ul>
<h1 id="publications-">Publications 📙</h1>
<p><strong><em>Au-delà de la précision : test comportemental de modèles de NLP avec CheckList</em></strong>
<br />
L’une des stratégies les plus courantes pour mesurer la généralisation dans les modèles de NLP consiste à évaluer des séries de tests. Bien qu’utile, cette approche présente deux inconvénients majeurs : la surestimation de la capacité de généralisation d’un modèle et l’incapacité à déterminer ses points d’échec. Dans un travail présenté lors de l’ACL de cette année (qui a également remporté le prix du meilleur article), <a href="https://www.aclweb.org/anthology/2020.acl-main.442">Ribeiro et al. (2020)</a> proposent une méthodologie d’évaluation plus complète qui est à la fois agnostique au modèle et à la tâche.</p>
<p><br />
Elle applique le principe de test comportemental consistant à “découpler le test de la mise en œuvre” en traitant le modèle comme une boîte noire, ce qui permet de comparer différents modèles entraînés à partir de données différentes. Le <a href="https://github.com/marcotcr/checklist">code</a>, à l’aide de modèles et d’autres abstractions, permet aux utilisateurs de générer facilement un grand nombre de cas de test. Le travail contient également de multiples études d’utilisateurs, démontrant l’efficacité de ce framework pour identifier les points d’échec dans les modèles commerciaux et de l’état de l’art.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*TEs-JJz3P2_o2eYxJgaQYA.png" alt="" /></p>
<p><em>CheckListing a commercial sentiment analysis model (</em><a href="https://www.aclweb.org/anthology/2020.acl-main.442.pdf"><em>source</em></a><em>)</em></p>
<p><br />
<strong><em>TaBERT : un nouveau modèle pour comprendre les requêtes des tables de la base de données</em></strong>
<br />
Aujourd’hui, la plupart des approches sont surtout entraînées pour apprendre à partir d’un langage libre mais pas des tables de la base de données. <a href="https://ai.facebook.com/blog/tabert-a-new-model-for-understanding-queries-over-tabular-data">TaBERT</a> est le premier modèle à soutenir une compréhension commune des phrases en langage naturel et des données tabulaires. Cette compréhension commune des informations dans différents formats de données est importante dans des domaines tels que la compréhension des questions, où il faudrait un modèle capable d’analyser sémantiquement des bases de données pour répondre à une requête. TaBERT peut permettre différents cas d’utilisation commerciale tels que poser directement des questions sur un produit lorsque la réponse existe dans une base de données particulière de produits ou de transactions de commerce électronique.</p>
<p><br />
<strong><em>Climbing towards NLU : Sur le sens, la forme et la compréhension à l’ère des données</em></strong>
<br />
Dans une autre <a href="https://www.aclweb.org/anthology/2020.acl-main.463">publication</a> primée lors de la conférence du ACL de cette année, les professeurs Emily M. Bender et Alexander Koller plaident pour une compréhension claire de la distinction entre forme et sens dans la recherche contemporaine en NLP. En se concentrant sur le débat autour de la question de savoir si les grands modèles linguistiques pré-entraînés comme BERT et GPT-2 “comprennent” le langage, ils avancent l’argument suivant : “<em>la tâche de modélisation du langage, parce qu’elle n’utilise que la forme comme données d’entraînement, ne peut en principe conduire à l’apprentissage du sens</em>”. Le document contient également plusieurs expériences de réflexion perspicaces, dont une qu’ils appellent “le test du poulpe”. Enfin, les auteurs appellent à une approche plus descendante de la recherche future en NLP, et proposent quelques bonnes pratiques sur la manière de relever les défis qui s’y trouvent.</p>
<p><br />
<strong><em>La lutte contre le changement climatique grâce à l’apprentissage automatique</em></strong>
<br />
Pouvons-nous utiliser des méthodes d’apprentissage automatique pour réduire les émissions de gaz à effet de serre ? Ce <a href="https://arxiv.org/abs/1906.05433v2">document</a> examine le paysage des méthodes de blanchiment d’argent pour atténuer ce problème et éventuellement lutter contre le changement climatique. En plus de fournir un aperçu complet des méthodes de blanchiment d’argent appliquées dans différents secteurs (par exemple, les transports, les systèmes électriques, les bâtiments et les villes, etc.) et des problèmes spécifiques (par exemple, l’urbanisme, l’optimisation des chaînes d’approvisionnement, etc.), les auteurs fournissent également des recommandations, un appel à la collaboration et des opportunités commerciales pour lutter contre le changement climatique.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*pkgq7AR0Df0De2cq.png" alt="" /></p>
<p>*Figure par <a href="https://arxiv.org/abs/1906.05433v2"><em>Rolnick et al. (2020)</em></a></p>
<p><br />
<strong><em>Embeddings contextuels : Quand en valent-ils la peine ?</em></strong>
<br />
Des embeddings contextuels profonds comme ELMo et BERT ont été largement utilisés dans l’industrie ces dernières années, en plus de permettre des progrès rapides sur plusieurs critères de référence comme GLUE. Outre les coûts importants en termes de temps et de mémoire lors de l’entraînement, il y a des coûts supplémentaires lors du fine-tuning ou de l’inférence sur les tâches en aval. Dans un travail présenté lors de l’ACL qui vient de s’achever, <a href="https://www.aclweb.org/anthology/2020.acl-main.236">Arora et al. (2020)</a> évaluent les avantages de l’utilisation des embeddings BERT par rapport aux embeddings non contextuels (GloVe, aléatoire). Grâce à leurs expériences sur des tâches de référence en aval comme la reconnaissance d’entités nommées (NER), l’analyse des sentiments et les tâches de compréhension du langage naturel (GLUE), ils montrent qu’il est souvent possible d’obtenir des performances absolues de 5 à 10 % de celles des embeddings BERT en utilisant GloVe ou des embeddings aléatoires.</p>
<p><br />
<strong><em>Définir et évaluer une génération de langue naturelle équitable</em></strong>
<br />
Présenté lors du ACL 2020 Widening NLP Workshop, le <a href="http://www.winlp.org/wp-content/uploads/2020/final_papers/45_Paper.pdf">travail</a> de Catherine Yeo et Alyssa Chen se concentre sur les biais qui apparaissent dans la tâche de génération du langage que constitue l’achèvement des phrases. En particulier, elles présentent un cadre mathématique de l’équité, suivi d’une évaluation des préjugés sexistes dans GPT-2 et XLNet. Leur analyse fournit une formulation théorique pour définir les biais en NLG et des preuves empiriques que les modèles de génération du langage existants intègrent les préjugés liés au genre.</p>
<p><br />
<strong><em>Smart To-Do : Génération automatique d’éléments à faire à partir de courriers électroniques</em></strong>
<br />
Nous sommes nombreux à connaître la fonction <a href="https://research.google/pubs/pub45189/">Smart Reply</a> que nous voyons sur nos applications de courrier électronique. <a href="https://www.aclweb.org/anthology/2020.acl-main.767">Mukherjee et al. (2020)</a> explorent une nouvelle façon de stimuler la productivité des utilisateurs, en créant automatiquement des listes de tâches à partir des fils de discussion. Ce qui différencie cette tâche des autres tâches de génération de langage (par exemple, le résumé des conversations par courriel, les titres des nouvelles) est sa nature <em>focalisée sur l’action</em>, c’est-à-dire l’identification de la ou des tâches spécifiques à effectuer.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*IKKk0Eqm2hRW-yMQcbigIg.png" alt="" /></p>
<p><em>Sample To-Do list generation (</em><a href="https://www.aclweb.org/anthology/2020.acl-main.767.pdf"><em>source</em></a><em>)</em></p>
<h1 id="outils-et-jeux-de-données-️">Outils et jeux de données ⚙️</h1>
<p><strong><em>Transformer v3.0</em></strong>
<br />
L’équipe de Hugging Face a <a href="https://github.com/huggingface/transformers/releases/tag/v3.0.0">publié</a> une nouvelle version de sa librairie Transformers. Dans la nouvelle version 3.0 de Transformers, ils ont amélioré la documentation, renforcé les capacités de tokenisation et proposé plusieurs améliorations et ajouts de modèles.</p>
<p><br />
<strong><em>Texthero</em></strong>
<br />
<a href="https://texthero.org/">Texthero</a> est une boîte à outils Python permettant de travailler plus efficacement avec des ensembles de données textuelles. Elle peut être utilisée par les personnes qui se lancent dans le NLP et qui cherchent à construire rapidement un pipeline NLP pour comprendre les données avant de les modéliser. C’est également un excellent outil pour enseigner les concepts de NLP puisqu’il offre une API pour interagir facilement et efficacement avec des ensembles de données textuelles.</p>
<p><br />
<strong><em>Papers with Code Methods</em></strong>
<br />
L’équipe de Papers with Code a récemment lancé une nouvelle fonctionnalité appelée <a href="https://paperswithcode.com/methods">Methods</a> qui permet aux utilisateurs de mieux rechercher, naviguer et découvrir les différents éléments constitutifs de l’apprentissage machine tels que les optimiseurs, les fonctions d’activation, l’attention, et bien plus encore. Grâce à cette fonctionnalité, vous pouvez désormais facilement connaître les progrès réalisés en termes de méthodes dans le domaine de la NLP. Vous pouvez même suivre l’utilisation de ces méthodes au fil du temps et les tâches qu’elles soutiennent.
<br />
<img src="https://cdn-images-1.medium.com/max/800/1*ew_6dxwMIWZt6qSBQus5QQ.png" alt="" /></p>
<p><a href="https://paperswithcode.com/methods"><em>Papers with Code</em></a></p>
<p><br />
<strong><em>Code Finder for Research Papers</em></strong>
<br />
<a href="https://chrome.google.com/webstore/detail/code-finder-for-research/aikkeehnlfpamidigaffhfmgbkdeheil">Cette extension de navigateur gratuite récemment publiée</a> est utile pour trouver et afficher automatiquement des liens vers des implémentations de code pour les documents ML n’importe où sur le web, comme Google Search, Arxiv, Twitter, Scholar et d’autres sites.</p>
<h1 id="articles-et-blog-️">Articles et Blog ✍️</h1>
<p><strong><em>Rendre multilingue des phrases monolingues par la distillation des connaissances</em></strong>
<br />
Le codage de la sémantique des mots et des phrases est une chose que nous considérons comme allant de soi et dont les systèmes de NLP de pointe sont capables. SentenceBERT fournit des exemples illustrant la manière dont nous pouvons utiliser au mieux les architectures basées sur des transformers dans des tâches telles que le regroupement et la similarité sémantique des textes. Ce modèle est toutefois limité au traitement de séquences de texte provenant d’une seule langue, ce qui, dans certains cas, peut être le facteur qui vous empêche de déployer un tel modèle en production. Il serait donc intéressant de trouver un moyen d’étendre ces modèles au domaine du multilinguisme, ce que Reimers et al. étudient dans leur travail intitulé “<a href="https://arxiv.org/pdf/2004.09813.pdf"><em>Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation</em></a>”.
Cet <a href="https://medium.com/dair-ai/making-monolingual-sentence-embeddings-multilingual-using-knowledge-distillation-59d8a7713672">article</a> fournit un résumé de ce travail, où Viktor Karlsson partage également ses réflexions sur la contribution et les résultats des auteurs.</p>
<p><br />
<strong><em>DeViSe sur PyTorch</em></strong>
<br />
Cet <a href="https://medium.com/@vijayabhaskar96/fun-project-devise-on-pytorch-83eb09694d41">article de blog</a> présente le modèle d’intégration Deep Visual-Semantic (DeViSe) mis en œuvre dans PyTorch. DeViSe utilise comme cible les vecteurs de mots des labels, ce qui facilite l’apprentissage de la signification sémantique des labels. Le modèle est entraîné pour identifier les objets visuels en utilisant les données des images étiquetées et les informations sémantiques du texte non annoté. Ces modèles peuvent ensuite être utilisés pour générer des résultats intéressants pour des tâches telles que la recherche par mot-clé, la recherche inversée d’images et la recherche d’images par mot-clé.</p>
<p><br />
<strong><em>Découverte de la structure et de la fonction des protéines grâce à la modélisation du langage</em></strong>
<br />
Les modèles de langage se sont révélés très efficaces pour encoder des informations séquentielles telles que des phrases en langage naturel, ce qui est utile pour construire des modèles hautement prédictifs qui permettent d’effectuer un large éventail de tâches de NLP. Au fur et à mesure de leur amélioration, les modèles Transformer ont également été adoptés dans d’autres domaines tels que <a href="https://ai.facebook.com/blog/end-to-end-object-detection-with-transformers/">la vision par ordinateur pour la détection d’objets</a>. Il n’est pas surprenant que le mécanisme d’attention sous-jacent utilisé dans ces modèles de langage puisse être appliqué efficacement à d’autres problèmes difficiles et à fort impact, comme la découverte de la structure des protéines.</p>
<p><br />
<a href="https://blog.einstein.ai/provis/">Des travaux récents</a> d’un groupe de recherche de Salesforce montrent le potentiel de l’utilisation d’un modèle de langage Transformer pour récupérer la structure et les propriétés fonctionnelles des protéines en entraînant le modèle à prédire les acides aminés masqués dans une séquence protéique. Comme ces informations peuvent être traitées de manière séquentielle, une stratégie de préentraînement similaire à celle du modèle BERT peut être utilisée et appliquée à des séquences de protéines non marquées à grande échelle. Il est démontré que le mécanisme d’attention permet de saisir les relations de contact qui pourraient être utiles pour la prédiction des interactions entre protéines et alimenter la découverte scientifique en biologie.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*H20KG_4EnTW7iNSq.png" alt="" /></p>
<p><em>Figure source:</em> <a href="https://blog.einstein.ai/provis/"><em>Salesforce Einstein</em></a></p>
<p><br />
<strong><em>Nettoyage des données textuelles à l’aide Dynamic Embedding Visualisation</em></strong>
<br />
Il est primordial de disposer de données d’entraînement de haute qualité pour les tâches de traduction automatique, ce qui est plus difficile dans le cas des langues à faibles ressources. Dans ce <a href="https://t.co/JmAJn6L6HG">blogpost</a>, Morgan McGuire démontre comment utiliser des techniques telles que l’intégration contextuelle multilingue, ainsi que la réduction de la dimensionnalité en utilisant UMAP pour identifier de manière interactive les groupes bruyants et les supprimer afin d’améliorer les ensembles de données. L’animation suivante présente un groupe bruyant aléatoire contenant des données en arabe et en pied de page de site web dans un ensemble de données irlandais-anglais.</p>
<h1 id="education-">Education 🎓</h1>
<p><strong><em>Ethical & Responsible NLP</em></strong>
<br />
Rachel Tatman aborde des sujets importants dans son discours d’ouverture <a href="https://slideslive.com/38929585/what-i-wont-build">What I won’t build</a> à <a href="https://twitter.com/WiNLPWorkshop">@WiNLPWorkshop</a>. Les chercheurs et les praticiens doivent déterminer si les systèmes qu’ils mettent en place peuvent causer du tort, de la discrimination ou porter atteinte à la vie privée. Rachel nous invite à poser une série de questions sur les utilisateurs, l’utilisation du système et son effet sur l’inégalité systémique. Elle préconise également de sensibiliser les autres aux risques et à la manière dont l’organisation d’un effort coordonné peut donner des résultats.</p>
<p><br />
<strong><em>Full Stack Deep Learning</em></strong>
<br />
Ce nouveau cours appelé <a href="https://course.fullstackdeeplearning.com/">Full Stack Deep Learning</a> vise à fournir les connaissances nécessaires pour déployer des modèles d’apprentissage approfondi en production. Ce cours en ligne gratuit aborde notamment la mise en place de projets d’apprentissage machine, la gestion des données, l’entraînement et le débogage, les tests et le déploiement.
<br />
<img src="https://cdn-images-1.medium.com/max/800/0*ps_3B2O9_nuwIZLW.png" alt="" /></p>
<p><br />
<strong><em>Reinforcement Learning Tutorial</em></strong>
<br />
<a href="https://github.com/eemlcommunity/PracticalSessions2020/blob/master/rl/EEML2020_RL_Tutorial.ipynb">Dans ce tutoriel d’apprentissage par renforcement</a> (disponible sous forme de Google Colab), Feryal démontre d’importants concepts de RL qui comprennent des algorithmes tels que l’itération des politiques, le Q-Learning et le Q ajusté aux neurones. En outre, une courte introduction à l’apprentissage du renforcement profond est également couverte, qui comprend des explications et le code de l’algorithme du réseau Q profond (DQN).</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*bMU-UL-wPPGmZMKo.png" alt="" /></p>
<h1 id="rester-informé-">Rester informé 🎯</h1>
<p>Si vous cherchez d’autres aperçus et faits marquants de l’ACL de cette année, les liens suivants peuvent vous intéresser :</p>
<ul>
<li><a href="https://medium.com/@vered1986/highlights-of-acl-2020-4ef9f27a4f0c">Highlights of ACL 2020</a> (by Vered Shwartz)</li>
<li><a href="https://medium.com/@yoav.goldberg/the-missing-pieces-in-virtual-acl-a05327cf9a18">The missing pieces in virtual-ACL</a> (by Yoav Goldberg)</li>
<li><a href="https://medium.com/@maggie0/top-takeaways-from-an-acl-2020-mentoring-session-on-career-planning-becoming-a-research-leader-5c79ce75b98c">Takeaways from ACL 2020 Mentoring Session on Career Planning & becoming a research leader</a> (by Zhijing Jin)</li>
</ul>
<p>Quelques suggestions de challenges en cours si vous cherchez des idées pour démarrer et mettre en pratique vos connaissances en NLP et en ML :</p>
<ul>
<li><a href="https://dravidian-codemix.github.io/2020/index.html">Dravidian-CodeMix</a> - analyser des sentiments pour les langues dravidiennes dans le texte codé que l’on trouve dans les médias sociaux</li>
<li><a href="http://nlc2cmd.us-east.mybluemix.net/">NLC2CMD</a> - traduire les descriptions en anglaise des dans leur syntaxe Bash correspondante</li>
<li><a href="https://knowledgepit.ml/predicting-escalations-in-customer-support/">IEEE BigData 2020 Cup</a> - un défi d’exploration de données pour prédire l’augmentation de l’assistance technique aux clients en utilisant des techniques de langage naturel</li>
</ul>
<h1 id="mentions-spéciales-️">Mentions spéciales ⭐️</h1>
<ul>
<li>Amit Chaudhary a publié un <a href="https://amitness.com/2020/06/fasttext-embeddings/">article</a> qui passe en revue les défis posés par l’algorithme Word2Vec et la façon dont FastText les résout en utilisant des informations de sous-mots.</li>
<li>Dans cet article, George Ho <a href="https://eigenfoo.xyz/transformers-in-nlp/">résume</a> les tendances récentes en NLP. Il fournit un bref résumé des méthodes récentes, y compris d’autres aspects de ces modèles tels que la mise à l’échelle et les différences de représentation.</li>
<li>Kostas Stathoulopoulos a créé cet <a href="http://acl-explorer.eu-west-2.elasticbeanstalk.com/">outil de recherche</a> pour explorer et découvrir des articles récents ou passés de l’ACL . Vous pouvez rechercher des publications par auteur, le domaine d’étude, l’année, le titre de l’article, etc.</li>
<li><a href="https://www.dampfkraft.com/nlp/cutlet-python-romaji-converter.html">cutlet</a> permet de convertir le japonais en romaji. Contrairement aux outils existants, il utilise le même dictionnaire qu’un tokenizer japonais commun et a la possibilité d’utiliser l’orthographe originale pour les mots de prêt étrangers.</li>
</ul>
<hr />
<p>Vous pouvez retrouver la précédente newsletter <a href="https://dair.ai/NLP_Newsletter_-12_-FR/">ici</a></p>
<p><br />
Si vous avez des jeux de données, des projets, des articles de blog, des tutoriels ou des documents que vous souhaitez partager dans la prochaine édition de la newletter, vous pouvez utiliser ce <a href="https://forms.gle/3b7Q2w2bzsXE6uYo9">formulaire</a>.</p>
<p><br />
<a href="https://dair.ai/newsletter/">Abonnez-vous</a> pour recevoir les prochains numéros dans votre boîte mail.</p>
<p><a href="https://dair.ai/posts/NLP_Newsletter_-13_-FR/">NLP Newsletter 13 [FR]: ACL Highlights, TaBERT, Texthero, ML Methods, Climbing towards NLU,…</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on July 15, 2020.</p>
https://dair.ai/posts/NLP_Newsletter_13_en2020-07-13T00:00:00+00:002020-07-13T00:00:00+00:00Elvis Saraviahttps://dair.aiellfae@gmail.com
<p><img src="https://cdn-images-1.medium.com/max/1200/1*DFP4TyFn1lS2rNK8au2H2Q.png" alt="" /></p>
<p><br />
Hello everyone! Welcome to the 13th issue of the NLP Newsletter. In this issue, we cover topics that range from interesting works presented at the <a href="https://acl2020.org/">ACL conference</a> to tools for improving the exploration of papers and code to several useful NLP tool recommendations.</p>
<p><br />
Special thanks to <a href="https://twitter.com/_skeshaw">Keshaw Singh</a> and <a href="https://twitter.com/manisnesan">Manikandan Sivanesan</a> for significantly contributing towards this edition of the NLP Newsletter.</p>
<h1 id="dairai-updates">dair.ai updates</h1>
<ul>
<li>In one of our upcoming talks, Dr. Juan M. Banda will discuss the motivation and rationale behind their Social Media Mining Toolkit (<a href="https://github.com/thepanacealab/SMMT">SMMT</a>), and how to use it to define frameworks for large-scale social media data gathering for NLP and machine learning research projects. They will outline all the lessons learned, mistakes, and hard decisions made to produce and maintain a publicly available large-scale dataset of COVID-19 Twitter chatter data featuring over <a href="https://zenodo.org/record/3911930">424 Million Tweets in 60+ languages and from 60+ countries</a>.</li>
<li>We recently hosted a live stream talking about <a href="https://www.youtube.com/watch?v=O2TZPrwhPhE">how to get started with NLP</a>. If you are just getting started with NLP and you are looking for research tips feel free to check out the talk. We are hosting more live streams like this in the future, so if you want to get notified you can either subscribe to the <a href="https://www.youtube.com/watch?v=O2TZPrwhPhE">YouTube channel</a> or the <a href="https://www.meetup.com/dair-ai/">Meetup page</a>.</li>
<li>In the <a href="https://www.meetup.com/dair-ai/events/271794687/">upcoming paper discussion</a>, we will discuss the paper titled “Deep Learning Based Text Classification: A Comprehensive Overview”.</li>
</ul>
<h1 id="research-and-publications-">Research and Publications 📙</h1>
<p><strong><em>Beyond Accuracy: Behavioral Testing of NLP Models with CheckList</em></strong></p>
<p><br />
One of the most common strategies for measuring generalization in NLP models is via evaluation of held-out test sets. While useful, this approach has two significant drawbacks — overestimation of a model’s generalization capability, and the inability to determine its failure points. In a work presented at this year’s ACL (which also won the best paper award), <a href="https://www.aclweb.org/anthology/2020.acl-main.442">Ribeiro et al. (2020)</a> propose a more comprehensive evaluation methodology which is model-agnostic as well as task-agnostic.
It applies the behavioral testing principle of “decoupling testing from implementation” by treating the model as a black box, allowing comparison of different models trained on different data. The <a href="https://github.com/marcotcr/checklist">code</a>, with the help of templates and other abstractions, allows users to generate a large number of test cases easily. The work also contains multiple user studies, demonstrating the effectiveness of this framework in identifying critical failure points in both commercial and state-of-the-art models.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*TEs-JJz3P2_o2eYxJgaQYA.png" alt="" /></p>
<p><em>CheckListing a commercial sentiment analysis model (</em><a href="https://www.aclweb.org/anthology/2020.acl-main.442.pdf"><em>source</em></a><em>)</em></p>
<p><br />
<strong><em>TaBERT: A new model for understanding queries over tabular data</em></strong></p>
<p><br />
Most approaches today are mostly trained to learn from free-form language but not DB tables. <a href="https://ai.facebook.com/blog/tabert-a-new-model-for-understanding-queries-over-tabular-data">TaBERT</a> is the first model to support a joint understanding of natural language sentences and tabular data. This joint understanding of information across different data formats is important in areas such as question understanding where it would take a model the ability to semantically parse over databases to possibly address a query. TaBERT can enable different business use cases such as directly asking questions about a product where the answer exists in a particular e-commerce database of products or transactions.</p>
<p><br />
<strong><em>Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data</em></strong></p>
<p><br />
In another award-winning <a href="https://www.aclweb.org/anthology/2020.acl-main.463">publication</a> from this year’s ACL conference, professors Emily M. Bender and Alexander Koller advocate for a clear understanding of the distinction between form and meaning in contemporary NLP research. Focusing on the debate around whether large pretrained language models like BERT and GPT-2 “understand” language, they argue the following — “<em>the language modeling task, because it only uses form as training data, cannot in principle lead to learning of meaning</em>.” The paper also contains several insightful thought experiments, including one they call “the octopus test.” Finally, the authors call for a more top-down approach to future research in NLP, and propose some best practices on how to navigate the challenges that lie therein.</p>
<p><br />
<strong><em>Tackling Climate Change with Machine Learning</em></strong></p>
<p><br />
Can we use machine learning methods to reduce greenhouse gas emissions? This <a href="https://arxiv.org/abs/1906.05433v2">recent paper</a> looks at the landscape of ML methods for mitigating this issue and potentially tackling climate change. Besides providing a comprehensive survey of ML methods applied in different sectors (e.g., transportation, electricity systems, buildings and cities, etc. ) and specific problems (e.g., urban planning, optimizing supply chains, etc), the authors also provide recommendations, a call for collaboration, and business opportunities to tackle climate change.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*pkgq7AR0Df0De2cq.png" alt="" /></p>
<p><em>Figure by</em> <a href="https://arxiv.org/abs/1906.05433v2"><em>Rolnick et al. (2020)</em></a></p>
<p><br />
<strong><em>Contextual Embeddings: When Are They Worth It?</em></strong></p>
<p><br />
Deep contextual embeddings like ELMo and BERT have found widespread use in industry in recent years, in addition to enabling rapid progress on several benchmarks like GLUE. Besides incurring significant costs in terms of time and memory while pretraining, there are additional costs during fine-tuning or inference on downstream tasks as well. In a work presented at the recently concluded ACL, <a href="https://www.aclweb.org/anthology/2020.acl-main.236">Arora et al. (2020)</a> assess the benefits of using BERT embeddings relative to non-contextual (GloVe, random) embeddings. Through their experiments on benchmark downstream tasks like named entity recognition (NER), sentiment analysis, and natural language understanding (GLUE) tasks, they show that it is often possible to get within 5–10% of absolute performance of BERT embeddings when using GloVe or random embeddings.</p>
<p><br />
<strong><em>Defining and Evaluating Fair Natural Language Generation</em></strong></p>
<p><br />
Presented at the ACL 2020 Widening NLP Workshop, the <a href="http://www.winlp.org/wp-content/uploads/2020/final_papers/45_Paper.pdf">work</a> of Catherine Yeo and Alyssa Chen focuses on the biases that emerge in the language generation task of sentence completion. In particular, they introduce a mathematical framework of fairness for NLG followed by an evaluation of gender biases in GPT-2 and XLNet. Their analysis provides a theoretical formulation for defining biases in NLG and empirical evidence that existing language generation models embed gender bias.</p>
<p><br />
<strong><em>Smart To-Do: Automatic Generation of To-Do Items from Emails</em></strong></p>
<p><br />
Many of us are familiar with the <a href="https://research.google/pubs/pub45189/">Smart Reply</a> feature we see on our email applications. <a href="https://www.aclweb.org/anthology/2020.acl-main.767">Mukherjee et al. (2020)</a> explore a new way to boost user productivity, by automatically creating to-do lists from email threads. What differentiates this task from other language generation tasks (e.g. email conversation summarization, news headlines) is its <em>action-focused</em> nature, i.e., identifying specific task(s) to be performed.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*IKKk0Eqm2hRW-yMQcbigIg.png" alt="" /></p>
<p><em>Sample To-Do list generation (</em><a href="https://www.aclweb.org/anthology/2020.acl-main.767.pdf"><em>source</em></a><em>)</em></p>
<h1 id="tools-and-datasets-️">Tools and Datasets ⚙️</h1>
<p><strong><em>Transformer v3.0</em></strong></p>
<p><br />
The Hugging Face team has <a href="https://github.com/huggingface/transformers/releases/tag/v3.0.0">released</a> a new version of their popular library called Transformers. In the new Transformers v3.0, they have improved documentation, improved tokenization capabilities, and offer several model improvements and additions.</p>
<p><br />
<strong><em>Texthero</em></strong></p>
<p><br />
<a href="https://texthero.org/">Texthero</a> is a Python toolkit for working more efficiently with text-based datasets. It can be used by people getting started in NLP and who are seeking to quickly build a NLP pipeline to understand the data before modeling. This is also a great tool for teaching about NLP concepts since it offers a high-level API to easily and efficiently interact with textual datasets.</p>
<p><br />
<strong><em>Papers with Code Methods</em></strong></p>
<p><br />
The Papers with Code team recently released a new feature called <a href="https://paperswithcode.com/methods">Methods</a> that allows users to better search, navigate, and learn about the different building blocks of machine learning such as optimizers, activations, attention, and much more. With this feature, you can now easily find out what has been the progress, in terms of methods, in the field of NLP. You can even track the usage of these methods over time and what tasks they support.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*ew_6dxwMIWZt6qSBQus5QQ.png" alt="" /></p>
<p><a href="https://paperswithcode.com/methods"><em>Papers with Code</em></a></p>
<p><br />
<strong><em>Code Finder for Research Papers</em></strong></p>
<p><br />
This recently released free <a href="https://chrome.google.com/webstore/detail/code-finder-for-research/aikkeehnlfpamidigaffhfmgbkdeheil">browser extension</a> is useful for automatically finding and showing links to code implementations for ML papers anywhere on the web such as Google Search, Arxiv, Twitter, Scholar, and other sites.</p>
<h1 id="articles-and-blog-posts-️">Articles and Blog posts ✍️</h1>
<p><strong><em>Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation</em></strong></p>
<p><br />
Encoding the semantics of words and sentences is something we take for granted that state-of-the-art NLP systems are capable of. SentenceBERT provides illustrating examples of how we can make the best use of Transformer based architectures in tasks such as clustering and semantic textual similarity. This model is however limited to processing sequences of text from a single language, which in some cases can be the factor preventing you from deploying such a model into production. It would, therefore, be interesting to figure out a way to extend these models into the realm of multilinguality, which is what Reimers et al. study in their work titled <a href="https://arxiv.org/pdf/2004.09813.pdf"></a>“<a href="https://arxiv.org/pdf/2004.09813.pdf"><em>Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation</em></a><em>”.</em> This <a href="https://medium.com/dair-ai/making-monolingual-sentence-embeddings-multilingual-using-knowledge-distillation-59d8a7713672">article</a> provides a summary of this work, where Viktor Karlsson also shares his thoughts and reflections on the authors’ contribution and findings.</p>
<p><br />
<strong><em>DeViSe on PyTorch</em></strong></p>
<p><br />
This <a href="https://medium.com/@vijayabhaskar96/fun-project-devise-on-pytorch-83eb09694d41">blog post</a> showcases the Deep Visual-Semantic (DeViSe) embedding model implemented in PyTorch. DeViSe uses word vectors of labels as the targets which facilitate the learning of the semantic meaning of the labels. The model is trained to identify the visual objects using labeled image data & semantic information from unannotated text. Such models can then be used to generate interesting results for tasks such as keyword search, image reverse search, and image to keyword search.</p>
<p><br />
<strong><em>Discovering Protein Structure and Function Through Language Modeling</em></strong></p>
<p><br />
Transformer language models have shown to be very effective at encoding sequential information such as natural language sentences which is useful to build highly predictive models that perform a wide range of different NLP tasks. As Transformer models keep improving they have also seen adoption in other areas such as <a href="https://ai.facebook.com/blog/end-to-end-object-detection-with-transformers/">computer vision for object detection</a>. It should come as no surprise that the underlying attention mechanism used in these language models can be applied effectively to other difficult and high impact problems such as discovering protein structure.</p>
<p><br />
<a href="https://blog.einstein.ai/provis/">Recent work</a> from a Salesforce research group shows the potential of using a Transformer language model to recover the high-level structure and functional properties of proteins by training the model to predict masked amino acids in a protein sequence. As this information can be processed sequentially, a similar BERT-based language model pretraining strategy can be used and applied to large scale unlabeled protein sequences. The attention mechanism is shown to capture contact relationships which could be useful for protein interaction prediction and fuel scientific discovery in Biology.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*H20KG_4EnTW7iNSq.png" alt="" /></p>
<p><em>Figure source:</em> <a href="https://blog.einstein.ai/provis/"><em>Salesforce Einstein</em></a></p>
<p><br />
<strong><em>Text Data Cleanup using Dynamic Embedding Visualisation</em></strong></p>
<p><br />
Having high-quality training data for machine translation tasks is paramount and is more challenging in the case of low resourced languages. In this <a href="https://t.co/JmAJn6L6HG">blogpost</a>, Morgan McGuire demonstrates how to use techniques such as multilingual contextual embeddings, along with dimensionality reduction using UMAP to interactively identify noisy clusters and remove them to improve the datasets. The following animation showcases a random noisy cluster containing Arabic and website footer data in an Irish-English dataset.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*xC4HkE_Vx-Q9YDRP.gif" alt="" /></p>
<p><a href="https://www.ntentional.com/images/copied_from_nb/my_icons/20200629_text_clustering/bokeh4_2020-06-30.gif"><em>Source</em></a></p>
<h1 id="education-">Education 🎓</h1>
<p><strong><em>Ethical & Responsible NLP</em></strong></p>
<p><br />
Rachel Tatman addresses important topics in her keynote <a href="https://slideslive.com/38929585/what-i-wont-build">What I won’t build</a> at <a href="https://twitter.com/WiNLPWorkshop">@WiNLPWorkshop</a>. Researchers and practitioners should determine if the systems they build can cause harm, discrimination, or invade privacy. Rachel urges that we ask a series of questions about the users, usage of the system, and its effect on systemic inequality. She also advocates for educating others on the risks and how organizing a coordinated effort can show results.</p>
<p><br />
<strong><em>Full Stack Deep Learning</em></strong></p>
<p><br />
This new course called <a href="https://course.fullstackdeeplearning.com/">Full Stack Deep Learning</a> aims to provide the necessary knowledge needed to deploy deep learning models in production. Some topics covered in this free online course are how to set up machine learning projects, data management, training and debugging, testing and deployment, among other topics.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*ps_3B2O9_nuwIZLW.png" alt="" /></p>
<p><br />
<strong><em>Reinforcement Learning Tutorial</em></strong></p>
<p><br />
In this comprehensive reinforcement learning <a href="https://github.com/eemlcommunity/PracticalSessions2020/blob/master/rl/EEML2020_RL_Tutorial.ipynb">tutorial</a> (available as a Google Colab), Feryal demonstrates important RL concepts which include algorithms like policy iteration, Q-Learning and Neural Fitted Q. In addition, a short introduction to deep reinforcement learning is also covered which includes explanations and code for the Deep Q-network (DQN) algorithm.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*bMU-UL-wPPGmZMKo.png" alt="" /></p>
<h1 id="stay-informed-">Stay Informed 🎯</h1>
<p>If you are looking for further overviews and highlights from ACL this year the following links may interest you:</p>
<ul>
<li><a href="https://medium.com/@vered1986/highlights-of-acl-2020-4ef9f27a4f0c">Highlights of ACL 2020</a> (by Vered Shwartz)</li>
<li><a href="https://medium.com/@yoav.goldberg/the-missing-pieces-in-virtual-acl-a05327cf9a18">The missing pieces in virtual-ACL</a> (by Yoav Goldberg)</li>
<li><a href="https://medium.com/@maggie0/top-takeaways-from-an-acl-2020-mentoring-session-on-career-planning-becoming-a-research-leader-5c79ce75b98c">Takeaways from ACL 2020 Mentoring Session on Career Planning & becoming a research leader</a> (by Zhijing Jin)</li>
</ul>
<p>We would also like to suggest some ongoing challenges if you are looking for ideas to get started and apply NLP and machine learning in practice:</p>
<ul>
<li><a href="https://dravidian-codemix.github.io/2020/index.html">Dravidian-CodeMix</a> — sentiment analysis for Dravidian languages in the code-mixed text found in social media</li>
<li><a href="http://nlc2cmd.us-east.mybluemix.net/">NLC2CMD</a> — translate English descriptions of command-line tasks to their corresponding Bash syntax</li>
<li><a href="https://knowledgepit.ml/predicting-escalations-in-customer-support/">IEEE BigData 2020 Cup</a> — a data mining challenge to predict escalations in customer technical support using natural language techniques</li>
</ul>
<h1 id="noteworthy-mentions-️">Noteworthy Mentions ⭐️</h1>
<ul>
<li>Amit Chaudhary published an <a href="https://amitness.com/2020/06/fasttext-embeddings/">article</a> that goes over challenges with the Word2Vec algorithm and how FastText solves those challenges by using sub-word information.</li>
<li>In this blog post, George Ho <a href="https://eigenfoo.xyz/transformers-in-nlp/">summarizes</a> the recent general trends in natural language processing. He provides a short summary of recent methods including other aspects of these models such as scaling and differences in representations.</li>
<li>Kostas Stathoulopoulos built this <a href="http://acl-explorer.eu-west-2.elasticbeanstalk.com/">search tool</a> for exploring and discovering recent and past ACL papers. You can search for publications by author, the field of study, year, paper title, etc.</li>
<li><a href="https://www.dampfkraft.com/nlp/cutlet-python-romaji-converter.html">cutlet</a> converts Japanese to romaji. Unlike existing tools, it uses the same dictionary as a common Japanese tokenizer and has the option to use the original spelling for foreign loanwords.</li>
</ul>
<p><a href="https://dair.ai/newsletter/"><em>Subscribe</em></a> <em>🔖 to the NLP Newsletter to receive future issues in your inbox.</em></p>
<p><a href="https://dair.ai/posts/NLP_Newsletter_13_en/">NLP Newsletter 13 [EN]: ACL Highlights, TaBERT, Texthero, ML Methods, Climbing towards NLU,…</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on July 13, 2020.</p>
https://dair.ai/posts/NLP_Newsletter_#12_[FR]2020-07-02T00:00:00+00:002020-07-02T00:00:00+00:00Loïck BOURDOIShttps://dair.ai
<p><img src="https://cdn-images-1.medium.com/max/1200/1*g36Zf0zqinVfWEfocBa0gA.png" alt="" /></p>
<h1 id="avant-propos-delvis">Avant-propos d’Elvis</h1>
<p>Bienvenue au douzième numéro de la lettre d’information consacrée au NLP.</p>
<p><br />
Cela fait environ un mois que nous n’avons pas publié de nouveau numéro de la newsletter. L’interruption est terminée et nous sommes heureux de vous présenter d’autres travaux issus des communautés de l’apprentissage automatique et du traitement du langage naturel qui ont eu lieu au cours des dernières semaines.</p>
<p><br />
Nous avons pris le temps de réfléchir à la manière d’améliorer la newsletter et avons reçu d’excellents commentaires. Nous vous remercions pour votre soutien.</p>
<p><br />
<strong><em>Quelques mises à jour sur la lettre d’information sur le NLP et sur dair.ai:</em></strong></p>
<p><br />
La communauté dair.ai a produit un travail incroyable et a contribué à améliorer la démocratisation de l’éducation, de la recherche et des technologies. Voici ce que nous avons fait au cours des dernières semaines :</p>
<ul>
<li><a href="https://github.com/dair-ai/ml-visuals">ML Visuals</a> est un nouvel effort de collaboration pour aider la communauté d’apprentissage machine à améliorer la communication scientifique en fournissant gratuitement des visuels et des figures professionnelles en rapport avec ML. Vous êtes libre d’utiliser les visuels dans vos présentations ou vos articles de blog.</li>
<li>Notre <a href="https://github.com/dair-ai/ml-nlp-paper-discussions">discussion hebdomadaire</a> tente de réunir des experts et des débutants pour s’informer mutuellement sur les publications parues récemment en NLP et ML. Il n’y a pas d’exigences pour participer, il suffit d’apporter votre volonté d’apprendre et nous serons heureux de vous aider en répondant à vos questions et en nous engageant dans des discussions plus approfondies sur les articles de ML.</li>
<li>Début août, nous lancerons notre premier groupe d’étude. Nous couvrirons l’excellent livre intitulé <a href="https://d2l.ai/index.html">“Dive into Deep Learning”</a> d’Aston Zhang, Zack C. Lipton, Mu Li et Alex J. Smola. Pour en savoir plus sur ce programme, consultez notre <a href="https://www.meetup.com/dair-ai/events/271394829/">page Meetup</a>. Il n’y a pas de conditions préalables mais nous vous fournirons de nombreux documents à lire pour être mieux préparé aux leçons.</li>
<li>Vous pouvez consulter nos récentes conférences sur cette <a href="https://www.youtube.com/channel/UCyna_OxOWL7IEuOwb7WhmxQ?view_as=subscriber">chaîne YouTube</a>. Cet effort vise à mieux faire connaître le travail des ingénieurs et chercheurs en NLP. Si vous souhaitez donner une conférence, veuillez consulter cet <a href="https://github.com/dair-ai/dair-ai.github.io/wiki/Call-for-Talks">appel à conférence</a>.</li>
</ul>
<h1 id="publications-">Publications 📙</h1>
<p><strong><em>Language Models are Few-Shot Learners</em></strong></p>
<p><br />
Jusqu’à présent, nous avons constaté le succès des modèles Transformers pour toute une série de tâches de NLP. Récemment, <a href="https://arxiv.org/abs/2005.14165">Brown et al. (2020)</a> ont proposé le GPT-3, un modèle de langage autorégressif qui s’appuie sur le GPT-2, avec une taille de 175 milliards de paramètres. C’est le plus grand modèle de LM jamais formé et il vise à répondre à la question de savoir si l’augmentation de la taille du LM (en termes de taille) améliore les performances de nombreuses tâches de NLP. En outre, la question plus importante est de savoir si le modèle de LM étendu peut effectuer du <em>few-shot learning</em> pour ces tâches et comment cela se compare à d’autres paradigmes d’apprentissage comme le fine-tuning, le one-shot learning, et le zero-shot learning. Il est intéressant de noter que le modèle fonctionne très bien pour une variété de tâches, mais qu’il est moins performant pour les tâches qui exigent un certain niveau de raisonnement de bon sens. L’avantage du grand modèle de LM semble être qu’il ne nécessite pas de fine-tuning (dans une variété de cas), ce qui signifie qu’à un certain point, il pourrait être possible d’étendre facilement l’apprentissage à des tâches en aval encore plus complexes et nouvelles sans avoir besoin de collecter des ensembles de données supervisées.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*QuDYXo8McJoYwCdlosVhWw.png" alt="" /></p>
<p><em>Source:</em> <a href="https://arxiv.org/abs/2005.14165"><em>Brown et al. (2020)</em></a></p>
<p><br />
<strong><em>Générer des notes SOAP à partir de conversations médecin-patient</em></strong></p>
<p><br />
La documentation électronique des dossiers de santé implique un processus rigoureux et long, généralement préparé manuellement par les médecins qui consacrent de longues heures à cette tâche. C’est pourquoi <a href="https://arxiv.org/abs/2005.01795">Khrisna et al. (2020)</a> proposent une approche pour aider à automatiser la génération de la documentation sous forme de notes SOAP. Les auteurs expérimentent différentes approches de ML en tirant parti des conversations qui ont lieu entre les médecins et les patients lors d’une visite. Leur approche combine des modules d’extraction et d’abstraction entraînés aux conversations cliniques et obtient des scores ROUGES élevés pour la tâche de rédaction des notes SOAP.</p>
<p><br />
<strong><em>BLEURT : Apprendre des mesures robustes pour la génération de textes</em></strong></p>
<p><br />
Il est bien connu dans le domaine du NLP que certaines mesures d’évaluation (par exemple, BLEU et ROUGE) ne sont pas les plus fiables. <a href="https://arxiv.org/abs/2004.04696">Sellam et al. (2020)</a> proposent une mesure d’évaluation appelée BLEURT qui peut mieux modéliser les jugements humains. La métrique de génération de texte est basée sur BERT et vise à satisfaire l’expressivité et la robustesse par un pré entraînement sur de grandes quantités de données synthétiques. Par rapport à d’autres mesures (par exemple, Meteor et BLEU) utilisant les modèles BERT vanilla, BLEURT tend à mieux modéliser l’évaluation humaine et donc à être plus performant en termes de précision.</p>
<p><br />
Si vous souhaitez une mise à jour des différents paramètres d’évaluation utilisés en NLP, cette <a href="https://arxiv.org/abs/2006.14799">récente enquête</a> fournit une discussion approfondie de l’évaluation en NLP.</p>
<p><br />
<strong><em>Raisonnement différent pour le texte</em></strong></p>
<p><br />
Les moteurs de recherche actuels permettent généralement d’utiliser une requête pour obtenir des pages ou des informations pertinentes. Cependant, ils ne sont pas aussi performants lorsqu’il s’agit d’obtenir des réponses à des requêtes qui impliquent plusieurs documents pour arriver à une réponse, comme c’est le cas avec les réponses à des questions à sauts multiples. Les méthodes actuelles utilisent soit une extraction + lecture (soutenue par un DNN), soit une base de connaissances pour effectuer une certaine forme d’extraction afin d’aider à traiter cette tâche particulière et à trouver des réponses raisonnables à ces questions. Cette dernière méthode fonctionne bien jusqu’à ce que l’information soit mise à l’échelle et que la traversée sur le graphique des connaissances devienne impossible. Il est important de parcourir le graphique efficacement pour aboutir à une réponse. Bhuwan Dhingra propose un <a href="https://blog.ml.cmu.edu/2020/05/15/differentiable-reasoning-over-text/">système de bout en bout</a> dans lequel l’opération de traversée est différenciée et peut être entraînée efficacement, même sur de grands corpus comme Wikipédia. Grâce à cette approche, la méthode est capable de raisonner sur un texte et de répondre à des questions, même celles qui nécessitent plusieurs sauts. L’auteur fournit également une démo qui présente le système utilisé pour répondre à des questions à sauts multiples.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*fxVzcy6AV8V8fVp2.png" alt="" /></p>
<p><em>Source:</em> <a href="https://blog.ml.cmu.edu/2020/05/15/differentiable-reasoning-over-text/"><em>CMU Blog</em></a></p>
<p><br />
<strong><em>DE⫶TR : Détection d’objets de bout en bout avec des Transformers</em></strong></p>
<p><br />
<a href="https://ai.facebook.com/research/publications/end-to-end-object-detection-with-transformers">Carion et al. (2020)</a> proposent un nouvel algorithme de détection d’objets qui exploite l’architecture de l’encodeur-décodeur du Transformer pour la détection d’objets. Le DETR, comme on appelle le modèle, est un système non-autoregressif de bout en bout qui fait des prédictions en parallèle, ce qui permet au modèle d’être rapide et efficace. La nouveauté réside dans l’utilisation directe d’un bloc Transformer pour effectuer la tâche de détection d’objet, qui est présentée comme un problème d’image à image. Cette tâche est chaînée avec un CNN qui extrait les informations locales des images. Cela signifie que le bloc Transformer est chargé de raisonner sur l’image dans son ensemble et de produire en parallèle l’ensemble final de prédictions. L’idée générale est de permettre le raisonnement des relations entre les objets et le contexte global de l’image, ce qui est utile pour la prédiction des objets dans une image.
<br />
<img src="https://cdn-images-1.medium.com/max/800/1*QjhCl88V3GzuXZWl-SGHHg.png" alt="" /></p>
<p><em>DETR:</em> <a href="https://ai.facebook.com/research/publications/end-to-end-object-detection-with-transformers"><em>source</em></a></p>
<p><br />
<strong><em>Publications de type enquête</em></strong></p>
<p><br />
Si vous commencez à faire du NLP basé sur un apprentissage approfondi, la plupart des gens vous recommandent de commencer par apprendre à écrire le code pour les tâches de classification ou les pipelines de collecte de données. Ces enquêtes peuvent vous aider à développer votre intuition pour ces taches :</p>
<ul>
<li><a href="https://arxiv.org/abs/2004.03705">Deep Learning Based Text Classification: A Comprehensive Review</a></li>
<li><a href="https://dl.acm.org/doi/pdf/10.1145/3347145?download=true">Contextual Word Representations: Putting Words into Computers</a></li>
<li><a href="https://arxiv.org/abs/2003.01200">Natural Language Processing Advancements By Deep Learning: A Survey</a></li>
<li><a href="https://arxiv.org/abs/1811.03402">A Survey on Data Collection for Machine Learning: a Big Data — AI Integration Perspective</a></li>
</ul>
<p><br />
<strong><em>Découverte de modèles symboliques issus d’un apprentissage approfondi avec des biais inductifs</em></strong></p>
<p><br />
<a href="https://arxiv.org/abs/2006.11287">Cranmer et al. (2020)</a> ont développé une approche de réseau neuronal graphique (GNN) pour apprendre des représentations de faible dimension qui sont ensuite exploitées pour découvrir et extraire des relations physiques par régression symbolique. En tirant parti des forts biais inductifs des GNN, le cadre proposé peut être appliqué sur des données à grande échelle et entraîné à adapter les expressions symboliques aux fonctions internes apprises par le modèle. En utilisant les GNN, les auteurs ont pu entraîner le modèle à apprendre des représentations interprétables et à améliorer la généralisation de celui-ci. Les cas d’utilisation abordés avec la méthodologie comprennent la redécouverte des lois de la force, la redécouverte des Hamiltoniens, et l’application à un défi astrophysique du monde réel (<em>prévision de la quantité de matière en excès pour un halo de matière noire.</em>).</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*gS5npj4Y1CjGp64VXSMy4A.png" alt="" /></p>
<p><em>Figure par</em> <a href="https://arxiv.org/abs/2006.11287"><em>Cranmer et al. (2020)</em></a></p>
<h1 id="outils-et-jeux-de-données-️">Outils et jeux de données ⚙️</h1>
<p><strong><em>NLP de HuggingFace</em></strong></p>
<p><br />
HuggingFace a publié une librairie Python appelée <a href="https://github.com/huggingface/nlp">nlp</a> qui vous permet de partager et de charger facilement des données/métriques avec un accès à plus de 100 ensembles de données NLP. Parmi les avantages de cette librairie, on peut citer l’interopérabilité avec d’autres bibliothèques de ML, la rapidité d’exécution, l’utilisation efficace de la mémoire, la mise en cache intelligente, et bien d’autres choses encore. La librairie est accompagnée d’un <a href="https://huggingface.co/nlp/viewer/?dataset=glue&config=cola">site web</a> pour explorer les ensembles de données.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*jTnEcrpdG2h4Yjj7WZ9zwQ.png" alt="" /></p>
<p><br />
<strong><em>Hateful Memes Challenge</em></strong></p>
<p><br />
Le Hateful Memes Challenge est un concours visant à aider à construire des systèmes multimodaux plus efficaces pour les discours de haine. Dans le cadre de ce défi, un ensemble de données à grande échelle appelé <a href="https://www.drivendata.org/competitions/64/hateful-memes/">Hateful Memes</a> est fourni. Il combine texte et images, ce qui en fait une tâche difficile. L’ensemble de données a été créé par Facebook AI et hébergé par DrivenData. La cagnotte est de 100 000 dollars, sans compter que la compétition fait également partie du parcours du concours NeurIPS. Vous pourrez également trouver le <a href="https://github.com/facebookresearch/mmf/tree/master/projects/hateful_memes">code de démarrage</a> pour vous familiariser avec la tâche.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*Gww3vx0kiT33gCxjKyk43w.png" alt="" /></p>
<p><br />
<strong><em>TextAttack</em></strong></p>
<p><br />
<a href="https://github.com/QData/TextAttack">TextAttack</a> est une librairie Python permettant de développer différentes attaques adverses en NLP et d’examiner les résultats des modèles, d’accroître la généralisation des modèles par l’augmentation de données et d’entraîner facilement les modèles de NLP à l’aide de commandes de base.</p>
<p><br />
<strong><em>GameGAN</em></strong></p>
<p><br />
NVIDIA a entraîné un nouveau modèle d’IA appelé <a href="https://blogs.nvidia.com/blog/2020/05/22/gamegan-research-pacman-anniversary/">GameGAN</a> qui prend en entrée 50000 épisodes du populaire PAC-MAN et apprend les règles de l’environnement en regardant le scénario impliquant un agent se déplaçant dans le jeu. NVIDIA affirme qu’il s’agit du premier modèle de réseau neuronal capable d’imiter un ingénieur de jeu informatique en utilisant des GAN. Cette capacité peut être utilisée par les développeurs de jeux pour automatiser la génération de mises en page pour différents niveaux de jeu ou même construire des systèmes de simulation plus sophistiqués.</p>
<p><br />
<strong><em>Question de compréhension : COVID-Q : Plus de 1 600 questions sur la COVID-19</em></strong></p>
<p><br />
Nous avons récemment assisté à une explosion des applications NLP utilisées pour mieux comprendre les ensembles de données liés à la COVID-19. Récemment, une équipe de chercheurs a créé un ensemble de données comprenant environ 1 600 questions liées à la COVID, annotées par catégorie et type de question. Voici quelques liens utiles si vous souhaitez en savoir plus sur le projet : <a href="https://github.com/JerryWei03/COVID-Q">ensemble de données sur GitHub</a>, <a href="https://arxiv.org/abs/2005.12522">article</a>, et <a href="https://towardsdatascience.com/what-are-people-asking-about-covid-19-a-new-question-classification-dataset-adcaeaddcce4">article de blog</a>. Si vous souhaitez savoir comment créer un tel ensemble de données, l’un des auteurs présentera son expérience dans le cadre de l’une de nos <a href="https://www.meetup.com/dair-ai/events/271420297/">rencontres en ligne</a>.</p>
<h1 id="articles-et-blog-️">Articles et Blog ✍️</h1>
<p><strong><em>Recettes pour construire un chatbot à domaine ouvert</em></strong></p>
<p><br />
Constanza Fierro a récemment <a href="https://medium.com/dair-ai/recipes-for-building-an-open-domain-chatbot-488e98f658a7">publié</a> un article sur les marches à suivre afin de créer un chatbot à domaine ouvert. L’article vise à résumer un agent conversationnel proposé par l’IA de Facebook, BlenderBot, qui améliore la conversation grâce à un fine-tuning sur des ensembles de données qui mettent l’accent sur la personnalité, l’empathie et la connaissance. Une des nouveautés de ce travail est la capacité d’entraîner les modèles à générer et à avoir un dialogue plus humain, même avec des modèles plus petits.</p>
<p><br />
<strong><em>Apprentissage machine sur les graphiques : Un modèle et une taxonomie complète</em></strong></p>
<p><br />
Cette <a href="https://arxiv.org/abs/2005.03675">étude</a> fournit une taxonomie complète des approches qui visent à apprendre les représentations graphiques. Les auteurs présentent un nouveau cadre pour unifier les différents paradigmes qui existent pour les méthodes d’apprentissage des représentations graphiques. Cette unification est importante pour mieux comprendre l’intuition derrière les méthodes et pour faire progresser ce domaine de recherche.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*sh_MUYhT1P1t-Vo44rzjpw.png" alt="" /></p>
<p><em>Source:</em> <a href="https://arxiv.org/abs/2005.03675"><em>https://arxiv.org/abs/2005.03675</em></a></p>
<p><br />
<strong><em>Zero-Shot Learning en NLP</em></strong></p>
<p><br />
L’un des objectifs ambitieux des chercheurs en ML est de créer des systèmes d’IA capables d’effectuer du <em>zero-shot learning</em>, ce qui, dans le contexte du NLP, signifie simplement concevoir et entraîner un modèle à effectuer une tâche pour laquelle il n’a pas été explicitement entrainé. En d’autres termes, vous pouvez effectuer de nouvelles tâches de NLP sans aucun fine-tuning, comme le GPT-2 l’a fait pour la traduction automatique. Si vous souhaitez en savoir plus sur les approches sur ce sujet, Joe Davidson a écrit un <a href="https://joeddav.github.io/blog/2020/05/29/ZSL.html">article de blog détaillé</a> sur le sujet qui comprend même une démo et un Colab.
Un autre <a href="https://amitness.com/2020/05/zero-shot-text-classification/">guide illustré</a> à consulter est celui d’Amit Chaudhary qui explique comment le zero-shot learning est utilisé pour la classification de textes.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*i0L_fbFMBBNF7QFU.png" alt="" /></p>
<p><em>Source:</em> <a href="https://amitness.com/2020/05/zero-shot-text-classification/"><em>Amit Chaudhary</em></a></p>
<p><br />
<strong><em>Recherche en IA, reproductibilité et incitations</em></strong></p>
<p><br />
Dans un récent <a href="https://dennybritz.com/blog/ai-replication-incentives/">article de blog</a>, Denny Britz aborde les questions de la reproductibilité de l’apprentissage approfondi ainsi que les systèmes d’incitation universitaires et notamment la manière dont ceux-ci sont à l’origine de certaines tendances de la recherche dans la communauté du NLP. Parmi les sujets abordés figurent les différences entre reproduction et réplication, le budget de calcul, les protocoles d’évaluation, l’incompréhension de l’open source et les incitations du haut vers le bas et du bas vers le haut. C’est un article intéressant car il aborde des sujets tels que le budget de calcul et la reproductibilité, qui font généralement défaut dans les rapports scientifiques. Denny aborde également l’idée du billet de loterie gagnant qui déclare que le fait de trouver une variante du modèle qui fonctionne pour vos expériences n’implique pas qu’elle se généralisera à des données sur différentes distributions de données. En fait, dans la majorité des cas, les billets perdants ou le reste des variations échouées ne sont pas signalés et ce que vous obtenez est généralement un papier poli. Alors, comment pouvons-nous reproduire le chemin complet vers la conclusion ?</p>
<h1 id="education-">Education 🎓</h1>
<p><strong><em>Fun Python</em></strong></p>
<p><br />
Rada Mihalcea <a href="https://web.eecs.umich.edu/~mihalcea/urls/FunPython.pdf">a dévoilé</a> une série complète de notebooks Python pour se familiariser avec Python. Le matériel couvre les concepts de base en Python et a été conçu pour les élèves de 10 à 12 ans.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*6RD-wwbui3D8ZQOUV6nr5g.jpeg" alt="" /></p>
<p><br />
<strong><em>Deep Mind x UCL : Série de conférences sur l’apprentissage profond</em></strong></p>
<p><br />
DeepMind a publié une série de <a href="https://www.youtube.com/playlist?list=PLqYmG7hTraZCDxZ44o4p3N5Anz3lLRVZF">conférences vidéo gratuites</a> couvrant des sujets relatifs à l’apprentissage machine qui vont des modèles avancés pour la vision par ordinateur aux réseaux adversaires générateurs en passant par l’apprentissage de la représentation non supervisée.</p>
<p><br />
<strong><em>Exemples de codes Keras</em></strong></p>
<p><br />
Au cours des derniers mois, la communauté a ajouté plusieurs <a href="https://keras.io/examples/">exemples de code</a> sur le site web de Keras. Les exemples vont des modèles de NLP aux algorithmes de vision par ordinateur en passant par les architectures d’apprentissage génératif.</p>
<p><br />
<strong><em>Applied Machine Learning 2020</em></strong></p>
<p><br />
Andreas Muller publie <a href="https://www.youtube.com/watch?v=d79mzijMAw0&list=PL_pVmAaAnxIRnSw6wiCpSvshFyCREZmlM">des vidéos</a> de son cours, Applied Machine Learning 2020. Celui-ci comprend des sujets comme l’introduction aux réseaux de neurones, les séries chronologiques et les prévisions, le clustering, etc.</p>
<p><br />
<strong><em>Deep Learning Drizzle</em></strong></p>
<p><br />
Au cas où vous auriez du mal à trouver des cours de NLP ou de ML, ce <a href="https://deep-learning-drizzle.github.io/">site web</a> possède l’une des bases de données de cours en ligne les plus complètes. La plupart d’entre eux sont disponibles sous forme de conférences video.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*yV716lh60cVP0oHzKbPMXA.png" alt="" /></p>
<p><em>Source: Deep Learning Drizzle</em></p>
<p><br />
<strong><em>CMU Neural Nets for NLP 2020</em></strong></p>
<p><br />
Graham Neubig publie toutes les <a href="https://www.youtube.com/playlist?list=PL8PYTP1V4I8CJ7nMxMC8aXv8WqKYwj-aJ">conférences vidéo</a> du cours intitulé “Neural Networks for NLP” (édition 2020). Le cours couvre des sujets tels que les CNN pour le texte, les astuces d’efficacité pour le NLP, l’attention, le multitâche et l’apprentissage multilingue. Il contient également des notebooks d’accompagnement avec des mises en œuvre de certains concepts abordés dans le cours.</p>
<p><br />
<strong><em>PyTorch Recipes</em></strong></p>
<p><br />
<a href="https://pytorch.org/tutorials/recipes/recipes_index.html">PyTorch Recipes</a> est une collection de tutoriels PyTorch qui vise à enseigner aux utilisateurs les caractéristiques spécifiques de PyTorch. Ils sont destinés à être facilement applicables et sont différents des longs tutoriels qui sont également disponibles sur le site web.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*jcDapaNSSNuNx8f314WJKg.png" alt="" /></p>
<h1 id="rester-informé-">Rester informé 🎯</h1>
<p>Ces dernières années, nous avons assisté à une explosion de projets et de documents sur le ML. Il est devenu difficile de suivre ce qui se passe et les tendances en matière de ML. Nous avons également vu des efforts incroyables de la part de la communauté pour aider à distiller cette information. À partir de ce numéro de la newsletter, nous allons inclure une section présentant certaines des ressources qui devraient aider les lecteurs à suivre et à rester informés sur les questions intéressantes et urgentes en matière de ML. Voici la liste de ce numéro :</p>
<ul>
<li><a href="https://www.underratedml.com/"><strong>Underrated ML Podcast</strong></a> <a href="https://www.underratedml.com/"></a> : un podcast qui présente des idées sous-estimées sur le ML</li>
<li><a href="https://paperswithcode.com/"><strong>Papers with Code</strong></a> <a href="https://paperswithcode.com/"></a> : site web qui permet d’améliorer l’accessibilité des dernières publications via l’explication de codes.</li>
<li><a href="https://madewithml.com/"><strong>Made with ML</strong></a> : plateforme communautaire qui permet de se tenir au courant des derniers projets de ML.</li>
<li><a href="https://github.com/dair-ai/ml-nlp-paper-discussions"><strong>ML Paper Discussions</strong></a> : discussion hebdomadaire sur les derniers articles de ML et de NLP</li>
<li><a href="https://www.youtube.com/c/yannickilcher"><strong>Yannic Kilcher</strong></a> : chaîne YouTube fournissant d’excellentes explications d’articles.</li>
</ul>
<h1 id="mentions-spéciales-️">Mentions spéciales ⭐️</h1>
<ul>
<li>Deeplearning.ai <a href="https://www.coursera.org/specializations/natural-language-processing">publie</a> les deux premiers cours consacrés au PNL. Les cours 3 et 4 seront bientôt publiés.</li>
<li>Cet <a href="https://www.dropbox.com/s/ec3y4khbk38e29i/NeuralNetworksEN.pdf?dl=0">article</a> présente un aperçu mathématique des réseaux neuronaux discriminants et des réseaux neuronaux générateurs. Il a été écrit par Gabriel Peyré.</li>
<li><a href="https://arxiv.org/abs/2004.10934">YOLOv4</a> est la dernière mise à jour du populaire algorithme de détection d’objets qui vise à fournir un algorithme plus rapide pour localiser et classifier les objets.</li>
<li>Suraj Patil propose ce <a href="https://colab.research.google.com/drive/176NSaYjc2eeI-78oLH_F9-YV3po3qQQO?usp=sharing">tutoriel</a> sur la façon d’affiner le modèle T5 à l’aide de la librairie Transformers.</li>
<li><a href="http://research.baidu.com/Blog/index-view?id=134">VidPress</a> est l’un des derniers outils construits par Baidu pour créer des vidéos directement à partir d’articles textuels.</li>
<li>Enfin quelques <a href="https://github.com/kushalj001/pytorch-question-answering">implémentations</a> pour la tâche de questions/réponses utilisant PyTorch. C’est un travail réalisé par <a href="https://twitter.com/kushalj001">Kushal</a>.</li>
</ul>
<hr />
<p>Vous pouvez retrouver la précédente newsletter <a href="https://dair.ai/NLP_Newsletter_-11_-FR/">ici</a></p>
<p><br />
Si vous avez des jeux de données, des projets, des articles de blog, des tutoriels ou des documents que vous souhaitez partager dans la prochaine édition de la newletter, vous pouvez utiliser ce <a href="https://forms.gle/3b7Q2w2bzsXE6uYo9">formulaire</a>.</p>
<p><br />
<a href="https://dair.ai/newsletter/">Abonnez-vous</a> pour recevoir les prochains numéros dans votre boîte mail.</p>
<p><a href="https://dair.ai/posts/NLP_Newsletter_-12_-FR/">NLP Newsletter #12 [FR]: Hateful Memes, TextAttack, DETR, BLEURT, GameGAN, Survey Papers,…</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on July 02, 2020.</p>
https://dair.ai/posts/NLP_Newsletter_12_en2020-06-30T00:00:00+00:002020-06-30T00:00:00+00:00Elvis Saraviahttps://dair.aiellfae@gmail.com
<p><img src="https://cdn-images-1.medium.com/max/1200/1*g36Zf0zqinVfWEfocBa0gA.png" alt="" /></p>
<p><br />
Hello everyone! Welcome to the 12th issue of the NLP Newsletter. In this issue, we cover topics that range from progress in language modeling to Transformer-based object detection to how to stay informed with ML.</p>
<p><br />
It has been a month or so since we last published an issue of the NLP Newsletter. The hiatus is over and we are happy to bring back more of the interesting and creative works that have been coming out of the machine learning and natural language processing communities in the past few weeks.</p>
<p><br />
We have taken the time to think about how to improve the newsletter. We have received excellent feedback and we thank you for all the support.</p>
<h1 id="dairai-updates">dair.ai updates</h1>
<p>The dair.ai community has been producing incredible work and helping towards improving the democratization of education, research, and technologies. Here is what we have been up to the last few weeks:</p>
<ul>
<li><a href="https://github.com/dair-ai/ml-visuals">ML Visuals</a> is a new collaborative effort to help the machine learning community in improving science communication by providing free professional and compelling ML-related visuals and figures. You are free to use the visuals in your presentations or blog posts.</li>
<li>Our <a href="https://github.com/dair-ai/ml-nlp-paper-discussions">weekly paper discussion</a> attempts to bring together experts and beginners to help educate each other about recent NLP and ML papers. There are no requirements to join, just bring your willingness to learn and we will be happy to help along the way by answering questions and engaging in deeper discussions about ML papers.</li>
<li>At the beginning of August, we are launching our first study group. We will be covering the excellent book called <a href="https://d2l.ai/index.html">“Dive into Deep Learning”</a> by Aston Zhang, Zack C. Lipton, Mu Li, and Alex J. Smola. Learn more about this program on our <a href="https://www.meetup.com/dair-ai/events/271394829/">Meetup page</a>. There are no prerequisites but we will provide plenty of reading material to be better prepared for the lessons.</li>
<li>Take a look at our recent talks on this <a href="https://www.youtube.com/channel/UCyna_OxOWL7IEuOwb7WhmxQ?view_as=subscriber">YouTube channel</a>. This effort aims to increase more awareness of the work of emerging NLP engineers and researchers. If you would like to give a talk please take a look at this <a href="https://github.com/dair-ai/dair-ai.github.io/wiki/Call-for-Talks">Call for Talks</a>.</li>
</ul>
<h1 id="research-and-publications-">Research and Publications 📙</h1>
<p><strong><em>Language Models are Few-Shot Learners</em></strong></p>
<p><br />
So far we have witnessed the success of Transformers models for a range of NLP tasks. Recently, <a href="https://arxiv.org/abs/2005.14165">Brown et al. (2020)</a> proposed GPT-3, an autoregressive language model that builds on GPT-2, with a size of 175 billion parameters. This is the biggest LM model ever trained and aims to answer the question of whether scaling up the LM (in terms of size) improves the performance on many NLP tasks. In addition, the bigger question is whether the scaled-up LM can perform <em>few-shot learning</em> on these tasks and how that compares with other learning paradigms like fine-tuning, one-shot learning, and zero-shot learning. Interestingly enough, the model does very well on a variety of tasks but underperforms when dealing with tasks requiring some level of common-sense reasoning. The benefit of the large LM model seems to be that it doesn’t require fine-tuning (in a variety of cases) which means that some point it could be possible to easily expand to even more complex and novel downstream tasks without the need to collect supervised datasets.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*QuDYXo8McJoYwCdlosVhWw.png" alt="" /></p>
<p><em>Source:</em> <a href="https://arxiv.org/abs/2005.14165"><em>Brown et al. (2020)</em></a></p>
<p><br />
<strong><em>Generating SOAP Notes from Doctor-Patient Conversations</em></strong></p>
<p><br />
Electronic health record (EHR) documentation involves a rigorous and long process typically prepared manually by physicians. This could potentially lead to stress and burnout in physicians as they need to spend long hours on this task. Motivated by this, <a href="https://arxiv.org/abs/2005.01795">Khrisna et al. (2020)</a> propose an approach to help automate the generation of documentation in the form of SOAP notes. The authors experiment with different ML approaches leveraging conversations that happen between physicians and patients during a visit. Their approach combines extractive and abstraction modules trained in clinical conversations and achieve high ROUGE scores on the task of drafting SOAP notes.</p>
<p><br />
<strong><em>BLEURT: Learning Robust Metrics for Text Generation</em></strong></p>
<p><br />
It is well known in the field of NLP that certain evaluation metrics (e.g., BLEU and ROUGE) are not the most reliable due to the poor correlation with human-based judgments, thus there have been more efforts recently to improve these metrics. <a href="https://arxiv.org/abs/2004.04696">Sellam et al. (2020)</a> propose a learned evaluation metric called BLEURT that can better model human judgments. The text generation metric is based on BERT and aims to satisfy expressivity and robustness through pretraining on large amounts of synthetic data. When compared to other metrics (e.g., Meteor and BLEU) using the vanilla BERT models, BLEURT tends to better model human assessment and thus perform better in terms of accuracy.</p>
<p><br />
If you want an update of the different evaluation metrics used in NLP, this <a href="https://arxiv.org/abs/2006.14799">recent survey</a> provides an in-depth discussion of evaluation in NLP.</p>
<p><br />
<strong><em>Differentiable Reasoning over Text</em></strong></p>
<p><br />
Current search engines typically allow the use of a query to obtain relevant pages or information. However, they don’t do so well when retrieving answers to queries that involve multiple documents to arrive at an answer as is the case with multi-hop question answering. Current methods use either a retrieve + read (supported by a DNN) or a knowledge-base to perform some form of extraction to help address that particular task and find reasonable answers to those questions. The latter works well until the information scales up and the traversal on the knowledge graph becomes infeasible. Traversing the graph efficiently is important to lead to an answer. Bhuwan Dhingra proposes an <a href="https://blog.ml.cmu.edu/2020/05/15/differentiable-reasoning-over-text/">end-to-end system</a> where the traversal operation is made differentiable and can be trained efficiently and effectively even on large corpus such as the entire Wikipedia dump. Using this approach the method is able to reason about text and answer questions, even those that require multiple hops. The author also provides a demo that showcases the system being used for multi-hop question answering.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*fxVzcy6AV8V8fVp2.png" alt="" /></p>
<p><em>Source:</em> <a href="https://blog.ml.cmu.edu/2020/05/15/differentiable-reasoning-over-text/"><em>CMU Blog</em></a></p>
<p><br />
<strong><em>DE⫶TR: End-to-End Object Detection with Transformers</em></strong></p>
<p><br />
<a href="https://ai.facebook.com/research/publications/end-to-end-object-detection-with-transformers">Carion et al. (2020)</a> propose a novel object detection algorithm that leverages the Transformer encoder-decoder architecture for object detection. DETR, as the model is called, is a non-autoregressive end-to-end system that makes predictions in parallel which allows the model to be fast and efficient. The novelty is in the direct use of a Transformer block to perform the object detection task which is framed as an image-to-set problem. This is chained with a CNN component that extracts the local information from images. This means that the Transformer component is in charge of reasoning about the image as a whole and output the final set of predictions in parallel. Overall the idea is to allow the reasoning of the relations between objects and the global image context which is useful for the prediction of objects in an image.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*QjhCl88V3GzuXZWl-SGHHg.png" alt="" /></p>
<p><em>DETR high-level architecture —</em> <a href="https://ai.facebook.com/research/publications/end-to-end-object-detection-with-transformers"><em>source</em></a></p>
<p><br />
<strong><em>Survey Papers</em></strong></p>
<p><br />
If you are getting started with deep learning-based NLP most people recommend you start by learning to write code for classification tasks or data collection pipelines. Those are great but you also need to build intuition on the tasks or processes you are writing code for. These survey papers may help build intuition on deep learning based NLP and data collection:</p>
<ul>
<li><a href="https://arxiv.org/abs/2004.03705">Deep Learning Based Text Classification: A Comprehensive Review</a></li>
<li><a href="https://dl.acm.org/doi/pdf/10.1145/3347145?download=true">Contextual Word Representations: Putting Words into Computers</a></li>
<li><a href="https://arxiv.org/abs/2003.01200">Natural Language Processing Advancements By Deep Learning: A Survey</a></li>
<li><a href="https://arxiv.org/abs/1811.03402">A Survey on Data Collection for Machine Learning: a Big Data — AI Integration Perspective</a></li>
</ul>
<p><br />
<strong><em>Discovering Symbolic Models from Deep Learning with Inductive Biases</em></strong></p>
<p><br />
<a href="https://arxiv.org/abs/2006.11287">Cranmer et al. (2020)</a> developed a Graph Neural Network (GNN) approach to learning low-dimensionality representations that are then operated on to discover and extract physical relations through symbolic regression. By leveraging the strong inductive biases of GNNs, the proposed framework can be applied on large-scale data and trained to fit symbolic expressions to the internal functions learned by the model. By using GNs, the authors were able to train the model to learn interpretable representations and improve the generalization of it. The use cases addressed with the methodology include rediscovering of force laws, rediscovering Hamiltonians, and the application to a real-world astrophysical challenge (<em>predicting the excess amount of matter for a dark matter halo.</em>).</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*gS5npj4Y1CjGp64VXSMy4A.png" alt="" /></p>
<p><em>Figure by</em> <a href="https://arxiv.org/abs/2006.11287"><em>Cranmer et al. (2020)</em></a></p>
<h1 id="tools-and-datasets-️">Tools and Datasets ⚙️</h1>
<p><strong><em>NLP datasets by HuggingFace</em></strong></p>
<p><br />
HuggingFace releases a Python library called <a href="https://github.com/huggingface/nlp">nlp</a> which allows you to easily share and load data/metrics with access to ~100 NLP datasets. Some benefits of the library include interoperability with other ML libraries, fast execution, efficient memory usage, smart caching, and much more. Accompanying the library, they also provide a <a href="https://huggingface.co/nlp/viewer/?dataset=glue&config=cola">website</a> for exploring datasets.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*jTnEcrpdG2h4Yjj7WZ9zwQ.png" alt="" /></p>
<p><br />
<strong><em>Hateful Memes Challenge</em></strong></p>
<p><br />
The Hateful Memes Challenge is a competition to help build more effective multimodal systems for hate speech. As part of the challenge, a large-scale dataset called <a href="https://www.drivendata.org/competitions/64/hateful-memes/">Hateful Memes</a> is provided that combines text and images, making it a challenging task. The dataset was created by Facebook AI and hosted by DrivenData. There is a $100000 total prize pool not to mention that the competition if part of the NeurIPS competition track as well. You will also be able to find <a href="https://github.com/facebookresearch/mmf/tree/master/projects/hateful_memes">starter code</a> to get familiar with the task.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*Gww3vx0kiT33gCxjKyk43w.png" alt="" /></p>
<p><br />
<strong><em>TextAttack</em></strong></p>
<p><br />
<a href="https://github.com/QData/TextAttack">TextAttack</a> is a new Python framework for developing different NLP adversarial attacks and examining model outputs, increasing model generalization via data augmentation, and easily training NLP models using basic commands.</p>
<p><br />
<strong><em>GameGAN</em></strong></p>
<p><br />
NVIDIA trained a new AI model called <a href="https://blogs.nvidia.com/blog/2020/05/22/gamegan-research-pacman-anniversary/">GameGAN</a> that takes as input 50000 episodes of the popular PAC-MAN and learns the rules of the environment by looking at the screenplay involving an agent moving through the game. NVIDIA claims that this is the first neural network model that has the ability to mimic a computer game engineer by using GANs. This capability can be used by game developers to automate the generation of layouts for different game levels or even build more sophisticated simulator systems.</p>
<p><br />
<strong><em>Question Understanding: COVID-Q: 1,600+ Questions about COVID-19</em></strong></p>
<p><br />
We have recently seen an explosion of NLP applications being used to better understand COVID-19 related datasets. Recently, a team of researchers has created a dataset consisting of ~1,600 COVID related questions annotated by question-category and question-type. Here are some useful links if you would like to know more about the project: <a href="https://github.com/JerryWei03/COVID-Q">dataset on GitHub</a>, <a href="https://arxiv.org/abs/2005.12522">paper</a>, and <a href="https://towardsdatascience.com/what-are-people-asking-about-covid-19-a-new-question-classification-dataset-adcaeaddcce4">blog post</a>. If you are interested in how to create such a dataset, one of the authors will present their experience creating this dataset in one of our online <a href="https://www.meetup.com/dair-ai/events/271420297/">meetups</a>.</p>
<h1 id="articles-and-blog-posts-️">Articles and Blog posts ✍️</h1>
<p><strong><em>Recipes for building an open-domain chatbot</em></strong></p>
<p><br />
Constanza Fierro recently <a href="https://medium.com/dair-ai/recipes-for-building-an-open-domain-chatbot-488e98f658a7">published</a> an article discussing recipes for building an open-domain chatbot. The article aims to summarize a conversational agent proposed by Facebook AI, BlenderBot, which improves conversation through fine-tuning on datasets that focus on personality, empathy, and knowledge. One of the novelties of this work is the ability to train the models to generate and have more human-like dialog even with smaller models.</p>
<p><br />
<strong><em>Machine Learning on Graphs: A Model and Comprehensive Taxonomy</em></strong></p>
<p><br />
This survey <a href="https://arxiv.org/abs/2005.03675">paper</a> provides a comprehensive taxonomy of approaches that aim to learn graph representations. The authors introduce a new framework for unifying the different paradigms that exist for graph representation learning methods. This unification is important for better understanding the intuition behind the methods and further help in progressing this area of research.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*sh_MUYhT1P1t-Vo44rzjpw.png" alt="" /></p>
<p><em>Source:</em> <a href="https://arxiv.org/abs/2005.03675"><em>https://arxiv.org/abs/2005.03675</em></a></p>
<p><br />
<strong><em>Zero-Shot Learning in Modern NLP</em></strong></p>
<p><br />
One of the ambitious goals of ML researchers is to create AI systems that have the ability to perform <em>zero-shot learning</em>, which in the context of NLP means to simply design and train a model to perform a task it wasn’t explicitly trained to do. In other words, you can perform novel NLP tasks without any fine-tuning similar to what GPT-2 achieved on machine translation. If you want to learn more about recent approaches used for zero-shot learning in NLP, Joe Davidson has written an <a href="https://joeddav.github.io/blog/2020/05/29/ZSL.html">extensive blog post</a> about the topic which even includes a demo and a Colab notebook.
Here is another <a href="https://amitness.com/2020/05/zero-shot-text-classification/">illustrated guide</a> by Amit Chaudhary explaining how zero-shot learning is used for text classification.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*i0L_fbFMBBNF7QFU.png" alt="" /></p>
<p><em>Source:</em> <a href="https://amitness.com/2020/05/zero-shot-text-classification/"><em>Amit Chaudhary</em></a></p>
<p><br />
<strong><em>AI Research, Replicability and Incentives</em></strong></p>
<p><br />
In a recent <a href="https://dennybritz.com/blog/ai-replication-incentives/">blog post</a>, Denny Britz discusses the issues of deep learning replicability and academic incentive systems and how these are driving some research trends in the community. Some of the topics discussed are the differences between reproduction and replication, computational budget, evaluation protocols, misunderstanding of open source, and top-down and bottom incentives. It’s an interesting article because it touches on topics such as the computational budget and reproducibility which are typically lacking in scientific reports. Denny also discusses the idea of the winning lottery ticket which states that just because you found a variant of the model that works for your experiments it doesn’t imply that it will generalize to data on different data distribution. In fact, in the majority of cases, the losing tickets or the rest of failed variations are not reported and what you get is typically a polished paper. So how can we replicate the full path to the conclusion?</p>
<h1 id="education-">Education 🎓</h1>
<p><strong><em>Fun Python</em></strong></p>
<p><br />
Rada Mihalcea <a href="https://web.eecs.umich.edu/~mihalcea/urls/FunPython.pdf">releases</a> a comprehensive series of Python notebooks for getting up to speed with Python. The material covers basic concepts in Python and it was designed for students age 10–12.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*6RD-wwbui3D8ZQOUV6nr5g.jpeg" alt="" /></p>
<p><br />
<strong><em>Deep Mind x UCL Deep Learning Lecture Series</em></strong></p>
<p><br />
DeepMind released a series of <a href="https://www.youtube.com/playlist?list=PLqYmG7hTraZCDxZ44o4p3N5Anz3lLRVZF">free video lectures</a> covering topics in machine learning that range from advanced models for computer vision to generative adversarial networks to unsupervised representation learning.</p>
<p><br />
<strong><em>Keras Code Examples</em></strong></p>
<p><br />
Over the past months, the community has been adding several recipes and <a href="https://keras.io/examples/">examples of code</a> on the Keras website. The examples range from NLP models to computer vision algorithms to generative deep learning architectures. These types of resources that are community-driven help out the community to better understand how to train ML models for their own tasks and projects. If you can contribute, please do so, it helps people who are getting started.</p>
<p><br />
<strong><em>Applied Machine Learning 2020</em></strong></p>
<p><br />
Andreas Muller releases <a href="https://www.youtube.com/watch?v=d79mzijMAw0&list=PL_pVmAaAnxIRnSw6wiCpSvshFyCREZmlM">video recordings</a> for his course, Applied Machine Learning 2020. It includes topics like introduction to neural networks, time series and forecasting, topic modeling, clustering, etc.</p>
<p><br />
<strong><em>Deep Learning Drizzle</em></strong></p>
<p><br />
Just in case you have a hard time finding NLP or ML courses, this <a href="https://deep-learning-drizzle.github.io/">neat website</a> has one of the most comprehensive databases of online courses. Most of them are available as video lectures!</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*yV716lh60cVP0oHzKbPMXA.png" alt="" /></p>
<p><em>Source: Deep Learning Drizzle</em></p>
<p><br />
<strong><em>CMU Neural Nets for NLP 2020</em></strong></p>
<p><br />
Graham Neubig releases all <a href="https://www.youtube.com/playlist?list=PL8PYTP1V4I8CJ7nMxMC8aXv8WqKYwj-aJ">video lectures</a> for the course called Neural Networks for NLP (2020 edition). The course covers topics like CNNs for text, efficiency tricks for NLP, attention, multitask and multilingual learning. It also contains accompanying notebooks with implementations of certain concepts covered in the course.</p>
<p><br />
<strong><em>PyTorch Recipes</em></strong></p>
<p><br />
<a href="https://pytorch.org/tutorials/recipes/recipes_index.html">PyTorch Recipes</a> are a collection of bite-sized PyTorch tutorials that aim to teach users about specific PyTorch features. They are meant to be easily consumed and are different from the lengthy tutorials that are also available on the website.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*jcDapaNSSNuNx8f314WJKg.png" alt="" /></p>
<h1 id="stay-informed-">Stay Informed 🎯</h1>
<p>In the last few years, we have witnessed an explosion of ML projects and papers. It has made it difficult to keep track of what’s happening and trending in ML. We have also seen incredible efforts from the community to help distill this fast-paced information. Starting today, we will include a special section in this newsletter showcasing some of the great resources that should help readers to keep track and stay educated on interesting and pressing issues in ML. Here is this week’s list:</p>
<ul>
<li><a href="https://www.underratedml.com/"><strong>Underrated ML Podcast</strong></a> <a href="https://www.underratedml.com/"></a>— a podcast that pitches underrated ML ideas</li>
<li><a href="https://paperswithcode.com/"><strong>Papers with Code</strong></a> <a href="https://paperswithcode.com/"></a>— is a website to keep track of ML results and leaderboards and surface the latest ML papers with code to improve accessibility and accelerating progress.</li>
<li><a href="https://madewithml.com/"><strong>Made with ML</strong></a> — is a community-driven platform to stay up to date with the latest ML projects.</li>
<li><a href="https://github.com/dair-ai/ml-nlp-paper-discussions"><strong>ML Paper Discussions</strong></a> — a weekly discussion on recent ML and NLP papers</li>
<li><a href="https://www.youtube.com/c/yannickilcher"><strong>Yannic Kilcher</strong></a> <em>**</em>— a YouTube channel providing excellent paper explanations</li>
</ul>
<h1 id="noteworthy-mentions-️">Noteworthy Mentions ⭐️</h1>
<ul>
<li>Deeplearning.ai <a href="https://www.coursera.org/specializations/natural-language-processing">releases</a> the first two courses for their new NLP Specialization. Course 3 and 4 will be released soon.</li>
<li>This <a href="https://www.dropbox.com/s/ec3y4khbk38e29i/NeuralNetworksEN.pdf?dl=0">article</a> presents a mathematical overview of discriminative neural networks and generative neural networks. It was written by Gabriel Peyré.</li>
<li><a href="https://arxiv.org/abs/2004.10934">YOLOv4</a> is the latest update of the popular object detection algorithm which aims to provide a faster algorithm to locate and classify objects.</li>
<li>T5 is one of the latest works in NLP that aims to incorporate the lessons and techniques from previous works and establish a unified framework for address text-based tasks. If you are unfamiliar with how to use T5, Suraj Patil provides this <a href="https://colab.research.google.com/drive/176NSaYjc2eeI-78oLH_F9-YV3po3qQQO?usp=sharing">tutorial</a> on how to fine-tune T5 using Transformers.</li>
<li><a href="http://research.baidu.com/Blog/index-view?id=134">VidPress</a> is one of the latest tools built by Baidu to create videos directly from text articles.</li>
<li>Here are some excellent <a href="https://github.com/kushalj001/pytorch-question-answering">paper implementations</a> on the task of question answering using PyTorch. It provides detailed descriptions and code walkthroughs. This is work done by <a href="https://twitter.com/kushalj001">Kushal</a>.</li>
</ul>
<p><a href="https://dair.ai/newsletter/"><em>Subscribe</em></a> <em>🔖 to the NLP Newsletter to receive future issues in your inbox.</em></p>
<p><a href="https://dair.ai/posts/NLP_Newsletter_12_en/">NLP Newsletter #12 [EN]: Hateful Memes, TextAttack, DETR, BLEURT, GameGAN, Survey Papers,…</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on June 30, 2020.</p>
https://dair.ai/posts/Critique-of-Taylor-s-Law-for-Human-Linguistic-Sequences2020-05-30T00:00:00+00:002020-05-30T00:00:00+00:00Vaibhav Jadehttps://dair.ai
<p>Summary and critique of the paper <a href="https://www.aclweb.org/anthology/P18-1105.pdf">‘Taylor’s Law for Human Linguistic Sequences’</a></p>
<h3 id="scope-of-the-paper">Scope of the paper</h3>
<p>The paper discusses the statistical significance of Taylor’s power law on the natural text. Taylor’s power law has been reported to hold in many natural and social domains along with derivations from Zipfs’ law and Heaps’ law. So, this paper attempts one such fluctuation analysis with Taylor’s law.</p>
<h3 id="taylors-law">Taylor’s law</h3>
<p>The law states that for any fixed species the fluctuations in the size of a population (characterized by the standard deviation) can be approximately written as constant times the average population to a power α.</p>
<h3 id="meaning-of-taylors-law-in-the-context-of-the-paper">Meaning of Taylor’s law in the context of the paper</h3>
<p>In the context of natural language, the paper considers the frequency of occurrence of a word in a particular word segment. Consider a set of words W and a segment of length N. We calculate the number of occurrences of the words in every segment of a particular document. Using these frequencies over the segments, we calculate mean and variance for each word in W. Now, Taylor’s power law analysis is done for each word. As reported from previous works, the power takes value that lies in the range 0.5 and 1.</p>
<h4 id="significance-of-alpha--05">Significance of alpha = 0.5</h4>
<p>It is proven in the paper that for IID process, alpha comes out to be 0.5. That means, that words are independent, which is completely opposite to the natural languages. That also means the exponent is greater than 0.5 for natural texts.</p>
<h4 id="significance-of-alpha--10">Significance of alpha = 1.0</h4>
<p>For 1, the paper shows that all segments always contain the same proportion of the words. Which would partly be possible for some words, in programming languages, as some words always occur together.</p>
<p><br />
This shows that natural languages reside in between this range. Also, it shows how the co-occurrence of the words as a measure for natural languages is being justified by this method, in accordance with Taylor’s law. This also indicates that Taylor’s exponent is partly related to grammaticality.</p>
<h3 id="estimation-of-taylors-exponent">Estimation of Taylor’s exponent</h3>
<p>For a particular document, we have a mean and variance for each word. To estimate Taylor’s exponent, the paper proposes to first transform the equation by taking a log and then using linear regression using least-squared error to estimate the exponent.</p>
<h3 id="outcomes-of-the-paper">Outcomes of the paper</h3>
<h4 id="analysis-of-the-texts">Analysis of the texts</h4>
<p>The paper considers texts from 14 languages, news articles, enwiki8 data, child utterances data, 4 program language sources, and musical data converted into text. The exponent was found to be around 0.58 for natural texts by using regression, using segment size 5620. The paper also considers the relation of segment size with exponent, which concludes that it’s better to use a higher size.
For enwiki8, child utterances, programming languages, and music, the exponent was found to be higher, showing a higher number of fixed forms in the texts.</p>
<p><br />
The standard deviation versus mean on a log-log graph, showing low deviation from estimated Taylor’s exponent for almost all words in different text settings.</p>
<h4 id="evaluation-of-machine-generated-text">Evaluation of machine generated text</h4>
<p>When experimented on randomly sampled data from ‘Moby Dick’, the exponent was found to be 0.5, As expected from random sampling. The paper also tested text generated on character-based LSTM, which was trained on the complete works of Shakespeare. The exponent was found to be 0.5, the same as an I.I.D. process. This shows that model was unable to capture the word-level correlation. But for the machine translation model, the paper found out the text generated had Taylor’s exponent equivalent to that of the original text.</p>
<h3 id="merits-of-the-paper">Merits of the paper</h3>
<ul>
<li>Proves that Taylor’s power law analysis is relevant in the context of human linguistics.</li>
<li>Taylor’s exponent can be used as continuous quantification based upon lexical fluctuations. I.e., between natural texts and fixed-form texts as programming languages or music.</li>
<li>The variation of Taylor’s exponent was lower than the analysis applied to other contexts, showing strong relevance of such fluctuation analysis for human linguistics.</li>
<li>Can be used as a measure to compare model performance based on underlying language, aside from application performance.</li>
<li>Also showed the limitations of character-based LSTM, which could not capture the underlying natural text.</li>
</ul>
<h3 id="limitations-of-the-paper">Limitations of the paper</h3>
<ul>
<li>The paper shows that exponent is only slightly dependent on the document size, although it varies greatly with segment size. Hence, the document needs to have a large number of words, so we can have segments of large size, and also more number of these segments. So, such analysis can’t be performed for short-length texts like tweets or emails.</li>
<li>The paper uses books and news articles for the data sources. But, there are many more forms of sources that should be considered.</li>
<li>As stated in the paper, Taylor’s exponent could not act as differentiators between languages.</li>
</ul>
<h3 id="possible-extensions-and-applications-of-the-paper">Possible extensions and applications of the paper</h3>
<ul>
<li>While calculating the frequencies, the paper does not do Lemmatization, i.e., the words are considered in their original forms and semantically equivalent words are not clubbed together while considering frequencies. As the analysis is done on multiple languages, this leads to uniformity while comparing the languages. But, fluctuation analysis with lemmatization might provide important insights.</li>
<li>As covered in the limitation section, the analysis could be applied to various forms of natural texts such as emails, scripts, legal documents, screenplays, poems, etc. where the limitation of text size can somewhat be mitigated.</li>
<li>It would also be interesting to see the analysis of the above-mentioned formats of texts in different languages, across different scripts. Doing such analysis on a variety of Indian languages would also be insightful as we have such diversity in languages.</li>
<li>As stated in the paper, other fluctuation analysis can be done, like Zipfs’ law or Heaps’ law.</li>
<li>As discussed in related works of the paper, Taylor’s power law is studied across multiple domains such as ecology, life science, physics, finance, and human dynamics (shown by Eisler, Bartos, and Kertész, 2007). The paper itself is one such analysis applied to human linguistics.
As stated in the definition, for complex systems of interacting elements, the activity (standard deviation) of element increase with average activity (mean). So, we may find Taylor’s power law behavior underlined for these sorts of systems.</li>
</ul>
<p><a href="https://dair.ai/posts/Critique-of-Taylor-s-Law-for-Human-Linguistic-Sequences/">Critique of Taylor's Law for Human Linguistic Sequences</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on May 30, 2020.</p>
https://dair.ai/posts/BART-Summary2020-05-11T00:00:00+00:002020-05-11T00:00:00+00:00Antonio Lopardohttps://dair.aiantonio.lopardo@outlook.com
<script type="text/javascript" async="" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
<p><img src="https://firebasestorage.googleapis.com/v0/b/firescript-577a2.appspot.com/o/imgs%2Fapp%2FAntonioLprd%2FMp23o_Xx8j.jpg?alt=media&token=72dbb3c1-93a5-4b9d-8cec-3e006952568e" alt="" /></p>
<blockquote>
<p>Paper summary: BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , Oct. 2019. (<a href="https://arxiv.org/abs/1910.13461">link</a>)</p>
</blockquote>
<h2 id="why-is-this-important"><strong>Why is this important?</strong></h2>
<p>In this paper, Lewis et al. present valuable comparative work on different pre-training techniques and show how this kind of work can be used to guide large pre-training experiments reaching state-of-the-art (SOTA) results.</p>
<h2 id="what-does-it-propose"><strong>What does it propose?</strong></h2>
<p>The authors propose a framework to compare pre-training techniques and language model (LM) objectives. This framework focuses on how these techniques can be viewed as <strong>corrupting text with an arbitrary noising function while the Language Model is tasked with denoising it.</strong> After some comparative experiments using this framework, BART is introduced as a transformer-based LM that reaches SOTA performance.</p>
<h2 id="how-does-it-work"><strong>How does it work?</strong></h2>
<h3 id="the-framework"><strong>The Framework</strong></h3>
<p><img src="https://lh3.googleusercontent.com/_ZYOOgt3efQF8LlFM_rmlJdiQyj3bkFeKfeihhbOK3w-UvPUPvFX9K_YFMh7SIURsyFclNwkL8oVByH3XlQKXPnhZYO8IFY54nhFBlE9wuk0vJBKxI1Ci_7xnbePqT8thQC-vB1ZUvs" alt="image" />
The idea behind the proposed framework is simple, they suggest that decoupling language models and the functions with which the text are corrupted is useful to compare different pre-training techniques and see how they perform on similar models and diverse benchmarks. Viewed this way, pre-training is a sequence of repeated steps:</p>
<ul>
<li>Apply a noising function to the text</li>
<li>The language model attempts to reconstruct the text</li>
<li>Then calculate the loss function (typically cross entropy over the original text) and then back-propagate the gradients and update the model’s weights.</li>
</ul>
<h3 id="comparing-different-text-noising-techniques-and-lm-objectives"><strong>Comparing different text-noising techniques and LM Objectives</strong></h3>
<p><img src="https://firebasestorage.googleapis.com/v0/b/firescript-577a2.appspot.com/o/imgs%2Fapp%2FAntonioLprd%2FdqcEKBSd0Y.png?alt=media&token=2cee8cc9-cb39-472f-9377-82c995a7ee85" alt="" /></p>
<p><br />
In the first experiment, using the framework they introduced at the beginning of the article, the authors compared different pre-training techniques and LM objectives on a smaller than usual model, BART-base. The model uses a 6 layered, transformer-based, seq2seq architecture for autoencoding as introduced by <a href="https://arxiv.org/pdf/1706.03762.pdf">Vaswani et al</a>. The pre-training techniques compared in the experiments can be divided between those that work at the token level and those that work at the sentence level:</p>
<p><br />
<strong>Token Masking:</strong> random tokens are sampled and replaced with [MASK]</p>
<p><br />
<strong>Token Deletion:</strong> similar to masking but the sampled tokens are deleted and the model has to add a new token in their place.</p>
<p><br />
<strong>Token Infilling:</strong> a number of text spans, i.e. contiguous group tokens, are sampled, and then they are replaced by the [MASK] token.</p>
<p><br />
<strong>Sentence Permutation:</strong> random shuffling of the document’s sentences.</p>
<p><br />
<strong>Document Rotation</strong> a token is chosen randomly to be the start of the document, the section before the starting token is appended at the end.</p>
<p><br />
Intuitively, the techniques that work at the sentence level should help the LM learn the different roles of sentences in a paragraph or longer text and in the process help dealing with natural language generation(NLG) tasks.</p>
<p><br />
Besides the pre-training techniques, the authors also compare different LM objectives focusing on the ones used by BERT and GPT as well as techniques that tried to incorporate the best of both worlds:</p>
<p><br />
<strong>Autoregressive, left to right, LM</strong> (<a href="https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf">GPT-2</a>)</p>
<p><br />
<strong>Masked LM</strong> (<a href="https://arxiv.org/abs/1810.04805">BERT</a>) replace 15% of the token with the [MASK] token and predict the corresponding words.</p>
<p><br />
<strong>Permuted LM</strong> (<a href="https://arxiv.org/pdf/1906.08237.pdf">XLNet</a>) left to right, autoregressive LM training but with the order of the words to predict chosen at random.</p>
<p><br />
<strong>Multitask Masked LM</strong> (<a href="https://arxiv.org/pdf/1905.03197.pdf">UniLM</a>) combination of right-to-left, left-to-right and bidirectionality. ⅓ of the time using each with shared parameters.</p>
<p><br />
<strong>Masked Seq2Seq</strong> (<a href="https://arxiv.org/pdf/1905.02450.pdf">MASS</a>) masking a span containing 50% of the tokens and train to predict the masked tokens.</p>
<h3 id="results-of-the-first-experiment"><strong>Results of the first experiment</strong></h3>
<p><img src="https://lh6.googleusercontent.com/YxPMuTJc7EY2rFYIXUBVIYMjV-rloyEj2UmgJ6pbxyyMDCWzdwu4KOgRErOKJcmDe4QfC7LO-2bGE6-_0pCF1lRwJfFbjGvBbuk73oFQ8AgMdHAYDNIwDH8HlEcBI15SQKTUMIUhc_8" alt="" /></p>
<p><br />
From the results of this first experiments the authors draw some important conclusions.</p>
<p><br />
<strong>Token masking is crucial</strong></p>
<p><br />
Only the configurations with token masking or its variations achieve consistently great performance on different tasks.</p>
<p><br />
<strong>Left-to-right pre-training improves NLG</strong></p>
<p><br />
The classical LM objective despite not doing well in inference or question answering tasks, achieves SOTA on ELI5 (Explain Like I’m 5).</p>
<p><br />
<strong>Bidirectional encoders are crucial for QA</strong></p>
<p><br />
Ignoring future context hinders the performance of left-to-right models.</p>
<p><br />
While pre-training techniques and LM objectives are important, the authors make note of the fact that they do not provide the full picture. They report that their permuted language model performs much worse than XLNet because BART lacks some of the valuable architectural innovations introduced in XLNet.</p>
<h3 id="results-of-the-large-scale-pre-training-experiment"><strong>Results of the large-scale pre-training experiment</strong></h3>
<p>After the comparative experiment, the authors trained a 12 layered, transformer-based architecture for autoencoding, and using similar hyperparameters to <a href="https://arxiv.org/pdf/1907.11692.pdf">RoBERTa</a>. They used both a form of token masking at 30% and sentence permutation as pre-training text-noising techniques and run the model on 160GB of news, books, stories, and web text, similar to what’s done in RoBERTa.</p>
<p><br />
<img src="https://lh4.googleusercontent.com/grHkzO3a2zUtdo-_AkH3sGhWmfGQ9_n4bAD0wm78kzVLpuzYFFdSwycnLIevdGUb5jVMJpGdA46LvZN_k0PDrCCObljSvgfcTo6PHevfpa5ZonMXn-C5tEXsUW1V33akbAIINi7whkA" alt="" /></p>
<p><br />
BART performs best in abstractive summarization tasks especially in the <strong>XSum</strong> benchmark that contains very few examples of summaries where phrases are present both in the summary and the original text. Besides surpassing the previous best systems in summarization by a considerable margin, BART does well also in natural language inference (NLI) tasks and QA, where it is on par with SOTA results.</p>
<h3 id="qualitative-analysis"><strong>Qualitative Analysis</strong></h3>
<p>The paper also features examples of the summaries produced by BART that can really give a sense of how well it does on the XSum dataset:</p>
<p><br />
<img src="https://firebasestorage.googleapis.com/v0/b/firescript-577a2.appspot.com/o/imgs%2Fapp%2FAntonioLprd%2F7kziiycrsc.png?alt=media&token=a6812c61-8d4b-4f0b-ac0c-e36c470dad45" alt="" /></p>
<p><br />
If you want to summarize some text of your own we have set up a <a href="https://colab.research.google.com/drive/1ufNmxZz3v8LloGAYFJvTM2Git2aNe83R?usp=sharing">Google Colab notebook</a> using the Hugging Face library.</p>
<p><a href="https://dair.ai/posts/BART-Summary/">BART: Are all pretraining techniques created equal?</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on May 11, 2020.</p>
https://dair.ai/posts/NLP_Newsletter_11_tr2020-05-04T00:00:00+00:002020-05-04T00:00:00+00:00Harun Uzhttps://dair.ai
<p><img src="https://cdn-images-1.medium.com/max/1200/1*X_c0mVECV9rtl6ozuE-oPA.png" alt="" /></p>
<p><br />
NLP Haber Bülteni’nin 11. sayısına hoş geldiniz. Bu sayıda vergi politikası tasarımı için takviyeli öğrenme (İng. reinforcement learning) framework’leri, kabul görmüş en güncel (SotA) AI sohbet sistemleri ve metin üretme araçlarını geliştirme gibi konu başlıklarını işlemekteyiz.</p>
<h1 id="dairai-sitesinden-güncellemeler">dair.ai sitesinden güncellemeler</h1>
<ul>
<li>Metin tabanlı duygu araştırmada kullanılabilecek bir <a href="https://github.com/dair-ai/emotion_dataset">veri seti</a> yayımladık. Repo, duygu sınıflandırma (İng. emotion classification) görevi için ön-eğitime tabi tutulmuş BERT modellerine nasıl ince-ayar yapılacağını gösteren bir Colab <a href="https://colab.research.google.com/drive/1nwCE6b9PXIKhv2hvbqf1oZKIGkXMTi1X#scrollTo=t23zHggkEpc-">notebook</a>‘u içermektedir. Kısa süre önce bir NLP işlem dizisine (İng. pipeline) kolayca entegre edilebilecek bir modele veri setimiz üzerine ince-ayar yapılmıştır ve Huggingface’te <a href="https://huggingface.co/mrm8488/distilroberta-base-finetuned-sentiment">erişilebilmektedir</a>.</li>
<li>Kısa süre önce ilk kez yaptığımız bildiri okuma oturumumuzu gerçekleştirdik. 120’den fazla kişinin kayıt olduğu ve çoğunun uzak etkinliğimize katılım gösterdiği oturumumuzun ilk tartışma konusu <a href="https://arxiv.org/abs/1910.10683%27">T5 bildirisi</a> üzerine oldu. Bildirinin derinlemesine tartışmasını yapacağımız ikinci bir oturuma da ev sahipliği yapacağız. Herkes <a href="https://www.meetup.com/dair-ai/events/270419989/">şuradan</a> erişilebilen etkinliğimize davetlidir. Gelecek etkinlikler hakkında daha fazla bilgi sahibi olmak için <a href="https://www.meetup.com/dair-ai">Meetup grubumuza</a> ya da <a href="https://join.slack.com/t/dairai/shared_invite/zt-dv2dwzj7-F9HT047jIGkunNKv88lQ~g">Slack kanalımıza</a> katılabilirsiniz. <em>Ayrıca gelecek etkinlikler hakkında bilgileri NLP Haber Bülteni’ne 🔖</em> <a href="https://dair.ai/newsletter/">abone olarak</a> da alabilirsiniz.</li>
</ul>
<h1 id="araştırma-ve-yayınlar-">Araştırma ve Yayınlar 📙</h1>
<p><strong><em>OpenAI’nın Jukebox’ı</em></strong></p>
<p><br />
Çeşitli türlerde ve artistik tarzlarda (sıfırdan) müzik üretmek için eğitilen sinir ağı yapısı olan Jukebox, OpenAI’nın en son çalışmalarından biridir. VQ-VAE olarak adlandırılan niceleme-tabanlı (İng. quantization-based) yaklaşımdan esinlenilmiş model, tür, artist ve şarkı sözü girdileri ile beslenmekte ve yeni bir ses örneği üretmektedir. Ana fikir, çok-seviyeli bir otokodlayıcı (İng. autoencoder) ile uzun, işlenmemiş ses girdilerini işleyip sıkıştırmak ve gerekli müziksel bilgiyi muhafaza ederek boyutları indirgemektir. Ardından, transformatörler (İng. transformer) VQ-VAE kod çözücüsü (İng. decoder) tarafından kullanılarak ham sesi yeniden yapılandıracak kodları üretmek için kullanılmaktadır. Çalışmayla ilgili daha fazla detayı şu <a href="https://openai.com/blog/jukebox/">blog gönderisinde</a> ya da <a href="https://cdn.openai.com/papers/jukebox.pdf">bildiride</a> bulabilirsiniz.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*JvZuEnM8O4B2PmYSi3RivA.png" alt="" /></p>
<p><br />
<strong><em>HybridQA: Çizelge ve Metin Verisi üzerine Çok-Sekmeli Soru Cevaplama Veri Seti</em></strong></p>
<p><br />
Şimdiye kadar çoğu soru-cevap (QA) veri seti türdeş bilgilere odaklanmıştır. <a href="https://github.com/wenhuchen/HybridQA">HybridQA</a>, farklı cinsten bilgiler üzerinde akıl yürütmeyi gerektiren araştırmaları ve yöntemleri teşvik etmek için oluşturulmuş büyük ölçekli bir soru-cevap veri setidir. Çok-sekmeli QA veri seti, yapılandırılmış bir Vikipedi tablosu ve tablodaki serbest çalışma derlemelerine bağlı yapılandırılmamış varlıklardan oluşmaktadır. Ayrıca, yazarlar yalnızca tabloyu ya da metni kullanmanın aksine farklı cinsten bilgilerle çalışmanın avantajlarını ön plana çıkardıkları iki adet taban çalışmayı da tartışmaktadırlar; ancak sonuçların insan performansının çok arkasında kaldığını ve bu durumun da farklı cinsten veri üzerinde daha iyi akıl yürütmeyi beceren ve <em>kapsam</em> sorunlarını ele alan QA sistemlerine bir çağrıda bulunduğunu belirtmektedirler.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*ojEoPUGxzskGUc1F.png" alt="" /></p>
<p><em><a href="https://github.com/wenhuchen/HybridQA">Kaynak</a></em></p>
<p><br />
<strong><em>Bir SotA Sohbet Robotu</em></strong></p>
<p><br />
Facebook AI, <em>gelmiş geçmiş en büyük açık alanlı sohbet robotu</em> olarak bahsettikleri AI tabanlı modelleri Blender’i <a href="https://ai.facebook.com/blog/state-of-the-art-open-source-chatbot">inşa etti</a> ve açık-kaynaklı yaptı. <a href="https://arxiv.org/abs/2001.09977">Meena</a>‘nın (Google tarafından sunulan bir konuşkan AI sistemi) başarısını takiben, Facebook AI, kaliteli konuşmalar üretmek için empati ve kişilik gibi sohbet yeteneklerini harmanlayan bir model önermektedir. Model, yaklaşık 1.5 milyar eğitim örneği üzerinde Transformatör tabanlı bir model (9.4 milyara kadar parametresi olan) kullanılarak eğitilmiştir. Ardından modelin konuşma becerilerini geliştirebilecek tanımlanmış, arzu edilen kişisel özellikleri kazandırmayı amaçlayan <a href="https://arxiv.org/abs/2004.08449">Blended Skill Talk</a> veri seti üzerine ince-ayar yapılmıştır. Yazarlar, modelin insan değerlendirmeciler tarafından Meena’nın ürettiklerinden daha insanca kabul ettikleri cevaplar üretebildiğini iddia etmektedirler.</p>
<p><br />
<strong><em>TLDR: Bilimsel Dokümanların Aşırı Özetlenmesi</em></strong></p>
<p><br />
Şu <a href="https://arxiv.org/abs/2004.15011">bildiri</a>, yeni bir görev olan bilimsel bildirilerin TLDR’sinin üretilmesi (İng. TLDR generation) için yeni bir yaklaşım ile birlikte veri seti (SCITLDR) önermektedir. TLDR’ler bilimsel makalelerin yoğun özetleri ve bir alternatif olarak tanımlanmaktadır. TLDR’ler, yazarlar tarafından önerildiği üzere bir bildirinin ne hakkında olduğunu hızlıca anlama yolu olarak hizmet edebilmekte ve okuyucuya bildiriyi okumaya devam edip etmemeye karar vermesinde yardımcı olabilmektedir. İnsan üretimi özetlerin çeşitliliğinden dolayı çoklu TLDR’ler hakemlerden değerlendirme yoluyla elde edilmektedir. Çoklu-görev ince ayar programı (başlık üretimi ve TLDR üretimini içermektedir) olan BART-tabanlı bir model nihai görev için kullanılmıştır.</p>
<p>Not: TLDR, <em>too long, didn’t read</em> söyleminin kısaltması olup “çok uzundu, okumadım” anlamına gelmektedir ve genellikle uzun dokümanların sonunda kısaca dokümanın ne anlattığını birkaç cümlede anlatan kısmı işaretlemek için kullanılır.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*eeRZ4T1CG0lhgHr_ia-8MA.png" alt="" /></p>
<p><em><a href="https://arxiv.org/abs/2004.15011">Cachola et al. (2020)</a></em></p>
<p><br />
<strong><em>WT5?! Tahminlerini Açıklamak için Metinden-Metine Modellerin Eğitimi</em></strong></p>
<p><br />
<em>Metinden-metine Aktarım Transformatörü (İng. Text-to-Text Transfer Transformer)</em> (<a href="https://arxiv.org/abs/1910.10683">T5</a>), NLP öğrenme aktarımı (İng. transfer learning) alanındaki en son gelişmeleri tek ve birleşik bir framework olarak bir araya getirme yöntemidir. Bu çalışma, hem girdilerin hem de çıktıların metin olduğunu varsayarak birçok NLP görevinin metinden-metine biçiminde formüle edilebileceğini ileri sürmektedir. Yazarlar, bu “<em>framework’un hem ön-eğitim hem de ince-ayar için tutarlı eğitim hedefleri sağladığını</em>” iddia etmektedirler. T5, aslında özellikle modelin dikkat (İng. attention) bileşenlerine yapılmış çeşitli geliştirmeleri uygulayan kodlayıcı-kod çözücü (İng. encoder-decoder) Transformatördür. Model, yeni yayımlanan <a href="https://www.tensorflow.org/datasets/catalog/c4">Colossal Clean Crawled Corpus</a> veri seti üzerinde ön-eğitime tabi tutulmuştur ve özetleme, soru cevaplama ve metin sınıflandırma (İng. text classification) gibi NLP görevlerinde kabul görmüş en güncel (İng. state-of-the-art) sonuçlara ulaşmıştır.</p>
<p><br />
T5’i takip eden yeni bir çalışma, <a href="https://arxiv.org/abs/2004.14546">WT5</a> (-Why, T5?-“Neden, T5?”in kısaltması), Transformatör tabanlı T5 modelini kendi yaptığı tahminlere açıklama üretmesi için ince ayar yapmıştır. Bu da modelin neden belirli tahminleri yaptığının daha iyi anlaşılmasına yardımcı olabilmektedir. Model, yalnızca hedef açıklamalı ve hedef etiketli örneklerle beslenmektedir. Bir görev ön eki (örneğin duygu) olan girdi metin ve asıl metin de önüne eklenmiş “explain(açıkla)” etiketi alabilmektedir (aşağıdaki resimde örnek görebilirsiniz). Bu da modele, tamamen etiketlenmiş veri ve açıklama etiketleri bulunan sınırlı sayıda örneği sağlayarak yarı-gözetimli (İng. semi-supervised) öğrenmeyi aktif etmektedir. Yazarlar, yaklaşımlarının alan dışı veri üzerinde iyi sonuç ürettiğini ve açıklanabilirlik veri seti üzerinde kabul görmüş en güncel niteleyici ve sayısal sonuçlarını bildirmişlerdir. Bu çalışma, metin-tabanlı modellerin tahminlerini daha iyi anlamak için kullanılabilecek ilgi çekici bir temel model sunmaktadır; ancak yazarlar yaklaşımın yorumlanabilirliğin yalnızca yüzeysel bir gelişmesi olduğunu ve daha ilerletilebileceğini vurgulamaktadırlar.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*5TFSiu_G6ofmwM9FhpokHQ.png" alt="" /></p>
<p><em>Narang et al. (2020)</em></p>
<h1 id="araçlar-ve-veri-setleri-️">Araçlar ve Veri Setleri ⚙️</h1>
<p><strong><em>NVIDIA’nın Tıbbi Görüntüleme Framework’u</em></strong></p>
<p><br />
<a href="https://blogs.nvidia.com/blog/2020/04/21/monai-open-source-framework-ai-healthcare/?ncid=so-twit-79443#cid=ix11_so-twit_en-us">MONAI</a>, sağlık alanında bilimsel gelişmeleri desteklemek için geliştirilen bir tıbbi görüntüleme AI framework’udur. Yayımlama notlarında bildirildiği üzere, MONAI sağlık hizmet verisi ile başa çıkabilmek için kullanıcı dostu ve alana optimize bir kütüphane sağlamayı hedeflemektedir. Benzer diğer kütüphaneler gibi ayrıca alana özel veri işleme ve dönüştürme araçları, bu alanda yaygın kullanılan sinir ağı modelleri ve değerlendirme yöntemlerine erişim ve sonuçları yeniden üretebilme yetenekleri sağlamaktadır.</p>
<p><br />
<strong><em>Python Game Boy Emülatörü</em></strong></p>
<p><br />
<a href="https://github.com/Baekalfen/PyBoy">PyBoy</a>, Python’da yazılmış ve Game Boy donanımı ile çalıştığı makine arasına arabirim kurmaya yardımcı olan bir araçtır. İçerisinde oyun ile etkileşime geçen AI tabanlı bir ajanın (İng. agent) eğitilebilmesi için deneysel bir ortam da bulunmaktadır.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*WDJdaEyRjK660-0b6KvVjQ.png" alt="" /></p>
<p><br />
<strong><em>PDF olarak Jupyter Notebook’ları</em></strong></p>
<p><br />
Hiç notebook’larınızı düzgünce bir PDF haline getirmek istediniz mi? Tim Head tarafından yazılan şu Jupyter <a href="https://github.com/betatim/notebook-as-pdf">uzantısı</a> eklenti bakımından en az gereksinimle notebook’unuzdan PDF üretmenize ve yeniden üretilebilirlik için notebook’larınızı PDF’lere ekleyebilmenize olanak sağlamaktadır.</p>
<p><br />
<strong><em>Daha Gerçekçi Konuşkan AI Sistemleri Üzerine</em></strong></p>
<p><br />
Transformers, artık kütüphanedeki ulaşılabilen ilk konuşkan (İng. conversational) tepki modeli olan <a href="https://huggingface.co/transformers/model_doc/dialogpt.html">DialoGPT</a>‘ye erişimi içermektedir. <a href="https://www.microsoft.com/en-us/research/publication/dialogpt-large-scale-generative-pre-training-for-conversational-response-generation/">DialoGPT</a>, Microsoft tarafından önerilen büyük ölçekli, konuşkan, cevap üreten bir modeldir. Vikipedi ve haber gibi metin verilerine bağlı olan önceki modellerden farklı olarak Reddit yorumlarından çıkarılan çok miktardaki karşılıklı konuşmaları kullanmaktadır. DialoGPT, yanıt üretmek için büyük ölçekli bir ön-eğitim sağlamayı hedefleyen GPT-tabanlı özbağlanımlı (İng. autoregressive) dil modelidir ve konuşkan AI’nın insan etkileşimini daha iyi temsil etmesine yol açmaktadır.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*HTtADQcR20iRxvPh3DhJ2Q.png" alt="" /></p>
<p><br />
<strong><em>TorchServe ve [Kubernete için TorchElastic], büyük ölçekte model eğitmek ve servis etmek için yeni PyTorch kütüphaneleri</em></strong></p>
<p><br />
<a href="https://medium.com/pytorch/torchserve-and-torchelastic-for-kubernetes-new-pytorch-libraries-for-serving-and-training-models-2efd12e09adc">TorchServe</a>, geliştiricilerin süreçte zorlanmalarını azaltmayı hedeflerken modellerini eğitme ve servis etmelerine olanak sağlayan açık-kaynaklı bir kütüphanedir. Bu araç PyTorch’un üzerine inşa edilmiş olup geliştiricilerin modellerini job olarak AWS üzerinde çalıştırmalarına izin vermektedir. Torchserve’nin kolay uygulama, temiz sonuç çıkarım API’leri, raporlama, gerçek zamanlı sonuç servisi metrikleri ve kolay model yönetimi gibi yetenekleri eğitilmiş modellere sunmak için kabul görmüş bir yol olması beklenmektedir.</p>
<p><br />
<strong><em>MLSUM: Çok Dilli Özetleme Derlemi (The Multilingual Summarization Corpus)</em></strong></p>
<p><br />
NLP’de çok dilli çalışmaları teşvik etmek ve güçlendirmek amacıyla Thomas Scialom ve diğer araştırmacılar kısa süre önce çok dilli özetleme derlemini <a href="https://arxiv.org/abs/2004.14900">sundular</a>. Veri seti, gazetelerden elde edilmiş olup Fransızca, Almanca, İspanyolca, Rusça ve Türkçe yaklaşık 1.5 milyon makale içermektedir.</p>
<p><br />
<strong><em>ML ile Yapıldı (Made with ML)</em></strong></p>
<p><br />
Eğer kaçırdıysanız, Goku Mohandas, ilgili ve ilginç ML projelerini keşfedebileceğiniz bir araç sağlamayı hedefleyen Made with ML isimli internet sitesini geliştirdi. Kullanıcıların topluluk ile projelerini paylaşabilecekleri bir platform niteliğindedir. Siteye en son eklenen yeniliklerden birisi de kullanıcılara ilgili projeleri hızlıca bulmada yardımcı olan <a href="https://madewithml.com/topics/">seçilmiş konular</a> bölümüdür.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*eoyqzd6XYVBOqOnU_jqjNA.png" alt="" /></p>
<h1 id="makaleler-ve-blog-gönderileri-️">Makaleler ve Blog Gönderileri ✍️</h1>
<p><strong><em>ICLR 2020 Konferansında Transformatörler için hangi yenilikler var?</em></strong></p>
<p><br />
Makine öğrenmesinde birincil konferanslardan ICLR, tüm dünyadaki seyahat kısıtlamaları sebebiyle bu yıl sanal olarak yapılmak zorunda kaldı. Zirve konferanslardan olmak her zaman büyük çalışmaların beklentisi içerisinde olmak demektir; özellikle söz konusu, çığır açmış geçmiş çalışmaların üzerine yapılan geliştirmeler olduğunda. Örneğin, Transformatörler birçok NLP görevinde kabul görülen en güncel sonuçları üretmeyi başarmışlardı ve ICLR’de bu modelleri geliştirmek için yollar öneren iki adet çalışma kabul edildi.</p>
<p><br />
Şu <a href="https://towardsdatascience.com/whats-new-for-transformers-at-the-iclr-2020-conference-4285a4294792">makale</a>, yapısal revizyonlar (<a href="https://openreview.net/pdf?id=H1eA7AEtvS">ALBERT</a>, <a href="https://openreview.net/pdf?id=rkgNKkHtvB">Reformer</a> ve <a href="https://openreview.net/pdf?id=r1eIiCNYwS">Transformer-XH</a> gibi), yeni öğrenme prosedürleri (<a href="https://openreview.net/pdf?id=r1xMH1BtvB">ELECTRA</a> ve <a href="https://openreview.net/pdf?id=BJlzm64tDH">Pretrained Encyclopedia</a> gibi) ve büyük ölçekte çıkarım, metin üretimi, görsel-dilbilimsel temsiller gibi diğer alanlarda geliştirmeler içeren Transformatörler’le ilgili çalışmaları özetlemektedir. İlginç bir <a href="https://openreview.net/pdf?id=HJlnC1rKPB">bildiri</a>, Transformatör yapılarının, potansiyel bir CNN genelleştirmesi olduğunu öne süren ilginç bulgularıyla, konvolüsyon ile öz-dikkat(İng. self-attention) katmanlarının ortak yönlerini açıklayan detaylı bir analiz sağlamaktadır.</p>
<p><br />
Eğer bu yıl ICLR’de yayımlanan diğer çalışmalar hakkında daha çok bilgiye sahip olmak isterseniz, <a href="https://paperswithcode.com/conference/iclr-2020-1/official">Papers with Code</a> sitesine göz gezdirebilirsiniz.</p>
<p><br />
ICLR bütün <a href="https://iclr.cc/virtual_2020/papers.html?filter=keywords">konferans konuşmalarını</a> kullanıma açmış bulunmaktadır.</p>
<p><br />
<strong><em>AI Ekonomisti: AI-Destekli Vergi Politikaları ile Eşitlik ve Üretkenliği Artırmak</em></strong></p>
<p><br />
Takviyeli öğrenme (İng. reinforcement learning), AI laboratuvarlarının AI alanında çığır açan gelişmeler yapmalarına olanak sağlamaktadır. AI sistemleri ile özellikle vergi politikası tasarımı gibi küresel problemlerle başa çıkma amacıyla, bir grup araştırmacı sadece simülasyon ve veriye bağlı çözümler kullanarak <em>dinamik vergi politikalarını</em> öğrenmeyi amaçlayan takviyeli öğrenme frameworku <a href="https://blog.einstein.ai/the-ai-economist/">AI Economist</a>‘i sundular. AI Economist’ten elde edilen bazı gelişmeler umut verici sonuçlar göstermekte ve bu da onu sosyal çıkarları ve ekonomik eşitsizliğin mevcut durumunu geliştirebilecek bir framework olma potansiyeline sahip yapmaktadır.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*erIkiJKxa6jJgJEyNAnvbA.png" alt="" /></p>
<p><br />
<strong><em>AI sistemlerine sağduyulu akıl yürütme kabiliyetleri getirmek üzerine</em></strong></p>
<p><br />
Günümüz AI sistemlerinin birçoğunun sağduyulu akıl yürütme yeteneklerinde eksiklik olduğu tartışılmaktadır. Şu detaylı <a href="https://www.quantamagazine.org/common-sense-comes-to-computers-20200430/">makale</a> bu problemin kısa bir tarihini ve en son teknolojiler üzerinde çalışan araştırmacıların bu alanda nasıl ilerleme kaydettiklerini anlatmaktadır. Beklenildiği üzere güncel çalışmaların birçoğu, bir bilgi tabanı inşa ederek sinir ağlarına (özellikle dil modellerine) dünyayı daha hızlı ve efektif anlamayı öğretmektedirler. Bu, <em>kapsam</em> ve <em>hassaslık</em> problemleri ile baş etmek için sembolik akıl yürütme ile sinir ağlarını birleştirme girişimi olarak düşünülebilir.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*X6kfr8dyhvhjhQAPD-oh2A.png" alt="" /></p>
<p><em><a href="https://arxiv.org/abs/1906.05317">COMET</a> — Bosselut et al. (2019)</em></p>
<p><br />
<strong><em>BERT’lere ayak uydurmak: ana NLP başarım ölçütlerinin kritiği</em></strong></p>
<p><br />
NLP insanlardan daha iyi neler yapabilir ve hala geliştirilebilir mi? Güncel bir <a href="https://creatext.ai/blog-posts/nlp-benchmarking-superglue-xtreme">blog gönderisinde</a>, Manuel Tonneau, GLUE ölçütleri üzerinde model performansını gözden geçirmekte ve hangi görevlerde NLP sistemlerinin başı çektiğini hangi görevler hala insanların daha üstün olduğunu belirlemektedir. SuperGLUE ve XTREME ölçütleri de çıtayı daha yukarı taşıma ve yeni görev ve yeni diller üzerindeki araştırmaları motive etme girişimi olarak sunulmuştur.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*7tD3nxDuzazUFGSWNZOXWQ.png" alt="" /></p>
<p><em><a href="https://super.gluebenchmark.com/leaderboard/">SuperGLUE kıyaslamaları</a></em></p>
<p><br />
<strong><em>Transformatör Modelleri için Triton (TensorRT) Inference Sunucusunun Kıyaslaması</em></strong></p>
<p><br />
Şu detaylı <a href="https://blog.einstein.ai/benchmarking-tensorrt-inference-server/">blog gönderisi</a>, üretimde kullanılacak Transformatör tabanlı dil modellerinin servis edilmesi için yapılan ilginç kıyaslama deneylerini ele almaktadır. Yazarlar, modelleri ve deneyleri farklı ayarlar ve seçeneklerle tutmak için <a href="https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/index.html">NVIDIA’nın Triton (TensorRT) Inference Sunucusunu</a> kullanmaktadırlar. Ayrıca TensorFlow ve PyTorch tabanlı modellerin karşılaştırılabilir sonuçlarını sağlamaktadırlar. Rapor, eşzamanlı servis gecikmesi, girdinin eşzamanlı serviste işlenme süresi ve yığın (İng. batch) ve sekans uzunluğu gibi diğer yapılandırma ayarlarıyla model servis etmenin farklı yönlerinden elde edilmiş sonuçları içermektedir. Model servis etmenin birçok yönü raporda bulunmamaktadır; ancak yazarlar model versiyonlamayı ve nesne tespiti gibi farklı görevleri test etmekle oldukça ilgililerdir. Bu tarz rehberler, insanların modellerini üretime koymalarına yardımcı olacak kıyaslama modelleri için en iyi yöntemleri ve teknikleri sağlamaktadır.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*zTIAB4-Q4TdEX32RpEC-hg.png" alt="" /></p>
<p><em>Farklı modellerin gecikme ve işleme süreleri — <a href="https://blog.einstein.ai/benchmarking-tensorrt-inference-server/">kaynak</a></em></p>
<p><br />
<strong><em>Keras’ta Tekrarlayan Katmanlara Görsel Rehber</em></strong></p>
<p><br />
Amit Chaudhary’nin şu <a href="https://amitness.com/2020/04/recurrent-layers-keras/">makalesi</a> Keras’ta bulunan tekrarlayan katmanlara (İng. recurrent layers) ve girdi ve çıktıdaki çeşitli argümanların etkilerine görsel bir açıklama sağlamaktadır. Makalenin, veri hazırlarken ve işlerken Keras RNN katmanları ile nasıl etkileşime geçilmesi gerektiğini daha iyi anlamaya yardımcı olması amaçlanmaktadır. RNN modelleri ile dil modellemeye ilgisi olan yeni başlayanlar için faydalı bir rehberdir.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*XvJRVz993m5TaOTilodcmg.png" alt="" /></p>
<h1 id="eğitim-">Eğitim 🎓</h1>
<p><strong><em>Bulut, Mobil & Uç Cihazlar için Pratik Derin Öğrenme Kitabı</em></strong></p>
<p><br />
Eğer derin öğrenme modellerinizi bulut, mobil ya da uç cihazlara (İng. edge device) taşımak gibi bir ilgi alanınız varsa Anirudh Koul, Siddha Ganju ve Meher Kasam tarafından yazılan şu <a href="https://www.practicaldeeplearning.ai/">kitaba</a> göz gezdirebilirsiniz. Kitabın başlığı “Practical Deep Learning Book for Cloud, Mobile & Edge (Bulut, Mobil ve Uç Cihazlar için Pratik Derin Öğrenme Kitabı)”dır ve konu başlıkları bilgisayarla görü modellerinizi ayarlama ve uygulama, kırkın üzerinde endüstriyel kullanım alanlarına giriş ve modelleri hızlı eğitebilmek için öğrenme aktarımının (İng. transfer learning) kullanımı gibi konular üzerine yoğunlaşmıştır.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*XE_--TTGI9fQCTij.jpg" alt="" /></p>
<p><br />
<strong><em>Makine öğrenmesi dersleri</em></strong></p>
<ul>
<li>Stanford, Andrew Ng. tarafından verilen ML dersinin yeni kaydedilen <a href="https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU">videolarını</a> erişilebilir hale getirdi. Bu ders, makine öğrenmesi dünyasına yeni başlayan öğrenciler için iyi hizmet edebilecek içerikler sağlamaktadır.</li>
<li>ML ve NLP sistemlerini gerçek dünyada kullanmak için üretime geçirdikçe, daha güvenilir ve mahremiyeti koruyan sistemleri inşa etmek önemli bir hal almaktadır. Bu <a href="https://cseweb.ucsd.edu/classes/sp20/cse291-b/index.html">ders</a>, <em>güvenilir makine öğrenmesi</em> alanındaki konu başlıklarını kapsamaktadır.</li>
<li>Thomas Wolf, NLP’de öğrenme aktarımının en son akımları ve gelecek konu başlıklarını açıklayan şu kapsamlı <a href="https://www.youtube.com/watch?v=G5lmya6eKtc">video-tabanlı özeti</a> yayımladı.</li>
</ul>
<p><br />
<strong><em>GAN’ları öğrenmek</em></strong></p>
<p><br />
Pieter Abbeel’in şu <a href="https://www.youtube.com/watch?v=1CT-kxjYbFU&feature=youtu.be">video dersi</a>, gerçekçi resimler üretme ve dijital boyama gibi günümüzde her çeşit yaratıcı uygulamada kullanılan çekişmeli üretici ağlara (İng. generative adversarial networks) detaylı bir genel bakış sağlamaktadır. Bu ders, şu an UC Berkley’de öğretilen <a href="https://sites.google.com/view/berkeley-cs294-158-sp20/home">Derin Gözetimsiz Öğrenme</a> (İng. Deep Unsupervised Learning) dersinin bir parçasıdır. Aşağıda dersin ana hatlarını görebilirsiniz.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*1BKlAYDwyUheMMRl59TFFQ.png" alt="" /></p>
<p><br />
<strong><em>Derin Öğrenme için Diferansiyel Hesaplama</em></strong></p>
<p><br />
Aurélien Geron, diferansiyel hesaplamanın türevler, kısmi türevler, gradyanlar gibi temel konseptlerini öğretmeyi amaçladığı ilginç Colab <a href="https://colab.research.google.com/github/ageron/handson-ml2/blob/master/math_differential_calculus.ipynb#scrollTo=mnywx0pgMCLA">notebook</a>‘unu paylaştı. Bu konu başlıkları derin öğrenme alanında oldukça yüksek önem taşımaktadırlar ve Geron bu başlıkları özetlemekte ve yol göstermek için anlaması kolay görselleştirmeler içeren implementasyonlarını açıklamaktadır. Ayrıca, oto-türev (İng. auto-differentiation) üzerine başka bir <a href="https://github.com/ageron/handson-ml2/blob/master/extra_autodiff.ipynb">notebook</a>‘a bakmayı da önermektedir.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*NU6s4j-GE5PjaDsCsgcuNw.png" alt="" /></p>
<h1 id="bunlara-da-göz-atın-️">Bunlara da göz atın ⭐️</h1>
<ul>
<li>Andrej Karpathy, Tesla’da eksiksiz sürüş için çabalarına ilişkin AI teknolojilerindeki en yeni gelişmeleri <a href="https://www.youtube.com/watch?v=hx7BXih7zx8&feature=youtu.be">paylaştı</a>. Konu başlıkları HydraNet’in modellenmesi, veri motorları, değerlendirme ölçütleri ve bu büyük ölçekli sinir ağı modellerinden nasıl efektif bir şekilde sonuç alınabileceği gibi konulardan oluşmaktadır.</li>
<li>MLT tarafından hazırlanmış, etkileşimli makine öğrenmesi, derin öğrenme ve matematik araçları içeren hoş bir <a href="https://github.com/Machine-Learning-Tokyo/Interactive_Tools">repoya buradan</a> erişebilirsiniz.</li>
<li>Yeni bir <a href="https://arxiv.org/abs/2004.08900">bildiri</a>, büyük NLP modellerinin eğitim maliyetlerine ve bu maliyetlerin nasıl değiştirilebileceğine dair kısa bir genel bakış sağlamayı hedeflemektedir.</li>
<li>Kısa süre önce Springer, matematikten derin öğrenme başlıklarına kadar uzanan yüzlerce kitabı ücretsiz olarak erişime açtı. Şu <a href="https://towardsdatascience.com/springer-has-released-65-machine-learning-and-data-books-for-free-961f8181f189">makale</a> ücretsiz indirmeye uygun olan bazı makine öğrenmesi kitapları özetlemektedir.</li>
<li>Kra-Mania, Seinfeld dizisinden oluşturulan açık soru-cevap (QA) veri setini kullanarak <a href="https://github.com/deepset-ai/haystack">Haystack</a> (soru cevaplama ve arama için araç) ile yapılmış bir uygulamadır. Şu <a href="https://colab.research.google.com/drive/17kZqK2i0CYzR6ZDjL6ULEtpDUOYwQAbK">rehber</a> bu kütüphane ile nasıl kolayca QA işlem dizilerinin inşa edilebileceğini göstermektedir. Ayrıca şu <a href="https://kra-mania.firebaseapp.com/">linkte</a> uygulamayı deneyebilirsiniz.</li>
<li>Açıklanabilirlik, araştırmacılar tarafından derin sinir ağlarını daha iyi anlamayı amaçlayan bir süreçtir. AI sistemleri gerçek dünyada kritik alanlarda kullanıldıkça açıklanabilirlik, önemli ve aktif bir araştırma alanı olmaktadır. Şu <a href="https://arxiv.org/abs/2004.14545">bildiri</a>, <em>deneyimsizler için derin öğrenme açıklanabilirliğine “alan kılavuzu”</em> sağlamaktadır.</li>
<li>ML ve NLP’de son zamanların popüler araştırma alanlarından birisi olan veri artırma (İng. data augmentation) üzerine olan çalışmaları açıklayan kısa bir <a href="https://ai.stanford.edu/blog/data-augmentation/">derlemeye</a> göz gezdirebilirsiniz.</li>
<li>Bir önceki haber bülteninde, Transformatör’ün bir çeşidi olan ve özellikle uzun dokümanlarda olmak üzere çeşitli NLP görevlerinde başarımı artıran Longformer’dan bahsetmiştir. Şu <a href="https://www.youtube.com/watch?v=_8KNb5iqblE&t=463s">videoda</a>, Yannic Kicher bu çalışmanın önerdiği yeniliklere güzel bir açıklama getirmektedir.</li>
</ul>
<p><br />
Eğer NLP Haber Bülteni’nin bir sonraki sayısında paylaşmak istediğiniz tamamlanmış veri seti, proje, blog gönderisi, rehber ya da bildiriniz varsa, lütfen şu <a href="https://forms.gle/3b7Q2w2bzsXE6uYo9">formu</a> kulanarak direkt olarak gönderiniz.</p>
<p><br />
<em>NLP Haber Bülteni’nin gelecek sayılarını gelen kutunuzda görmek istiyorsanız 🔖 <a href="https://dair.ai/newsletter/">*abone olun*</a>.</em></p>
<p><a href="https://dair.ai/posts/NLP_Newsletter_11_tr/">NLP Haber Bülteni #11 [TR]: Jukebox, HybridQA, TLDR üretimi, Blender: SOTA Sohbet Robotu, TorchServe, AI Economist, WT5,...</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on May 04, 2020.</p>
https://dair.ai/posts/NLP_Newsletter_11_en2020-05-04T00:00:00+00:002020-05-04T00:00:00+00:00Elvis Saraviahttps://dair.aiellfae@gmail.com
<p><img src="https://cdn-images-1.medium.com/max/1200/1*X_c0mVECV9rtl6ozuE-oPA.png" alt="" /></p>
<p><br />
Welcome to the 11th issue of the NLP Newsletter. In this issue, we cover topics that range from reinforcement learning frameworks for tax policy design to state-of-the-art conversational AI to improving text generation frameworks.</p>
<h1 id="dairai-updates">dair.ai updates</h1>
<ul>
<li>We have released a <a href="https://github.com/dair-ai/emotion_dataset">dataset</a> that can be used for text-based emotion research. The repository includes a <a href="https://colab.research.google.com/drive/1nwCE6b9PXIKhv2hvbqf1oZKIGkXMTi1X#scrollTo=t23zHggkEpc-">notebook</a> that shows how to fine-tune pretrained BERT models for the task of emotion classification. More recently, a model was fine-tuned on our dataset and <a href="https://huggingface.co/mrm8488/distilroberta-base-finetuned-sentiment">hosted</a> on HuggingFace which can easily be integrated into an NLP pipeline.</li>
<li>We recently held our first-ever paper reading session. Over 124 people registered and a huge portion of that group participated in the remote event. The first discussion was on the <a href="https://arxiv.org/abs/1910.10683%27">T5 paper</a>. We are hosting a second session where we will have an in-depth discussion of the paper. All are invited to the event posted <a href="https://www.meetup.com/dair-ai/events/270419989/">here</a>. To find out more about future events, join our <a href="https://www.meetup.com/dair-ai">Meetup group</a>, or join the discussion in our <a href="https://join.slack.com/t/dairai/shared_invite/zt-dv2dwzj7-F9HT047jIGkunNKv88lQ~g">Slack group</a>. You can also <a href="https://dair.ai/newsletter/"><em>subscribe</em></a> <em>🔖 to the NLP Newsletter to receive information about future events.</em></li>
</ul>
<h1 id="research-and-publications-">Research and Publications 📙</h1>
<p><strong><em>OpenAI’s Jukebox</em></strong></p>
<p><br />
The latest work from OpenAI is called Jukebox which is essentially a neural network architecture trained to generate music (from scratch) in various genres and artistic styles. The model, based on a quantization-based approach called VQ-VAE, is fed genre, artist, and lyrics as input and it outputs a novel audio sample. The idea is to process and compress long raw audio inputs via a multi-level autoencoder and reducing the dimensionality but preserving essential musical information. Thereafter, transformers are used to generate codes that are then reconstructed to raw audio via the VQ-VAE decoder. More details of this work in this <a href="https://openai.com/blog/jukebox/">blog post</a> or the <a href="https://cdn.openai.com/papers/jukebox.pdf">full paper.</a></p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*JvZuEnM8O4B2PmYSi3RivA.png" alt="" /></p>
<p><br />
<strong><em>HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data</em></strong></p>
<p><br />
So far, most question answering datasets focus on homogeneous information. <a href="https://github.com/wenhuchen/HybridQA">HybridQA</a> is a large-scale question answering dataset for encouraging research and methods that require reasoning on heterogeneous information. The (multi-hop) QA dataset consists of a structured Wikipedia table and unstructured information in the form of entities in the table linking to free-form corpora. The authors also discuss two baselines where they highlight the advantages of working with heterogeneous information as opposed to just using the table or text alone. However, they do point out that results are far behind human performance and this calls for QA systems that can better reason over heterogeneous information and address <em>coverage</em> problems.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*ojEoPUGxzskGUc1F.png" alt="" /></p>
<p><em><a href="https://github.com/wenhuchen/HybridQA">Source</a></em></p>
<p><br />
<strong><em>A state-of-the-art open-source chatbot</em></strong></p>
<p><br />
Facebook AI has <a href="https://ai.facebook.com/blog/state-of-the-art-open-source-chatbot">built</a> and open-sourced Blender, an AI-based model which they refer to as the <em>largest-ever open-domain chatbot</em>. Following the success of <a href="https://arxiv.org/abs/2001.09977">Meena</a> (a recent conversational AI system proposed by Google), they proposed a model that blends conversational skills like empathy and personality to improve the generated conversation quality. The model was trained using a Transformer-based model (with up to 9.4 billion parameters) on ~1.5 billion training samples. Then it was fine-tuned using a dataset (<a href="https://arxiv.org/abs/2004.08449">Blended Skill Talk</a>) that aims to provide the identified desirable traits that could improve the conversational abilities of the model. The authors claim that the model is able to generate responses that human evaluators deemed more human than those generated by Meena.</p>
<p><br />
<strong><em>TLDR: Extreme Summarization of Scientific Documents</em></strong></p>
<p><br />
This <a href="https://arxiv.org/abs/2004.15011">paper</a> proposes an approach, including a dataset (SCITLDR), for the novel task of <em>TLDR generation</em> of scientific papers. In this work, TLDRs are defined as an alternative and compact summarization of the scientific article. TLDRs, as suggested by the authors, can serve as a way to quickly understand what a paper is about and potentially help the reader decide whether they want to continue reading the paper. Due to variation in human-generated summaries, multiple TLDRs are obtained from experts via a peer review style. A BART-based model with a multitask fine-tuning schedule (including title generation and TLDR generation) was used for the final task.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*eeRZ4T1CG0lhgHr_ia-8MA.png" alt="" /></p>
<p><em><a href="https://arxiv.org/abs/2004.15011">Cachola et al. (2020)</a></em></p>
<p><br />
<strong><em>WT5?! Training Text-to-Text Models to Explain their Predictions</em></strong></p>
<p><br />
<em>Text-to-Text Transfer Transformer</em> (<a href="https://arxiv.org/abs/1910.10683">T5</a>) is a method that brings together all the recent improvements in NLP transfer learning models into one unified framework. This work proposes that most NLP tasks can be formulated in a text-to-text format, suggesting that both the inputs and outputs are texts. The authors claim that this “<em>framework provides a consistent training objective both for pre-training and fine-tuning</em>”. T5 is essentially an encoder-decoder Transformer that applies various improvements in particular to the attention components of the model. The model was pre-trained on a newly released dataset called <a href="https://www.tensorflow.org/datasets/catalog/c4">Colossal Clean Crawled Corpus</a> and achieved SOTA results on NLP tasks such as summarization, question answering, and text classification.</p>
<p><br />
New follow-up work called <a href="https://arxiv.org/abs/2004.14546">WT5</a> (shorthand for “Why, T5?”) fine-tunes a Transformer-based T5 model to produce explanations to the predictions it makes. This can help to provide more understanding of why a model is making certain predictions. The model is fed examples with target explanations and with only target labels. The input text, which includes a task prefix (e.g. sentiment) and the actual text can also have an “explain” tag prepended (see example in the figure below). This enables semi-supervised learning where fully labeled data is provided to the model and only limited examples have the explanation tags. The authors report quantitative and qualitative results demonstrating that their approach achieves state-of-the-art results on explainability datasets including the ability to perform well in out-of-domain data. This work presents an interesting basic model that can be used to better understand the predictions of text-based models but as the authors emphasize the approach is only a surface-level improvement of interpretability and that there is room for improvements.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*5TFSiu_G6ofmwM9FhpokHQ.png" alt="" /></p>
<p><em>Narang et al. (2020)</em></p>
<h1 id="tools-and-datasets-️">Tools and Datasets ⚙️</h1>
<p><strong><em>NVIDIA’s Medical Imaging Framework</em></strong></p>
<p><br />
<a href="https://blogs.nvidia.com/blog/2020/04/21/monai-open-source-framework-ai-healthcare/?ncid=so-twit-79443#cid=ix11_so-twit_en-us">MONAI</a> is a medical imaging AI framework to support scientific development in healthcare. As reported in the release notes, MONAI aims to provide a user-friendly and domain-optimized library for dealing with healthcare-data. Similar to other libraries, it also provides domain-specific data processing and transformation tools, neural network models commonly used in the space, including access to evaluation methods and the ability to reproduce results.</p>
<p><br />
<strong><em>A Python Game Boy Emulator</em></strong></p>
<p><br />
<a href="https://github.com/Baekalfen/PyBoy">PyBoy</a> is a tool built with Python to help interfacing with Game Boy hardware. It even includes an experimental wrapper to train an AI-based agent that interacts with the game.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*WDJdaEyRjK660-0b6KvVjQ.png" alt="" /></p>
<p><br />
<strong><em>Jupyter Notebooks as PDF</em></strong></p>
<p><br />
Have you ever wanted to properly render your notebooks as PDFs? Check out this Jupyter <a href="https://github.com/betatim/notebook-as-pdf">extension</a> written by Tim Head that lets your produce PDFs from your notebooks with the least requirements in terms of plugins and allowing notebooks to be attached to the PDF for reproducibility.</p>
<p><br />
<strong><em>On Building More Realistic Conversational AI systems</em></strong></p>
<p><br />
Transformers now include <a href="https://huggingface.co/transformers/model_doc/dialogpt.html">DialoGPT</a> giving access to the first conversational response model available in the library. <a href="https://www.microsoft.com/en-us/research/publication/dialogpt-large-scale-generative-pre-training-for-conversational-response-generation/">DialoGPT</a> is a large-scale neural conversational response generation model proposed by Microsoft. It differs from the previous models that depend on general text data such as wiki and news since it uses massive amounts of conversations extracted from Reddit comments. DialoGPT is based on the GPT-based autoregressive language model and aims to provide large-scale pretraining for response generation and enabling conversational AI more representative of human interaction.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*HTtADQcR20iRxvPh3DhJ2Q.png" alt="" /></p>
<p><br />
<strong><em>TorchServe and [TorchElastic for Kubernetes], new PyTorch libraries for serving and training models at scale</em></strong></p>
<p><br />
<a href="https://medium.com/pytorch/torchserve-and-torchelastic-for-kubernetes-new-pytorch-libraries-for-serving-and-training-models-2efd12e09adc">TorchServe</a> is an open-source library that allows developers to train and serve their models while aiming to reduce friction in the process. The tool is built on top of PyTorch and allows developers to deploy their model as jobs using AWS. Torchserve is meant to be the canonical way to serve trained models providing features such as secure deployment, clean inference APIs, logging and real-time metrics of inference service, and easy model management.</p>
<p><br />
<strong><em>MLSUM: The Multilingual Summarization Corpus</em></strong></p>
<p><br />
To encourage and strengthen multilingual research in NLP, Thomas Scialom and other researchers recently <a href="https://arxiv.org/abs/2004.14900">proposed</a> a multilingual summarization corpus. The dataset was obtained from newspapers and contains ~1.5 million articles in French, German, Spanish, Russian, and Turkish.</p>
<p><br />
<strong><em>Made with ML</em></strong></p>
<p><br />
In case you missed it, Goku Mohandas has built a website called Made with ML that aims to provide a tool to discover relevant and interesting ML projects. It’s a platform that allows makers to share their projects with the community. A recent upgrade to the website includes a section that provides carefully <a href="https://madewithml.com/topics/">curated topics</a> that can help users to quickly find relevant projects.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*eoyqzd6XYVBOqOnU_jqjNA.png" alt="" /></p>
<h1 id="articles-and-blog-posts-️">Articles and Blog posts ✍️</h1>
<p><strong><em>What’s new for Transformers at the ICLR 2020 Conference?</em></strong></p>
<p><br />
One of the premier conferences in machine learning, ICLR, had to be held virtually this year due to the travel restrictions imposed by countries all over the world. Being a top conference there is always expectations of novel work, especially improvements on previous works that have been considered groundbreaking. As an example, Transformers have shown to produce state-of-the-art results on a variety of NLP tasks and there were a couple of accepted works at ICLR proposing ways to improve such models.</p>
<p><br />
This <a href="https://towardsdatascience.com/whats-new-for-transformers-at-the-iclr-2020-conference-4285a4294792">article</a> summarizes some of the works related to Transformers which include architectural revisions (e.g. <a href="https://openreview.net/pdf?id=H1eA7AEtvS">ALBERT</a>, <a href="https://openreview.net/pdf?id=rkgNKkHtvB">Reformer</a>, and <a href="https://openreview.net/pdf?id=r1eIiCNYwS">Transformer-XH</a>), novel learning procedures (e.g. <a href="https://openreview.net/pdf?id=r1xMH1BtvB">ELECTRA</a> and <a href="https://openreview.net/pdf?id=BJlzm64tDH">Pretrained Encyclopedia</a>), and improving other domains such as large-scale retrieval, text generation, and visual-linguistic representations. One <a href="https://openreview.net/pdf?id=HJlnC1rKPB">interesting paper</a> even provides a detailed analysis describing the common aspects of self-attention and convolutional layers, with interesting findings suggesting that Transformer architectures are a potential generalization of CNNs.</p>
<p><br />
If you are interested to know more about other works published in ICLR this year, you can check out the <a href="https://paperswithcode.com/conference/iclr-2020-1/official">Papers with Code website</a> for those papers.</p>
<p><br />
ICLR just made all the <a href="https://iclr.cc/virtual_2020/papers.html?filter=keywords">conference talks</a> available as open-access.</p>
<p><br />
<strong><em>The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies</em></strong></p>
<p><br />
Reinforcement learning has allowed AI labs to produce some of the most groundbreaking advancements in the field of AI. In an effort to tackle global problems with AI systems, specifically tax policy design, a group of researchers proposed a reinforcement learning framework (<a href="https://blog.einstein.ai/the-ai-economist/">AI economist</a>) that aims to learn <em>dynamic tax policies</em> purely through simulation and data-driven solutions. Some of the improvements obtained by the AI Economist show promising results and schedules that could lead to a framework that potentially improves social outcomes and the state of economic inequality.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*erIkiJKxa6jJgJEyNAnvbA.png" alt="" /></p>
<p><br />
<strong><em>On bringing common-sense reasoning abilities to AI systems</em></strong></p>
<p><br />
It is argued that one of the capabilities lacking in many of today’s AI systems is common-sense reasoning. This detailed <a href="https://www.quantamagazine.org/common-sense-comes-to-computers-20200430/">article</a> provides a brief history of this problem and how researchers working on the cutting-edge are beginning to make progress in this aspect of the field. Not surprisingly, many of the recent efforts include the building of knowledge bases to teach a neural network (specifically language models) to learn faster and more efficiently about the world. This can be considered as an effort to combine symbolic reasoning with neural networks to deal with the problems of <em>coverage</em> and model <em>brittleness</em>.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*X6kfr8dyhvhjhQAPD-oh2A.png" alt="" /></p>
<p><em><a href="https://arxiv.org/abs/1906.05317">COMET</a> — Bosselut et al. (2019)</em></p>
<p><br />
<strong><em>Keeping up with the BERTs: a review of the main NLP benchmarks</em></strong></p>
<p><br />
What can NLP do better than humans and where is there still room for improvement? In a <a href="https://creatext.ai/blog-posts/nlp-benchmarking-superglue-xtreme">recent blog post</a>, Manuel Tonneau reviews model performance on the GLUE benchmark, identifying tasks where NLP systems excel already and others on which humans still have the lead. The SuperGLUE and XTREME benchmarks are also presented as an initiative to set the bar higher and further motivate research on new tasks and new languages.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*7tD3nxDuzazUFGSWNZOXWQ.png" alt="" /></p>
<p><em><a href="https://super.gluebenchmark.com/leaderboard/">SuperGLUE benchmark</a></em></p>
<p><br />
<strong><em>Benchmarking Triton (TensorRT) Inference Server for Transformer Models</em></strong></p>
<p><br />
This detailed <a href="https://blog.einstein.ai/benchmarking-tensorrt-inference-server/">blog post</a> discusses interesting benchmarking experiments for serving Transformer-based language models for production use. The authors use <a href="https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/index.html">NVIDIA’s Triton (TensorRT) Inference Server</a> for hosting the models and experiment with different configurations and setup to provide comparable results between TensorFlow and PyTorch served models. The report includes results obtained on the different aspects of the model serving such as the latency with concurrency, throughput with concurrency, and other configurations involving batch size and sequence length. Many aspects of model serving are missing in the report but the authors are interested in testing with model versioning and different tasks such as object detection. Such guides provide best practices and techniques for benchmarking models that are useful for practitioners putting their models in production.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*zTIAB4-Q4TdEX32RpEC-hg.png" alt="" /></p>
<p><em>Latency and throughput for different models — <a href="https://blog.einstein.ai/benchmarking-tensorrt-inference-server/">source</a></em></p>
<p><br />
<strong><em>A Visual Guide to Recurrent Layers in Keras</em></strong></p>
<p><br />
This <a href="https://amitness.com/2020/04/recurrent-layers-keras/">article</a> by Amit Chaudhary provides a visual explanation of recurrent layers available in Keras and the effect of various arguments on the input and output. This is meant to provide a better understanding of how to interact with Keras RNN layers when preparing and processing the data. A useful tutorial for beginners interested in modeling language with RNN models.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*XvJRVz993m5TaOTilodcmg.png" alt="" /></p>
<h1 id="education-">Education 🎓</h1>
<p><strong><em>Practical Deep Learning Book for Cloud, Mobile & Edge</em></strong></p>
<p><br />
If you are interested in taking your deep learning models to the cloud, mobile, and edge devices, this is a relevant <a href="https://www.practicaldeeplearning.ai/">book</a> written by Anirudh Koul, Siddha Ganju, and Meher Kasam. The book is titled “Practical Deep Learning Book for Cloud, Mobile & Edge” and consists of topics that range from tuning and deploying your computer vision models to an introduction of 40+ industry case studies to the use of transfer learning to train models quickly.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*XE_--TTGI9fQCTij.jpg" alt="" /></p>
<p><br />
<strong><em>ML courses</em></strong></p>
<ul>
<li>Stanford has made available a newly recorded set of <a href="https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU">videos</a> of the ML course taught by Andrew Ng. This course provides content that could serve well for students getting started in the world of machine learning.</li>
<li>As we move ML and NLP systems into production for real-world use, it becomes crucial for building more trustworthy and privacy-preserving systems. This <a href="https://cseweb.ucsd.edu/classes/sp20/cse291-b/index.html">course</a> covers topics in <em>trustworthy machine learning</em>.</li>
<li>Thomas Wolf recorded this comprehensive <a href="https://www.youtube.com/watch?v=G5lmya6eKtc">video-based summary</a> explaining the recent trends and future topics in transfer learning for NLP.</li>
</ul>
<p><br />
<strong><em>Learning about GAN</em></strong></p>
<p><br />
This <a href="https://www.youtube.com/watch?v=1CT-kxjYbFU&feature=youtu.be">video lecture</a> by Pieter Abbeel provides a comprehensive overview of generative adversarial networks (GANs) which are being used for all sorts of creative applications today, from generating realistic images to digital painting. The lecture is part of the <a href="https://sites.google.com/view/berkeley-cs294-158-sp20/home">Deep Unsupervised Learning</a> course currently being delivered at UC Berkley. See the outline of the lecture below.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*1BKlAYDwyUheMMRl59TFFQ.png" alt="" /></p>
<p><br />
<strong><em>Differential Calculus for Deep Learning</em></strong></p>
<p><br />
Aurélien Geron shares an interesting Colab <a href="https://colab.research.google.com/github/ageron/handson-ml2/blob/master/math_differential_calculus.ipynb#scrollTo=mnywx0pgMCLA">notebook</a> that aims to introduce the basic concepts of differential calculus such as derivatives, partial derivatives, and gradients. These topics are all important in the field of deep learning and Geron summarizes the concepts along with implementations including easy to understand visualizations to guide the learner. He also recommends looking at another <a href="https://github.com/ageron/handson-ml2/blob/master/extra_autodiff.ipynb">notebook</a> on auto-differentiation.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*NU6s4j-GE5PjaDsCsgcuNw.png" alt="" /></p>
<h1 id="noteworthy-mentions-️">Noteworthy Mentions ⭐️</h1>
<ul>
<li>Andrej Karpathy <a href="https://www.youtube.com/watch?v=hx7BXih7zx8&feature=youtu.be">shares</a> some of the recent developments in AI technology related to their efforts towards full-serving driving at Tesla. Topics include modeling of HydraNets, data engines, evaluation metrics, and how to efficiently perform inference on these large-scale neural network models.</li>
<li>This is a neat <a href="https://github.com/Machine-Learning-Tokyo/Interactive_Tools">repository</a> prepared by MLT containing a list of interactive tools for machine learning, deep learning, and mathematics.</li>
<li>A recent <a href="https://arxiv.org/abs/2004.08900">paper</a> aims to provide a concise overview of the costs associated with training large NLP models and how to derive these costs.</li>
<li>Recently, Springer has made freely available 100s of books with titles ranging from maths to deep learning. This <a href="https://towardsdatascience.com/springer-has-released-65-machine-learning-and-data-books-for-free-961f8181f189">article</a> summarizes some of the machine learning related books that are available to download for free.</li>
<li>Kra-Mania is a simple question-answering app built with <a href="https://github.com/deepset-ai/haystack">Haystack</a> (tools for question answering and search) using an open QA dataset built from the Seinfeld show. This <a href="https://colab.research.google.com/drive/17kZqK2i0CYzR6ZDjL6ULEtpDUOYwQAbK">tutorial</a> shows how easy it is to build QA pipelines with the library. And this <a href="https://kra-mania.firebaseapp.com/">link</a> takes you the demo app.</li>
<li>Explainability is the process by which researchers aim to better understand deep neural networks. It’s an important and active area of study as AI systems are being used in real-world critical domains. This <a href="https://arxiv.org/abs/2004.14545">paper</a> provides a <em>“field guide” to deep learning explainability for the uninitiated</em>.</li>
<li>Here is a short <a href="https://ai.stanford.edu/blog/data-augmentation/">survey</a> describing recent works on data augmentation which has recently become a popular area of study in ML and NLP.</li>
<li>In the previous newsletter, we featured the Longformer a variation of the Transformer which improves performance on various NLP tasks, particularly for longer documents. In this <a href="https://www.youtube.com/watch?v=_8KNb5iqblE&t=463s">video</a>, Yannic Kilcher provides a great explanation of the novelty proposed in this work.</li>
</ul>
<p><br />
If you have any finished datasets, projects, blog posts, tutorials, or papers that you wish to share in the next issue of the NLP Newsletter, please submit them directly using this <a href="https://forms.gle/3b7Q2w2bzsXE6uYo9">form</a>.</p>
<p><br />
<a href="https://dair.ai/newsletter/"><em>Subscribe</em></a> <em>🔖 to the NLP Newsletter to receive future issues in your inbox.</em></p>
<p><a href="https://dair.ai/posts/NLP_Newsletter_11_en/">NLP Newsletter #11 [EN]: Jukebox, HybridQA, TLDR generation, Blender: the SOTA Chatbot, TorchServe, AI Economist, WT5,...</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on May 04, 2020.</p>
https://dair.ai/posts/NLP_Newsletter_#11_[FR]2020-05-04T00:00:00+00:002020-05-04T00:00:00+00:00Loïck BOURDOIShttps://dair.ai
<p><img src="https://cdn-images-1.medium.com/max/1200/1*X_c0mVECV9rtl6ozuE-oPA.png" alt="" /></p>
<h1 id="avant-propos-delvis">Avant-propos d’Elvis</h1>
<p>Bienvenue au onzième numéro de la lettre d’information consacrée au NLP.
\</p>
<p><strong><em>Quelques mises à jour sur la lettre d’information sur le NLP et sur dair.ai :</em></strong>
\</p>
<ul>
<li>
<p>Nous avons publié un <a href="https://github.com/dair-ai/emotion_dataset">jeu de données</a> qui peuvent être utilisées pour la recherche d’émotions basée sur des textes. Le répertoire comprend un <a href="https://colab.research.google.com/drive/1nwCE6b9PXIKhv2hvbqf1oZKIGkXMTi1X#scrollTo=t23zHggkEpc-">notebook</a>) qui montre comment fine-tuner les modèles BERT pré-entraînés pour la tâche de classification des émotions. Plus récemment, un modèle a été fine-tuneé sur notre jeu de données et <a href="https://huggingface.co/mrm8488/distilroberta-base-finetuned-sentiment">hébergé</a> sur HuggingFace, permettant une intégration simple à une pipeline de NLP.</p>
</li>
<li>
<p>Nous avons récemment tenu notre toute première séance de lecture de d’articles. Plus de 124 personnes se sont inscrites et une grande partie de ce groupe a participé à l’événement à distance. La première discussion a porté sur le document du <a href="https://arxiv.org/abs/1910.10683%27">T5</a>. Nous organisons une deuxième session où nous aurons une discussion approfondie sur le document. Pour en savoir plus sur les prochains événements, rejoignez notre groupe <a href="https://www.meetup.com/dair-ai">Meetup</a>, ou participez à la discussion dans notre <a href="https://join.slack.com/t/dairai/shared_invite/zt-dv2dwzj7-F9HT047jIGkunNKv88lQ~g">groupe Slack</a>.</p>
</li>
</ul>
<h1 id="publications-">Publications 📙</h1>
<p><strong><em>OpenAI’s Jukebox</em></strong></p>
<p><br />
Le dernier travail dévoilé par OpenAI s’appelle Jukebox et est essentiellement une architecture de réseau neuronal entraîné pour générer de la musique (à partir de zéro) dans divers genres et styles artistiques. Le modèle, basé sur une approche de quantification appelée VQ-VAE, est alimenté par le genre, l’artiste et les paroles et produit un nouvel échantillon audio. L’idée est de traiter et de compresser de longues entrées audio brutes via un auto-encodeur à plusieurs niveaux et de réduire la dimensionnalité tout en préservant les informations musicales essentielles. Par la suite, des transformers sont utilisés pour générer des codes qui sont ensuite reconstruits en audio brut via le décodeur VQ-VAE. Plus de détails sur ce travail est disponible sur le <a href="https://openai.com/blog/jukebox/">blog d’OpenAI</a> ou dans <a href="https://cdn.openai.com/papers/jukebox.pdf">l’article complet</a>.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*JvZuEnM8O4B2PmYSi3RivA.png" alt="" /></p>
<p><br />
<strong><em>HybridQA : A Dataset of Multi-Hop Question Answering over</em></strong></p>
<p><br />
Jusqu’à présent, la plupart des jeux de données répondant aux questions portent sur des informations homogènes. <a href="https://github.com/wenhuchen/HybridQA">HybridQA</a> est un jeu de données de réponse aux questions à grande échelle destiné à encourager la recherche et les méthodes qui nécessitent un raisonnement sur des informations hétérogènes. L’ensemble de données se compose d’un tableau Wikipédia structuré et d’informations non structurées sous la forme d’entités se liant à des corpus de forme libre. Les auteurs introduisent également deux baselines permettant de souligner les avantages de travailler avec des informations hétérogènes par rapport à l’utilisation d’informations homogènes. Cependant, ils soulignent que les résultats sont loin derrière la performance humaine et que cela nécessite des systèmes d’assurance qualité qui peuvent mieux raisonner sur des informations hétérogènes.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*ojEoPUGxzskGUc1F.png" alt="" /></p>
<p><em><a href="https://github.com/wenhuchen/HybridQA">Source</a></em></p>
<p><br />
<strong><em>Un chatbot open-source de pointe</em></strong></p>
<p><br />
Facebook AI a <a href="https://ai.facebook.com/blog/state-of-the-art-open-source-chatbot">construit</a> et a mis en open source Blender, un modèle basé sur l’IA qu’ils appellent le <em>plus grand chatbot à domaine ouvert</em>. Suite au succès de <a href="https://arxiv.org/abs/2001.09977">Meena</a> (un récent système d’IA conversationnelle proposé par Google), ils ont proposé un modèle qui mélange les compétences conversationnelles comme l’empathie et la personnalité afin d’améliorer la qualité de la conversation générée. Le modèle a été entraîné à l’aide d’un modèle basé sur un transformer (jusqu’à 9,4 milliards de paramètres) sur environ 1,5 milliard d’échantillons d’entraînement. Il a ensuite été fine-tuné à l’aide d’un jeu de données (<a href="https://arxiv.org/abs/2004.08449">Blended Skill Talk</a>) qui vise à fournir les traits souhaitables identifiés qui pourraient améliorer les capacités conversationnelles du modèle. Les auteurs affirment que le modèle est capable de générer des réponses que les évaluateurs humains ont jugées plus humaines que celles générées par Meena.</p>
<p><br />
<strong><em>TLDR : résumé extrême d’articles scientifiques</em></strong></p>
<p><br />
Ce <a href="https://arxiv.org/abs/2004.15011">document</a> propose une approche, y compris un jeu de données (SCITLDR), pour la nouvelle tâche de <em>génération de TLDR</em> d’articles scientifiques.
TLDR étant les initiales de « too long ; didn’t read » en anglais. Ce sigle est utilisé pour indiquer que ce qui suit est un résumé du texte trop long.
Dans ce travail, les TLDR sont définis comme une alternative et un résumé compact de l’article scientifique. Les TLDR, comme le suggèrent les auteurs, peuvent servir de moyen de comprendre rapidement le sujet d’un article et éventuellement aider le lecteur à décider s’il veut continuer à lire l’article. Pour la tâche finale, un modèle basé sur BART avec un fine-tuning multitâche (incluant la génération de titres et la génération de TLDR) a été utilisé.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*eeRZ4T1CG0lhgHr_ia-8MA.png" alt="" /></p>
<p><em><a href="https://arxiv.org/abs/2004.15011">Cachola et al. (2020)</a></em></p>
<p><br />
<strong><em>WT5 ?! Entraînement des modèles de texte à texte pour expliquer leurs prévisions</em></strong></p>
<p><br />
Un nouveau travail appelé <a href="https://arxiv.org/abs/2004.14546">WT5</a> (abréviation de “Why, T5 ?”) fine-tune un modèle T5 de Google pour produire des explications aux prévisions qu’il fait. Cela peut aider à mieux comprendre pourquoi un modèle fait certaines prédictions. Le modèle est alimenté par des exemples avec des explications cibles et des labels cibles. Le texte d’entrée, qui comprend un préfixe de tâche (par exemple, sentiment) et le texte réel peuvent également être précédés d’une étiquette “explain” (voir l’exemple dans la figure ci-dessous). Cela permet un apprentissage semi-supervisé où des données entièrement étiquetées sont fournies au modèle et où seuls des exemples limités ont les balises d’explication. Les auteurs font état de résultats quantitatifs et qualitatifs démontrant que leur approche permet d’obtenir des résultats de pointe sur les ensembles de données d’explicabilité, y compris la capacité à obtenir de bons résultats dans les données hors domaine. Ce travail présente un modèle de base intéressant qui peut être utilisé pour mieux comprendre les prédictions des modèles basés sur le texte mais, comme le soulignent les auteurs, l’approche n’est qu’une amélioration superficielle de l’interprétabilité et qu’il est possible de l’améliorer.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*5TFSiu_G6ofmwM9FhpokHQ.png" alt="" /></p>
<p><em>Narang et al. (2020)</em></p>
<h1 id="outils-et-jeux-de-données-️">Outils et jeux de données ⚙️</h1>
<p><strong><em>NVIDIA’s Medical Imaging Framework</em></strong></p>
<p><br />
<a href="https://blogs.nvidia.com/blog/2020/04/21/monai-open-source-framework-ai-healthcare/?ncid=so-twit-79443#cid=ix11_so-twit_en-us">MONAI</a> est un framework d’IA en imagerie médicale destiné à soutenir le développement scientifique dans le domaine des soins de santé. Comme indiqué dans les notes de publication, MONAI vise à fournir une bibliothèque conviviale et optimisée pour le traitement des données relatives aux soins de santé. Comme d’autres bibliothèques, elle fournit également des outils de traitement et de transformation des données spécifiques à un domaine, des modèles de réseaux neuronaux couramment utilisés dans l’espace, y compris l’accès à des méthodes d’évaluation et la possibilité de reproduire les résultats.</p>
<p><br />
<strong><em>Un émulateur Python pour Game Boy</em></strong></p>
<p><br />
<a href="https://github.com/Baekalfen/PyBoy">PyBoy</a> est un outil construit en Python capable de gérer une interface Game Boy. Il comprend aussi une enveloppe expérimentale pour entraîner un agent basé sur l’IA qui interagit avec le jeu.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*WDJdaEyRjK660-0b6KvVjQ.png" alt="" /></p>
<p><br />
<strong><em>Jupyter Notebooks en PDF</em></strong></p>
<p><br />
Avez-vous déjà voulu convertir vos notebooks en format PDF ? Cette <a href="https://github.com/betatim/notebook-as-pdf">extension de Jupyter</a> écrite par Tim Head vous permet de produire des PDF à partir de vos ordinateurs portables avec le moins d’exigences possible en termes de plugins et permet de joindre les ordinateurs portables au PDF pour la reproductibilité.</p>
<p><br />
<strong><em>Sur la mise en place de systèmes d’IA conversationnelle plus réalistes</em></strong></p>
<p><br />
La librairie Transformers comprend maintenant <a href="https://huggingface.co/transformers/model_doc/dialogpt.html">DialoGPT</a>. <a href="https://www.microsoft.com/en-us/research/publication/dialogpt-large-scale-generative-pre-training-for-conversational-response-generation/">DialoGPT</a> est un modèle de génération de réponse conversationnelle neuronale à grande échelle proposé par Microsoft. Il diffère des modèles précédents qui dépendent de données textuelles générales telles que Wikipédia et les articles de presse, car il utilise des quantités massives de conversations extraites des commentaires de Reddit. DialoGPT est basé sur le modèle de langage autorégressif basé sur le GPT et vise à fournir un pré-entraînement à grande échelle pour la génération de réponses et de permettre ainsi une IA conversationnelle plus représentative de l’interaction humaine.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*HTtADQcR20iRxvPh3DhJ2Q.png" alt="" /></p>
<p><br />
<strong><em>TorchServe et [TorchElastic for Kubernetes], nouvelles librairies PyTorch pour servir et entraîner des modèles à l’échelle</em></strong></p>
<p><br />
<a href="https://medium.com/pytorch/torchserve-and-torchelastic-for-kubernetes-new-pytorch-libraries-for-serving-and-training-models-2efd12e09adc">TorchServe</a> est une librairie open-source qui permet aux développeurs d’entraîner leurs modèles tout en visant à réduire les frictions dans le processus. L’outil est construit sur PyTorch et permet aux développeurs de déployer leurs modèles en tant que travaux en utilisant AWS. Torchserve est conçu comme la manière canonique de servir les modèles entraînés en fournissant des fonctionnalités telles que le déploiement sécurisé, des API d’inférence, les mesures en temps réel du service d’inférence, et une gestion facile des modèles.</p>
<p><br />
<strong><em>MLSUM : Le corpus multilingue de résumés</em></strong></p>
<p><br />
Pour encourager et renforcer la recherche multilingue en NLP, des chercheurs de ReciTAL et du CNRS ont récemment <a href="https://arxiv.org/abs/2004.14900">proposé</a> un corpus de résumés multilingues. L’ensemble de données a été obtenu à partir de journaux et contient environ 1,5 million d’articles en français, allemand, espagnol, russe et turc.</p>
<p><br />
<strong><em>Made with ML</em></strong></p>
<p><br />
Au cas où vous l’auriez manqué, Goku Mohandas a construit un site web appelé « Made with ML » qui vise à fournir un outil pour découvrir des projets ML pertinents et intéressants. Il s’agit d’une plateforme qui permet aux créateurs de partager leurs projets avec la communauté. Une récente mise à jour du site web comprend une section qui fournit des <a href="https://madewithml.com/topics/">sujets</a> soigneusement sélectionnés qui peuvent aider les utilisateurs à trouver rapidement des projets pertinents.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*eoyqzd6XYVBOqOnU_jqjNA.png" alt="" /></p>
<h1 id="articles-et-blog-️">Articles et Blog ✍️</h1>
<p><strong><em>Quelles sont les nouveautés pour les Transformers lors de la conférence ICLR 2020 ?</em></strong></p>
<p><br />
L’une des plus importantes conférences sur l’apprentissage automatique, l’ICLR, a dû se tenir virtuellement cette année en raison des restrictions de voyage imposées par les pays du monde entier.
Voici quelques articles présentés lors de cette conférence.</p>
<p><br />
Cet [article] (https://towardsdatascience.com/whats-new-for-transformers-at-the-iclr-2020-conference-4285a4294792) résume certains des travaux relatifs aux Transformers qui comprennent des révisions architecturales (par exemple <a href="https://openreview.net/pdf?id=H1eA7AEtvS">ALBERT</a>, <a href="https://openreview.net/pdf?id=rkgNKkHtvB">Reformer</a> et <a href="https://openreview.net/pdf?id=r1eIiCNYwS">Transformer-XH</a>), de nouvelles procédures d’apprentissage (par exemple <a href="https://openreview.net/pdf?id=r1xMH1BtvB">ELECTRA</a> et <a href="https://openreview.net/pdf?id=BJlzm64tDH">Pretrained Encyclopedia</a>) et l’amélioration d’autres domaines tels que la recherche à grande échelle, la génération de texte et les représentations visuelles et linguistiques. Un <a href="https://openreview.net/pdf?id=HJlnC1rKPB">document</a> fournit une analyse détaillée décrivant les aspects communs des couches d’auto-attention et convolutionnelles, avec des résultats intéressants suggérant que les architectures de Transformers sont une généralisation potentielle des CNN.</p>
<p><br />
Si vous souhaitez en savoir plus sur d’autres travaux publiés dans le cadre de la CIRL cette année, vous pouvez consulter le site <a href="https://paperswithcode.com/conference/iclr-2020-1/official">Papers with Code website</a>.</p>
<p><br />
Enfin, l’ICLR vient de mettre en libre accès toutes les <a href="https://iclr.cc/virtual_2020/papers.html?filter=keywords">conférences</a>.</p>
<p><br />
<strong><em>AI Economist : Améliorer l’égalité et la productivité grâce à des politiques fiscales axées sur l’IA</em></strong></p>
<p><br />
Un groupe de chercheurs a proposé un framework d’apprentissage par renforcement (<a href="https://blog.einstein.ai/the-ai-economist/">AI Economist</a>) qui vise à apprendre les <em>politiques fiscales dynamiques</em> uniquement par la simulation et les solutions basées sur des données. Certaines des améliorations obtenues par l’AI Economist montrent des résultats et des calendriers prometteurs qui pourraient déboucher sur un cadre susceptible d’améliorer les résultats sociaux et l’état des inégalités économiques.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*erIkiJKxa6jJgJEyNAnvbA.png" alt="" /></p>
<p><br />
<strong><em>Sur l’apport de capacités de raisonnement de bon sens aux systèmes d’IA</em></strong></p>
<p><br />
L’une des capacités qui font défaut dans de nombreux systèmes d’IA actuels est le raisonnement de bon sens. Cet <a href="https://www.quantamagazine.org/common-sense-comes-to-computers-20200430/">article</a> présente un bref historique de ce problème et explique comment les chercheurs commencent à progresser dans ce domaine. Un bon nombre des efforts récents comprennent la création de bases de connaissances pour entraîner un réseau neuronal (en particulier des modèles de langage) afin d’apprendre plus rapidement et plus efficacement sur le monde. Cela peut être considéré comme un effort pour combiner le raisonnement symbolique avec les réseaux de neurones afin de traiter les problèmes de <em>couverture</em> et de <em>bruitage</em> des modèles.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*X6kfr8dyhvhjhQAPD-oh2A.png" alt="" /></p>
<p><em><a href="https://arxiv.org/abs/1906.05317">COMET</a> — Bosselut et al. (2019)</em></p>
<p><br />
<strong><em>Un examen des principaux critères de référence du NLP</em></strong></p>
<p><br />
Qu’est-ce que le NLP peut faire de mieux que les humains et où y a-t-il encore des possibilités d’amélioration ? Dans un <a href="https://creatext.ai/blog-posts/nlp-benchmarking-superglue-xtreme">récent billet de blog</a>, Manuel Tonneau examine les performances du modèle par rapport au benchmark GLUE, en identifiant les tâches où les systèmes de NLP excellent déjà et celles où les humains ont encore une longueur d’avance. Les références SuperGLUE et XTREME sont également présentées comme une initiative visant à placer la barre plus haut et à motiver davantage la recherche sur de nouvelles tâches et de nouveaux langages.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*7tD3nxDuzazUFGSWNZOXWQ.png" alt="" /></p>
<p><em><a href="https://super.gluebenchmark.com/leaderboard/">SuperGLUE benchmark</a></em></p>
<p><br />
<strong><em>Serveur d’inférence Triton (TensorRT) pour les modèles de Transformers</em></strong></p>
<p><br />
Dans ce <a href="https://blog.einstein.ai/benchmarking-tensorrt-inference-server/">billet de blog</a> les auteurs utilisent <a href="https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/index.html">le serveur d’inférence Triton (TensorRT) de NVIDIA</a> pour héberger les modèles et expérimenter avec différentes configurations afin de fournir des résultats comparables entre les modèles desservis par TensorFlow et PyTorch. Le rapport comprend les résultats obtenus sur les différents aspects du modèle de service, tels que la latence avec concurrence, le débit avec concurrence, et d’autres configurations impliquant la taille des batchs et la longueur de la séquence. De nombreux aspects du service de modèle sont manquants dans le rapport mais les auteurs sont intéressés par des tests avec le versionnement de modèle et différentes tâches telles que la détection d’objets. Ces guides fournissent les meilleures pratiques et techniques d’évaluation des modèles qui sont utiles aux personnes qui mettent leurs modèles en production.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*zTIAB4-Q4TdEX32RpEC-hg.png" alt="" /></p>
<p><em>Latence et débit pour différents modèles — <a href="https://blog.einstein.ai/benchmarking-tensorrt-inference-server/">source</a></em></p>
<p><br />
<strong><em>Un guide en Keras pour les couches récurrentes</em></strong></p>
<p><br />
Cet <a href="https://amitness.com/2020/04/recurrent-layers-keras/">article</a> d’Amit Chaudhary fournit une explication visuelle des couches récurrentes disponibles dans Keras et de l’effet de divers arguments sur l’entrée et la sortie. Cela vise à fournir une meilleure compréhension de la façon d’interagir avec les couches RNN de Keras lors de la préparation et du traitement des données. Un tutoriel utile pour les débutants intéressés par le langage de modélisation avec les modèles RNN.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*XvJRVz993m5TaOTilodcmg.png" alt="" /></p>
<h1 id="education-">Education 🎓</h1>
<p><strong><em>Livre d’apprentissage approfondi pour le cloud, le mobile et les périphériques</em></strong></p>
<p><br />
Si vous êtes intéressé par l’utilisation de vos modèles d’apprentissage approfondi dans le cloud, les téléphones mobiles et les périphériques, voici un <a href="https://www.practicaldeeplearning.ai/">livre</a> écrit par Anirudh Koul, Siddha Ganju et Meher Kasam. Le livre s’intitule “Practical Deep Learning Book for Cloud, Mobile & Edge” et traite de sujets allant du fine-tuning et du déploiement de vos modèles de vision par ordinateur à une introduction de plus de 40 études de cas de l’industrie, en passant par l’utilisation de l’apprentissage par transfert pour entraîner rapidement les modèles.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*XE_--TTGI9fQCTij.jpg" alt="" /></p>
<p><br />
<strong><em>Cours de ML</em></strong></p>
<ul>
<li>Stanford a mis à disposition un ensemble de <a href="https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU">vidéos</a> du cours de ML enseigné par Andrew Ng. Ce cours fournit un contenu qui pourrait être utile aux étudiants qui se lancent dans le monde de l’apprentissage automatique.</li>
<li>Alors que nous mettons en production des systèmes de ML et de NLP pour une utilisation dans le monde réel, il devient crucial de construire des systèmes plus fiables et préservant la vie privée. Ce <a href="https://cseweb.ucsd.edu/classes/sp20/cse291-b/index.html">cours</a> couvre les sujets de l’apprentissage machine fiable.</li>
<li>Thomas Wolf a enregistré une <a href="https://www.youtube.com/watch?v=G5lmya6eKtc">vidéo</a> qui explique les tendances récentes et les sujets futurs de l’apprentissage par transfert pour le NLP.</li>
</ul>
<p><br />
<strong><em>Cours sur les GAN</em></strong></p>
<p><br />
Cette <a href="https://www.youtube.com/watch?v=1CT-kxjYbFU&feature=youtu.be">conférence vidéo</a> de Pieter Abbeel donne un aperçu complet des GAN qui sont utilisés aujourd’hui pour toutes sortes d’applications créatives, de la production d’images réalistes à la peinture numérique. Cette conférence fait partie du cours <a href="https://sites.google.com/view/berkeley-cs294-158-sp20/home">Deep Unsupervised Learning</a> actuellement dispensé à l’université de Berkley. Voir le plan de la conférence ci-dessous.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*1BKlAYDwyUheMMRl59TFFQ.png" alt="" /></p>
<p><br />
<strong><em>Calcul différentiel pour l’apprentissage approfondi</em></strong></p>
<p><br />
Aurélien Geron partage un <a href="https://colab.research.google.com/github/ageron/handson-ml2/blob/master/math_differential_calculus.ipynb#scrollTo=mnywx0pgMCLA">notebook</a> qui vise à introduire les concepts de base du calcul différentiel tels que les dérivées, les dérivées partielles et les gradients. Ces sujets sont tous importants dans le domaine de l’apprentissage profond et Aurélien Géron résume les concepts ainsi que les mises en œuvre, y compris des visualisations faciles à comprendre pour guider l’apprenant. Il recommande également de consulter un autre <a href="https://github.com/ageron/handson-ml2/blob/master/extra_autodiff.ipynb">notebook</a> sur l’autodifférenciation.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*NU6s4j-GE5PjaDsCsgcuNw.png" alt="" /></p>
<h1 id="mentions-spéciales-️">Mentions spéciales ⭐️</h1>
<ul>
<li>Andrej Karpathy <a href="https://www.youtube.com/watch?v=hx7BXih7zx8&feature=youtu.be">partage</a> certains des développements récents chez Tesla afin d’aboutir à une conduite autonome complète. Les sujets abordés comprennent la modélisation des HydraNets, les moteurs de données, les mesures d’évaluation et la manière d’effectuer efficacement des inférences sur ces modèles de réseaux neuronaux à grande échelle.</li>
<li>Il s’agit d’un <a href="https://github.com/Machine-Learning-Tokyo/Interactive_Tools">dépôt</a> préparé par MLT contenant une liste d’outils interactifs pour l’apprentissage automatique, l’apprentissage approfondi et les mathématiques.</li>
<li>Un récent <a href="https://arxiv.org/abs/2004.08900">document</a> vise à fournir un aperçu concis des coûts associés à l’entraînement de grands modèles de NLP et la manière de calculer ces coûts.</li>
<li>Springer a mis à disposition gratuitement des centaines de livres dont les titres vont des mathématiques à l’apprentissage approfondi. Cet <a href="https://towardsdatascience.com/springer-has-released-65-machine-learning-and-data-books-for-free-961f8181f189">article</a> résume certains des livres relatifs à l’apprentissage machine qui peuvent être téléchargés gratuitement.</li>
<li>Kra-Mania est une application simple de question-réponse construite avec <a href="https://github.com/deepset-ai/haystack">Haystack</a> en utilisant un jeu de données d’assurance qualité ouvert construit à partir de l’exposition Seinfeld. Ce <a href="https://colab.research.google.com/drive/17kZqK2i0CYzR6ZDjL6ULEtpDUOYwQAbK">tutoriel</a> montre à quel point il est facile de construire des pipelines d’assurance qualité avec la librairie. Et ce <a href="https://kra-mania.firebaseapp.com/">lien</a> vous emmène à l’application de démonstration.</li>
<li>L’explicabilité est le processus par lequel les chercheurs visent à mieux comprendre les réseaux neuronaux profonds. Ce <a href="https://arxiv.org/abs/2004.14545">document</a> fournit un <em>“guide de terrain” sur l’explicabilité de l’apprentissage profond pour les non-initiés</em>.</li>
<li>Voici une petite <a href="https://ai.stanford.edu/blog/data-augmentation/">enquête</a> décrivant les travaux récents sur l’augmentation des données, qui est devenue un domaine d’étude populaire en ML et en NLP.</li>
<li>Dans la newsletter précédente, nous avons présenté Longformer, une variante du Transformer qui améliore les performances de diverses tâches de NLP, en particulier pour les documents longs. Dans cette <a href="https://www.youtube.com/watch?v=_8KNb5iqblE&t=463s">vidéo</a>, Yannic Kilcher explique la nouveauté proposée dans ce travail.</li>
</ul>
<hr />
<p>Vous pouvez retrouver la précédente newsletter <a href="https://dair.ai/NLP_Newsletter_-10_-FR/">ici</a></p>
<p><br />
Si vous avez des jeux de données, des projets, des articles de blog, des tutoriels ou des documents que vous souhaitez partager dans la prochaine édition de la newletter, vous pouvez utiliser ce <a href="https://forms.gle/3b7Q2w2bzsXE6uYo9">formulaire</a>.</p>
<p><br />
<a href="https://dair.ai/newsletter/">Abonnez-vous</a> pour recevoir les prochains numéros dans votre boîte mail.</p>
<p><a href="https://dair.ai/posts/NLP_Newsletter_-11_-FR/">NLP Newsletter #11 [FR]: Jukebox, HybridQA, TLDR generation, Blender: the SOTA Chatbot, TorchServe, AI Economist, WT5,...</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on May 04, 2020.</p>
https://dair.ai/posts/NLP_Newsletter_10_[TR]2020-04-30T00:00:00+00:002020-04-30T00:00:00+00:00Harun Uzhttps://dair.ai
<p><img src="https://cdn-images-1.medium.com/max/1200/1*WxbP3uKvd2GB6B-NaxtiIw.png" alt="" /></p>
<p><br />
NLP Haber Bülteni’nin 10. sayısına hoş geldiniz. Umarız iyi ve güvendesinizdir. Bu sayıda, dil modellerine ilişkin en iyi uygulamalardan makine öğrenmesinde tekrarlanabilirliğe ve doğal dil işlemede (NLP) mahremiyet ve güvenliğe kadar uzanan konuları ele alıyoruz.</p>
<h1 id="dairai-sitesinden-güncellemeler-️">dair.ai sitesinden güncellemeler 🔬🎓⚙️</h1>
<ul>
<li>
<p>COVID-19 Açık Araştırma Veri Setinin (<a href="https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge">COVID-19 Open Research Dataset</a>) çalışmasına yardımcı olmak ve bilimsel literatürden içgörü elde etmek için, açık kaynak araçlarını ve herkese açık ön-eğitime tabi tutulmuş (İng. pre-trained) dil modellerini kullanarak basit bir metin benzerliği arama uygulaması oluşturma adımlarını gösteren bir <a href="https://github.com/dair-ai/covid_19_search_application">notebook</a> yayımladık.</p>
</li>
<li>
<p>Geçtiğimiz hafta “Modern NLP için Derin Öğrenme” hakkında <a href="https://odsc.com/boston/">Açık Veri Bilimi Konferansı</a>‘nda sanal bir eğitim verdik. Kaynakları <a href="https://github.com/dair-ai/odsc_2020_nlp">burada</a> bulabilirsiniz.</p>
</li>
<li>
<p>Geçen hafta topluluğumuzun üyeleriyle birlikte iki makale yayımladık. Bunlardan biri; etiketlenmemiş veri vektörlerinin (veri akışı) bir dizisini analiz etme ve bunların temsillerini öğrenme problemine yönelik <a href="https://medium.com/dair-ai/unsupervised-progressive-learning-upl-a-new-problem-for-ai-9a1c68c70a28">denetimsiz aşamalı öğrenme</a> (İng. unsupervised progressive learning) ile ilgili. İkinci <a href="https://medium.com/dair-ai/structural-scaffolds-for-citation-intent-classification-in-scientific-publications-e5acd2f0ebf9">makale</a> ise ELMo kullanarak atıf niyet sınıflandırması (İng. citation intent classification) için bir yaklaşımı özetlemektedir.</p>
</li>
<li>
<p>Kısa süre önce, duygu sınıflandırması (İng. emotion classification) görevi için ön-eğitime tabi tutulmuş dil modellerinin ince-ayarını (İng. fine-tune) yapmanıza yardımcı olacak bir <a href="https://colab.research.google.com/drive/1nwCE6b9PXIKhv2hvbqf1oZKIGkXMTi1X">notebook</a> yayımladık.</p>
</li>
</ul>
<h1 id="araştırma-and-yayınlar-">Araştırma and Yayınlar 📙</h1>
<p><strong><em>XTREME: Dillerarası Genelleştirmenin Değerlendirilmesinde Çokdilli Çoklu-Görevli (multi-task) Geniş Bir Başarım Ölçümü</em></strong></p>
<p><br />
Hafta başında, Google AI ve DeepMind araştırmacıları, <a href="https://arxiv.org/abs/2003.11080">XTREME</a> adında, çokdilli temsilleri öğrenen dil modellerinin dillerarası genelleştirme kabiliyetlerini desteklemeyi amaçlayan ilginç bir başarım ölçümü yayımladılar. Bu başarım ölçümü, sözdizimsel veya anlamsal olarak farklı anlam düzeyleri hakkında akıl yürütmeyi gerektiren 40 dil ve 9 farklı görev üzerinde test içeriyor. Bu bildiri ayrıca, mBERT, XLM ve MMTE gibi çokdilli temsiller için kabul görmüş en iyi modelleri kullanarak temel sonuçlar vermektedir.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/0*kk7J1fCht_VZR_su.png" alt="" /></p>
<p><em>Kaynak:</em> <a href="https://ai.googleblog.com/2020/04/xtreme-massively-multilingual-multi.html"><em>Google AI Blog</em></a></p>
<p><br />
<strong><em>Makinelerin Gerçek Dünya Dil Kullanımlarına Göre Değerlendirilmesi</em></strong></p>
<p><br />
Dil modellerinin soru cevaplama (QA) ve dizilim etiketleme (İng. sequence labeling) gibi çeşitli görevlerde nispeten iyi performans gösterdiği gösterilmiştir. Ancak, yeni bir <a href="https://arxiv.org/abs/2004.03607">bildiri</a>, dil modellerinin gerçek dünya dil kullanımında daha karmaşık ayarlarla başarı gösterip gösteremeyeceğini daha iyi değerlendirmek için bir framework ve kıyaslama önermektedir (ör. güncel durumlar için faydalı tavsiyeler üretmek). Ampirik sonuçlar, T5 gibi kabul görmüş en iyi modellerin, vakaların sadece %9’unda insan tarafından yazılmış tavsiye kadar faydalı tavsiyeler ürettiğini göstermektedir. Bu sonuçlar dil modellerinin gerçek dünya bilgisini ve sağduyulu muhakemeyi anlama ve modelleme yeteneğindeki eksikliklere işaret etmektedir.</p>
<p><br />
<strong><em>Metin Temsili Modellerinize Sevgi Katın: Basque Örneği</em></strong></p>
<p><br />
Dile özgü büyük veri setleri üzerinde tekdilli modellerin (FastText kelime gömmeleri (İng. word embeddings) ve BERT) eğitimi, ön-eğitime tabi tutulmuş çokdilli sürümlerden daha iyi sonuçlar verebilir mi? Yakın tarihli bir <a href="https://arxiv.org/abs/2004.00033">bildiri</a>‘de, araştırmacılar daha büyük Basque derlemleri kullanarak ilgili modellerin etkisini ve performansını inceliyorlar. Sonuçlar, modelin aslında konu sınıflandırması (İng. topic classification), duygu sınıflandırması ve Basque için PoS etiketlemesi gibi alt görevlerde daha iyi sonuçlar verdiğini göstermektedir. Bunun diğer diller için geçerli olup olmadığını ve ortaya çıkan bazı ilginç sonuçlar veya yeni zorluklar olup olmadığını test etmek ilginç olabilir.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*rN7mNPz0os7kd8rBboliIg.png" alt="" /></p>
<p><em>Şekil kaynağı:</em> <a href="https://arxiv.org/abs/2004.00033"><em>Agerri et al. (2020)</em></a></p>
<p><br />
<strong><em>SimCLR ile Öz-Denetimli ve Yarı-Denetimli Öğrenmede İlerleme</em></strong></p>
<p><br />
Haber bültenimizin önceki bir <a href="https://medium.com/dair-ai/nlp-newsletter-the-annotated-gpt-2-understanding-self-distillation-haiku-ganilla-sparkwiki-b0f47f595c82">sayısında</a>, Google AI tarafından geliştirilen ve öğrenme aktarımı (İng. transfer learning) ve yarı-denetimli (İng. semi-supervised) öğrenme gibi farklı ortamlarda görüntü sınıflandırma sonuçlarını iyileştirmede görsel temsillerin <em>kontrastlı öz-denetimli öğrenimi (İng. contrastive self-supervised learning)</em> için bir <em>framework</em> öneren SimCLR’den bahsetmiştik. Etiketlenmemiş verilerden görsel temsiller öğrenmek öz-denetimli ve yarı-denetimli öğrenmeye yeni bir yaklaşımdır. <a href="https://ai.googleblog.com/2020/04/advancing-self-supervised-and-semi.html">Sonuçlar</a>, ImageNet’te (etiketlenmiş verilerin sadece %1’ine dayanarak) yöntemin düşük kaynaklı ortamlarda da yararlı olabileceğini gösteren kabul görmüş en iyi sonuçları elde ettiğini göstermektedir.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*kGiv7LFJW1g_R6m2XblSSA.png" alt="" /></p>
<p><em>Kaynak:</em> <a href="https://ai.googleblog.com/2020/04/advancing-self-supervised-and-semi.html"><em>Google AI Blog</em></a></p>
<p><br />
Öz-denetimli öğrenmenin alandaki en önemli konulardan biri olduğunu belirtmek gerekir. Daha fazla bilgi edinmek istiyorsanız aşağıdakilere göz atabilirsiniz:</p>
<ul>
<li><a href="https://www.nytimes.com/2020/04/08/technology/ai-computers-learning-supervised-unsupervised.html">Computers Already Learn From Us. But Can They Teach Themselves?</a></li>
<li><a href="https://amitness.com/2020/02/illustrated-self-supervised-learning/">The Illustrated Self-Supervised Learning</a></li>
<li><a href="https://www.fast.ai/2020/01/13/self_supervised/">Self-supervised learning and computer vision</a></li>
</ul>
<p><br />
<strong><em>Bayt Çifti Kodlaması Dil Modeli Ön-eğitimi için En Optimum Algoritma Değil</em></strong></p>
<p><br />
Kaj Bostrom ve Greg Durrett, bir dil modelinin ön-eğitimi için sıklıkla kullanılan Bayt Çifti Kodlaması (İng. Byte Pair Encoding - BPE) tokenizasyon algoritmasının en optimum yöntem olup olmadığını değerlendirdikleri bir <a href="https://arxiv.org/pdf/2004.03720.pdf">bildiri</a> yayımladılar. Başka bir deyişle, tokenizasyonun dil modellerinin performansı üzerindeki etkisinin doğrudan değerlendirilmesini irdelediler. Yazarlara göre, bu literatürde de görülebildiği gibi nadiren irdelenmektedir. Sonuca ulaşmak için, kontrollü deneyler kullanarak dil modellerini sıfırdan ön-eğitime tabi tuttular ve unigram ve BPE gibi farklı tokenizasyon algoritmaları uyguladılar. Daha sonra, elde edilen ön-eğitime tabi tutulmuş dil modelleri birkaç alt akış görevinde test edildi. Sonuçlar, unigram tokenizasyonunun daha yaygın kullanılan BPE ile eşleştiğini veya daha iyi performans gösterdiğini ortaya koymaktadır.</p>
<p><br />
<strong><em>Longformer: Uzun Doküman Transformatörü</em></strong></p>
<p><br />
Allen AI’daki araştırmacılar, uzun metinlerle daha yüksek verimde performans göstermeyi hedefleyen <a href="https://arxiv.org/abs/2004.05150">Longformer</a> adlı transformatör tabanlı yeni bir model yayımladı. Transformatör tabanlı modellerin öz-dikkat (İng. self-attention) işlemlerinin büyüklüğünün katlanarak artması (sekans uzunluğunun kuadratik etkisi ile) ve bu nedenle hesaplama açısından çok pahalı olması bilinen bir transformatör tabanlı model limitasyondur. Son zamanlarda, Transformatörlerin uzun dokümanlara uygulanabilirliğini sağlamak için <a href="https://arxiv.org/abs/2001.04451">Reformer</a> ve <a href="https://arxiv.org/abs/1904.10509">Sparse Transformers</a> gibi birçok çaba gösteriliyor. Longformer, daha az bellek tüketmek ve uzun doküman modellemesinde yüksek verim göstermek için karakter düzeyinde modelleme ile öz-dikkati (yerel ve genel dikkat karışımı) birleştiriyor. Yazarlar ayrıca, ön-eğitime tabi tutulmuş modellerinin soru cevaplama (İng. question answering - QA) ve metin sınıflandırması da dahil olmak üzere doküman düzeyi alt görevlerine uygulandığında diğer yöntemlerden daha iyi performans gösterdiğini ortaya koymaktadır.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*uTxVqLtO_nQaDw4OedUUtQ.png" alt="" /></p>
<p><em>Şekil kaynağı:</em> <a href="https://arxiv.org/abs/2004.05150"><em>Beltagy et al. (2020)</em></a></p>
<h1 id="yaratıcılık-etik-ve-toplum-">Yaratıcılık, Etik ve Toplum 🌎</h1>
<p><strong><em>Makine Öğrenmesinde Tekrarlanabilirlik</em></strong></p>
<ul>
<li>
<p>Tekrarlanabilirlik, makine öğrenimi toplulukları arasında devam eden bir tartışma konusu olmuştur. Daha açık, şeffaf ve erişilebilir bilimi teşvik etmek için tekrarlanabilirlik konusunda birçok çaba gösterilmiştir. Makine öğrenimi alanının tekrarlanabilirlik açısından nerede durduğunu anlamak istiyorsanız, Joelle Pineau ve diğerleri tarafından yayımlanan bu <a href="https://arxiv.org/abs/2003.12206">yayına</a> bakın.</p>
</li>
<li>
<p>Daha yakın zamanda ve bu çabalardan esinlenerek, Papers with Code ekibi (bu ekip şu an Facebook AI’nın bir parçası) <em>büyük makine öğrenmesi konferanslarında sunulan tekrarlanabilir araştırmaları kolaylaştırmak</em> için faydalı olacak [tekrarlanabilirlik kontrol listesi] (https://github.com/paperswithcode/releasing-research-code)’ni anlattıkları bir <a href="https://medium.com/paperswithcode/ml-code-completeness-checklist-e9127b168501">blog yazısı</a> yayımladırlar. Kontrol listesi kod gönderimini aşağıdaki şekilde değerlendiriyor:</p>
</li>
</ul>
<p><img src="https://cdn-images-1.medium.com/max/800/1*BQH6F1J3TE1T_GREv5xSew.png" alt="" /></p>
<p><em>Kaynak:</em> <a href="https://medium.com/paperswithcode/ml-code-completeness-checklist-e9127b168501"><em>Papers with Code</em></a></p>
<ul>
<li>Açık bilim ve tekrarlanabilirlik konusunda; bir NLP araştırmacısının, bir araştırmacının kopyalayamayacağı bir bildiri sonucunu kopyalamayı başaracaklara ödül vereceğini duyurduğu ilginç bir gönderisi <a href="https://twitter.com/srush_nlp/status/1245825437240102913?s=20">burada</a> görebilirsiniz.</li>
</ul>
<p><br />
<strong><em>NLP’de Mahremiyet ve Güvenlik</em></strong></p>
<p><br />
Ön-eğitime tabi tutulmuş bir dil modeli çalınabilir mi veya API’ler aracılığıyla kullanıma açıldığında herhangi bir güvenlik etkisi yaratabilir mi? Yeni bir makalede araştırmacılar, özellikle modeli çalmak için sorguların kullanımı ile ilgili olarak, BERT tabanlı API’leri güvenlik açısından test etmeyi amaçlıyorlar. Özetle, kötü niyetli bir kişi ince-ayara tabi tutulmuş bir modeli, anlamsız dizilerle besleyerek ve kurban modelinin öngörülen etiketleriyle kendi modelini ince-ayara tabi tutarak çalabileceğini buldular. Model çıkarma saldırıları hakkında daha fazla bilgiyi <a href="http://www.cleverhans.io/2020/04/06/stealing-bert.html">buradan</a> edinebilirsiniz.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*K9ZD4USdovdyHXomB7csfA.png" alt="" /></p>
<p>SQuAD konusunda eğitilmiş bir kurban modeline uygulanan model çıkarma işlem dizisi (<a href="http://www.cleverhans.io/2020/04/06/stealing-bert.html">Kaynak</a>).</p>
<p><br />
ACL 2020’de kabul edilen bir başka ilginç <a href="https://arxiv.org/abs/2004.06660">bildiri</a>, ön-eğitime tabi tutulmuş dil modellerinin saldırılara karşı hassas olup olmadığını araştırıyor. Yazarlar, bu modelleri ciddi tehditlere karşı savunmasız hale getiren, güvenlik açıklarını ön-eğitime tabi tutulmuş ağırlıklara (İng. weight) enjekte edebilen bir <em>zehirlenme</em> yöntemi geliştiriyor. Bu açıklar, saldırganın modelin tahminlerini manipule etmek için rasgele kelimeler enjekte ederek bu modellerde arka kapılar (İng. backdoors) ortaya çıkarabileceğinin mümkün olduğunu göstermektedir. Bunu test etmek için, modeli yanlış örneklemeye zorlamak amacıyla belirli anahtar kelimelerle enjekte edilen veri kümelerini içeren alt görevleri yerine getirmek için ön-eğitime tabi tutulmuş modeller kullanılmıştır.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*s4QscGOeDiN6tHOfM99pww.png" alt="" /></p>
<p><em>Şekil kaynağı:</em> <a href="https://arxiv.org/abs/2004.06660"><em>Kurita et al. (2020)</em></a></p>
<p><br />
<strong><em>Bir Dizi AI-tabanlı COVID-19 Uygulama ve Araştırmaları</em></strong></p>
<ul>
<li>
<p>COVID-19, modern zamanların en büyük zorluklarından birisi olduğunu kanıtladı. Tüm dünyadan araştırmacılar COVID-19’un anlaşılmasına yardımcı olup katkıda bulunmak için arama motorlarından veri seti yayınlarına kadar farklı yollar bulmaya çalışmaktadırlar. Sebastian Ruder, yapay zeka araştırmacılarının üzerinde çalıştığı birkaç ilgi çekici projeyi ön plana çıkardığı haber bülteninin yeni bir <a href="http://newsletter.ruder.io/issues/covid-19-edition-236509">sayısını</a> yayımladı.</p>
</li>
<li>
<p>COVID-19 konu başlığı üzerine Allen AI’dan araştırmacılar, bu ayın sonlarına doğru yapılacak <a href="https://www.meetup.com/NY-NLP/events/269849442">sanal buluşmada</a> popüler COVID-19 Open Research Veri Seti (CORD-19) hakkında tartışacaklardır.</p>
</li>
<li>
<p>CORD-19 veri seti birçok araştırmacı tarafından arama motorları gibi NLP-destekli uygulamalar geliştirmek için kullanılmaktadır. Araştırmacılara bilimsel makalelerde raporlanan CORD-19 ile ilgili sonuçları elde etmelerinde yardımcı olabilecek bir arama motorunun gerçeklendiği şu <a href="https://openreview.net/forum?id=PlUA_mgGaPq">bildiriyi</a> örnek olarak alabilirsiniz. Bu tür araçlar, yazarlara göre kanıta dayalı karar verme sürecini bilgilendirmeye yardımcı olabilir.</p>
</li>
<li>
<p>ArCOV-19, 27 Ocak’tan 31 Mart 2020’ye kadarlık periyodu(hala devam etmekte) kapsayan Arapça COVID-19 Twitter veri setidir. Bu veri seti, COVID-19 pandemisini kapsayan genel kullanıma açık ilk Arapça Twitter veri setidir. Yaklaşık 748 bin popüler tweet(Twitter arama kriterine göre) ve bu tweetlerin en popüler alt kümelerini içermektedir. Alt kümeler hem retweetleri hem de tartışma zincirlerini(tweetlere verilen cevaplar) içermektedir. <a href="https://gitlab.com/bigirqu/ArCOV-19">ArCOV-19</a>, doğal dil işleme, veri bilimi, sosyal hesaplama (İng. social computing) ve diğerleri gibi birçok alanda araştırmalara olanak vermek için tasarlanmıştır.</p>
</li>
</ul>
<h1 id="araçlar-ve-veri-setleri-️">Araçlar ve Veri Setleri ⚙️</h1>
<p><strong><em>Python’da Makine Öğrenmesi: Veri Bilimi, Makine Öğrenmesi ve Yapay Zeka’da Ana Geliştirme ve Teknoloji Akımları</em></strong>
Bir araç ya da veri seti değil; ancak Sebastian Raschka, Joshua Patterson ve Corey Nolet’in bu harika <a href="https://www.mdpi.com/2078-2489/11/4/193">bildirisi</a>, özellikle Python programlama dili üzerine odaklanarak makine öğrenmesinde yeni teknoloji akımlarının geliştirilmesi üzerine kapsamlı bir genel bakış sağlamaktadır.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*OUpM4KS2uvT7zWlMYqy8RQ.png" alt="" /></p>
<p><em>Resim kaynağı</em> <a href="https://www.mdpi.com/2078-2489/11/4/193"><em>Raschka ve çalışma arkadaşları (2020)</em></a></p>
<p><br />
<strong><em>Makine Öğrenmesinde Yorumlanabilirlik ve Açıklanabilirlik</em></strong></p>
<p><br />
HuggingFace ekibi, BERT ve RoBERTa gibi dil modellerinden öğrenilen temsillerin görselleştirilmesine olanak sağlayan exBERT adını verdikleri görselleştirme aracını yayımladılar. Bu özellik, kendi <a href="https://huggingface.co/models?filter=exbert">model sayfalarına</a> entegre edilmiştir. Aynı zamanda dil modellerinin nasıl öğrendiklerinin ve bu öğrenilen temsillerin hangi özellikleri çözümlediklerinin daha iyi anlaşılmasını hedeflemektedir.</p>
<p><br />
OpenAI, genellikle yorumlanabilirlik bağlamında çalışılan çeşitli görü modellerinin önemli katman ve nöronlarından elde edilmiş bir dizi görselleştirmeleri içeren <a href="https://microscope.openai.com/models">Microscope</a> adını verdikleri web uygulamalarını kısa zaman önce yayımladılar. Ana hedef, sinir ağlarında öğrenilen özelliklerden çıkarılan ilgi çekici kavramların analizini ve paylaşımını kolaylaştırmaktır.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*4VdcqSSyzWDMvVDPEuKzIQ.png" alt="" /></p>
<p><br />
<strong><em>CloudCV: ViLBERT Çoklu-Görev Demosu</em></strong></p>
<p><br />
Bir önceki <a href="https://dair.ai/NLP_Research_Highlights_-_Issue_-1/">NLP Araştırmalarında Öne Çıkanlar</a>‘da, açıklama tabanlı resim çıkarımında ve görsel soru cevaplamada (VQA) kullanılabilecek görü-ve-dil modellerini iyileştirme tekniği çok görevli ViLBERT’e yer vermiştik. Yazarlar şimdi de VQA ve konum-gösterici soru cevaplama(İng. pointing question answering) gibi 18 farklı görü ve dil görevleri üzerinde modellerin test edilebileceği bir <a href="https://vilbert.cloudcv.org/">web uygulamalasını</a> sağlamaktadırlar.</p>
<p><br />
<strong><em>Açık araştırma için COVID-19 ile ilgili 150 milyondan fazla tweetlerin bulunduğu bir Twitter veri seti</em></strong></p>
<p><br />
COVID-19 küresel pandemisine olan ilgi düzeyi sebebiyle, araştırmacılar Twitter’dan elde edilen COVID-19 hakkındaki sıradan konuşmalara ilişkin tweetlerin olduğu bir <a href="https://zenodo.org/record/3738018">veri setini</a> yayımlamaktadırlar. İlk sürümden beri, yeni işbirlikçilerden ek veriler eklenerek kaynağın şu anki boyutuna ulaşmasına olanak sağlanmıştır. Özel veri toplama işlemi 11 marttan itibaren başlamış bulunmakta ve günlük 4 milyondan fazla tweet elde edilmektedir.</p>
<p><br />
<strong><em>Küçük bir otograd motoru</em></strong></p>
<p><br />
Andrej Karpathy kısa bir süre önce, sade ve sezgisel bir arayüz kullanarak yapay sinir ağı inşa etme ve eğitme yetenekleri sağlayan <a href="https://github.com/karpathy/micrograd">micrograd</a> adı verilen kütüphaneyi çıkardı. Aslında, bulunan en küçük otograd (İng. otograd) motoru olduğunu iddia ettiği bütün kütüphaneyi yaklaşık 150 kod satırında yazdı. İdeal olarak, bu tür kütüphaneler eğitim amaçlı kullanılabilir.</p>
<h1 id="makaleler-ve-blog-gönderileri-️">Makaleler ve Blog gönderileri ✍️</h1>
<p><strong><em>Transformatör Ailesi ve En Son Gelişmeler</em></strong></p>
<p><br />
Yeni ve güncel bir blog gönderisinde Lilian Weng, Transformatör modellerinin en son gelişmelerinden bazılarını özetlemektedir. <a href="https://lilianweng.github.io/lil-log/2020/04/07/the-transformer-family.html">Makale</a>, güzel bir notasyon, tarihsel bir gözden geçiriş ve daha uzun dikkat süresi(Transformer XL), düşük hesaplama ve bellek tüketimi gibi en son iyileştirmeleri ele almaktadır.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*i-4V-EIirg2cvGMVLd8BWA.png" alt="" /></p>
<p><br />
Model sıkıştırma (İng. model compression), ön-eğitime tabi tutulmuş modellerin büyük boyutları ve doğaları sebebiyle NLP’de önemli bir araştırma alanıdır. İdeal durumda, bu modeller birçok çeşitli NLP görevinde en iyi sonuçları üretmeye devam ettikçe bu modellerin hesaplanma maliyetlerini düşürerek üretimde yapılabilir olmalarını sağlamak da önemli olmaktadır. Madison May, kısa zaman önce özellikle NLP’de model sıkıştırmada kullanılan birkaç tekniği özetlediği harika bir <a href="https://www.pragmatic.ml/a-survey-of-methods-for-model-compression-in-nlp/">makale</a> daha yayımladı.</p>
<h1 id="eğitim-">Eğitim 🎓</h1>
<p><strong><em>Alec Radford’dan Dil Modelleri üzerine Konuk Dersi</em></strong></p>
<p><br />
Eğer CBOW, Word2Vec, ELMo, GPT, BERT, ELECTRA ve T5 gibi dil modellerinde kullanılan tekniklerin teorik yönleri hakkında bilgi edinmeye meraklıysanız, Alec Radford’un(OpenAI’da araştırmacı) şu muhteşem <a href="https://www.youtube.com/watch?v=BnpB3GrpsfM">konuk dersi</a> ilginizi çekebilir. Bu eğitim, Pieter Abbeel’in derin gözetimsiz öğrenme tekniklerini anlattığı hala devam eden <a href="https://sites.google.com/view/berkeley-cs294-158-sp20/home">dersinin</a> bir parçası olarak verilmişti.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*GUxoCXqhozkp_aaRxpT3Sg.png" alt="" /></p>
<p><br />
<strong><em>Python Numpy Rehberi(Jupyter ve Colab’lı)</em></strong></p>
<p><br />
Stanford’un popüler çevrimiçi Görsel Tanıma için Evrişimsel Sinir Ağları(Convolutional Neural Network for Visual Recognition) dersi, Numpy’ye <a href="">giriş rehberi</a> için artık Google Colab’a bir bağlantı içermektedir. Oldukça geniş bir rehber; ancak yeni başlayanlar için olukça iyidir.</p>
<p><br />
<strong><em>Yeni mobil sinir ağı mimarisi</em></strong></p>
<p><br />
Mobil ve hafif istemciler için sinir ağları inşa etmeye ilginiz varsa şu kapsamlı <a href="https://machinethink.net/blog/mobile-architectures/">blog gönderisi</a> sizin için olabilir. Bu makale birçok yapay sinir ağı tasarımını ve hız performans testlerini kapsamaktadır.</p>
<p><br />
<strong><em>Veri-Tabanlı Cümle Basitleştirme: Derleme ve Kıyaslama</em></strong></p>
<p><br />
Cümle basitleştirme (İng. sentence simplification), bu günlerin baskın paradigması, bir cümleyi değişime uğratarak okumayı ve anlamayı daha kolay hale getirmeyi amaçlamaktadır. Şu <a href="https://www.mitpressjournals.org/doi/full/10.1162/coli_a_00370">derleme bildirisi</a>, İngilizce’de orijinal-basitleştirilmiş şeklinde cümle çiftlerinden oluşan bir derlem kullanarak nasıl basitleştirme yapılacağını öğrenme girişimi yaklaşımlarına odaklanmaktadır. Ayrıca farklı yaklaşımların, karşılaştırma ve güçlü ve kısıtlı yönlerini ön plana çıkarmak gibi farklı veri setleri üzerinde kıyaslamalarını da içermektedir.</p>
<p><br />
<strong><em>Makine Öğrenmesinde İleri Konu Başlıkları</em></strong></p>
<p><br />
Yisong Yue <a href="https://sites.google.com/view/cs-159-spring-2020/lectures?authuser=0">Veri-Tabanlı Algoritma Tasarımı</a> dersinin bütün ders videolarını yayımladı. Ders, Bayesian optimizasyonu, türevlenebilir hesaplama (İng. differentiable computation) ve taklit öğrenme(İng. imitation learning) gibi makine öğrenmesinde ileri konu başlıklarını içermektedir.</p>
<p><br />
<img src="https://cdn-images-1.medium.com/max/800/1*8YFTbEPUw3Bqio70xP0WXQ.png" alt="" /></p>
<h1 id="bunlara-da-göz-atın-️">Bunlara da göz atın ⭐️</h1>
<p>Önceki NLP Haber Bülteni sayılarına <a href="https://github.com/dair-ai/nlp_newsletter">buradan</a> erişebilirsiniz.</p>
<p><br />
Harvard, birçok kendi kendinize çalışabileceğiniz çevrimiçi kurslarını ücretsiz olarak <a href="https://online-learning.harvard.edu/catalog?keywords=&paid%5B1%5D=1&max_price=&start_date_range%5Bmin%5D%5Bdate%5D=&start_date_range%5Bmax%5D%5Bdate%5D=">sunmaktadır</a>.</p>
<p><br />
<a href="https://github.com/zaidalyafeai/ARBML">ARBML</a>, web, komut satırı ve notebook gibi birçok arayüz ile gerçek zamanlı deneyim şansı sunarak Arapça NLP ve makine öğrenmesi projelerinin implementasyonlarını sağlamaktadır.</p>
<p><br />
<a href="https://nlpdashboard.com">NLP Kontol Paneli(NLP Dashboard)</a>, haber hikayelerinin ve metinlerin istatistiksel analizi ve varlık ismi tanıma (İng. named entity recognition) işlemlerinin yapılabileceği eğlenceli bir web uygulamasıdır. spaCy, Flask ve Python ile geliştirilmiştir.</p>
<p><br />
Eğer göz gezdirmediyseniz Connor Shorten, ilginç ve güncel makine öğrenmesi bildirilerini özetlediği şu bilgilendirici <a href="https://www.youtube.com/channel/UCHB9VepY6kYvZjj0Bgxnpbw?sub_confirmation=1">YouTube kanalını</a> sürdürmektedir. Her çalışmanın kısa, öz ve muhteşem özetlerini sağlarken önemli detaylarını da işlemektedir. Ayrıca alandaki diğer araştırmacılar ile bir de <a href="https://www.youtube.com/channel/UCMLtBahI5DMrt0NPvDSoIRQ">podcast</a> yayını başlattı.</p>
<p><br />
<a href="https://github.com/microsoft/nlp-recipes">İşte</a> metin sınıflandırma, metinsel doğrulama(İng. textual entailment), metin özetleme ve soru cevaplama gibi NLP işlemlerinin en iyi yöntemlerini ve önerileri (notebook’lar ve açıklamalar ile birlikte) sağlayan etkileyici, zengin bir repo.</p>
<hr />
<p>Eğer NLP Haber Bülteni’nin bir sonraki sayısında paylaşmak istediğiniz güncel veri seti, proje, blog gönderisi, rehber ya da bildiri varsa lütfen şu <a href="https://forms.gle/3b7Q2w2bzsXE6uYo9">formu</a> doldurarak direkt gönderiniz.</p>
<p><br />
<em>🔖 Gelecek sayıları gelen kutunuzda görmek için NLP Haber Bülteni’ne</em> <a href="https://dair.ai/newsletter/"><em>abone olun</em></a>.</p>
<p><a href="https://dair.ai/posts/NLP_Newsletter_10_-TR/">NLP Haber Bülteni #10 [TR]: Makine Öğrenmesinde Tekrarlanabilirlik, NLP'de Mahremiyet ve Güvenlik, XTREME, Longformer, VilBERT, exBERT,…</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on April 30, 2020.</p>
https://dair.ai/posts/attention-is-all-you-need2020-04-29T00:00:00+00:002020-04-29T00:00:00+00:00Constanza Fierrohttps://dair.aiconstanza.fierro94@gmail.com
<script type="text/javascript" async="" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
<blockquote>
<p>Paper summary: Attention is all you need , Dec. 2017. (<a href="https://arxiv.org/abs/1706.03762">link</a>)</p>
</blockquote>
<h2 id="why-is-it-important">Why is it important?</h2>
<p>This is the paper that first introduced the transformer architecture, which allowed language models to be way bigger than before thanks to its capability of being easily parallelizable. Consequently, models such as BERT and GPT achieved far better results in diverse NLP tasks.</p>
<h2 id="what-does-it-propose">What does it propose?</h2>
<p>This work proposed a network architecture to perform neural machine translation (NMT). This new model is entirely based on the attention mechanism, contrary to the standard at that point of using recurrent networks with attention. The architecture was tested in two NMT tasks and it outperformed the best existent models, in addition to using less resources. Furthermore, the model was also successfully tested in a different task (english constituency parsing).</p>
<p><br />
The inherently sequential nature of RNNs precludes parallelization within training examples, moreover the best RNNs architectures don’t rely solely on one or a couple of hidden states, but use attention to attend to the most relevant hidden states. That’s why the architecture presented in this paper is so relevant and impactful, it’s able to achieve better results getting rid of the sequentiality of RNNs.</p>
<h2 id="how-does-it-work">How does it work?</h2>
<p>The Transformer is an encoder-decoder architecture, the encoder corresponds to the left side of the image below (Figure 1) and the decoder to the right one. In this paper the authors introduced the multi-head self-attention layer and the positional encodings used in the architecture (details in the next 2 sections).</p>
<p><br />
Essentially, token embeddings are added with their positional encoding and used as inputs in the encoder and decoder. The encoder is composed of a stack of N=6 layers, we can see one of such layers in Figure 1. The decoder is also composed of a stack of N=6 identical layers, which has the two sub-layers of the encoder but it inserts a third sub-layer first that performs a multi-head attention over the output. Each feed-forward and multi-head self-attention layer is followed by a residual connection and a layer normalization, thus the output of each sub-layer is \(\text{LayerNorm}(x+\text{SubLayer}(x))\).</p>
<p><br />
Some extra details:</p>
<ul>
<li>Byte pair encoding is used to define the tokens of the text.</li>
<li>The feed forward network consists of two linear transformations with a ReLu in between.</li>
<li>Regularizations:
<ul>
<li>Dropout on the output of each sub-layer (before it’s added and normalized).</li>
<li>Label smoothing.</li>
</ul>
</li>
<li>Beam search is used to generate the text.</li>
</ul>
<p>(for further explanations in these concepts you can check my <a href="https://cfierro94.github.io/nlp-deep-dive/attention-is-all-you-need">deep dive of this paper</a>)</p>
<figure>
<img src="../../images/summary-attention-is-all-you-need/architecture.png" alt="Figure 1. The Transformer architecture." style="width:60%; display: block;margin-left: auto;margin-right: auto;" />
<figcaption style="text-align: center;">Figure 1. The Transformer architecture.</figcaption>
</figure>
<h3 id="positional-encoding">Positional Encoding</h3>
<p><strong>Motivation</strong>: Since there’s no recurrence, a positional encoding vector is added to the token embedding to inject information about its position in the text.</p>
<p><br />
The \(PE(w_t)\), the positional encoding for the word \(w\) at position \(t\), is a vector of dimension \(d_{model}\) equal to the embedding dimension. We compute each dimension \(i\) of this vector as follows:</p>
\[PE_i(w_t) = \left\{\begin{array}{ll} sin(k_j*t) & \text{if} \quad i=2j \\ cos(k_j* t) & \text{if} \quad i=2j+1 \\\end{array} \right. \\[20pt]\text{where,} \quad k_j = \frac{1}{10000^{2i/d_{model}}}\]
<p>Which give as,</p>
\[PE(w_t) = \begin{bmatrix}sin(k_0t)\\cos(k_0t)\\... \\sin(k_{d/2}t)\\cos(k_{d/2}t)\end{bmatrix}\]
<p>We can think of this as a bit representation of numbers, with each dimension of the vector as a bit, since each bit changes periodically and we can tell that one number is bigger than another because of the bits activated and their order. More of this intuition in <a href="https://kazemnejad.com/blog/transformer_architecture_positional_encoding/">this blog post</a>.</p>
<p><br />
The authors showed that learned positional embeddings were as good as these in their test sets. But they think sin/cos is better because it could extrapolate better.</p>
<h3 id="attention-layer">Attention Layer</h3>
<p>Instead of just having one attention layer the authors found beneficial to linearly project the attention mechanism, thus performing multiple attentions in parallel (Figure 2). Each of the attention mechanisms is a scaled dot-product attention (explained below).</p>
<p><br />
The intuition behind is that having just one attention will lead to average all the different aspects of the text, whereas when we do parallel attention we are able to look at each of these details separately (the subject, the intention, the action, etc).</p>
<figure>
<img src="../../images/summary-attention-is-all-you-need/attentions.png" alt="Figure 2. Attention mechanisms of the Transformer." style="width:70%; display: block;margin-left: auto;margin-right: auto;" />
<figcaption style="text-align: center;">Figure 2. Attention mechanisms of the Transformer.</figcaption>
</figure>
<h4 id="scaled-dot-product-attention">Scaled Dot-Product Attention</h4>
<p>Attention is a function that takes a query and a set of key-value pairs as inputs, and computes a weighted sum of the values, where the weights are obtained from a compatibility function between the query and the corresponding key.</p>
<p><br />
The specific attention used here, is called <em>scaled dot-product</em> because the compatibility function used is:</p>
\[\text{weight}(q,k_i) =\text{softmax}(\frac{q\cdot k_i}{\sqrt{d_k}}) \quad \text{where} \, q,k_i \in \!R^{d_k}\]
<p>The authors decided to use a dot-product attention over an additive attention because it is faster to compute and more space-efficient. But it has been shown to perform worse for larger dimensions of the input (\(d_k\)), thus they added the scaling factor \(\sqrt{d_k}\) to counteract the effect of the dot product getting too large (which they suspect is the problem).</p>
<h4 id="masked-multi-head-self-attention">Masked Multi-Head Self-Attention</h4>
<p>In training we don’t want to show the complete output sentence to our model, but instead we want to present the words one by one to not let extra information flow in the decoder. That’s why in Figure 2 we see a “Mask opt.” which refers to setting those vectors to -inf, making them 0 after the softmax. Figure 3 can help to understand how this affects the architecture overall.</p>
<figure>
<img src="../../images/summary-attention-is-all-you-need/mask-transformer.png" alt="Figure 3. The Transformer architecture masking the output." style="width:70%; display: block;margin-left: auto;margin-right: auto;" />
<figcaption style="text-align: center;">Figure 3. The Transformer architecture masking the output.</figcaption>
</figure>
<h2 id="whats-next">What’s next!</h2>
<p>BERT and GPT are just some of the applications that the Transformer can have, but it has also been applied to <a href="http://openaccess.thecvf.com/content_cvpr_2018/html/Hu_Squeeze-and-Excitation_Networks_CVPR_2018_paper.html">images</a>, <a href="https://arxiv.org/abs/1710.10903]">graph networks</a>, <a href="https://arxiv.org/abs/1805.08318">GANs</a>, among others, achieving state-of-the-art results. It has also been useful to interpret a part of the models that use it (<a href="https://arxiv.org/abs/1910.05276">https://arxiv.org/abs/1910.05276</a>).</p>
<p><a href="https://dair.ai/posts/attention-is-all-you-need/">Attention is all you need, the transformer architecture</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on April 29, 2020.</p>
https://dair.ai/posts/NLP_Year_in_Review — 20192020-04-25T00:00:00+00:002020-04-25T00:00:00+00:00Elvis Saraviahttps://dair.aiellfae@gmail.com
<p><img src="https://miro.medium.com/max/1842/1*T08rCNctBW5zUvol0gfIig.png" alt="" /></p>
<p><br />
2019 was an impressive year for the field of natural language processing (NLP). In this blog post, I want to highlight some of the most important stories related to machine learning and NLP that I came across in 2019. I will mostly focus on NLP but I will also highlight a few interesting stories related to AI in general. The headlines are in no particular order. Stories may include publications, engineering efforts, yearly reports, the release of educational resources, etc.</p>
<p><br />
<em>Warning! This is a very long article so before you get started I would suggest bookmarking the article if you wish to read it in parts. I have also published the PDF version of this article which you can find at the end of the post.</em></p>
<h3 id="table-of-content"><strong>Table of Content</strong></h3>
<ul>
<li>Publications</li>
<li>ML/NLP Creativity and Society</li>
<li>ML/NLP Tools and Datasets</li>
<li>Articles and Blog Posts</li>
<li>Ethics in AI</li>
<li>ML/NLP Education</li>
</ul>
<h3 id="publications-"><strong>Publications 📙</strong></h3>
<p>Google AI introduces <a href="https://ai.googleblog.com/2019/12/albert-lite-bert-for-self-supervised.html">ALBERT</a> which a lite version of <a href="https://arxiv.org/abs/1810.04805">BERT</a> for self-supervised learning of contextualized language representations. The main improvements are reducing redundancy and allocating the model’s capacity more efficiently. The method advances state-of-the-art performance on 12 NLP tasks.</p>
<p><br />
Earlier this year, researchers at NVIDIA published a <a href="https://arxiv.org/pdf/1812.04948.pdf">popular paper</a> (coined StyleGAN) which proposed an alternative generator architecture for GANs, adopted from <a href="https://en.wikipedia.org/wiki/Neural_Style_Transfer">style transfer</a>. Here is a <a href="https://arxiv.org/pdf/1912.04958v1.pdf">follow-up work</a> where that focuses on improvements such as redesigning the generator normalization process.</p>
<p><br />
<img src="https://miro.medium.com/max/1058/1*Oc0EmkL9Scp5WZ4m5bq-cA.png" alt="" />
<em>The top row shows target images and the bottom row shows synthesized images — <a href="https://arxiv.org/pdf/1912.04958v1.pdf">source</a></em></p>
<p><br />
One of my favorite papers this year was <a href="https://code2seq.org/">code2seq</a> which is a method for generating natural language sequences from the structured representation of code. Such research can give way to applications such as automated code summarization and documentation.</p>
<p><br />
Ever wondered if it’s possible to train a biomedical language model for biomedical text mining? The answer is <a href="https://arxiv.org/abs/1901.08746">BioBERT</a> which is a contextualized approach for extracting important information from biomedical literature.</p>
<p><br />
After the release of BERT, Facebook researchers published <a href="https://ai.facebook.com/blog/roberta-an-optimized-method-for-pretraining-self-supervised-nlp-systems/">RoBERTa</a> which introduced new methods for optimization to improve upon BERT and produced state-of-the-art results on a wide variety of NLP benchmarks.</p>
<p><br />
Researchers from Facebook AI also recently published a <a href="https://ai.facebook.com/blog/making-transformer-networks-simpler-and-more-efficient/">method</a> based on an all-attention layer for improving the efficiency of a Transformer language model. More work from this research group includes a <a href="https://ai.facebook.com/blog/-teaching-ai-to-plan-using-language-in-a-new-open-source-strategy-game/">method</a> to teach AI systems on how to plan using natural language.</p>
<p><br />
<img src="https://miro.medium.com/max/968/1*0ZYSozImqnmFHxKmREpzmw.png" alt="" /></p>
<p><em>Explainability continues to be an important topic in machine learning and NLP. This <a href="https://arxiv.org/abs/1910.10045">paper</a> provides a comprehensive overview of works addressing explainability, taxonomies, and opportunities for future research.</em></p>
<p><br />
Sebastian Ruder published his <a href="https://ruder.io/thesis/">thesis</a> on Neural Transfer Learning for Natural Language Processing.</p>
<p><br />
A group of researchers developed a <a href="https://arxiv.org/abs/1910.04980">method</a> to perform emotion recognition in the context of conversation which could pave the way to affective dialogue generation. Another related work involves a GNN approach called <a href="https://www.aclweb.org/anthology/D19-1015.pdf">DialogueGCN</a> to detect emotions in conversations. This research paper also provides <a href="https://github.com/SenticNet/conv-emotion/tree/master/DialogueGCN">code implementation</a>.</p>
<p><br />
The Google AI Quantum team published a <a href="https://www.nature.com/articles/s41586-019-1666-5">paper</a> in Nature where they claim to have developed a quantum computer that is faster than the world’s largest supercomputer. Read more about their experiments (here](https://ai.googleblog.com/2019/10/quantum-supremacy-using-programmable.html).</p>
<p><br />
As mentioned earlier, one of the areas of neural network architectures that require a lot of improvement is explainability. This <a href="https://arxiv.org/abs/1908.04626">paper</a> discusses the limitations of attention as a reliable approach for explainability in the context of language modeling.</p>
<p><br />
<a href="https://arxiv.org/abs/1904.11694">Neural Logic Machine</a> is a neural-symbolic network architecture that is able to do well at both inductive learning and logic reasoning. The model does significantly well on tasks such as sorting arrays and finding shortest paths.</p>
<p><br />
<img src="https://miro.medium.com/max/1298/1*t7rfBC1pdn0wGgE1L0VNXw.png" alt="" /></p>
<p><br />
And <a href="https://arxiv.org/abs/1909.03186">here</a> is a paper that applies Transformer language models to Extractive and Abstractive Neural document summarization.</p>
<p><br />
Researchers developed a method that focuses on using comparisons to build and train ML models. Instead of requiring large amounts of feature-label pairs, this <a href="https://blog.ml.cmu.edu/2019/03/29/building-machine-learning-models-via-comparisons/">technique</a> compares images with previously seen images to decide whether the image should be of a certain label.</p>
<p><br />
Nelson Liu and others presented a <a href="https://arxiv.org/abs/1903.08855">paper</a> discussing the type of linguistic knowledge being captured by pretrained contextualizers such as BERT and ELMo.</p>
<p><br />
<a href="https://arxiv.org/abs/1906.08237">XLNet</a> is a pretraining method for NLP that showed improvements upon BERT on 20 tasks. I wrote a summary of this great work <a href="https://medium.com/dair-ai/xlnet-outperforms-bert-on-several-nlp-tasks-9ec867bb563b">here</a>.</p>
<p><br />
This <a href="https://arxiv.org/abs/1901.11373">work</a> from DeepMind reports the results from an extensive empirical investigation that aims to evaluate language understanding models applied to a variety of tasks. Such extensive analysis is important to better understand what language models capture so as to improve their efficiency.</p>
<p><br />
<a href="https://arxiv.org/abs/1908.03557">VisualBERT</a> is a simple and robust framework for modeling vision-and-language tasks including VQA and Flickr30K, among others. This approach leverages a stack of Transformer layers coupled with self-attention to align elements in a piece of text and the regions of an image.</p>
<p><br />
This <a href="https://arxiv.org/abs/1903.05987">work</a> provides a detailed analysis comparing NLP transfer learning methods along with guidelines for NLP practitioners.</p>
<p><br />
Alex Wang and Kyunghyun propose an <a href="https://arxiv.org/abs/1902.04094">implementation of BERT</a> that is able to produce high-quality, fluent generations. Here is a Colab <a href="https://colab.research.google.com/drive/1MxKZGtQ9SSBjTK5ArsZ5LKhkztzg52RV">notebook</a> to try it.</p>
<p><br />
Facebook researchers published code (<a href="https://github.com/facebookresearch/XLM">PyTorch implementation</a>) for XLM which is a model for cross-lingual model pretraining.</p>
<p><br />
This <a href="https://www.cl.uni-heidelberg.de/statnlpgroup/blog/rl4nmt/">works</a> provides a comprehensive analysis of the application of reinforcement learning algorithms for neural machine translation.</p>
<p><br />
This survey <a href="https://jair.org/index.php/jair/article/view/11640">paper</a> published in JAIR provides a comprehensive overview of the training, evaluation, and use of cross-lingual word embedding models.</p>
<p><br />
The Gradient published an excellent <a href="https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/">article</a> detailing the current limitations of reinforcement learning and also providing a potential path forward with hierarchical reinforcement learning. And in a timely manner, a couple of folks published an excellent set of <a href="https://github.com/araffin/rl-tutorial-jnrr19/blob/master/1_getting_started.ipynb">tutorials</a> to get started with reinforcement learning.</p>
<p><br />
This <a href="https://arxiv.org/abs/1902.06006">paper provides</a> a light introduction to contextual word representations.</p>
<h3 id="mlnlp-creativity-and-society-"><strong>ML/NLP Creativity and Society 🎨</strong></h3>
<p><em>Machine learning has been applied to solve real-world problems but it has also been applied in interesting and creative ways. ML creativity is as important as any other research area in AI because at the end of the day we wish to build AI systems that will help shape our culture and society.</em></p>
<p><br />
Towards the end of this year, Gary Marcus and Yoshua Bengio <a href="https://www.zdnet.com/article/devils-in-the-details-in-bengio-marcus-ai-debate/">debated</a> on the topics of deep learning, symbolic AI and the idea of hybrid AI systems.</p>
<p><br />
The <a href="https://hai.stanford.edu/ai-index/2019">2019 AI Index Report</a> was finally released and provides a comprehensive analysis of the state of AI which can be used to better understand the progress of AI in general.</p>
<p><br />
<a href="https://en.wikipedia.org/wiki/Commonsense_reasoning">Commonsense reasoning</a> continues to be an important area of research as we aim to build artificial intelligence systems that not are only able to make a prediction on the data provided but also understand and can reason about those decisions. This type of technology can be used in conversational AI where the goal is to enable an intelligent agent to have more natural conversations with people. Check out this <a href="https://www.forbes.com/sites/ayurellahornmuller/2018/12/31/the-art-of-ai-storytelling-how-one-30-under-30-scientist-is-teaching-devices-to-make-assumptions/#61de7c52a4f0">interview</a> with Nasrin Mostafazadeh having a discussion on commonsense reasoning and applications such as storytelling and language understanding. You can also check out this recent <a href="https://arxiv.org/abs/1906.02361">paper</a> on how to leverage language models for commonsense reasoning.</p>
<p><br />
<a href="https://openai.com/blog/introducing-activation-atlases/">Activation Atlases</a> is a technique developed by researchers at Google and Open AI to better understand and visualize the interactions happening between neurons of a neural network.</p>
<p><br />
<img src="https://miro.medium.com/max/1600/0*MQUIQ6n7i1RwfCbK.jpg" alt="" /></p>
<p><em>An activation atlas of the InceptionV1 vision classification network reveals many fully realized features, such as electronics, buildings, food, animal ears, plants, and watery backgrounds.” — <a href="https://openai.com/blog/introducing-activation-atlases/">source</a></em></p>
<p><br />
Check out the <a href="https://fcrc.acm.org/turing-lecture-at-fcrc-2019">Turing Lecture</a> delivered by Geoffrey Hinton and Yann LeCun who were <a href="https://medium.com/dair-ai/turing-award-goes-to-deep-learning-pioneers-38d37cc6d0dd?source=collection_home---4------10-----------------------">awarded</a>, together with Yoshua Bengio, the Turing Award this year.</p>
<p><br />
Tackling climate change with machine learning is discussed in this <a href="https://arxiv.org/abs/1906.05433">paper</a>.</p>
<p><br />
OpenAI published an extensive <a href="https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf">report</a> discussing the social impacts of language models covering topics like beneficial use and potential misuse of the technology.</p>
<p><br />
Emotion analysis continues to be used in a diverse range of applications. <a href="https://themojifier.com/">The Mojifier</a> is a cool project that looks at an image, detects the emotion, and replaces the face with the emojis matching the emotion detected.</p>
<p><br />
Work on radiology with the use of AI techniques has also been trending this year. Here is a nice <a href="https://arxiv.org/abs/1903.11726">summary</a> of trends and perspectives in this area of study.</p>
<p><br />
Researchers from NYU also released a <a href="https://medium.com/@jasonphang/deep-neural-networks-improve-radiologists-performance-in-breast-cancer-screening-565eb2bd3c9f">Pytorch implementation</a> of a deep neural network that improves radiologists’ performance on breast cancer screening. And here is a major <a href="https://physionet.org/content/mimic-cxr/2.0.0/">dataset</a> release called MIMIC-CXR which consists of a database of chest Xrays and text radiology reports.</p>
<p><br />
The New York Times wrote a <a href="https://www.nytimes.com/2019/01/02/obituaries/karen-sparck-jones-overlooked.html">piece</a> on Karen Spark Jones remembering the seminal contributions she made to NLP and Information Retrieval.</p>
<p><br />
OpenAI Five <a href="https://openai.com/blog/openai-five-defeats-dota-2-world-champions/">became</a> the first AI system to beat a world champion at an esports game.</p>
<p><br />
The <a href="https://jfgagne.ai/talent-2019/">Global AI Talent Report</a> provides a detailed report of the worldwide AI talent pool and the demand for AI talent globally.</p>
<p><br />
If you haven’t subscribed already, the DeepMind team has an excellent <a href="https://deepmind.com/blog?filters=%7B%22category%22:%5B%22Podcasts%22%5D%7D">podcast</a> where participants discuss the most pressing topics involving AI. Talking about AI potential, Demis Hassabis did an <a href="https://worldin.economist.com/article/17385/edition2020demis-hassabis-predicts-ai-will-supercharge-science?utm_medium=pr&utm_source=inf-a&utm_campaign=worldin">interview</a> with The Economist where he spoke about futuristic ideas such as using AI as an extension to the human mind to potentially find solutions to important scientific problems.</p>
<p><br />
This year also witnessed incredible advancement in ML for health applications. For instance, researchers at Massachusetts <a href="https://venturebeat.com/2019/01/04/massachusetts-generals-ai-can-spot-brain-hemorrhages-as-accurately-as-humans/">developed</a> an AI system capable of spotting brain hemorrhages as accurate as humans.</p>
<p><br />
<img src="https://miro.medium.com/max/1104/0*HQad0irUNeJ79Ib9" alt="" />
<em>“Brain scans analyzed by the AI system.”</em></p>
<p><br />
Janelle Shane summarizes a set of <a href="https://aiweirdness.com/post/181621835642/10-things-artificial-intelligence-did-in-2018">“weird” experiments</a> showing how machine learning can be used in creative ways to conduct fun experimentation. Sometimes this is the type of experiment that’s needed to really understand what an AI system is actually doing and not doing. Some experiments include neural networks generating fake snakes and telling jokes.</p>
<p><br />
<img src="https://miro.medium.com/max/400/0*5DZujahMQxmWHG-J.png" alt="" />
<a href="https://aiweirdness.com/post/181621835642/10-things-artificial-intelligence-did-in-2018">Snake Species</a></p>
<p><br />
<a href="https://www.blog.google/topics/machine-learning/hunting-planets-machine-learning/">Learn</a> to find planets with machine learning models build on top of TensorFlow.</p>
<p><br />
OpenAI <a href="https://openai.com/blog/better-language-models/#sample1">discusses</a> the implication of releasing (including the potential of malicious use cases) large-scale unsupervised language models.</p>
<p><br />
This <a href="https://colab.research.google.com/github/google/nucleus/blob/master/nucleus/examples/dna_sequencing_error_correction.ipynb">Colab notebook</a> provides a great introduction on how to use Nucleus and TensorFlow for “DNA Sequencing Error Correction”. And here is a great detailed <a href="https://blog.floydhub.com/exploring-dna-with-deep-learning/">post</a> on the use of deep learning architectures for exploring DNA.</p>
<p><br />
<img src="https://miro.medium.com/max/819/1*m6Olf8Vu5M0VLdd8-TU2Nw.jpeg" alt="" /></p>
<p><em><a href="https://raw.githubusercontent.com/google/nucleus/master/nucleus/examples/images/consensus-approach-overview.jpg">source</a></em></p>
<p><br />
Alexander Rush is a Harvard NLP researcher who wrote an important article about the <a href="http://nlp.seas.harvard.edu/NamedTensor">issues</a> with tensors and how some current libraries expose them. He also went on to talk about a proposal for tensors with named indices.</p>
<hr />
<h3 id="mlnlp-tools-and-datasets-️"><strong>ML/NLP Tools and Datasets ⚙️</strong></h3>
<p><em>Here I highlight stories related to software and datasets that have assisted in enabling NLP and machine learning research and engineering.</em></p>
<p><br />
Hugging Face released a popular Transformer <a href="https://github.com/huggingface/transformers">library</a> based on Pytorch names pytorch-transformers. It allows NLP practitioners and researchers to easily use state-of-the-art general-purpose architectures such as BERT, GPT-2, and XLM, among others. If you are interested in how to use pytorch-transformers there are a few places to start but I really liked this detailed <a href="https://rsilveira79.github.io/fermenting_gradients/machine_learning/nlp/pytorch/pytorch-transformer-squad/">tutorial</a> by Roberto Silveira showing how to use the library for machine comprehension</p>
<p><br />
<img src="https://miro.medium.com/max/425/0*wHSaulTrUQzX4mRd.png" alt="" /></p>
<p><em><a href="https://github.com/huggingface/transformers">source</a></em></p>
<p><br />
TensorFlow 2.0 was released with a bunch of <a href="https://medium.com/tensorflow/whats-coming-in-tensorflow-2-0-d3663832e9b8">new features</a>. Read more about best practices <a href="https://medium.com/tensorflow/effective-tensorflow-2-0-best-practices-and-whats-changed-a0ca48767aff">here</a>. François Chollet also wrote an extensive overview of the new features in this <a href="https://colab.research.google.com/drive/1UCJt8EYjlzCs1H1d1X0iDGYJsHKwu-NO">Colab notebook</a>.</p>
<p><br />
PyTorch 1.3 was <a href="https://ai.facebook.com/blog/pytorch-13-adds-mobile-privacy-quantization-and-named-tensors/">released</a> with a ton of new features including named tensors and other front-end improvements.</p>
<p><br />
The Allen Institute for AI released <a href="https://iconary.allenai.org/">Iconary</a> which is an AI system that can play Pictionary-style games with a human. This work incorporates visual/language learning systems and commonsense reasoning. They also published a new commonsense reasoning <a href="https://arxiv.org/abs/1908.05739">benchmark</a> called Abductive-NLI.</p>
<p><br />
spaCy <a href="https://explosion.ai/blog/spacy-transformers">releases</a> a new library to incorporate Transformer language models into their own library so as to be able to extract features and used them in spaCy NLP pipelines. This effort is built on top of the popular Transformers library developed by Hugging Face. Maximilien Roberti also <a href="https://towardsdatascience.com/fastai-with-transformers-bert-roberta-xlnet-xlm-distilbert-4f41ee18ecb2">wrote</a> a nice article on how to combine fast.ai code with pytorch-transformers.</p>
<p><br />
The Facebook AI team released <a href="https://phyre.ai/">PHYRE</a> which is a benchmark for physical reasoning aiming to test the physical reasoning of AI systems through solving various physics puzzles.</p>
<p><br />
<img src="https://miro.medium.com/max/416/1*1sNrObRoffXbfX2D61bXTw.gif" alt="" /></p>
<p><em><a href="https://phyre.ai/">source</a></em></p>
<p><br />
StanfordNLP released <a href="https://stanfordnlp.github.io/stanfordnlp/">StanfordNLP 0.2.0</a> which is a Python library for natural language analysis. You can perform different types of linguistic analysis such as lemmatization and part of speech recognition on over 70 different languages.</p>
<p><br />
<a href="https://cs.stanford.edu/people/dorarad/gqa/">GQA</a> is a visual question answering dataset for enabling research related to visual reasoning.</p>
<p><br />
exBERT is a visual interactive tool to explore the embeddings and attention of Transformer language models. You can find the paper <a href="https://arxiv.org/abs/1910.05276">here</a> and the demo <a href="http://exbert.net/">here</a>.</p>
<p><br />
<img src="https://miro.medium.com/max/1600/0*eHjrWea2jeGqhvI_.png" alt="" /></p>
<p><em>exBERT — <a href="http://exbert.net/">source</a></em></p>
<p><br />
Distill published an <a href="https://distill.pub/2019/memorization-in-rnns/">article</a> on how to visualize memorization in Recurrent Neural Networks (RNNs).</p>
<p><br />
<a href="https://mathpix.com/">Mathpix</a> is a tool that lets you take a picture of an equation and then it provides you with the latex version.</p>
<p><br />
<img src="https://miro.medium.com/max/400/1*d3BlwVO1E9ndiLVOj4BPcQ.gif" alt="" /></p>
<p><em><a href="https://mathpix.com/">source</a></em></p>
<p><br />
<a href="https://parl.ai/">Parl.ai</a> is a platform that hosts many popular datasets for all works involving dialog and conversational AI.</p>
<p><br />
Uber researchers released <a href="https://uber.github.io/ludwig/">Ludwig</a>, an open-source tool that allows users to easily train and test deep learning models with just a few lines of codes. The whole idea is to avoid any coding while training and testing models.</p>
<p><br />
Google AI researchers release <a href="https://ai.googleblog.com/2019/01/natural-questions-new-corpus-and.html">“Natural Questions”</a> which is a large-scale corpus for training and evaluating open-domain question answering systems.</p>
<h3 id="articles-and-blog-posts-️"><strong>Articles and Blog posts ✍️</strong></h3>
<p><em>This year witnessed an explosion of data science writers and enthusiasts. This is great for our field and encourages healthy discussion and learning. Here I list a few interesting and must-see articles and blog posts I came across:</em></p>
<p><br />
Christian Perone provides an <a href="http://blog.christianperone.com/2019/01/mle/">excellent introduction</a> to maximum likelihood estimation (MLE) and maximum a posteriori (MAP) which are important principles to understand how parameters of a model are estimated.</p>
<p><br />
Reiichiro Nakano published a <a href="https://reiinakano.com/2019/06/21/robust-neural-style-transfer.html">blog post</a> discussing neural style transfer with adversarially robust classifiers. A Colab <a href="https://colab.research.google.com/github/reiinakano/adversarially-robust-neural-style-transfer/blob/master/Robust_Neural_Style_Transfer.ipynb">notebook</a> was also provided.</p>
<p><br />
Saif M. Mohammad started a great <a href="https://medium.com/@nlpscholar/state-of-nlp-cbf768492f90">series</a> discussing a diachronic analysis of ACL anthology.</p>
<p><br />
<img src="https://miro.medium.com/max/1570/0*zfM9ED6W74NyxMln.png" alt="" />
<em>“Graphs showing average academic age, median academic age, and percentage of first-time publishers in AA over time.” — <a href="https://medium.com/@nlpscholar/state-of-nlp-cbf768492f90">source</a></em></p>
<p><br />
The question is: can a language model learn syntax? Using structural probes, this <a href="https://nlp.stanford.edu/~johnhew/structural-probe.html">work</a> aims to show that it is possible to do so using contextualized representations and a method for finding tree structures.</p>
<p><br />
Andrej Karpathy wrote a <a href="https://karpathy.github.io/2019/04/25/recipe/">blog post</a> summarizing best practices and a recipe on how to effectively train neural networks.</p>
<p><br />
Google AI researchers and other researchers collaborated to <a href="https://www.blog.google/products/search/search-language-understanding-bert">improve</a> the understanding of search using BERT models. Contextualized approaches like BERT are adequate to understand the intent behind search queries.</p>
<p><br />
<a href="https://medium.com/@lessw/new-state-of-the-art-ai-optimizer-rectified-adam-radam-5d854730807b">Rectified Adam</a> (RAdam) is a new optimization technique based on Adam optimizer that helps to improve AI architectures. There are several efforts in coming up with better and more stable optimizers but the authors claim to focus on other aspects of optimizations that are just as important for delivering improved convergence.</p>
<p><br />
With a lot of development of machine learning tools recently, there are also many discussions on how to implement ML systems that enable solutions to practical problems. Chip Huyen <a href="https://github.com/chiphuyen/machine-learning-systems-design/blob/master/build/build1/consolidated.pdf">wrote</a> an interesting chapter discussing machine learning system design emphasizing on topics such as hyperparameter tuning and data pipeline.</p>
<p><br />
NVIDIA <a href="https://techcrunch.com/2019/08/13/nvidia-breaks-records-in-training-and-inference-for-real-time-conversational-ai/">breaks the record</a> for creating the biggest language model trained on billions of parameters.</p>
<p><br />
Abigail See wrote this excellent blog <a href="http://www.abigailsee.com/2019/08/13/what-makes-a-good-conversation.html">post</a> about what makes a good conversation in the context of systems developed to perform natural language generation task.</p>
<p><br />
Google AI <a href="https://ai.googleblog.com/2019/09/announcing-two-new-natural-language.html">published</a> two natural language dialog datasets with the idea to use more complex and natural dialog datasets to improve personalization in conversational applications like digital assistants.</p>
<p><br />
Deep reinforcement learning continues to be one of the most widely discussed topics in the field of AI and it has even attracted interest in the space of psychology and neuroscience. Read more about some highlights in this <a href="https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(19)30061-0">paper published</a> in Trends in Cognitive Sciences.</p>
<p><br />
Samira Abner wrote this excellent <a href="https://staff.fnwi.uva.nl/s.abnar/?p=108">blog post</a> summarizing the main building blocks behind Transformers and capsule networks and their connections. Adam Kosiorek also wrote this magnificent <a href="http://akosiorek.github.io/ml/2019/06/23/stacked_capsule_autoencoders.html">piece</a> on stacked capsule-based autoencoders (an unsupervised version of capsule networks) which was used for object detection.</p>
<p><br />
<img src="https://miro.medium.com/max/819/0*R4x3osVbWIuUUPrn.png" alt="" /></p>
<p><em><a href="https://staff.fnwi.uva.nl/s.abnar/?p=108">source</a></em></p>
<p><br />
Researchers published an <a href="https://distill.pub/2019/visual-exploration-gaussian-processes/">interactive article</a> on Distill that aims to show a visual exploration of Gaussian Processes.</p>
<p><br />
Through this Distill <a href="https://distill.pub/2019/gan-open-problems/">publication</a>, Augustus Odena makes a call to researchers to address several important open questions about GANs.</p>
<p><br />
Here is a PyTorch <a href="https://github.com/zaidalyafeai/Notebooks/blob/master/Deep_GCN_Spam.ipynb">implementation</a> of graph convolutional networks (GCNs) used for classifying spammers vs. non-spammers.</p>
<p><br />
At the beginning of the year, VentureBeat released a list of <a href="https://venturebeat.com/2019/01/02/ai-predictions-for-2019-from-yann-lecun-hilary-mason-andrew-ng-and-rumman-chowdhury/">predictions</a> for 2019 made by experts such as Rumman Chowdury, Hilary Mason, Andrew Ng, and Yan LeCun. Check it out to see if their predictions were right.</p>
<p><br />
<a href="https://medium.com/huggingface/multi-label-text-classification-using-bert-the-mighty-transformer-69714fa3fb3d">Learn</a> how to finetune BERT to perform multi-label text classification.</p>
<p><br />
Due to the popularity of BERT, in the past few months, many researchers developed methods to “compress” BERT with the idea to build faster, smaller and memory-efficient versions of the original. Mitchell A. Gordon <a href="http://mitchgordon.me/machine/learning/2019/11/18/all-the-ways-to-compress-BERT.html">wrote</a> a summary of the types of compressions and methods developed around this objective.</p>
<p><br />
Superintelligence continued to be a topic of debate among experts. It’s an important topic that needs a proper understanding of frameworks, policies, and careful observations. I found this interesting <a href="https://www.fhi.ox.ac.uk/wp-content/uploads/Reframing_Superintelligence_FHI-TR-2019-1.1-1.pdf">series of comprehensive essays</a> (in the form of a technical report by K. Eric Drexler) to be useful to understand some issues and considerations around the topic of superintelligence.</p>
<p><br />
Eric Jang wrote a nice <a href="https://blog.evjang.com/2019/02/maml-jax.html">blog post</a> introducing the concept of meta-learning which aims to build and train machine learning models that not only predict well but also learn well.</p>
<p><br />
A <a href="https://ruder.io/aaai-2019-highlights/">summary</a> of AAAI 2019 highlights by Sebastian Ruder.</p>
<p><br />
Graph neural networks were heavily discussed this year. David Mack wrote a <a href="https://medium.com/octavian-ai/finding-shortest-paths-with-graph-networks-807c5bbfc9c8">nice visual article</a> about how they used this technique together with attention to perform shortest path calculations.</p>
<p><br />
Bayesian approaches remain an interesting subject, in particular how they can be applied to neural networks to avoid common issues like over-fitting. Here is a <a href="https://medium.com/neuralspace/bayesian-neural-network-series-post-1-need-for-bayesian-networks-e209e66b70b2">list</a> of suggested reads by Kumar Shridhar on the topic.</p>
<p><br />
<img src="https://miro.medium.com/max/1600/0*E8ScZUhm9npwaZQm.png" alt="" /></p>
<p><em>“Network with point-estimates as weights vs Network with probability distribution as weights” - <a href="https://arxiv.org/pdf/1806.05978.pdf">Source</a></em></p>
<h3 id="ethics-in-ai-"><strong>Ethics in AI 🚨</strong></h3>
<p><em>Perhaps one of the most highly discussed aspects of AI systems this year was ethics which include discussions around bias, fairness, and transparency, among others. In this section, I provide a list of interesting stories and papers around this topic:</em></p>
<p><br />
The paper titled <a href="http://papers.nips.cc/paper/8035-does-mitigating-mls-impact-disparity-require-treatment-disparity">“Does mitigating ML’s impact disparity require treatment disparity?”</a> discusses the consequences of applying disparate learning processes through experiments conducted on real-world datasets.</p>
<p><br />
HuggingFace published an <a href="https://medium.com/huggingface/ethical-analysis-of-the-open-sourcing-of-a-state-of-the-art-conversational-ai-852113c324b2">article</a> discussing ethics in the context of open-sourcing NLP technology for conversational AI.</p>
<p><br />
Being able to quantify the role of ethics in AI research is an important endeavor going forward as we continue to introduce AI-based technologies to society. This <a href="https://arxiv.org/abs/1809.08328">paper</a> provides a broad analysis of the measures and “use of ethics-related research in leading AI, machine learning and robotics venues.”</p>
<p><br />
This <a href="https://arxiv.org/abs/1903.03862">work</a> presented at NAACL 2019 discusses how debiasing methods can cover up gender bias in word embeddings.</p>
<p><br />
<a href="https://www.youtube.com/watch?v=A2Jtqi_oa2Y]">Listen</a> to Zachary Lipton presenting his paper “Troubling Trends in ML Scholarship”. I also wrote a summary of this interesting paper which you can find <a href="https://medium.com/dair-ai/an-overview-of-troubling-trends-in-machine-learning-scholarship-582df3caa518?source=false---------0">here</a>.</p>
<p><br />
Gary Marcus and Ernest Davis published their book on <a href="https://www.amazon.com/Rebooting-AI-Building-Artificial-Intelligence/dp/1524748250">“Rebooting AI: Building Artificial Intelligence We Can Trust”</a>. The main theme of the book is to talk about the steps we must take to achieve robust artificial intelligence. On the topic of AI progression, François Chollet also wrote an impressive <a href="https://arxiv.org/abs/1911.01547">paper</a> making a case for better ways to measure intelligence.</p>
<p><br />
Check out this Udacity <a href="https://www.udacity.com/course/secure-and-private-ai--ud185">course</a> created by Andrew Trask on topics such as differential privacy, federated learning, and encrypted AI. On the topic of privacy, Emma Bluemke wrote this great <a href="https://blog.openmined.org/federated-learning-differential-privacy-and-encrypted-computation-for-medical-imaging/">post</a> discussing how one may go about training machine learning models while preserving patient privacy.</p>
<p><br />
At the beginning of this year, Mariya Yao <a href="https://www.topbots.com/most-important-ai-ethics-research/">posted</a> a comprehensive list of research paper summaries involving AI ethics. Although the list of paper reference was from 2018, I believe they are still relevant today.</p>
<h3 id="mlnlp-education-"><strong>ML/NLP Education 🎓</strong></h3>
<p><em>Here I will feature a list of educational resources, writers and people doing some amazing work educating others about difficult ML/NLP concepts/topics:</em></p>
<p><br />
CMU released materials and syllabus for their <a href="http://phontron.com/class/nn4nlp2019/">“Neural Networks for NLP”</a> course.</p>
<p><br />
<a href="https://twitter.com/omarsar0">Elvis Saravia</a> and <a href="https://github.com/soujanyaporia">Soujanya Poria</a> released a project called <a href="https://nlpoverview.com/">NLP-Overview</a> that is intended to help students and practitioners to get a condensed overview of modern deep learning techniques applied to NLP, including theory, algorithms, applications, and state of the art results — <a href="https://github.com/omarsar/nlp_overview">Link</a></p>
<p><br />
<img src="https://miro.medium.com/max/992/0*FN-wy7HwrSN1dmW0.png" alt="" /></p>
<p><em><a href="https://nlpoverview.com/">NLP Overview</a></em></p>
<p><br />
Microsoft Research Lab published a free <a href="https://www.datasciencecentral.com/profiles/blogs/new-book-foundations-of-data-science-from-microsoft-research-lab">ebook</a> on the foundation of data science with topics ranging from Markov Chain Monte Carlo to Random Graphs.</p>
<p><br />
<a href="https://mml-book.github.io/">“Mathematics for Machine Learning”</a> is a free ebook introducing the most important mathematical concepts used in machine learning. It also includes a few Jupyter notebook tutorials describing the machine learning parts. Jean Gallier and Jocelyn Quaintance wrote an <a href="https://www.cis.upenn.edu/~jean/math-deep.pdf">extensive free ebook</a> covering mathematical concepts used in machine learning.</p>
<p><br />
Stanford releases a <a href="https://www.youtube.com/playlist?list=PLoROMvodv4rObpMCir6rNNUlFAn56Js20">playlist of videos</a> for its course on “Natural Language Understanding”.</p>
<p><br />
On the topic of learning, OpenAI put together this great <a href="https://openai.com/blog/learning-day/">list</a> of suggestions on how to keep learning and improving your machine learning skills. Apparently, their employees use these methods on a daily basis to keep learning and expanding their knowledge.</p>
<p><br />
<img src="https://miro.medium.com/max/792/1*VIcTGpIn2GoHQbZx3rPucQ.png" alt="" /></p>
<p><em><a href="https://openai.com/blog/learning-day/">source</a></em></p>
<p><br />
Adrian Rosebrock <a href="https://www.pyimagesearch.com/start-here/">published</a> an 81-page guide on how to do computer vision with Python and OpenCV.</p>
<p><br />
Emily M. Bender and Alex Lascarides published a <a href="http://www.morganclaypoolpublishers.com/catalog_Orig/product_info.php?products_id=1451">book</a> titled “Linguistic Fundamentals for NLP”. The main idea behind the book is to discuss what
“meaning” is in the field of NLP by providing a proper foundation on semantics and pragmatics.</p>
<p><br />
Elad Hazan published his <a href="https://drive.google.com/file/d/1GIDnw7T-NT4Do3eC0B5kYJlzwOs6nzIO/view">lecture notes</a> on “Optimization for Machine Learning” which aims to present machine training as an optimization problem with beautiful math and notations. Deeplearning.ai also published a <a href="https://www.deeplearning.ai/ai-notes/optimization/?utm_source=social&utm_medium=twitter&utm_campaign=BlogAINotesOptimizationAugust272019">great article</a> that discusses parameter optimization in neural networks using a more visual and interactive approach.</p>
<p><br />
Andreas Mueller published a <a href="https://www.youtube.com/playlist?list=PL_pVmAaAnxIQGzQS2oI3OWEPT-dpmwTfA">playlist</a> of videos for a new course in “Applied Machine Learning”.</p>
<p><br />
Fast.ai <a href="https://www.fast.ai/2019/06/28/course-p2v3/">releases</a> its new MOOC titled “Deep Learning from the Foundations”.</p>
<p><br />
MIT published all <a href="https://www.youtube.com/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI">videos</a> and syllabus for their course on “Introduction to Deep Learning”.</p>
<p><br />
Chip Huyen <a href="https://twitter.com/chipro/status/1157772112876060672">tweeted</a> an impressive list of free online courses to get started with machine learning.</p>
<p><br />
Andrew Trask <a href="https://github.com/iamtrask/Grokking-Deep-Learning">published</a> his book titled “Grokking Deep Learning”. The book serves as a great starter for understanding the fundamental building blocks of neural network architectures.</p>
<p><br />
Sebastian Raschka uploaded <a href="https://github.com/rasbt/deeplearning-models">80 notebooks</a> about how to implement different deep learning models such as RNNs and CNNs. The great thing is that the models are all implemented in both PyTorch and TensorFlow.</p>
<p><br />
Here is a great <a href="https://medium.com/@d3lm/understand-tensorflow-by-mimicking-its-api-from-scratch-faa55787170d">tutorial</a> that goes deep into understanding how TensorFlow works. And here is <a href="http://blog.christianperone.com/2018/03/pytorch-internal-architecture-tour/">one</a> by Christian Perone for PyTorch.</p>
<p><br />
Fast.ai also published a course titled “Intro to NLP” accompanied by a <a href="https://www.youtube.com/playlist?list=PLtmWHNX-gukKocXQOkQjuVxglSDYWsSh9">playlist</a>. Topics range from sentiment analysis to topic modeling to the Transformer.</p>
<p><br />
<a href="https://ipam.wistia.com/medias/excbyr8gvv">Learn</a> about Graph Convolutional Neural Networks for Molecular Generation in this talk by Xavier Bresson. Slides can be found <a href="http://helper.ipam.ucla.edu/publications/glws4/glws4_16076.pdf">here</a>. And here is a paper discussing how to <a href="https://arxiv.org/abs/1905.12265">pre-train</a> GNNs.</p>
<p><br />
On the topic of graph networks, some engineers <a href="https://www.eurekalert.org/pub_releases/2019-06/uoc--eug060719.php">use them</a> to predict the properties of molecules and crystal. The Google AI team also published an excellent <a href="https://ai.googleblog.com/2019/10/learning-to-smell-using-deep-learning.html">blog</a> post explaining how they use GNNs for odor prediction. If you are interested in getting started with Graph Neural Networks, here is a comprehensive <a href="https://arxiv.org/pdf/1812.08434.pdf">overview</a> of the different GNNs and their applications.</p>
<p><br />
Here is a <a href="https://www.youtube.com/playlist?list=PLFInMJnvb3owAddRh4qk2gCX25kGLDay-">playlist</a> of videos on unsupervised learning methods such as PCA by Rene Vidal from Johns Hopkins University.</p>
<p><br />
If you are ever interested in converting a pretrained TensorFlow model to PyTorch, Thomas Wolf has you covered in this <a href="https://medium.com/huggingface/from-tensorflow-to-pytorch-265f40ef2a28">blog post</a>.</p>
<p><br />
Want to learn about generative deep learning? David Foster wrote a great <a href="https://www.oreilly.com/library/view/generative-deep-learning/9781492041931/">book</a> that teaches data scientists how to apply GANs and encoder-decoder models for performing tasks such as painting, writing, and composing music. Here is the <a href="https://github.com/davidADSP/GDL_code">official repository</a> accompanying the book, it includes TensorFlow code. There is also an <a href="https://github.com/MLSlayer/Generative-Deep-Learning-Code-in-Pytorch">effort</a> to convert the code to PyTorch as well.</p>
<p><br />
A Colab <a href="https://colab.research.google.com/drive/1rjjjA7teiZVHJCMTVD8KlZNu3EjS7Dmu#scrollTo=T9xtzFTJ1Uwf">notebook</a> containing code blocks to practice and learn about causal inference concepts such as interventions, counterfactuals, etc.</p>
<p><br />
Here are the <a href="https://github.com/huggingface/naacl_transfer_learning_tutorial">materials</a> for the NAACL 2019 tutorial on “Transfer Learning in Natural Language Processing” delivered by Sebastian Ruder, Matthew Peters, Swabha Swayamdipta and Thomas Wolf. They also provided an accompanying Google Colab <a href="https://colab.research.google.com/drive/1iDHCYIrWswIKp-n-pOg69xLoZO09MEgf">notebook</a> to get started.</p>
<p><br />
Another great <a href="https://jalammar.github.io/visual-numpy/">blog post</a> from Jay Alammar on the topic of data representation. He also wrote many other interesting illustrated guides that include <a href="http://jalammar.github.io/illustrated-gpt2/">GPT-2</a> and <a href="http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/">BERT</a>. Peter Bloem also published a very detailed <a href="http://peterbloem.nl/blog/transformers">blog post</a> explaining all the bits that make up a Transformer.</p>
<p><br />
<img src="https://miro.medium.com/max/1094/1*MIS8yQRqbP-c6eldo9K4QQ.png" alt="" /></p>
<p><em>A visual illustration of basic self-attention — <a href="http://peterbloem.nl/blog/transformers">source</a></em></p>
<p><br />
Here is a nice overview of trends in NLP at ACL 2019, written by <a href="https://www.mihaileric.com/posts/nlp-trends-acl-2019/">Mihail Eric</a>. Some topics include infusing knowledge into NLP architectures, interpretability, and reducing bias among others. Here are a couple more overviews if you are interested: <a href="https://medium.com/@mgalkin/knowledge-graphs-in-natural-language-processing-acl-2019-7a14eb20fce8">link 2</a> and <a href="http://noecasas.com/post/acl2019/">link 3</a>.</p>
<p><br />
The full syllabus for CS231n 2019 edition was <a href="http://cs231n.stanford.edu/syllabus.html">released</a> by Stanford.</p>
<p><br />
David Abel <a href="https://david-abel.github.io/notes/iclr_2019.pdf">posted</a> a set of notes for ICLR 2019. He was also nice to provide an <a href="https://david-abel.github.io/notes/neurips_2019.pdf">impressive</a> summary of NeurIPS 2019.</p>
<p><br />
This is an excellent <a href="http://d2l.ai/">book</a> that provides learners with a proper introduction to deep learning with notebooks provided as well.</p>
<p><br />
<img src="https://miro.medium.com/max/560/0*_Rr1ogWtztlm3ffH.png" alt="" />
<em><a href="http://d2l.ai/">source</a></em></p>
<p><br />
An illustrated <a href="http://jalammar.github.io/illustrated-bert/">guide</a> to BERT, ELMo, and co. for transfer learning NLP.</p>
<p><br />
<img src="https://miro.medium.com/max/1098/0*wywWzpKhTISgCtq5.png" alt="" /></p>
<p><br />
Fast.ai <a href="https://www.fast.ai/2019/01/24/course-v3/">releases</a> its 2019 edition of the “Practical Deep Learning for Coders” course.</p>
<p><br />
<a href="https://sites.google.com/view/berkeley-cs294-158-sp19/home">Learn</a> about deep unsupervised learning in this fantastic course taught by Pieter Abbeel and others.</p>
<p><br />
Gilbert Strang <a href="http://math.mit.edu/~gs/learningfromdata/'">released</a> a new book related to Linear Algebra and neural networks.</p>
<p><br />
Caltech provided the entire syllabus, lecture slides, and video playlist for their course on <a href="http://tensorlab.cms.caltech.edu/users/anima/cs165.html">“Foundation of Machine Learning”</a>.</p>
<p><br />
The “<a href="https://scipy-lectures.org/">Scipy Lecture Notes</a>” is a series of tutorials that teach you how to master tools such as matplotlib, NumPy, and SciPy.</p>
<p><br />
Here is an excellent <a href="https://peterroelants.github.io/posts/gaussian-process-tutorial/">tutorial</a> on understanding Gaussian processes. (Notebooks provided).</p>
<p><br />
This is a must-read <a href="https://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html">article</a> in which Lilian Weng provides a deep dive into generalized language models such as ULMFit, OpenAI GPT-2, and BERT.</p>
<p><br />
<a href="https://paperswithcode.com/">Papers with Code</a> is a website that shows a curated list of machine learning papers with code and state-of-the-art results.</p>
<p><br />
Christoph Molnar released the first edition of “<a href="https://christophm.github.io/interpretable-ml-book/">Interpretable Machine Learning</a>” which is a book that touches on important techniques used to better interpret machine learning algorithms.</p>
<p><br />
David Bamman <a href="http://people.ischool.berkeley.edu/~dbamman/nlp18.html">releases</a> the full syllabus and slides to the NLP courses offered at UC Berkley.</p>
<p><br />
Berkley <a href="https://github.com/dbamman/anlp19">releases</a> all materials for their “Applied NLP” class.</p>
<p><br />
Aerin Kim is a senior research engineer at Microsoft and <a href="https://towardsdatascience.com/@aerinykim">writes</a> about topics related to applied Math and deep learning. Some topics include intuition to conditional independence, gamma distribution, perplexity, etc.</p>
<p><br />
Tai-Danae Bradley wrote this <a href="https://www.math3ma.com/blog/matrices-as-tensor-network-diagrams">blog post</a> discussing ways on how to think about matrices and tensors. The article is written with some incredible visuals which help to better understand certain transformations and operations performed on matrices.</p>
<p><br />
<img src="https://miro.medium.com/max/1600/0*TLepZbfBk5EjK1E0.jpg" alt="" /></p>
<p><em><a href="https://www.math3ma.com/blog/matrices-as-tensor-network-diagrams">source</a></em></p>
<hr />
<p>I hope you found the links useful. I wish you a successful and healthy 2020!
Due to the holidays, I didn’t get much chance to proofread the article so any feedback or corrections are welcomed!</p>
<blockquote>
<blockquote>
<p><a href="https://github.com/omarsar/nlp_highlights">PDF version</a> «</p>
</blockquote>
</blockquote>
<p><a href="https://dair.ai/posts/NLP_Year_in_Review-2019/">NLP Year in Review — 2019</a> was originally published by DAIR.AI at <a href="https://dair.ai">DAIR.AI</a> on April 25, 2020.</p>