Zum Hauptinhalt springen

Natural Language Processing in Python (English)

Website der Veranstaltung

Datum und Uhrzeit

04.10.2022, 09:00 - 07.10.2022, 13:30
Im Kalender speichern


Zoom (Online)
München, Deutschland


Trainer: Dr. Matthias Aßenmacher

Course duration: 4 days, each from 09:00 - 13:30 CET (Online-Live-Seminar)

Course language: English (we can switch to German if all participants are fluent in German)

In the last five to six years, researchers have achieved some breakthrough developments in text mining and natural language processing (NLP). These breakthroughs are largely based on three influential factors:

- Conceptually new frameworks from the field of Deep Learning.

- Significant improvements in computational resources,

- (Significantly) Larger amounts of available data (Big Data).

In this course, we will start with the basics of text processing in Python and learn about classical feature engineering approaches from the field of Machine Learning. We will then have a detailed look at the methodology that can be seen as the beginning of a new era in NLP: the so-called Word Embeddings or Word Vectors. We will further discuss the integration of these word embeddings into modern Deep Learning architectures, in particular deep recurrent neural networks (RNNs). Since the so-called "attention" mechanism and transfer learning form the basis of the most current state-of-the-art models such as BERT & Co., we will cover these two topics in detail in the final part of the course.

Overall, the course will cover the following topics:

Part 1: We will first illustrate the importance of NLP with some examples. After that, there will be an introduction to dealing with text data and their potential representations in Machine Learning. Afterward, so-called Fully-Connected-Neural-Networks (FCNNs) will be introduced as an important basis for the rest of the course.

Part 2: We will deal exclusively with so-called neural representations of texts. We will start with the idea of language modeling using the Neural probabilistic language model (Bengio et al, 2003). Then, the Word2Vec framework (Mikolov et al., 2013), the Doc2Vec framework (Mikolov and Le, 2014), and the FastText framework (Bojanowski et al, 2017) are introduced. Each of these frameworks will be accompanied by hands-on sessions for practical implementation of what has been learned.

Part 3: We will focus on Deep Learning and current state-of-the-art architectures. We will take an in-depth look at existing transfer learning resources and apply what we have learned in a final hands-on session.

Hands-On Sessions: For the hands-on parts of the course, practice exercises will be provided in the form of Jupyter notebooks that participants can use to complete the exercises themselves.

Prerequisites: Basic knowledge of Python and Supervised Machine Learning methods


Essential Data Science Training GmbH

Nachricht senden