Blog

Scipy 2020 Day Three

The third day of SciPy 2020 was filled with interesting and foundational tutorial content regarding deep learning with a short primer to the PyTorch library and I found the time to watch some interesting SciPy talks from Enthoughts SciPy Youtube channel as well.

SciPy 2020 Day Two

Day two of the SciPy 2020 conference was also very informative. Except of some connectivity issues with my internet provider, which lead to missing the latter half of the awesome Dask tutorial and prevented me from listening to other talks, everything went equally smooth.

SciPy 2020 Day One

I am very happy 😄 to participate at the 2020 edition of the SciPy conference, which is held online thanks to the measures that prevent the spread of the COVID-19 virus. Although it is the first online version of the SciPy conference, everything works fine and fluently due to the tremendous help from the organizers and community.

Missing Data Imputation Using Generative Adversarial Nets

Missing data, especially missing data points in time series, are a pervasive issue for many applications relying on precise and complete data sets. For example in the financial sector, missing tick data can lead to deviating forecasts and thus to wrong decisions with high losses.

Interpretable Discrete Representation Learning on Time Series

Effective and efficient time series representation learning poses an important topic for a vast array of applications like, e.g. clustering. Many currently used approaches share the property of being difficult to interpret though. In many areas it is important that intermediate learned representations are easy to interpret for efficient downstream processing.

Time Series Data Clustering Distance Measures

As ubiquitous as time series are, it is often of interest to identify clusters of similar time series in order to gain better insight into the structure of the available data. However, unsupervised learning from time series data has its own stumbling blocks. For this reason, the following article presents some helpful time series specific distance metrics and basic procedures to work successfully with time series data.

Tidy Data, Tidy Types, and Tidy Operations

The notion of tidy data is a concept known from R and used in many available libraries and frameworks today with great success. Tidy data together with proper data types and semantically allowed operations simplifies data science, machine learning and data stewardship by a large margin. In this article we will highlight the core properties of “Tidy Data, Tidy Types, and Tidy Operations” with the help of a concise example and how those properties can be successively achieved and maintained.

Unsupervised Skill Discovery in Deep Reinforcement Learning

Scientists from Google AI have published exciting research regarding unsupervised skill discovery in deep reinforcement learning. Essentially it will be possible to utilize unsupervised learning methods to learn model dynamics and promising skills in an unsupervised, model-free reinforcement learning enviroment, subsequently enabling to use model-based planning methods in model-free reinforcement learning setups.

Personalize Learning to Rank Results through Reinforcement Learning

Learning to optimally rank and personalize search results is a difficult and important topic in scientific information retrieval as well as in online retail business, where we typically want to bias customer query results with respect to specific preferences for the purpose of increasing revenue. Reinforcement learning, as a generic-flexible learning model, is able to bias, e.g. personalize, learning-to-rank results at scale, so that externally specified goals, e.g. an increase in sales and probably revenue, can be achieved. This article introduces the topics learning-to-rank and reinforcement learning in a problem-specific way and is accompanied by the example project ‘cli-ranker’, a command line tool utilizing reinforcement learning principles for learning user information retrieval preferences regarding text document ranking.

Time as a Machine Learning Feature

Quite often it is the case that cyclic data is not sufficiently transformed for machine learning algorithms, e.g. feature representation is missing out on the implicit properties of cyclic features often resulting in wrong distance measures. This article introduces cyclic feature transformation for time based features as a mini-howto.