Blog

Stochastic regression imputation

Stochastic regression imputation can be considered a refinement of regression imputation because it addresses the correlation bias by adding noise from the regression residuals to the missing value estimations. This post discusses the advantages of stochastic regression imputation with examples in Python.

The persistent problem of missing data

Deploying descriptive, predictive and prescriptive machine learning solution using complete data is difficult, but even more difficult in face of missing data. A gentle introduction to the reasons of missing data and the difficulties generated.

Scipy 2020 Day Three

The third day of SciPy 2020 was filled with interesting and foundational tutorial content regarding deep learning with a short primer to the PyTorch library and I found the time to watch some interesting SciPy talks from Enthoughts SciPy Youtube channel as well.

SciPy 2020 Day Two

Day two of the SciPy 2020 conference was also very informative. Except of some connectivity issues with my internet provider, which lead to missing the latter half of the awesome Dask tutorial and prevented me from listening to other talks, everything went equally smooth.

SciPy 2020 Day One

I am very happy 😄 to participate at the 2020 edition of the SciPy conference, which is held online thanks to the measures that prevent the spread of the COVID-19 virus. Although it is the first online version of the SciPy conference, everything works fine and fluently due to the tremendous help from the organizers and community.

Missing Data Imputation Using Generative Adversarial Nets

Missing data, especially missing data points in time series, are a pervasive issue for many applications relying on precise and complete data sets. For example in the financial sector, missing tick data can lead to deviating forecasts and thus to wrong decisions with high losses.

Interpretable Discrete Representation Learning on Time Series

Effective and efficient time series representation learning poses an important topic for a vast array of applications like, e.g. clustering. Many currently used approaches share the property of being difficult to interpret though. In many areas it is important that intermediate learned representations are easy to interpret for efficient downstream processing.

Time Series Data Clustering Distance Measures

As ubiquitous as time series are, it is often of interest to identify clusters of similar time series in order to gain better insight into the structure of the available data. However, unsupervised learning from time series data has its own stumbling blocks. For this reason, the following article presents some helpful time series specific distance metrics and basic procedures to work successfully with time series data.

Tidy Data, Tidy Types, and Tidy Operations

The notion of tidy data is a concept known from R and used in many available libraries and frameworks today with great success. Tidy data together with proper data types and semantically allowed operations simplifies data science, machine learning and data stewardship by a large margin. In this article we will highlight the core properties of “Tidy Data, Tidy Types, and Tidy Operations” with the help of a concise example and how those properties can be successively achieved and maintained.

Unsupervised Skill Discovery in Deep Reinforcement Learning

Scientists from Google AI have published exciting research regarding unsupervised skill discovery in deep reinforcement learning. Essentially it will be possible to utilize unsupervised learning methods to learn model dynamics and promising skills in an unsupervised, model-free reinforcement learning enviroment, subsequently enabling to use model-based planning methods in model-free reinforcement learning setups.