Deploying descriptive, predictive and prescriptive machine learning solution using complete data is difficult, but even more difficult in face of missing data. A gentle introduction to the reasons of missing data and the difficulties generated.
Effective and efficient time series representation learning poses an important topic for a vast array of applications like, e.g. clustering. Many currently used approaches share the property of being difficult to interpret though. In many areas it is important that intermediate learned representations are easy to interpret for efficient downstream processing.
As ubiquitous as time series are, it is often of interest to identify clusters of similar time series in order to gain better insight into the structure of the available data. However, unsupervised learning from time series data has its own stumbling blocks. For this reason, the following article presents some helpful time series specific distance metrics and basic procedures to work successfully with time series data.
The following first article part is an attempt to categorize and structure current state of the art (SoA) clustering approaches, making use off of the work of Aljalbout et al. and Min et al. for my current research at TECO Institute from Karlsruhe Institute of Technology regarding missing data imputation in large stock quotation and trade data sets, where clustering would be an obvious first step for retrieving highly correlated quotations or trade movements.