Missing data, especially missing data points in time series, are a pervasive issue for many applications relying on precise and complete data sets. For example in the financial sector, missing tick data can lead to deviating forecasts and thus to wrong decisions with high losses.
Effective and efficient time series representation learning poses an important topic for a vast array of applications like, e.g. clustering. Many currently used approaches share the property of being difficult to interpret though. In many areas it is important that intermediate learned representations are easy to interpret for efficient downstream processing.
As ubiquitous as time series are, it is often of interest to identify clusters of similar time series in order to gain better insight into the structure of the available data. However, unsupervised learning from time series data has its own stumbling blocks. For this reason, the following article presents some helpful time series specific distance metrics and basic procedures to work successfully with time series data.
Outside my professional life as a data scientist and software engineer at All.In Data, where we primarily focus on Microsoft Azure and Amazon Web Services cloud development, I am a convinced supporter of providing and hosting the services I use on my own systems.
A very quick primer for facilitating understanding and handling of time series and time series decomposition in pandas