This article covers experiences from combining clustering algorithms with Recurrence Analysis. The results originate from my Master’s Thesis, which was supervised by Prof. Leser at Humboldt-Universität zu Berlin, the German Research Centre for Geosciences and the Potsdam Institute for Climate Impact Research. This might be interesting for physicists and people working with nonlinear phenomena and time series ensembles. If you are interested in ideas, data, or software, I’d be happy to pass them on. For a brief introduction on Recurrence Plots, scroll to the end of the page, or refer to the excellent collection of materials at http://www.recurrence-plot.tk/.
Why cluster Recurrence Plots?
Identify typical behaviors in time series sets
Recurrence Plots are a powerful tool to detect and describe nonlinear dynamics. Assuming that the phenomena in your data are revealed by this technique, similar Recurrence Plots will correspond to similar behavior. Clustering will find groups of similar Recurrence Plots, where each group corresponds to a typical behavior in your data set. Both the meaning of similarity and behavior are challenging to define, but be assured that Recurrence Analysis has been successfully used in practice many times. The combination with clustering algorithms is pretty new, but not unprecedented.
Lange et al.  have clustered 14 years of remote sensing data to find regions with similar photosynthetic activity. For each location on the globe, one time series is available. By computing the Recurrence Plot of the time series and applying a clustering algorithm, similar Recurrence Plots, i.e. similar behavior is identified.
Yuan et al.  have analyzed network traffic using Recurrence Analysis and applied clustering to distinguish under-attack situations from normal situations. Clustering Recurrence Plots has also been applied to better understand urban car traffic .
Monitoring and Understanding System Behavior
Real-time monitoring and anomaly detection are important in industrial engineering scenarios. One typically observes several variables of a running system (e.g. temperatures, or even the sound produced by the system) and computes the Recurrence Plot over a time window. If the Recurrence Plot changes, this might indicate that something is broken. Cutting machines produce complicated vibration patterns when working well , and Recurrence Plots are necessary to capture their nonlinear properties. When something goes wrong, the vibrations change. Online clustering might be used to conclude a change of system behavior from the fact that current Recurrence Plots fall into a different cluster than the cluster that represents normal behavior.
Finally, simply exploring a large set of time series might require aggregation, as visual inspection of thousands of Recurrence Plots is infeasible for a researcher. Using clustering for explorative analysis, we might find structure in a data set, or focus on interesting types of behavior only.
Gold Standard Similarities
Since I hadn’t worked with Recurrence Plots when I started my thesis, I had no idea of what Recurrence Plots should be similar and what that meant. Dr. Norbert Marwan helped me to create an evaluation framework, a selection of example systems that the clustering should be able to distinguish. We selected a simple sine wave, two parameterizations of both the Rössler and the Lorenz system, two first order autoregressive processes and gaussian and gamma noise.
From each system, I generated 100 time series (25k observations each) by leaving the system’s parameters fixed and varying the initial conditions (except for the sine wave, where the wave length was varied). The primary goal was to abstract from the variation of the time series within each class, so the 100 time series from each system were defined to expose identical behavior (although their Recurrence Plots look different). We were looking for a similarity measure that reflects the nine classes.
We had to create our own evaluation data set, since there are none available (like the Iris data set for clustering). To get feedback on our evaluation data set, I created an online experiment for manually assessing the similarity between recurrence plots. Feel free to participate to see how the different Recurrence Plots look like. There’s an overview at the end of the page, too.
The goal was to see what happened if an expert clustered the Recurrence Plots manually. However, we felt it was simpler and more conclusive to let the experts quantify the similarity between the Recurrence Plots. By now, I’m pretty sure that was not the best experimental setup. We found that expert assessments strongly differed and this is probably due to the sequential nature of the assessment procedure. Although the participants were able to go back and revise their quantifications, I think most didn’t. If the experts would have had to manually group printed Recurrence Plots on a table, there would have probably been much less disagreement.
Until now, about 30 experts rated the similarity between representative Recurrence Plots from the evaluation set. The task was to “judge the similarity of the Recurrence Plots by comparing the dynamics of the underlying processes, as conveyed by the patterns in the plot.”
The results were astonishingly different. Even among people who stated to have worked at least for 5 years with Recurrence Plots, the assessments where vastly different. A few examples are shown below.
Even if the experimental setup might not have been optimal, it seems as if there is no such thing as a universal similarity between Recurrence Plots. Thus, specifying and quantifying similarity is important. The rest of my thesis was focused on similarity measures that would reveal the nine original systems, when used in a clustering algorithm.
The Solution: RQA Vectors
There are various options how to approach the problem of quantifying Recurrence Plot similarity. The straightforward approach is to use Recurrence Quantification Analysis (RQA) or Complex Network measures, which leaves us with a real-valued feature space of moderate dimensionality. This turned out to work very well even under strong observational noise. RQA measures are established features of Recurrence Plots, so this approach is pretty natural.
I also tried image distance measures, like compression distance, which didn’t work at all. Another class of promising approaches is to leverage the additional information contained in intermediate representations of the Recurrence Plot, like the full line length histograms (instead of the RQA measures that describe them), spatiograms or complex networks. Xu et al.  published an interesting article about distinguishing different types of dynamics by counting common subgraph patterns in the complex network. I mainly focused on RQA measures, as this approach gave best results with least effort.
For each RP, 16 RQA measures were computed, representing that Recurrence Plot as a 16 dimensional real numbered vector. In the video below, we see the distribution of the 900 vectors, projected down to the first three principal components. Dissimilarity between Recurrence Plots was then measured as Euclidean Distance between the vectors. Separation is good, and a standard clustering algorithm like k-means performs very well in discovering the clusters.
The oscillator systems (green cluster) are special, as we varied the wavelength of the sine wave, which results in a strong scatter of RQA measures. I’ll briefly return to this issue in the Problems section.
Robustness against Observational Noise
For practical applications, robustness with respect to observational noise is very important. I evaluated the clustering quality in a wide range of scenarios, varying the amount of noise, the number of clusters, the random seeds, etc. The below image shows the effect of 10% and 80% noise. Looking at the Recurrence Plot, the diagonal line structures are almost completely destroyed. Surprisingly, the experiments gave good results for up to 80% noise. I think its mainly the white vertical lines that account for this. That’s why I developed an idea for extended quantification of RPs based on white vertical lines.
Exclude Maximum Line Length RQA Measures
Some RQA Measures are more susceptible to observational noise than others. The measures related to maximum line lengths (diagonal, vertical and white vertical) and their reciprocals (divergence) are strongly affected. In my experience, removing them from the feature set improves the separation of the clusters and thus the performance of clustering algorithms. This effect is even stronger under high amounts of noise. The key insight is that the RQA measures are strongly disturbed, but still distinctive.
I also found that the RQA measures change gradually as we add more observational noise, which is encouraging, since a predictable impact supports the impression of robustness.
Measuring Clustering Quality
Finally, the clustering quality for various noise levels is shown below in terms of Adjusted Mutual Information. A value of one corresponds to a perfect clustering.
The red box plots correspond to considering all 16 RQA measures. The blue box plots correspond to leaving the maximum line length measures out. The box plots summarize the variation introduced both by the approximative nature of the clustering algorithms and selecting random sample of 600 RPs. Before applying the clustering algorithms, I projected the data down to the first k principal components, where k was chosen such that at least 95% of the variance is explained. I also compared this to directly applying the clustering algorithm, but the results where not as good.
If found that K-Means performed more predictably and approximately as well as the Gaussian Mixture Models. It is also computationally much less demanding. Up to around 80% noise, clustering quality is very satisfactory. Detailed evaluation results can be found in my Master’s Thesis.
Problems and Future Work
Estimating Recurrence Parameters
To compute the Recurrence Plots from the original time series data, a recurrence threshold must be selected. In addition, a method called Time Delay Embedding, which is often used to reconstruct multivariate time series from a univariate time series, requires additional parameters. In practice and for single time series, these are often tuned by an experienced researcher. Several heuristics and methods for chosing these parameters exist, but none works in all circumstances.
In real-world data with observational noise, these heuristics work even worse. Specifically, False Nearest Neighbors and Mutual Information stopped working at very small amounts of noise. Still, an automated way to chose them is necessary to compute the Recurrence Plots for large sets of time series. Applying multiple heuristics and finding a consensus might be a simple, interesting alternative.
RQA Measures Depend on the Sampling
Working with the same time series at higher or lower resolutions may result in strongly different RQA measures (as for instance diagonal lines become longer and shorter, when increasing or decreasing sampling time). This might pose a problem in some data sets, as the underlying behavior is the same. When applying for instance Wavelet Packet Decomposition to analyze time series on multiple scales, scale invariant descriptors of the Recurrence Plot might be desirable.
- My Java Implementation of Recurrence Quantification Analysis
- An extensive bibliography of Applications, Foundations and Software for Recurrence Analysis
- The online experiment for assessing similarity of Recurrence Plots
References H. Lange and S. Boese, “Recurrence Quantification and Recurrence Network Analysis of Global Photosynthetic Activity,” in Recurrence Quantification Analysis, no. 12, C. L. Webber Jr and N. Marwan, Eds. Springer International Publishing, 2014, pp. 349–374.  J. Yuan, R. Yuan, and X. Chen, “Network Anomaly Detection based on Multi-scale Dynamic Characteristics of Traffic,” International Journal of Computers Communications & Control, vol. 9, no. 1, pp. 101–112, Jan. 2014.  E. I. Vlahogianni, M. G. Karlaftis, and J. C. Golias, “Temporal Evolution of Short-Term Urban Traffic Flow: A Nonlinear Dynamics Approach,” Computer-Aided Civil and Infrastructure Engineering, vol. 23, no. 7, pp. 536–548, Oct. 2008.  H. Yang and Y. Chen, “Heterogeneous recurrence monitoring and control of nonlinear stochastic processes,” Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 24, no. 1, p. 013138, Mar. 2014.  A. Syta and G. Litak, “Vibration Analysis in Cutting Materials,” in Recurrence Quantification Analysis, no. 12, C. L. Webber Jr and N. Marwan, Eds. Springer International Publishing, 2014, pp. 279–290.  X. Xu, J. Zhang, and M. Small, “Superfamily phenomena and motifs of networks induced from time series,” Proceedings of the National Academy of Sciences of the United States of America, vol. 105, no. 50, pp. 19601–19605, Dec. 2008.
Appendix 1: What is a Recurrence Plot?
For a brief introduction to Recurrence Plots and Recurrence Quantification Analysis, please refer to the excellent collection of materials at http://www.recurrence-plot.tk/. In a nutshell, Recurrence Plots reveal repetitive behavior in a time series, which in turn tells something about the observed system. This can be applied to various kinds of data, including medical (e.g. EEG, ECG), financial, and climate data (e.g. remote sensing). Figure 1 shows a three-dimensional time series, once with a time axis and once in the space of the three dimensions.
The Recurrence Plot is a 2D plot with time on both axes. It contains a black dot at location (i, j) iff the distance between the time series at time i and at time j is smaller than a threshold ε. The Recurrence Plot for the above time series looks is shown below. The diagonal lines arise because the time series repeatedly traverses the two orbits. The checkerboard structure reflects the fact that it is either in one or the other orbit.
Appendix 2: Systems in the Evaluation Framework
|Periodic motion, Sine wave||OSC|
|Rössler System, Standard Parameters
a=0.2, b=0.2, c=5.7
|Rössler System, Funnel Regime
a=0.2925, b=0.1, c=8.5
|Lorenz System, Standard Parameters
sigma=10, beta=2.66, rho=28, dt=0.025
|Lorenz System, Noisy Oscillation
sigma=10, beta=2.66, rho=198, dt=0.025
|Autoregressive process, negative correlation
x(t) = -0.95 x(t-1)+N(0,0.1)
|Autoregressive process, positive correlation
x(t) = 0.95 x(t-1)+N(0,0.1)
|Standard Normal Noise||N|