In the shown example we have the following Situation: Claudio Conti has 12 sensors on his main water pipeline. All sensors detect the water pressure over time. There are somewhere over time issues in his system that cause a dramatic shift in the water pressure. What Claudio Conti wants: find out the exact point in time that causes such a shift (aka a rupture).
import matplotlib.pyplot as plt
import ruptures as rpt
import pandas as pd
import numpy as np
We generate here sample 12-dimensional time series that we use as data set. The data set has 12 time series where each of them belongs to one of the sensors in Claudio's pipe system. The obfuscated data set only uses 2 out of 12 sensors. This is to prevent the data scientist from extracting all the knowledge from the obfuscated sample set. In total, the data set has 5 ruptures in it.
n_samples, dim, sigma = 1000, 12, 7
n_bkps = 5 # number of breakpoints
signal, bkps = rpt.pw_constant(n_samples, dim, n_bkps, noise_std=sigma)
signal_obfuscated = signal[:,0:2]
signal_full = signal
#np.savetxt("data_set_obfuscated.csv", signal_obfuscated, delimiter=",", fmt='%f')
#np.savetxt("data_set_full.csv", signal_full, delimiter=",", fmt='%f')
Vanessa Martinez is the data scientist who has been connected to Claudio Conti. She gets obfuscated data from 2 out of the 12 sensors. Based on that she does her prediction to pin point the ruptures. Based on the limited data set, she can find 3 out of 5 ruptures.
# detection
signal_obfuscated = np.genfromtxt('data_set_obfuscated.csv', delimiter=",")
algo_obfuscated = rpt.Pelt(model="rbf").fit(signal_obfuscated)
result_obfuscated = algo_obfuscated.predict(pen=10)
rpt.display(signal_obfuscated, [], result_obfuscated)
plt.show()
After assembling the alogorithm in the exploration job, the data owner start the execution job. With all 12 sensors, the execution of the same program that Vaness created leads to finding all 5 ruptures precisely
signal_full = np.genfromtxt('data_set_full.csv', delimiter=",")
algo_full = rpt.Pelt(model="rbf").fit(signal_full)
result_full = algo_full.predict(pen=10)
rpt.display(signal_full, [], result_full)
plt.show()