Skip to content
Snippets Groups Projects
Commit 4e7e25bd authored by Erik Frisk's avatar Erik Frisk
Browse files

Added computation om probability of maximum fault isolation performance

parent ef28f1a2
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# _Residual Selection for Consistency Based Diagnosis Using Machine Learning Models_ # _Residual Selection for Consistency Based Diagnosis Using Machine Learning Models_
by Erik Frisk <erik.frisk@liu.se> and Mattias Krysander <mattias.krysander@liu.se> by Erik Frisk <erik.frisk@liu.se> and Mattias Krysander <mattias.krysander@liu.se>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Code corresponds to the paper "_Residual Selection for Consistency Based Diagnosis Using Machine Learning Models_" published at IFAC Safeprocess 2018 in Warszaw, Poland. Code corresponds to the paper "_Residual Selection for Consistency Based Diagnosis Using Machine Learning Models_" published at IFAC Safeprocess 2018 in Warszaw, Poland.
Note that the plots are not identical to the results in the paper where a Matlab implementation of the machine learning algorithms were used. However, the methodology is the same and the results are similar. Note that the plots are not identical to the results in the paper where a Matlab implementation of the machine learning algorithms were used. However, the methodology is the same and the results are similar.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Basic python imports ## Basic python imports
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import numpy as np import numpy as np
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
from diagutil import BoxOff from diagutil import BoxOff
import diagutil as du import diagutil as du
from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix from sklearn.metrics import confusion_matrix
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Load the data ## Load the data
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The data is loaded into a dictionary with 4 fields The data is loaded into a dictionary with 4 fields
* modes - an array with names of no-fault and fault modes * modes - an array with names of no-fault and fault modes
* res - An array with the 42 residuals * res - An array with the 42 residuals
* mode - a vector indicating which fault is active at each sample * mode - a vector indicating which fault is active at each sample
* fsm - A fault signature matrix based on model structure * fsm - A fault signature matrix based on model structure
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
data = du.loadmat('../data/data.mat')['data'] data = du.loadmat('../data/data.mat')['data']
nf = len(data['modes']) nf = len(data['modes'])
nr = data['res'].shape[1] nr = data['res'].shape[1]
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Preprocess data # Preprocess data
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Preprocesses data in two steps Preprocesses data in two steps
1. Take absolute values of residuals (absdata) 1. Take absolute values of residuals (absdata)
2. Threshold data (thdata) 2. Threshold data (thdata)
The data is normalized so that a threshold at 1 corresponds to probability of false alarm of approximately 1%. The data is normalized so that a threshold at 1 corresponds to probability of false alarm of approximately 1%.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
absdata = data.copy() absdata = data.copy()
absdata['res'] = np.abs(absdata['res']) absdata['res'] = np.abs(absdata['res'])
thdata = absdata.copy() thdata = absdata.copy()
thdata['res'] = thdata['res'] >= 1 thdata['res'] = thdata['res'] >= 1
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Plot the 7 first residuals in all fault modes. The residuals plotted in red are supposed to alarm for the fault according to the fault sensitivity matrix. Plot the 7 first residuals in all fault modes. The residuals plotted in red are supposed to alarm for the fault according to the fault sensitivity matrix.
(Fig. 4 in the paper)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plt.figure(10, clear=True, figsize=(10, 10)) plt.figure(10, clear=True, figsize=(10, 10))
for ri in range(7): for ri in range(7):
for fm in range(nf): for fm in range(nf):
plt.subplot(7, 8, ri*nf + fm + 1) plt.subplot(7, 8, ri*nf + fm + 1)
if absdata['fsm'][ri, fm]==0: if absdata['fsm'][ri, fm]==0:
plt.plot(absdata['res'][absdata['mode']==fm, ri], 'b', lw=0.3) plt.plot(absdata['res'][absdata['mode']==fm, ri], 'b', lw=0.3)
else: else:
plt.plot(absdata['res'][absdata['mode']==fm, ri], 'r', lw=0.3) plt.plot(absdata['res'][absdata['mode']==fm, ri], 'r', lw=0.3)
plt.gca().tick_params(labelsize=6) plt.gca().tick_params(labelsize=6)
plt.ylim(0, 3) plt.ylim(0, 3)
BoxOff() BoxOff()
if fm==0: if fm==0:
plt.ylabel('res-%d' % (ri+1), fontsize=8) plt.ylabel('res-%d' % (ri+1), fontsize=8)
if ri==0: if ri==0:
plt.title(absdata['modes'][fm], fontsize=8) plt.title(absdata['modes'][fm], fontsize=8)
plt.tight_layout(w_pad=-0.75, h_pad=0) plt.tight_layout(w_pad=-0.75, h_pad=0)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Basic analysis - performance of all 42 residuals # Basic analysis - performance of all 42 residuals
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
ts = np.zeros((nr, nf)) ts = np.zeros((nr, nf))
for ri in range(nr): for ri in range(nr):
for fm in range(nf): for fm in range(nf):
Nfm = np.sum(absdata['mode']==fm) Nfm = np.sum(absdata['mode']==fm)
Nalarm = np.sum(absdata['res'][absdata['mode']==fm, ri]>=1) Nalarm = np.sum(absdata['res'][absdata['mode']==fm, ri]>=1)
ts[ri, fm] = Nalarm/Nfm ts[ri, fm] = Nalarm/Nfm
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Plot the ideal fault isolability matrix corresponding to the fault sensitivity matrix. Plot the ideal fault isolability matrix corresponding to the fault sensitivity matrix.
(Fig. 6 in the paper)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
im = du.IsolabilityMatrix(data['fsm']) im = du.IsolabilityMatrix(data['fsm'])
plt.figure(20, clear=True, figsize=(6, 6)) plt.figure(20, clear=True, figsize=(6, 6))
plt.spy(im[1:, 1:], marker='o', color='b') plt.spy(im[1:, 1:], marker='o', color='b')
plt.xticks(np.arange(nf), data['modes'][1:]) plt.xticks(np.arange(nf), data['modes'][1:])
plt.yticks(np.arange(nf), data['modes'][1:]) plt.yticks(np.arange(nf), data['modes'][1:])
plt.title('Isolability matrix') plt.title('Isolability matrix')
plt.gca().xaxis.tick_bottom() plt.gca().xaxis.tick_bottom()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Compute consistency based diagnoses and the corresponding confusion matrix based on all 42 thresholded residuals. The confusion matrix should be compared with the ideal fault isolation matrix above. Compute consistency based diagnoses and the corresponding confusion matrix based on all 42 thresholded residuals. The confusion matrix should be compared with the ideal fault isolation matrix above.
(Fig. 7 in the paper)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
_, C = du.DiagnosesAndConfusionMatrix(thdata) _, C = du.DiagnosesAndConfusionMatrix(thdata)
plt.figure(30, clear=True, figsize=(6, 6)) plt.figure(30, clear=True, figsize=(6, 6))
du.PlotConfusionMatrix(C) du.PlotConfusionMatrix(C)
plt.xticks(np.arange(nf), data['modes']) plt.xticks(np.arange(nf), data['modes'])
plt.yticks(np.arange(nf), data['modes']) plt.yticks(np.arange(nf), data['modes'])
plt.title('Fault Isolation Performance matrix') plt.title('Fault Isolation Performance matrix')
plt.tight_layout() plt.tight_layout()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Test selection using Random Forest Classifiers ## Test selection using Random Forest Classifiers
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
First, build a random forest classifier based on the thresholded data. Here, 300 trees are trained in the tree ensemble. First, build a random forest classifier based on the thresholded data. Here, 300 trees are trained in the tree ensemble.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
rf = RandomForestClassifier(n_estimators=300) rf = RandomForestClassifier(n_estimators=300)
rf.fit(thdata['res'], thdata['mode']) rf.fit(thdata['res'], thdata['mode'])
sortIdx = np.argsort(rf.feature_importances_)[::-1] sortIdx = np.argsort(rf.feature_importances_)[::-1]
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Plot the confusion matrix for the random forest classifier for training data Plot the confusion matrix for the random forest classifier for training data
(Fig. 8 in the paper)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
C = np.diag([1/sum(thdata['mode']==mi) for mi in range(nf)])@confusion_matrix(thdata['mode'], rf.predict(thdata['res']))*100 s = np.diag([1/sum(thdata['mode']==mi) for mi in range(nf)])
C = s@confusion_matrix(thdata['mode'], rf.predict(thdata['res']))*100
plt.figure(31, clear=True, figsize=(6, 6)) plt.figure(31, clear=True, figsize=(6, 6))
du.PlotConfusionMatrix(C/100) du.PlotConfusionMatrix(C/100)
plt.xticks(np.arange(nf), data['modes']) plt.xticks(np.arange(nf), data['modes'])
plt.yticks(np.arange(nf), data['modes']) plt.yticks(np.arange(nf), data['modes'])
plt.title('Fault Isolation Performance matrix') plt.title('Fault Isolation Performance matrix')
plt.tight_layout() plt.tight_layout()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Plot the variable importance, sorted, to get a ranking of predictor/residual usefullness in the classifier. Note that this classifier is not meant to be used in the diagnosis system. Plot the variable importance, sorted, to get a ranking of predictor/residual usefullness in the classifier. Note that this classifier is not meant to be used in the diagnosis system.
(Fig. 10 in the paper)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plt.figure(40, clear=True, figsize=(9, 6)) plt.figure(40, clear=True, figsize=(9, 6))
plt.plot(rf.feature_importances_[sortIdx]) plt.plot(rf.feature_importances_[sortIdx])
plt.yticks(fontsize=8) plt.yticks(fontsize=8)
plt.xticks(range(nr), sortIdx+1, fontsize=8, rotation=90) plt.xticks(range(nr), sortIdx+1, fontsize=8, rotation=90)
plt.xlabel('Predictors') plt.xlabel('Predictors')
plt.ylabel('Importance') plt.ylabel('Importance')
plt.title('Predictor importance') plt.title('Predictor importance')
BoxOff() BoxOff()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Compute performance measures on false-alarm (FA), missed detection (MD), and an aggregated fault isolation (FI) when selecting residuals according to the ranking computed above. Compute performance measures on false-alarm (FA), missed detection (MD), aggregated fault isolation (FI) and the probability of maximum isolability performance (FI-max)
when selecting residuals according to the ranking computed above.
\begin{align*} \begin{align*}
p_{\text{FA}} &= 1 - P(NF\in D|NF)\\ p_{\text{FA}} &= 1 - P(NF\in D|NF)\\
p_{\text{MD}} &= \frac{1}{n_{f}}\sum_{f_{i}\in \tilde{\mathcal{F}}} P(NF\in D|f_{i})\\ p_{\text{MD}} &= \frac{1}{n_{f}}\sum_{f_{i}\in \tilde{\mathcal{F}}} P(NF\in D|f_{i})\\
p_{\text{FI}} &= \frac{1}{n_{f}^{2}}\sum_{f_{i}\in \tilde{\mathcal{F}}} P(NF\notin p_{\text{FI}} &= \frac{1}{n_{f}^{2}}\sum_{f_{i}\in \tilde{\mathcal{F}}} P(NF\notin
D|f_{i})\sum_{f_{j}\in \tilde{\mathcal{F}}}|P(f_{j}\in D|f_{i})-I_{ij}| D|f_{i})\sum_{f_{j}\in \tilde{\mathcal{F}}}|P(f_{j}\in D|f_{i})-I_{ij}|\\
p_{\text{FI-max}} &= P(D=F_{f_i}|f_i)
\end{align*} \end{align*}
where $F_{f_i}$ is the set of faults nont structurally isolable from fault $f_i$.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
pfa = np.zeros(nr-1) pfa = np.zeros(nr-1)
pmd = np.zeros(nr-1) pmd = np.zeros(nr-1)
pfi = np.zeros(nr-1) pfi = np.zeros(nr-1)
pmfi = np.zeros((nr-1, nf))
# Make sure isolability matrix for NF corresponds to diagnosis statement
# computed by DiagnosesAndConfusionMatrix
imk = du.IsolabilityMatrix(data['fsm'])
imk[0] = np.zeros(nf)
imk[0, 0] = 1
for k in range(1, nr): for k in range(1, nr):
imk = du.IsolabilityMatrix(data['fsm'][sortIdx[0:k]]) dx, C = du.DiagnosesAndConfusionMatrix(thdata, residx=sortIdx[0:k])
_, C = du.DiagnosesAndConfusionMatrix(thdata, residx=sortIdx[0:k])
pfa[k-1] = 1-C[0,0] pfa[k-1] = 1-C[0,0]
pmd[k-1] = np.mean(C[1:,0]) pmd[k-1] = np.mean(C[1:,0])
pfi[k-1] = np.mean(np.diag(1-C[1:, 0])@np.abs(C[1:, 1:]-im[1:, 1:])) pfi[k-1] = np.mean(np.diag(1-C[1:, 0])@np.abs(C[1:, 1:]-im[1:, 1:]))
for fi in range(nf):
pmfi[k-1, fi] = np.mean(np.all(dx[thdata['mode'] == fi] == imk[fi, :],
axis=1))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Plot the three performance measures agains the number of selected residuals. Plot the three aggregated performance measures agains the number of selected residuals.
(Fig. 11 in the paper)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
num_res = [10, 12, 26, 27] num_res = [10, 12, 26, 27]
plt.figure(50, clear=True, figsize=(9, 7)) plt.figure(50, clear=True, figsize=(9, 7))
plt.plot(range(1, nr), pfa, 'r', label='False alarm probability') plt.plot(range(1, nr), pfa, 'r', label='False alarm probability')
plt.plot(range(1, nr), pmd, 'b', label='Missed detection probability') plt.plot(range(1, nr), pmd, 'b', label='Missed detection probability')
plt.plot(range(1, nr), pfi, 'y', label='False isolation probability') plt.plot(range(1, nr), pfi, 'y', label='False isolation probability')
for ni in num_res: for ni in num_res:
plt.plot(ni, pfi[ni], 'kx') plt.plot(ni, pfi[ni], 'kx')
plt.legend() plt.legend()
plt.xlabel('Number of selected residuals') plt.xlabel('Number of selected residuals')
plt.ylabel('Probability') plt.ylabel('Probability')
BoxOff() BoxOff()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Plot the probability of maximum fault isolation performance for each fault.
(Fig. 12 in the paper)
%% Cell type:code id: tags:
``` python
plt.figure(figsize=(10, 10))
for k in range(nf):
plt.plot(pmfi[:, k], label=thdata['modes'][k])
plt.legend(loc='upper right')
BoxOff()
```
%% Cell type:markdown id: tags:
Compute and display confusion matrices corresponding to selecting 10, 12, 26, and 27 residuals. The results should be compared to the confusion matrix above where all 42 residuals were used. Compute and display confusion matrices corresponding to selecting 10, 12, 26, and 27 residuals. The results should be compared to the confusion matrix above where all 42 residuals were used.
(Fig. 13 in the paper)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plt.figure(80, clear=True, figsize=(12, 12)) plt.figure(80, clear=True, figsize=(12, 12))
for k, ni in enumerate(num_res): for k, ni in enumerate(num_res):
_, C = du.DiagnosesAndConfusionMatrix(thdata, residx=sortIdx[0:ni]) _, C = du.DiagnosesAndConfusionMatrix(thdata, residx=sortIdx[0:ni])
plt.subplot(2, 2, k+1) plt.subplot(2, 2, k+1)
du.PlotConfusionMatrix(C) du.PlotConfusionMatrix(C)
plt.title('No of tests: %d' % ni) plt.title('No of tests: %d' % ni)
plt.xticks(np.arange(nf), data['modes']) plt.xticks(np.arange(nf), data['modes'])
plt.yticks(np.arange(nf), data['modes']) plt.yticks(np.arange(nf), data['modes'])
plt.tight_layout() plt.tight_layout()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Compare performance of 12 with 42 residuals. Compare performance of 12 with 42 residuals.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
ntests = 12 ntests = 12
plt.figure(90, clear=True, figsize=(12, 12)) plt.figure(90, clear=True, figsize=(12, 12))
plt.subplot(1, 2, 1) plt.subplot(1, 2, 1)
_, C = du.DiagnosesAndConfusionMatrix(thdata, residx=sortIdx[0:ntests]) _, C = du.DiagnosesAndConfusionMatrix(thdata, residx=sortIdx[0:ntests])
du.PlotConfusionMatrix(C) du.PlotConfusionMatrix(C)
plt.title('No of tests: %d' % ntests) plt.title('No of tests: %d' % ntests)
plt.xticks(np.arange(nf), data['modes']) plt.xticks(np.arange(nf), data['modes'])
plt.yticks(np.arange(nf), data['modes']) plt.yticks(np.arange(nf), data['modes'])
plt.subplot(1, 2, 2) plt.subplot(1, 2, 2)
_, C = du.DiagnosesAndConfusionMatrix(thdata) _, C = du.DiagnosesAndConfusionMatrix(thdata)
du.PlotConfusionMatrix(C) du.PlotConfusionMatrix(C)
plt.title('No of tests: 42') plt.title('No of tests: 42')
plt.xticks(np.arange(nf), data['modes']) plt.xticks(np.arange(nf), data['modes'])
plt.yticks(np.arange(nf), data['modes']) plt.yticks(np.arange(nf), data['modes'])
plt.tight_layout() plt.tight_layout()
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
``` ```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment