"Plot the 7 first residuals in all fault modes. The residuals plotted in red are supposed to alarm for the fault according to the fault sensitivity matrix."
"Plot the 7 first residuals in all fault modes. The residuals plotted in red are supposed to alarm for the fault according to the fault sensitivity matrix.\n",
"\n",
"(Fig. 4 in the paper)"
]
]
},
},
{
{
...
@@ -153,7 +155,9 @@
...
@@ -153,7 +155,9 @@
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {},
"metadata": {},
"source": [
"source": [
"Plot the ideal fault isolability matrix corresponding to the fault sensitivity matrix."
"Plot the ideal fault isolability matrix corresponding to the fault sensitivity matrix. \n",
"\n",
"(Fig. 6 in the paper)"
]
]
},
},
{
{
...
@@ -175,7 +179,9 @@
...
@@ -175,7 +179,9 @@
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {},
"metadata": {},
"source": [
"source": [
"Compute consistency based diagnoses and the corresponding confusion matrix based on all 42 thresholded residuals. The confusion matrix should be compared with the ideal fault isolation matrix above."
"Compute consistency based diagnoses and the corresponding confusion matrix based on all 42 thresholded residuals. The confusion matrix should be compared with the ideal fault isolation matrix above.\n",
"\n",
"(Fig. 7 in the paper)"
]
]
},
},
{
{
...
@@ -222,7 +228,9 @@
...
@@ -222,7 +228,9 @@
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {},
"metadata": {},
"source": [
"source": [
"Plot the confusion matrix for the random forest classifier for training data"
"Plot the confusion matrix for the random forest classifier for training data\n",
"\n",
"(Fig. 8 in the paper)"
]
]
},
},
{
{
...
@@ -231,7 +239,8 @@
...
@@ -231,7 +239,8 @@
"metadata": {},
"metadata": {},
"outputs": [],
"outputs": [],
"source": [
"source": [
"C = np.diag([1/sum(thdata['mode']==mi) for mi in range(nf)])@confusion_matrix(thdata['mode'], rf.predict(thdata['res']))*100\n",
"s = np.diag([1/sum(thdata['mode']==mi) for mi in range(nf)])\n",
"Plot the variable importance, sorted, to get a ranking of predictor/residual usefullness in the classifier. Note that this classifier is not meant to be used in the diagnosis system."
"Plot the variable importance, sorted, to get a ranking of predictor/residual usefullness in the classifier. Note that this classifier is not meant to be used in the diagnosis system.\n",
"\n",
"(Fig. 10 in the paper)"
]
]
},
},
{
{
...
@@ -268,13 +279,16 @@
...
@@ -268,13 +279,16 @@
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {},
"metadata": {},
"source": [
"source": [
"Compute performance measures on false-alarm (FA), missed detection (MD), and an aggregated fault isolation (FI) when selecting residuals according to the ranking computed above.\n",
"Compute performance measures on false-alarm (FA), missed detection (MD), aggregated fault isolation (FI) and the probability of maximum isolability performance (FI-max)\n",
"when selecting residuals according to the ranking computed above.\n",
"Plot the three performance measures agains the number of selected residuals."
"Plot the three aggregated performance measures agains the number of selected residuals.\n",
"\n",
"(Fig. 11 in the paper)"
]
]
},
},
{
{
...
@@ -326,7 +352,31 @@
...
@@ -326,7 +352,31 @@
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {},
"metadata": {},
"source": [
"source": [
"Compute and display confusion matrices corresponding to selecting 10, 12, 26, and 27 residuals. The results should be compared to the confusion matrix above where all 42 residuals were used."
"Plot the probability of maximum fault isolation performance for each fault.\n",
"Compute and display confusion matrices corresponding to selecting 10, 12, 26, and 27 residuals. The results should be compared to the confusion matrix above where all 42 residuals were used.\n",
"\n",
"(Fig. 13 in the paper)"
]
]
},
},
{
{
...
...
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# _Residual Selection for Consistency Based Diagnosis Using Machine Learning Models_
# _Residual Selection for Consistency Based Diagnosis Using Machine Learning Models_
by Erik Frisk <erik.frisk@liu.se> and Mattias Krysander <mattias.krysander@liu.se>
by Erik Frisk <erik.frisk@liu.se> and Mattias Krysander <mattias.krysander@liu.se>
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Code corresponds to the paper "_Residual Selection for Consistency Based Diagnosis Using Machine Learning Models_" published at IFAC Safeprocess 2018 in Warszaw, Poland.
Code corresponds to the paper "_Residual Selection for Consistency Based Diagnosis Using Machine Learning Models_" published at IFAC Safeprocess 2018 in Warszaw, Poland.
Note that the plots are not identical to the results in the paper where a Matlab implementation of the machine learning algorithms were used. However, the methodology is the same and the results are similar.
Note that the plots are not identical to the results in the paper where a Matlab implementation of the machine learning algorithms were used. However, the methodology is the same and the results are similar.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## Basic python imports
## Basic python imports
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
importnumpyasnp
importnumpyasnp
importmatplotlib.pyplotasplt
importmatplotlib.pyplotasplt
fromdiagutilimportBoxOff
fromdiagutilimportBoxOff
importdiagutilasdu
importdiagutilasdu
fromsklearn.ensembleimportRandomForestClassifier
fromsklearn.ensembleimportRandomForestClassifier
fromsklearn.metricsimportconfusion_matrix
fromsklearn.metricsimportconfusion_matrix
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## Load the data
## Load the data
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
The data is loaded into a dictionary with 4 fields
The data is loaded into a dictionary with 4 fields
* modes - an array with names of no-fault and fault modes
* modes - an array with names of no-fault and fault modes
* res - An array with the 42 residuals
* res - An array with the 42 residuals
* mode - a vector indicating which fault is active at each sample
* mode - a vector indicating which fault is active at each sample
* fsm - A fault signature matrix based on model structure
* fsm - A fault signature matrix based on model structure
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
data=du.loadmat('../data/data.mat')['data']
data=du.loadmat('../data/data.mat')['data']
nf=len(data['modes'])
nf=len(data['modes'])
nr=data['res'].shape[1]
nr=data['res'].shape[1]
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# Preprocess data
# Preprocess data
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Preprocesses data in two steps
Preprocesses data in two steps
1. Take absolute values of residuals (absdata)
1. Take absolute values of residuals (absdata)
2. Threshold data (thdata)
2. Threshold data (thdata)
The data is normalized so that a threshold at 1 corresponds to probability of false alarm of approximately 1%.
The data is normalized so that a threshold at 1 corresponds to probability of false alarm of approximately 1%.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
absdata=data.copy()
absdata=data.copy()
absdata['res']=np.abs(absdata['res'])
absdata['res']=np.abs(absdata['res'])
thdata=absdata.copy()
thdata=absdata.copy()
thdata['res']=thdata['res']>=1
thdata['res']=thdata['res']>=1
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Plot the 7 first residuals in all fault modes. The residuals plotted in red are supposed to alarm for the fault according to the fault sensitivity matrix.
Plot the 7 first residuals in all fault modes. The residuals plotted in red are supposed to alarm for the fault according to the fault sensitivity matrix.
Plot the ideal fault isolability matrix corresponding to the fault sensitivity matrix.
Plot the ideal fault isolability matrix corresponding to the fault sensitivity matrix.
(Fig. 6 in the paper)
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
im=du.IsolabilityMatrix(data['fsm'])
im=du.IsolabilityMatrix(data['fsm'])
plt.figure(20,clear=True,figsize=(6,6))
plt.figure(20,clear=True,figsize=(6,6))
plt.spy(im[1:,1:],marker='o',color='b')
plt.spy(im[1:,1:],marker='o',color='b')
plt.xticks(np.arange(nf),data['modes'][1:])
plt.xticks(np.arange(nf),data['modes'][1:])
plt.yticks(np.arange(nf),data['modes'][1:])
plt.yticks(np.arange(nf),data['modes'][1:])
plt.title('Isolability matrix')
plt.title('Isolability matrix')
plt.gca().xaxis.tick_bottom()
plt.gca().xaxis.tick_bottom()
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Compute consistency based diagnoses and the corresponding confusion matrix based on all 42 thresholded residuals. The confusion matrix should be compared with the ideal fault isolation matrix above.
Compute consistency based diagnoses and the corresponding confusion matrix based on all 42 thresholded residuals. The confusion matrix should be compared with the ideal fault isolation matrix above.
(Fig. 7 in the paper)
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
_,C=du.DiagnosesAndConfusionMatrix(thdata)
_,C=du.DiagnosesAndConfusionMatrix(thdata)
plt.figure(30,clear=True,figsize=(6,6))
plt.figure(30,clear=True,figsize=(6,6))
du.PlotConfusionMatrix(C)
du.PlotConfusionMatrix(C)
plt.xticks(np.arange(nf),data['modes'])
plt.xticks(np.arange(nf),data['modes'])
plt.yticks(np.arange(nf),data['modes'])
plt.yticks(np.arange(nf),data['modes'])
plt.title('Fault Isolation Performance matrix')
plt.title('Fault Isolation Performance matrix')
plt.tight_layout()
plt.tight_layout()
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## Test selection using Random Forest Classifiers
## Test selection using Random Forest Classifiers
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
First, build a random forest classifier based on the thresholded data. Here, 300 trees are trained in the tree ensemble.
First, build a random forest classifier based on the thresholded data. Here, 300 trees are trained in the tree ensemble.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
``` python
``` python
rf=RandomForestClassifier(n_estimators=300)
rf=RandomForestClassifier(n_estimators=300)
rf.fit(thdata['res'],thdata['mode'])
rf.fit(thdata['res'],thdata['mode'])
sortIdx=np.argsort(rf.feature_importances_)[::-1]
sortIdx=np.argsort(rf.feature_importances_)[::-1]
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Plot the confusion matrix for the random forest classifier for training data
Plot the confusion matrix for the random forest classifier for training data
Plot the variable importance, sorted, to get a ranking of predictor/residual usefullness in the classifier. Note that this classifier is not meant to be used in the diagnosis system.
Plot the variable importance, sorted, to get a ranking of predictor/residual usefullness in the classifier. Note that this classifier is not meant to be used in the diagnosis system.
Compute performance measures on false-alarm (FA), missed detection (MD), and an aggregated fault isolation (FI) when selecting residuals according to the ranking computed above.
Compute performance measures on false-alarm (FA), missed detection (MD), aggregated fault isolation (FI) and the probability of maximum isolability performance (FI-max)
when selecting residuals according to the ranking computed above.
Plot the probability of maximum fault isolation performance for each fault.
(Fig. 12 in the paper)
%% Cell type:code id: tags:
``` python
plt.figure(figsize=(10,10))
forkinrange(nf):
plt.plot(pmfi[:,k],label=thdata['modes'][k])
plt.legend(loc='upper right')
BoxOff()
```
%% Cell type:markdown id: tags:
Compute and display confusion matrices corresponding to selecting 10, 12, 26, and 27 residuals. The results should be compared to the confusion matrix above where all 42 residuals were used.
Compute and display confusion matrices corresponding to selecting 10, 12, 26, and 27 residuals. The results should be compared to the confusion matrix above where all 42 residuals were used.