Artificial Intelligence & Machine Learning driven Predictive External corrosion Failure
1.0 Introduction
IDARE data scientists developed an algorithm to identify governing factors of external corrosion driven failures from actual history data using machine learning enabled artificial intelligence. The algorithm is capable of i) process unstructured inspection report, ii) identify importance of features that resulted in external corrosion failures and iii) predict risk category up to 96% reliability from external corrosion.
In order to identify governing factors for external corrosion several supervised and unsupervised machine learning algorithms were utilized. PHMSA pipeline incident public report were utilized to train the machine learning algorithm and validate the models. Following machine learning technique were utilized,
- Artificial Neural Network with deep learning
- Gradient Boosting Machine (GBM)
- Support Vector Regression (SVR)
- SVR with Radial Basis Function (SVR RBF)
- Decision Tree
- Random Forest
- Linear Regression
- Multilinear Regression
2.0 History Data
- PHMSA Onshore Pipeline Failure History (2010-Present)
- Processed unstructured data
- Applied 7 ML algorithm
- Identified governing Factors
- Tested on out of Sample Data
Following is the data type and structure of the raw data shared illustrating the type of unstructured data. (try hovering around the cells)
Using 7 different machine learning algorithm following feature’s importance factor on external corrosion failure is identified.
3.0 Identification of Governing Factors for Failure
Based on raw history data that includes about 489 features for different kind of on-shore pipeline failure importance factors that causes the failure due to external and internal corrosion is identified. Relative importance factor is also estimated that provides important insights about failure due to corrosion.
Following figure shows the importance factor by gradient boosting machine, as gradient boosting machine algorithm provides better accuracy.
3.0 Prediction of Corrosion Failure
The model was trained based on PHMSA accident report from 2010 to present. All the machine learning algorithm was utilized to train for predictive model.
Download the hazardous liquid accident data from PHMSA website from the following link (https://www.phmsa.dot.gov/sites/phmsa.dot.gov/files/data_statistics/pipeline/accident_hazardous_liquid_jan2010_present.zip ). Download the files –> change file extension to csv –> upload the csv file –> Submit. Then match the result for the cause detail of external corrosion.
`
Above live map illustrates offshore assets and relevant data sourced from data analytics. Results from real time data analytics can be feed at the backend and dynamic update is possible. This is a presentation of live dynamic documents newly introduced in the industry.
4.0 Analysis Results
Based on above failure data an assessment is been performed to find out governing factor of corrosion related failures.
Following chart shows the root mean square error for all the model for model prediction validation.
Estimated RMSE value on Predicted Data
Following chart illustrates the failure prediction by the test data or out of sample data predicted by ML models.

Confusion matrix for prediction of external corrosion failure
Following table shows Risk of the analyzed existing pipeline subjected to different scale of risk.
Following Chart shows the failure probability distribution predicted by different Machine Learning Algorithm

Predicted failure probability for the raw test data