Skip to content

Latest commit

 

History

History

PredictiveMaintenanceModelingGuide

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

SQL ML Services: Predictive Maintenance Modeling Guide

Predicting machine failures before they happen - an important scenario for manufacturing. This template is comparable to the other Predictive Maintenance template. The problem is approached differently.

Deploy to Azure on SQL Server

Deploy to Azure (SQL Server)

In this template:

  • There are five data sources namely: telemetry, errors, maintenance, machines, failures

  • Data ingestion, feature engineering and data preparation is done using SQL code

  • Data visualization and multi-class classification model is done via R code running on SQL Server

Implementation prerequisites:

The directory structure for this template is as follows:

Implementation setup overview:

1

Input data overview:

  • Telemetry.csv: The telemetry time-series data consists of voltage, rotation, pressure and vibration measurements.

  • Errors.csv: The error logs contain non-breaking errors thrown while the machine is still operational and do not qualify as failures. The error date and times are rounded to the closest hour since the telemetry data is collected at an hourly rate.

  • Maint.csv: The scheduled and unscheduled maintenance records which correspond to both regular inspection of components as well as failures. A record is generated if a component is replaced during the scheduled inspection or replaced due to a break down.

  • Machines.csv: This data set includes machine model and age in years in service.

  • Failures.csv: These are the records of component replacements due to failures. Each record has a date and time, machine ID and failed component type associated with it.

Feature engineering overview:

The first step in predictive maintenance applications is feature engineering which combines the different data sources to create features that best describe a machines’ health condition at a given point in time.

  • Lag Features from Telemetry: Telemetry data almost always comes with time-stamps which makes it suitable for calculating lagging features. In the following template, rolling mean and standard deviation of the telemetry data over the last 3-hour lag window is calculated for every 3 hours.

  • Lag Features from Errors: Similar to telemetry, errors also come with time-stamps. However, unlike telemetry that had numerical values, errors have categorical values denoting the type of error that occurred at a time-stamp. In this case, aggregating methods such as averaging does not apply. Hence, counting the different categories is a more viable approach where lagging counts of different types of errors that occurred in the lag window are calculated.

  • Days Since Last Replacement from Maintenance: This data contains the information of component replacement records. A relevant feature from this data set is to calculate how long it has been since a component was last replaced.

  • Machine Features: The machine features are used directly since they hold descriptive information about the type of the machines and their age which is defined as the years in service.

Label Construction:

The prediction problem for this example scenario is to compute the probability that a machine will fail in the next 24 hours due to a certain component failure (component 1,2,3 or 4). The rest of the records are labeled as "none" indicating there is no failure within the next 24 hours.

Modeling: Training, Validation and Evaluation

For predictive maintenance problems, a time-dependent splitting strategy is used to estimate performance which is done by validating and testing on examples that are later in time than the training examples. For a time-dependent split, a point in time is picked and model is trained on examples up to that point in time, and validated on the examples after that point assuming that the future data after the splitting point is not known.

Implementation process:

  • Create your SQL Server, then enable ML services.

  • Install any R IDE with Microsoft R client. Ensure to check your R code can access your SQL Server DB with the credentials.

  • In the SQL Server, create a database where you would like to load the datasets and perform feature engineering. Then run the SQL scripts in this order:

    • pdm_data_ingestion.sql (edit the folder path to where you have saved the CSV files)
    • pdm_feature_engineering.sql
  • In your R IDE, run the following scripts:

    • pdm_visualization.R,
    • pdm_modeling.R

    Ensure that you edit the code with your SQL Server credentials in both the R scripts.

  • The final dataset is loaded onto SQL Server with the evaluation metrics: metrics_df.

Operationalization (next steps):

The SQL/R codes can be converted to Stored SQL procedures which can run in the SQL environment. Then a PowerShell script can be used to invoke the modeling steps end-to-end.