How to use the Process Based Analytics

User guide to understand how the software is used

7 minute read

Once you have deployed our component, using either the suggested method of building a docker container, or installing the required dependencies manually, you can use python to run the scripts in the component’s root directory.

Contents

Testing the installation

Before you use the component any further, we recommend you make a short test to see if everything was setup correctly. For this test, set the following parameters inside the c.json configuration file:

  • “upto”: 2
  • “bagging_size”: 0.001
  • “validationdata_split”: 0.001
  • “testdata_split”: 0.0001
  • “max_learning_steps”: 150

This will cause the component to do a very short training, resulting in the quick execution of every sub-component.

Running the component

To run the component, simply execute the advanced_analytics_component script inside your running docker-container or your virtual environment (depending on the way you chose to install the component):

$ python3 advanced_analytics_component.py

Note that before you run the component, you should always clear the directories pointed to by the predictive_output_directory and prescriptions_save_path parameters of the c.json configuration file respectively. These folders don’t need to exists, but if they do, they should be empty.

If you don’t want to run the entire component, but only parts of it, you can also run the train_inductors.py, unite_ensemble_predictions.py or create_prescriptions.py scripts individually.

Expected output

The expected output of each step is shown below. The first script (train_inductors) trains the supervised learning ensemble on the selected dataset. It produces one csv-file for each model inside the ensemble. These csv-files contain the models’ predictions on the test-split of the chosen dataset. These files will be created in the directory pointed to by the predictive_output_directory parameter of the c.json configuration file. The same folder will also contain the saved weights of the trained models, if saving them was enabled in the configuration:

The second script (unite_ensemble_predictions) reads the predictions of the models inside the ensemble and creates from them one comprehensive ensemble-prediction for each step of each business process inside the test-split of the dataset. These predictions will be saved in a csv-file in the same directory.

The third script (create_prescriptions) uses the ensemble predictions to make prescriptions about whether to step in and possible save a failing business process at the point that each ensemble-prediction has been made. These prescriptions, along with useful information about the training process, are then saved to the directory that the prescriptions_save_path parameter of the c.json configuration file points to.

Explainable Predictive Process Monitoring

The generation of explanations involves: the input i.e., the instance to be explained, the corresponding prediction, and the black box. The output of this module is a set of counterfactual rules generated by an interpretable model in the form of a decision tree. The image below shows an example of such generated explanation for a loan application process prediction.

Given an instance of interest, with input features : {RequestedAmount = 25000, LoanGoal = Existing loan take over, ApplicationType = new credit, CreditScore = 0}

Whose correspoding prediction is: Denied The correponding counterfactual explanations according to the interpretable models could be: {CreditScore > 323} → Pending and {CreditScore ≤ 323, LoanGoal = Existing loan takeover, ApplicationType != New credit} → Pending.

Configuration options

The Configuration of the component is (almost, see below) entirely handled by the c.json configuration file inside the root folder of the source-code. Please note that this is only a temporary solution and subject to change in later iterations of the component.

Within the c.json configuration file you will find the following configurable values. Please keep the remaining values as is.

Parameters Datatype Description
predictive_output_directory string Path to a directory into which the ensemble predictions are to be saved. This path will also contain the trained models making up the ensemble. If the path does not exist, it will be created. However, the path should not point towards a directory with any content in it.
prescriptions_save_path string Path to a directory into which the reinforcement learning prescriptions are to be saved. If the path does not exist, it will be created. However, the path should not point towards a directory with any content in it.
verbose bool Setting this to true will result in additional messages to be displayed in the command line.
tensorboard bool This will enable tensorboard logging for the ensemble training.
save_model bool Enabling this parameter allows you to save the ensemble models.
max_epochs int Maximum number of epochs to be used in the training of each model inside the ensemble.
upto int Number of models to be included into the ensemble.
bagging_size double Relative portion of the training data that is to be used for the training of each model inside the ensemble.
traindata_split double Relative portion of the dataset that is to be used for training the ensemble.
validationdata_split double Relative portion of the dataset that is to be used for validating the training of the ensemble.
testdata_split double Relative portion of the dataset that is to be used for further training the reinforcement learning agent on.
neurons int Hyperparameter for the ensemble training, corresponding to the number of artificial neurons in each layer of the neural network.
layers int Hyperparameter for the ensemble training. Corresponds to the number of layers in the neural network.
learningrate double Hyperparameter for the ensemble training. The learning rate defines the step size at each iteration while moving toward a minimum of a loss func. Usually this value ranges btw 0 and 1.
batch_size int Hyperparameter for the ensemble training. Number of data instances to be used together by the stochastic gradient decent optimizer.
dropout double Hyperparameter for the ensemble training. Corresponds to the percentage of neurons to excluded during the training of each batch.
traindata_shuffle bool Whether to shuffle the order of the business processes in the portion of the dataset that is to be used for training the ensemble.
bagging_putback bool Whether the portions used to train the models that make up the ensemble is to have any overlap among each other.
max_episode_number int The maximum number of business processes used to train the reinforcement learning agent. Setting this to 0 has the agent train with every process that is available.
max_learning_steps int The maximum number of learning steps to be taken by the reinforcement learning agent.
n_steps int Hyperparameter for the PPO reinforcement learning algorithm. Length of the horizon.
nminibatches int Hyperparameter for the PPO reinforcement learning algorithm. Number of minibatches to be cut from the horizon.
gamma double Hyperparameter for the PPO reinforcement learning algorithm. Discount factor.
rl_learning_rate double Hyperparameter for the PPO reinforcement learning algorithm. Number multiplied with the clipped loss function before passing it to the optimizer.
ent_coef double Hyperparameter for the PPO reinforcement learning algorithm. Number multiplied with the entropy portion of the clipped loss function. Higher values lead to more random changes in the policy network.
vf_coef double Hyperparameter for the PPO reinforcement learning algorithm. Number multiplied to the value function loss in the clipped loss function.
cliprange double Hyperparameter for the PPO reinforcement learning algorithm. Cliprange for the policy loss.
save_path string Determines the path of an automatically newly generated file, in which the policy and value network’s weights are to be saved after training the reinforcement learning agent. Leave as the empty string to not save the networks’ weights.
load_path string Path to a file previously created using the save_path parameter. The weights from this file are then used to initialize the reinforcement learning agent’s policy and value networks. Leave as the empty string to initialise the networks randomly.

The selection of the dataset to be trained on is currently not covered by the configuration file. To change the selected dataset modify the advanced_analytics_component.py script by simply using one of the already imported data-definition scripts as argument in the “train” function call. Data-definition scripts exist for each of the four currently supported benchmark datasets.

As with the advanced_analytics_component.py script, you can also just alter the data-definition selection inside the train_inductors.py script, if you only want to train the ensemble and not run the entire component.


Last modified April 19, 2023: Files update (a811bf0)