Set up kg-ontoml to call NEAT to train classifiers and get metrics (AUC, precision, recall) #16

justaddcoffee · 2022-03-25T22:11:12Z

Per our conversation today with @GuoJing @caufieldjh in the OntoML meeting, we'd like to set up kg-ontoml to call NEAT and train classifiers (logistic regression, random forest, and MLP).

For each graph we want to do the learning task on, we will write a NEAT.yaml and also upload an embedding file, and the existing kg-hug-scheduler will train three classifiers (logistic regression, random forest and MLP with some layers that will not change across experiments, and emit some metrics like validatoin AUC, precision, recall, etc. The NEAT.yaml should basically not have a graph block, should have an embedding block with a pointer to a file that already exists, and a classifier block that is basically like this:

classifier:
  edge_method: Average # one of EdgeTransformer.methods: Hadamard, Sum, Average, L1, AbsoluteL1, L2, or alternatively a lambda
  classifiers:  # a list of classifiers to be trained
    - type: neural network
      model:
        outfile: "model_mlp_test_yaml.h5"
        classifier_history_file_name: "mlp_classifier_history.json"
        type: tensorflow.keras.models.Sequential
        layers:
          - type: tensorflow.keras.layers.Input
            parameters:
              shape: 868   # must match embedding_size up above
          - type: tensorflow.keras.layers.Dense
            parameters:
              units: 128
              activation: relu
          - type: tensorflow.keras.layers.Dense
            parameters:
              units: 32
              activation: relu
              # TODO: fix this:
              # activity_regularizer: tensorflow.keras.regularizers.l1_l2(l1=1e-5, l2=1e-4)
          - type: tensorflow.keras.layers.Dropout
            parameters:
              rate: 0.5
          - type: tensorflow.keras.layers.Dense
            parameters:
              units: 16
              activation: relu
          - type: tensorflow.keras.layers.Dense
            parameters:
              units: 1
              activation: sigmoid
      model_compile:
        loss: binary_crossentropy
        optimizer: nadam
        metrics:  # these can be tensorflow objects or a string that tensorflow understands, e.g. 'accuracy'
          - type: tensorflow.keras.metrics.AUC
            parameters:
              curve: PR
              name: auprc
          - type: tensorflow.keras.metrics.AUC
            parameters:
              curve: ROC
              name: auroc
          - type: tensorflow.keras.metrics.Recall
            parameters:
              name: Recall
          - type: tensorflow.keras.metrics.Precision
            parameters:
              name: Precision
          - type: accuracy
      model_fit:
        parameters:
          batch_size: 4096
          epochs: 5  # typically much higher
          callbacks:
            - type: tensorflow.keras.callbacks.EarlyStopping
              parameters:
                monitor: val_loss
                patience: 5
                min_delta: 0.001  # min improvement to be considered progress
            - type: tensorflow.keras.callbacks.ReduceLROnPlateau
    - type: Decision Tree
      model:
        outfile: "model_decision_tree_test_yaml.h5"
        type: sklearn.tree.DecisionTreeClassifier
        parameters:
          max_depth: 30
          random_state: 42
    - type: Random Forest
      model:
        outfile: "model_random_forest_test_yaml.h5"
        type: sklearn.ensemble.RandomForestClassifier
        parameters:
          n_estimators: 500
          max_depth: 30
          n_jobs: 8  # cpu count
          random_state: 42
    - type: Logistic Regression
      model:
        outfile: "model_lr_test_yaml.h5"
        type: sklearn.linear_model.LogisticRegression
        parameters:
          random_state: 42
          max_iter: 1000

Guojing also has a GNN set up that will likely do well on this learning task. For this, he will also produce embeddings, which we can run through the above NEAT pipeline to assess how it does with the HP-MP task. (Guojing will also investigate using the GNN directly on this HP-MP task, without making embeddings, but this won't be a part of the feature described in this ticket.)

The text was updated successfully, but these errors were encountered:

caufieldjh · 2022-03-28T00:49:32Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up kg-ontoml to call NEAT to train classifiers and get metrics (AUC, precision, recall) #16

Set up kg-ontoml to call NEAT to train classifiers and get metrics (AUC, precision, recall) #16

justaddcoffee commented Mar 25, 2022

caufieldjh commented Mar 28, 2022

caufieldjh commented Mar 30, 2022

caufieldjh commented Mar 30, 2022 •

edited

Loading

caufieldjh commented Apr 8, 2022

Set up kg-ontoml to call NEAT to train classifiers and get metrics (AUC, precision, recall) #16

Set up kg-ontoml to call NEAT to train classifiers and get metrics (AUC, precision, recall) #16

Comments

justaddcoffee commented Mar 25, 2022

caufieldjh commented Mar 28, 2022

caufieldjh commented Mar 30, 2022

caufieldjh commented Mar 30, 2022 • edited Loading

caufieldjh commented Apr 8, 2022

caufieldjh commented Mar 30, 2022 •

edited

Loading