This repository contains all of the primitives developed by the teams at KUNGFU.AI, Yonder, and New Knowledge for the D3M program.
kf-d3m-primitives requires Python 3.6, and the easiest
way to install it is via pip
:
pip install kf-d3m-primitives
The latest versions of D3M datasets can be downloaded by running the following script from inside the cloned directory. D3M Gitlab credentials are required.
python download_datasets.py
To make a docker image with kf-d3m-primitives installed on top of the D3M program image run:
make build
To download the large static volumes that are necessary to run and test some of the primitives run:
make volumes
To run the image with the downloaded datasets and static volumes mounted run:
make run
To test that each primitive's produce
method, and, where applicable, its set_training_data
, fit
, get_params
, and set_params
methods can be called sucessfully within D3M pipelines, run the following command. This will also test that the predictions produced on test sets by each pipeline that can be scored by the D3M runtime
.
make test
To generate json
annotations for all primitives with the required directory structure for D3M submission run:
make annotations
To generate yml.gz
pipeline run documents for all CPU-dependent pipelines with the required directory structure for D3M submission run:
make pipelines-cpu
To generate yml.gz
pipeline run documents for all GPU-dependent pipelines with the required directory structure for D3M submission run:
make pipelines-gpu
-
DataCleaningPrimitive: wrapper of the data cleaning primitive based on the punk library.
-
DukePrimitive: wrapper of the Duke library in the D3M infrastructure.
-
SimonPrimitive: LSTM-FCN neural network trained on 18 different semantic types, which infers the semantic type of each column. Base library here.
-
GoatForwardPrimitive: geocodes names of locations into lat/long pairs with requests to photon geocoding server (based on OpenStreetMap).
-
GoatReversePrimitive: geocodes lat/long pairs into geographic names of varying granularity with requests to photon geocoding server (based on OpenStreetMap).
-
StorcPrimitive: wrapper of tslearn's kmeans implementations.
-
SpectralClustering: wrapper of Spectral Clustering.
-
PcaFeaturesPrimitive: wrapper of the Punk feature ranker into D3M infrastructure.
-
RfFeaturesPrimitive wrapper of the Punk punk rrfeatures library into D3M infrastructure.
- TsnePrimitive: wrapper of TSNE.
- Sent2VecPrimitive: converts sentences into numerical feature representations. Base library here.
- GatorPrimitive: Inception V3 model pretrained on ImageNet finetuned for classification.
- ObjectDetectionRNPrimitive: wrapper of the Keras implementation of Retinanet from this repo. The original Retinanet paper can be found here.
-
KaninePrimitive: wrapper of KNeighborsTimeSeriesClassifier.
-
LstmFcnPrimitive: wrapper of LSTM Fully Convolutional Networks for Time Series Classification.
-
DeepArPrimitive: wrapper of DeepAR - a recurrent, autoregressive, probabilistic time series forecasting method from GluonTS.
-
NBEATSPrimitive: wrapper of N-BEATS - Neural basis expansion analysis for interpretable time series forecasting from GluonTS.
-
VarPrimitive: wrapper of VAR for multivariate time series and auto_arima for univariate time series.
shap_explainers: wrapper of Lundberg's shapley values implementation for tree models. Currently integrated into d3m.primitives.learner.random_forest.DistilEnsembleForest as produce_shap_values().
-
RemoteSensingPretrainedPrimitive: featurizes remote sensing imagery using pre-trained models that were optimized with a self-supervised objective. There are two inference models that correspond to two pretext tasks: Augmented Multiscale Deep InfoMax and Momentum Contrast. The implementation of the inference models comes from this repo.
-
MlpClassifierPrimitive: trains a two-layer neural network classifier on featurized remote sensing imagery. Produces heatmap visualizations for predictions using gradient-based GradCam technique.
-
ImageRetrievalPrimitive: retrieves semantically similar images from an index of un-annotated images using heuristics. Supports an iterative, human-in-the-loop, retrieval pipeline.