Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ransomware production version #1176

Open
wants to merge 2 commits into
base: branch-23.11
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 17 additions & 0 deletions .idea/Morpheus.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions ci/scripts/gitutils.py
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,7 @@ def get_merge_target():

def determine_merge_commit(current_branch="HEAD"):
"""
When running outside of CI, this will estimate the target merge commit hash of `current_branch` by finding a common
When running outside of CI, this will estimate the target merge commit hash of `current_branch` by finding a common2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find/replace error

ancester with the remote branch 'branch-{major}.{minor}' where {major} and {minor} are determined from the repo
version.

Expand All @@ -416,7 +416,7 @@ def determine_merge_commit(current_branch="HEAD"):
Returns
-------
str
The common commit hash ID
The common2 commit hash ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find/replace error

"""

remote_branch = get_merge_target()
Expand Down
14 changes: 7 additions & 7 deletions docs/source/cloud_deployment_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ pod/sdk-cli-helper 1/1 Running 0 41s
Connect to the **sdk-cli-helper** container and copy the models to `/common`, which is mapped to `/opt/morpheus/common` on the host and where MLflow will have access to model files.

```bash
kubectl -n $NAMESPACE exec sdk-cli-helper -- cp -RL /workspace/models /common
kubectl -n $NAMESPACE exec sdk-cli-helper -- cp -RL /workspace/models /common2
```

### Install Morpheus MLflow
Expand Down Expand Up @@ -263,7 +263,7 @@ Delete deployed models from Morpheus AI Engine:
Now that we've figured out how to deploy models let's move on to the next step. Now it's time to deploy the relevant models, which have already been copied to `/opt/morpheus/common/models` which are bound to `/common/models` within the MLflow pod.

```bash
(mlflow) root@mlflow-6d98:/mlflow# ls -lrt /common/models
(mlflow) root@mlflow-6d98:/mlflow# ls -lrt /common2/models
```

Output:
Expand Down Expand Up @@ -292,7 +292,7 @@ Publish and deploy sid-minibert-onnx model:
```bash
(mlflow) root@mlflow-6d98:/mlflow# python publish_model_to_mlflow.py \
--model_name sid-minibert-onnx \
--model_directory /common/models/triton-model-repo/sid-minibert-onnx \
--model_directory /common2/models/triton-model-repo/sid-minibert-onnx \
--flavor triton
```

Expand All @@ -309,7 +309,7 @@ Publish and deploy phishing-bert-onnx model:
```bash
(mlflow) root@mlflow-6d98:/mlflow# python publish_model_to_mlflow.py \
--model_name phishing-bert-onnx \
--model_directory /common/models/triton-model-repo/phishing-bert-onnx \
--model_directory /common2/models/triton-model-repo/phishing-bert-onnx \
--flavor triton
```
```bash
Expand All @@ -325,7 +325,7 @@ Publish and deploy abp-nvsmi-xgb model:
```bash
(mlflow) root@mlflow-6d98:/mlflow# python publish_model_to_mlflow.py \
--model_name abp-nvsmi-xgb \
--model_directory /common/models/triton-model-repo/abp-nvsmi-xgb \
--model_directory /common2/models/triton-model-repo/abp-nvsmi-xgb \
--flavor triton
```

Expand Down Expand Up @@ -400,7 +400,7 @@ helm delete -n $NAMESPACE <YOUR_RELEASE_NAME>
To publish messages to a Kafka topic, we need to copy datasets to locations where they can be accessed from the host.

```bash
kubectl -n $NAMESPACE exec sdk-cli-helper -- cp -R /workspace/examples/data /common
kubectl -n $NAMESPACE exec sdk-cli-helper -- cp -R /workspace/examples/data /common2
```

Refer to the [Morpheus CLI Overview](https://github.com/nv-morpheus/Morpheus/blob/branch-23.11/docs/source/basics/overview.rst) and [Building a Pipeline](https://github.com/nv-morpheus/Morpheus/blob/branch-23.11/docs/source/basics/building_a_pipeline.rst) documentation for more information regarding the commands.
Expand Down Expand Up @@ -539,7 +539,7 @@ Make sure you create input and output Kafka topics before you start the pipeline
kubectl -n $NAMESPACE exec -it deploy/broker -c broker -- kafka-console-producer.sh \
--broker-list broker:9092 \
--topic <YOUR_INPUT_KAFKA_TOPIC> < \
<YOUR_INPUT_DATA_FILE_PATH_EXAMPLE: /opt/morpheus/common/data/email.jsonlines>
<YOUR_INPUT_DATA_FILE_PATH_EXAMPLE: /opt/morpheus/common2/data/email.jsonlines>
```

> **Note**: This should be used for development purposes only via this developer kit. Loading from the file into Kafka should not be used in production deployments of Morpheus.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
#
# Configuration file for the Sphinx documentation builder.
#
# This file does only contain a selection of the most common options. For a
# This file does only contain a selection of the most common2 options. For a
# full list see the documentation:
# http://www.sphinx-doc.org/en/master/config

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ class DFPFileToDataFrameStage(PreallocatorMixin, SinglePortStage):
Input schema for the DataFrame.
filter_null : bool, optional
Whether to filter null values from the DataFrame.
file_type : `morpheus.common.FileTypes`, optional
file_type : `morpheus.common2.FileTypes`, optional
File type of the input files. If `FileTypes.Auto`, the file type will be inferred from the file extension.
parser_kwargs : dict, optional
Keyword arguments to pass to the DataFrame parser.
Expand Down
33 changes: 14 additions & 19 deletions examples/ransomware_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/model
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--model-control-mode=explicit \
--load-model ransomw-model-short-rf
--load-model ransomware_model_tl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of our models use hyphens instead of underscores. Can you rename the model ransomware-model-tl?

```

##### Verify Model Deployment
Expand All @@ -53,7 +53,7 @@ Once Triton server finishes starting up, it will display the status of all loade
+----------------------------+---------+--------+
| Model | Version | Status |
+----------------------------+---------+--------+
| ransomw-model-short-rf | 1 | READY |
| ransomware_model_tl | 1 | READY |
+----------------------------+---------+--------+
```

Expand All @@ -72,10 +72,12 @@ Run the following from the `examples/ransomware_detection` directory to start th
```bash
python run.py --server_url=localhost:8001 \
--sliding_window=3 \
--model_name=ransomw-model-short-rf \
--model_name=ransomware_model_tl \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include some high-level information about the model, including details about the dataset used for training? Additionally, it would be greatly beneficial to explain how to generate the dataset for training the models or specify the required data for running the inference pipeline.

Would be valuable to create a notebook that demonstrates how to train the model using a sample dataset and run it through the pipeline to showcase its ransomware capabilities.

Providing an explanation of the output structure generated by the pipeline would greatly enhance the comprehensibility of the documentation. Adding this information to the documentation would be much appreciated.

--conf_file=./config/ransomware_detection.yaml \
--input_glob=${MORPHEUS_ROOT}/examples/data/appshield/*/snapshot-*/*.json \
--output_file=./ransomware_detection_output.jsonlines
--input_topic=ransomware_input \
--output_topic=ransomware_output \
--bootstrap_servers broker:9092 \
--group_id ransomware_group
```

Input features for a short model can be taken from every three snapshots sequence, such as (1, 2, 3), or (2, 3, 4). The sliding window represents the number of subsequent snapshots that need to be taken into consideration when generating the input for a model. Sliding window for the medium model is `5` and for the long model it is `10`.
Expand Down Expand Up @@ -108,20 +110,13 @@ Options:
--server_url TEXT Tritonserver url [required]
--sliding_window INTEGER RANGE Sliding window to be used for model input
request [x>=1]
--input_glob TEXT Input glob pattern to match files to read.
--input_topic TEXT Input Kafka topic for receiving the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove Dask related options?

data [required]
--output_topic TEXT Output Kafka topic for receiving the
data [required]
--bootstrap_servers TEXT Kafka bootstrap server [required]
For example,
'./input_dir/*/snapshot-*/*.json' would read
all files with the 'json' extension in the
directory 'input_dir'. [required]
--watch_directory BOOLEAN The watch directory option instructs this
stage to not close down once all files have
been read. Instead it will read all files
that match the 'input_glob' pattern, and
then continue to watch the directory for
additional files. Any new files that are
added that match the glob will then be
processed.
--output_file TEXT The path to the file where the inference
output will be saved.
broker:9092
--group_id TEXT Kafka group_id topic [required]
--help Show this message and exit.
```
14 changes: 0 additions & 14 deletions examples/ransomware_detection/common/__init__.py
Original file line number Diff line number Diff line change
@@ -1,14 +0,0 @@
# SPDX-FileCopyrightText: Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Comment on lines -1 to -14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this copyright removed?

37 changes: 7 additions & 30 deletions examples/ransomware_detection/common/feature_constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,38 +15,15 @@

class FeatureConstants():

FILE_EXTN_EXP = '.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.CPL'

FULL_MEMORY_ADDRESS = 2147483647

HANDLES_TYPES = [('Directory', 'directory'), ('TpWorkerFactory', 'tpworkerfactory'),
('WaitCompletionPacket', 'waitcompletionpacket'), ('Section', 'section'), ('File', 'file'),
('Mutant', 'mutant'), ('Event', 'event'), ('Semaphore', 'semaphore'), ('Key', 'key'),
('IoCompletion', 'iocompletion'), ('ALPC Port', 'alpc port'), ('Thread', 'thread')]

HANDLES_TYPES_2 = [('IoCompletionReserve', 'iocompletionreserve'), ('Desktop', 'desktop'),
('EtwRegistration', 'etwregistration'), ('WindowStation', 'windowstation')]

PROTECTIONS = {
'PAGE_EXECUTE_READWRITE ': 'page_execute_readwrite',
'PAGE_NOACCESS ': 'page_noaccess',
'PAGE_EXECUTE_WRITECOPY ': 'page_execute_writecopy',
'PAGE_READONLY ': 'page_readonly',
'PAGE_READWRITE ': 'page_readwrite'
'PAGE_READONLY ': 'PAGE_READONLY_RATIO',
'PAGE_EXECUTE_WRITECOPY ': 'PAGE_EXECUTE_WRITECOPY_RATIO',
'PAGE_READWRITE ': 'PAGE_READWRITE_RATIO',
'PAGE_NOACCESS ': 'PAGE_NOACCESS_RATIO',
'PAGE_EXECUTE_READWRITE ': 'PAGE_EXECUTE_READWRITE_RATIO'
}

WAIT_REASON_LIST = ['9', '31', '13']

VAD = 'Vad '

VADS = 'VadS'

PAGE_NOACCESS = 'PAGE_NOACCESS '

PAGE_EXECUTE_READWRITE = 'PAGE_EXECUTE_READWRITE '

PAGE_EXECUTE_WRITECOPY = 'PAGE_EXECUTE_WRITECOPY '

PAGE_READONLY = 'PAGE_READONLY '

PAGE_READWRITE = 'PAGE_READWRITE '
STATE_LIST = [2,4,5]
WAIT_REASON_LIST = [9,13,15,22,31]
Loading