Skip to content

Commit

Permalink
Merge pull request #39 from nishabalamurugan/custom-keyword-search
Browse files Browse the repository at this point in the history
added custom keyword search  feature
  • Loading branch information
dinpraka authored Sep 26, 2024
2 parents 1c37564 + 9a5e7bf commit 12597cc
Show file tree
Hide file tree
Showing 9 changed files with 1,046 additions and 11 deletions.
108 changes: 99 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Designed and Developed by Comcast Cybersecurity Research and Development Team</p
- [Enterprise Github Secrets Detection](#enterprise-github-secrets-detection)
- [Public Github Secrets Detection](#public-github-secrets-detection)
- [ML Model Training](#ml-model-training)
- [Custom Keyword Scan](#custom-keyword-scan)
- [License](#license)

## Overview
Expand Down Expand Up @@ -78,7 +79,7 @@ Designed and Developed by Comcast Cybersecurity Research and Development Team</p
```
pip list --outdated
```

## Search Patterns

- There are two ways to define configurations in xGitGuard
Expand Down Expand Up @@ -121,7 +122,6 @@ Designed and Developed by Comcast Cybersecurity Research and Development Team</p

#### Running Enterprise Secret Detection


- Traverse into the `github-enterprise` script folder

```
Expand Down Expand Up @@ -233,7 +233,6 @@ python enterprise_key_detections.py -o org_name #Ex: python enterprise_ke
python enterprise_key_detections.py -r org_name/repo_name #Ex: python enterprise_key_detections.py -r test_org/public_docker
```


##### Detections With ML Filter

xGitGuard also has an additional ML filter where users can collect their organization/targeted data and train their model. Having this ML filter helps in reducing the false positives from the detection.
Expand Down Expand Up @@ -323,9 +322,9 @@ optional arguments:
#### Running Public Credential Secrets Detection

- Traverse into the `github-public` script folder
```
cd github-public
```
```
cd github-public
```

> **Note:** User needs to remove the sample content from primary_keywords.csv and add primary keywords like targeted domain names to be searched in public GitHub.
Expand Down Expand Up @@ -524,8 +523,6 @@ To use ML Feature, ML training is mandatory. It includes data collection, featur

> **Note:** Labelling the collected secret is an important process to improve the ML prediction.


- Traverse into the "ml_training" folder

```
Expand Down Expand Up @@ -620,7 +617,6 @@ To use ML Feature, ML training is mandatory. It includes data collection, featur

> **Note:** Labelling the collected secret is an important process to use the ML effectively.

- Traverse into the "models" folder

```
Expand Down Expand Up @@ -726,6 +722,100 @@ Traverse into the "ml_training" folder
> - If persisted model **xgitguard\output\public\_\*xgg\*.pickle** is not present in the output folder, then use feature engineered data to create a model and persist it.
> - By default, when feature engineered data collected in Public mode not available, then model creation will be using enterprise-based engineered data.
## Custom Keyword Scan

- Traverse into the `custom-keyword-search` script folder

```
cd custom-keyword-search
```

#### Running Enterprise Keyword Search

#### Enterprise Custom Keyword Search Process

Please add the required keywords to be searched into config/enterprise_keywords.csv

```
# Run with given configs,
python enterprise_keyword_search.py
```

##### Command to Run Enterprise Scanner for targeted organization

```
# Run Run for targeted org,
python enterprise_keyword_search.py -o org_name #Ex: python enterprise_keyword_search.py -o test_ccs
```

##### Command to Run Enterprise Scanner for targeted repo

```
# Run Run for targeted repo,
python enterprise_keyword_search.py -r org_name/repo_name #Ex: python enterprise_keyword_search.py -r test_ccs/ccs_repo_1
```

##### Command-Line Arguments for Enterprise keyword Scanner

```
Run usage:
enterprise_keyword_search.py [-h] [-e Enterprise Keywords] [-o org_name] [-r repo_name] [-l Logger Level] [-c Console Logging]
optional arguments:
-h, --help show this help message and exit
-e Enterprise Keywords, --enterprise_keywords Enterprise Keywords
Pass the Enterprise Keywords list as a comma-separated string.This is optional argument. Keywords can also be provided in the `enterprise_keywords.csv` file located in the `configs` directory.
-o pass org name, --org Pass the targeted org list as a comma-separated string
-r pass repo name, --repo Pass the targeted repo list as a comma-separated string
-l Logger Level, --log_level Logger Level
Pass the Logging level as for CRITICAL - 50, ERROR - 40 WARNING - 30 INFO - 20 DEBUG - 10. Default is 20
-c Console Logging, --console_logging Console Logging
Pass the Console Logging as Yes or No. Default is Yes
```

#### Running Public Keyword Search

#### Public Custom Keyword Search Process

Please add the required keywords to be searched into config/public_keywords.csv

```
# Run with given configs,
python public_keyword_search.py
```

##### Command to Run Public Scanner for targeted organization

```
# Run Run for targeted org,
python public_keyword_search.py -o org_name #Ex: python public_keyword_search.py -o test_org
```

##### Command to Run Public Scanner for targeted repo

```
# Run Run for targeted repo,
python public_keyword_search.py -r org_name/repo_name #Ex: python public_keyword_search.py -r test_org/public_docker
```

##### Command-Line Arguments for Public keyword Scanner

```
Run usage:
public_keyword_search.py [-h] [-p Public Keywords] [-o org_name] [-r repo_name] [-l Logger Level] [-c Console Logging]
optional arguments:
-h, --help show this help message and exit
-e Public Keywords, --public_keywords Public Keywords
Pass the Public Keywords list as a comma-separated string.This is optional argument. Keywords can also be provided in the `public_keywords.csv` file located in the `configs` directory.
-o pass org name, --org Pass the targeted org list as a comma-separated string
-r pass repo name, --repo Pass the targeted repo list as a comma-separated string
-l Logger Level, --log_level Logger Level
Pass the Logging level as for CRITICAL - 50, ERROR - 40 WARNING - 30 INFO - 20 DEBUG - 10. Default is 20
-c Console Logging, --console_logging Console Logging
Pass the Console Logging as Yes or No. Default is Yes
```

### Additional Important Notes

- Users can update confidence_values.csv based on secondary_keys, secondary_creds, extensions value and give scoring from level 0 (lowest) to 5 (highest) to denote associated keyword suspiciousness.
Expand Down
2 changes: 1 addition & 1 deletion xgitguard/common/github_calls.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def run_github_search(self, search_query, extension, org=[], repo=[]):
)
sys.exit(1)

if not extension and extension == "others":
if not extension or extension == "others" or len(extension) == 0:
response = self.__github_api_get_params(
search_query, org_qualifiers, repo_qualifiers
)
Expand Down
1 change: 1 addition & 0 deletions xgitguard/config/enterprise_keywords.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
keyword
1 change: 1 addition & 0 deletions xgitguard/config/public_keywords.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
keyword
29 changes: 29 additions & 0 deletions xgitguard/config/xgg_configs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -104,3 +104,32 @@ secrets:
"Day",
"Hour",
]
keywords:
public_data_columns:
[
"Source",
"Second_Key",
"URL",
"Owner",
"Repo_Name",
"Commit_Details",
"Detected_Timestamp",
"Year",
"Month",
"Day",
"Hour",
]
enterprise_data_columns:
[
"Source",
"Second_Key",
"URL",
"Owner",
"Repo_Name",
"Commit_Details",
"Detected_Timestamp",
"Year",
"Month",
"Day",
"Hour",
]
Empty file.
Loading

0 comments on commit 12597cc

Please sign in to comment.