Skip to content

Commit

Permalink
Cumulus 3742 record generation script (#3679)
Browse files Browse the repository at this point in the history
* little POC

* attempting s3 interface

* more intelligent naming?

* ci specific test target so that we dno't do this locally

* expanded boilerplate to check how it works

* new delibterate failure

* using jq

* boilerplate expanded all around

* pull out the @Cumulus pprefix

* deliberate failure to look at structure

* a bunch of chances to fail to get a wider set of errors

* guaranteed failure

* forced failure in logger

* fix pipe failing and fix random failures

* fix copypasta

* date without slashes to keep from overnesting

* remove deliberate test failures

* better naming for bucket

* works but no tests yet

* trim down and parse more info

* Initial commit

* WIP testing needs new date format

* new date format

* string format updates and tests turned on

* point to the right bucket

* remove unneeded import

* remove unneeded dev dep

* code cleanup

* add test:ci to unit failure archive code

* small cleanup

* switch to scripts to reduce the amount of little bits of code

* fix grep symbol

* pass in aws credentials from system

* remove delibterate failure

* add scripts to tsconfig.eslint.json

* linter fixes

* changelog

* how bout this?

* push to s3 needs to happen outside docker container

* prepare target

* refactor to get aws credentials into imgage

* remove the aws push from unit-tests

* turn off the ci script in this for the moment

* get aws tow ork

* how bout this to get units to work

* tsc:listEmittedFiles

* test fixes

* remove and ignore the stream output from ci

* remove mkdir from package.json

* cleanup aws creds leftover

* Fix shell script logic

* check deliberate failure again

* pipe standard error to tee

* coverage numbers

* random failures

* why didn't a real error end up in s3

* creating the target fileexplicitly

* why no report?

* no typpo backslack

* exit more exitly?

* correctly call bash function

* will this solve my problem

* but why don't outputs come out anymore?

* explicit build dir?

* usr /tmp directory?

* see if I Can turn log tailing back on

* central set of CUMULUS_UNIT_TEST_DATA

* remove delibterate failures

* make sure unit-logs file is removed

* delete file in docker if it was created in docker

* get defaults so local stack still works

* Unit test refactor

* Refactor pipeline

* Revert "Merge branch 'jk/CUMULUS-1/debug/http-test' into CUMULUS-3720-unit-test-failure-archive"

This reverts commit f823f24, reversing
changes made to 004d7f3.

* Revert "Merge branch 'jk/CUMULUS-1/debug/http-test' into CUMULUS-3720-unit-test-failure-archive"

This reverts commit 004d7f3, reversing
changes made to ac0b449.

* WIP breaking tyring to switch to ts

* looks like this needs to have specified an environment prefix

* why tsc failure

* fix ts errors in main.ts

* lint fix

* lower case enforcement and ensure bucket exists

* WIP brought resources into the same directory

* hardcoded strings to lowercase

* WIP guessing at gran/file/exec upload

* changed test coverage

* make sure tail is spittin gout all of its data before being killed

* adding changelog

* DISABLE_PG_SSL

* WIP granules, files and executions work but collections don't

* puts up many collections/granules/files/executions/granule-executions

* works though not user friendly

* switch to top level parallelization using pmap

* de-parallelize execution-granules upload

* flow down from user arguments

* less chatty debug

* use env vars to set DEPLOYMENT and bucket

* proper bucket

* changelog

* linter error

* exeuctions per granule ratio argument

* pass around models to share pointers between threads

* properly handle granulesK

* add tryCatches

* temp check what our args are after parsing

* stupid log levels

* remove debug output

* typehints and typing improvements

* tests broken wip

* much less clujey generator but still doesn't work

* tests added

* more jsdocs

* files better paired to collection configuration

* docstring for files

* env variable configuration

* pg db collections rather than api

* error install

* move provider to pg model and remove DEPLOYMENT and INTENRAL_BUCKET

* add Readme for invocation clarity

* cleanup

* linter fixes

* bugfixes

* remove chatty printout

* knex typing

* missed Knex specification

* rounding out unit tests

* linter error

* don't know where this came in

* add progess bar to avoid integer spam but let user know

* name fix (it's not pretending)

* sampleFileName corrected

* lint fixes

* fix leftover debug variable

* typo fix in readme

* refactor to put loaders in separate file

* tests expanded and improved

* PR feedback on var naming and configurability

* PR feedback on generator readability

* need longer unique IDs for granule and file, being limited by collisions

* typo fix

* pr feedback readme fix

* pull const declaration outside loop as let

* little nit for this to be more realistic

* switch to fake collections and providers

* coverage update and added superficial tests for colelction/provider

* get rid of unnecessary json files

* lint fixes

* migrate to inserts

* varied statuses

* cleaned up variance branch

* coverage

* lint fix

* error fix in batch params

* fix t yping in insert

* test for params returned from iterableGenerator

* remove swallowErrors arg and just use pmap stopOn

* lint error

* add end to end test for the main function

* simplify for ci debug, serial end to end

---------

Co-authored-by: Jonathan Kovarik <[email protected]>
Co-authored-by: Jonathan Kovarik <[email protected]>
  • Loading branch information
3 people authored Jun 18, 2024
1 parent 77fc472 commit 4c65286
Show file tree
Hide file tree
Showing 12 changed files with 1,284 additions and 7 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,8 @@ operations (e.g. `PREFIX-AsyncOperationEcsLogs`).
to granules table

### Added

- **CUMULUS-3742**
- Script for dumping data into postgres database for testing and replicating issues
- **CUMULUS-3614**
- `tf-modules/monitoring` module now deploys Glue table for querying dead-letter-archive messages.
- **CUMULUS-3616**
Expand Down
4 changes: 3 additions & 1 deletion packages/db/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,9 @@ export {
PostgresFile,
PostgresFileRecord,
} from './types/file';

export {
PostgresGranuleExecution,
} from './types/granule-execution';
export {
translateApiAsyncOperationToPostgresAsyncOperation,
translatePostgresAsyncOperationToApiAsyncOperation,
Expand Down
44 changes: 40 additions & 4 deletions packages/db/src/models/granules-executions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,21 @@ export default class GranulesExecutionsPgModel {
}

async create(
knexTransaction: Knex.Transaction,
knexTransaction: Knex | Knex.Transaction,
item: PostgresGranuleExecution
) {
return await knexTransaction(this.tableName).insert(item);
}

async exists(
knexTransaction: Knex.Transaction,
knexTransaction: Knex | Knex.Transaction,
item: PostgresGranuleExecution
) {
return isRecordDefined(await knexTransaction(this.tableName).where(item).first());
}

async upsert(
knexTransaction: Knex.Transaction,
knexTransaction: Knex | Knex.Transaction,
item: PostgresGranuleExecution
) {
return await knexTransaction(this.tableName)
Expand All @@ -38,7 +38,25 @@ export default class GranulesExecutionsPgModel {
.merge()
.returning('*');
}

/**
* Creates multiple granuleExecutions in Postgres
*
* @param {Knex | Knex.Transaction} knexOrTransaction - DB client or transaction
* @param {PostgresGranuleExecution[]} items - Records to insert into the DB
* @param {string | Array<string>} returningFields - A string or array of strings
* of columns to return. Defaults to 'cumulus_id'.
* @returns {Promise<PostgresGranuleExecution[]>} Returns an array of objects
* from the specified column(s) from returningFields.
*/
async insert(
knexOrTransaction: Knex | Knex.Transaction,
items: PostgresGranuleExecution[],
returningFields: string | string[] = '*'
): Promise<PostgresGranuleExecution[]> {
return await knexOrTransaction(this.tableName)
.insert(items)
.returning(returningFields);
}
/**
* Get execution_cumulus_id column values from the granule_cumulus_id
*
Expand Down Expand Up @@ -98,6 +116,24 @@ export default class GranulesExecutionsPgModel {
return knexTransaction<PostgresGranuleExecution>(this.tableName)
.where(query);
}
async count(
knexOrTransaction: Knex | Knex.Transaction,
params: ([string, string, string] | [Partial<PostgresGranuleExecution>])[]
) {
const query = knexOrTransaction(this.tableName)
.where((builder) => {
params.forEach((param) => {
if (param.length === 3) {
builder.where(...param);
}
if (param.length === 1) {
builder.where(param[0]);
}
});
})
.count();
return await query;
}
}

export { GranulesExecutionsPgModel };
2 changes: 1 addition & 1 deletion packages/db/src/test-utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ export const fakeCollectionRecordFactory = (
params: Partial<PostgresCollection>
): PostgresCollection => ({
name: cryptoRandomString({ length: 5 }),
version: '0.0.0',
version: '001',
sample_file_name: 'file.txt',
granule_id_extraction_regex: 'fake-regex',
granule_id_validation_regex: 'fake-regex',
Expand Down
8 changes: 8 additions & 0 deletions scripts/generate_records/.nycrc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"extends": "../../nyc.config.js",
"all": true,
"statements": 76,
"functions": 78,
"branches": 65,
"lines": 77
}
22 changes: 22 additions & 0 deletions scripts/generate_records/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Generate DB records

This script (generate_db_records.js) is meant to push up large quantities of realistic cumulus database entries for scaled testing purposes

## Installation
This can be installed with npm install in this directory (or will be installed as a part of cumulus when installing the whole of cumulus-core)

generate_db_records.js is tested to run with both node v16.19.0 and v20.12.2

## Configuration
the script can be configured either through command line arguments or environment variables (or both), preferring command line arguments if both are supplied

| Argument | Environment | Default | Description |
| --- | :----: | :----: | ---: |
| --collections <br>-c | COLLECTIONS | 1 | number of collections. number of granules will be <br> for *each* collection, not divided among them |
| --granules_k <br> -g| GRANULES_K | 10 | number of granules, in thousands |
| --executionsPerGranule <br> -e | EXECUTIONS_PER_GRANULE | 2:2 | number of executions *x* per <br> batch of granules *g* in format 'x:g' <br> \<executionsPerBatch>:\<granulesPerBatch> |
| --files <br> -f | FILES | 1 | number of files per granule |
| --concurrency <br> -C | CONCURRENCY | 1 | how many threads of parallelization <br> concurrency should usually be >100 |
| --variance <br> -v| VARIANCE | false | randomize executions and granules per batch, <br> adding up to 6 granules and or executions to a given batch |
| --swallowErrors <br> -s|SWALLOW_ERRORS| true | swallow and move on from data data upload errors |

235 changes: 235 additions & 0 deletions scripts/generate_records/db_record_loaders.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
// @ts-check

const {
CollectionPgModel,
ProviderPgModel,
fakeGranuleRecordFactory,
fakeFileRecordFactory,
fakeExecutionRecordFactory,
fakeRuleRecordFactory,
RulePgModel,
fakeCollectionRecordFactory,
fakeProviderRecordFactory,
} = require('@cumulus/db');
const { randomString } = require('@cumulus/common/test-utils');
const range = require('lodash/range');
const { randomInt } = require('crypto');

/**
* @typedef {import('@cumulus/db').PostgresFile} PostgresFile
* @typedef {import('@cumulus/db').PostgresGranule} PostgresGranule
* @typedef {import('@cumulus/db').PostgresCollection} PostgresCollection
* @typedef {import('@cumulus/db').GranulesExecutionsPgModel} GranulesExecutionsPgModel
* @typedef {import('@cumulus/db').ExecutionPgModel} ExecutionPgModel
* @typedef {import('@cumulus/db').GranulePgModel} GranulePgModel
* @typedef {import('@cumulus/db').FilePgModel} FilePgModel
* @typedef {import('@cumulus/db').PostgresGranuleExecution} PostgresGranuleExecution
* @typedef {import('@cumulus/db/dist/types/granule').GranuleStatus} GranuleStatus
* @typedef {import('knex').Knex} Knex
* @typedef {{
* geModel: GranulesExecutionsPgModel,
* executionModel: ExecutionPgModel,
* granuleModel: GranulePgModel,
* fileModel: FilePgModel
* }} ModelSet
* @typedef {{
* name: string,
* version: string,
* }} CollectionDetails
*/
/**
* upload executions corresponding to collection with collectionCumulusId
*
* @param {Knex} knex
* @param {number} collectionCumulusId
* @param {number} executionCount
* @param {ExecutionPgModel} model
* @returns {Promise<Array<number>>} - cumulusId for each successfully uploaded execution
*/
const loadExecutions = async (
knex,
collectionCumulusId,
executionCount,
model
) => {
if (executionCount === 0) {
return [];
}
let executionOutputs = [];
const executions = range(executionCount).map(() => fakeExecutionRecordFactory(
{ collection_cumulus_id: collectionCumulusId }
));
executionOutputs = await model.insert(knex, executions);

return executionOutputs.map((executionOutput) => executionOutput.cumulus_id);
};

/**
* upload granuleExecutions corresponding to each pair
* within list of granuleCumulusIds and executionCumulusIds
*
* @param {Knex} knex
* @param {Array<number>} granuleCumulusIds
* @param {Array<number>} executionCumulusIds
* @param {GranulesExecutionsPgModel} model
* @returns {Promise<Array<PostgresGranuleExecution>>} - granuleExecutions
*/
const loadGranulesExecutions = async (
knex,
granuleCumulusIds,
executionCumulusIds,
model
) => {
if (granuleCumulusIds.length === 0 || executionCumulusIds.length === 0) {
return [];
}
const granulesExecutions = granuleCumulusIds.map((granuleCumulusId) => (
executionCumulusIds.map((executionCumulusId) => (
{
granule_cumulus_id: granuleCumulusId,
execution_cumulus_id: executionCumulusId,
}
))
)).flat();

return await model.insert(knex, granulesExecutions);
};

/**
* upload granules corresponding to collection with collectionCumulusId
*
* @param {Knex} knex
* @param {number} collectionCumulusId
* @param {number} providerCumulusId
* @param {number} granuleCount
* @param {GranulePgModel} model
* @returns {Promise<Array<number>>} - cumulusId for each successfully uploaded granule
*/
const loadGranules = async (
knex,
collectionCumulusId,
providerCumulusId,
granuleCount,
model
) => {
if (granuleCount === 0) {
return [];
}
let granuleOutputs = [];
const granules = range(granuleCount).map(() => /** @type {PostgresGranule} */(
fakeGranuleRecordFactory({
granule_id: randomString(7),
collection_cumulus_id: collectionCumulusId,
provider_cumulus_id: providerCumulusId,
status: /** @type {GranuleStatus} */(['completed', 'failed', 'running', 'queued'][randomInt(4)]),
})
));
granuleOutputs = await model.insert(knex, granules);

return granuleOutputs.map((g) => g.cumulus_id);
};

/**
* upload files corresponding to granule with granuleCumulusId
*
* @param {Knex} knex
* @param {number} granuleCumulusId
* @param {number} fileCount
* @param {FilePgModel} model
* @returns {Promise<Array<number>>}
*/
const loadFiles = async (
knex,
granuleCumulusId,
fileCount,
model
) => {
if (fileCount === 0) {
return [];
}
const files = range(fileCount).map((i) => /** @type {PostgresFile} */(fakeFileRecordFactory({
bucket: `${i}`,
granule_cumulus_id: granuleCumulusId,
key: randomString(8),
})));
let uploadedFiles = [];
uploadedFiles = await model.insert(knex, files);

return uploadedFiles.map((uploadedFile) => uploadedFile.cumulus_id);
};

/**
* add provider through providerPgModel call
*
* @param {Knex} knex
* @returns {Promise<number>}
*/
const loadProvider = async (knex) => {
const providerJson = fakeProviderRecordFactory({});
const providerModel = new ProviderPgModel();
const [{ cumulus_id: providerId }] = await providerModel.upsert(
knex,
providerJson
);
return providerId;
};

/**
* add collection collectionPgModel call
*
* @param {Knex} knex
* @param {number} files - number of files per granule
* @param {number | null} collectionNumber
* @returns {Promise<number>}
*/
const loadCollection = async (knex, files, collectionNumber = null) => {
const collectionJson = fakeCollectionRecordFactory({
files: JSON.stringify((new Array(files)).map((i) => ({
bucket: `${i}`,
regex: `^.*${i}$`,
sampleFileName: `538.${i}`,
}))),
});
if (collectionNumber !== null) {
collectionJson.name = `DUMMY_${collectionNumber.toString().padStart(3, '0')}`;
}
const collectionModel = new CollectionPgModel();
const [{ cumulus_id: cumulusId }] = await collectionModel.upsert(
knex,
collectionJson
);
return cumulusId;
};

/**
* add rule to database
*
* @param {Knex} knex
* @param {number | undefined} collectionCumulusId
* @param {number | undefined} providerCumulusId
* @returns {Promise<void>}
*/
const loadRule = async (
knex,
collectionCumulusId,
providerCumulusId
) => {
const ruleModel = new RulePgModel();
const rule = fakeRuleRecordFactory(
{
collection_cumulus_id: collectionCumulusId,
provider_cumulus_id: providerCumulusId,
}
);
await ruleModel.upsert(knex, rule);
};

module.exports = {
loadGranules,
loadGranulesExecutions,
loadFiles,
loadExecutions,
loadCollection,
loadProvider,
loadRule,
};
Loading

0 comments on commit 4c65286

Please sign in to comment.