Skip to content

Commit 2637465

Browse files
authored
Merge pull request #1 from mojaloop/feat/performance-maintenance-characterisation-als
feat(mojaloop/#3400): benchmarking performance for ALS - mojaloop/project#3400 - created documentation, and scenarios for characterizing ALS - documented performance data from ALS characterization runs feat(mojaloop/#3424): analyse als perf results - mojaloop/project#3424 - Added analysis for scenarios 1-16 - re-factored structure into domains (i.e. fspiop-discovery, etc) - added status/summary to the top of domain readmes
2 parents 5bb6e1e + 15877d6 commit 2637465

File tree

196 files changed

+837403
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

196 files changed

+837403
-0
lines changed

README.md

+68
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# ML Performance Characterization Repository
2+
3+
Here you will find Performance Characterizations for the Mojaloop Services.
4+
5+
## 1. High-Level Characterization Scenarios
6+
7+
| High-level Scenario | Description | Documentation | Notes |
8+
|---|---|---|---|
9+
| 1. | FSPIOP Discovery | [./fspiop-discovery/README](./fspiop-discovery/README.md) | Done |
10+
| 2. | ~~FSPIOP Agreement~~ | | To Be Done |
11+
| 3. | ~~FSPIOP Transfers~~ | [./fspiop-transfers/README](./fspiop-transfers/README.md) | To Be Done |
12+
| 4. | ~~FSPIOP Discovery + Agreement + Transfers~~ | | To Be Done |
13+
14+
### 2. Capturing End-to-end Metrics
15+
16+
We have two approaches to capture the End-to-end metrics of a transaction.
17+
18+
#### 2.1 Tracestate Headers
19+
20+
The [Tracestate](https://github.com/mojaloop/mojaloop-specification/blob/master/fspiop-api/documents/Tracing%20v1.0.md#table-4--data-model-for-tracestate-list-member-values) header is part of the [Mojaloop Specification](https://github.com/mojaloop/mojaloop-specification/blob/master/fspiop-api/documents/Tracing%20v1.0.md) which conforms to the [W3C](https://github.com/mojaloop/mojaloop-specification/blob/master/fspiop-api/documents/Tracing%20v1.0.md#5-references) Tracing standards.
21+
22+
As such we are able to take advantage of this header by propogating the following key-value pairs during the End-to-end transaction:
23+
24+
| tracestate-key | tracestate-value | Notes |
25+
|---|---|---|
26+
| tx_end2end_start_ts | [timestamp](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/now) | Generated by the Test-runner (i.e. K6) |
27+
| tx_callback_start_ts | [timestamp](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/now) | Generated by the Payee Participant Simupator (e.g. when reciving the FSPIOP GET /parties Request) |
28+
29+
Example header: `tracestate=tx_end2end_start_ts={{TIMESTAMP}}, tx_callback_start_ts={{TIMESTAMP}}`
30+
31+
#### 2.2 WebScoket Subscriptions
32+
33+
The Simulators (i.e. "Callback Handler Service") have been developed to support a simple WebSocket (WS) mechanism that allows the Test Executer (i.e. K6) to subscribe for Callback events.
34+
35+
For example, let's take the FSPIOP GET /parties use-case. Here we have K6 subscribe to a Callback via a WS on the Payer Participant Simulator based on the following properties:
36+
37+
1. The **TraceID**
38+
2. The HTTP **Operation** (i.e. PUT)
39+
3. The Party **ID** (i.e. MSDISN Number)
40+
41+
This ensure that the K6 subscription-notification will be unique for each test.
42+
43+
We gain two benefits by using this approach:
44+
45+
1. The K6 Runner will only iterate once the current request is completed End-to-end which means that our execution strategy is closer to a real-work scenario.
46+
2. The K6 Runner will be able to report on the End-to-end duration and operations per second.
47+
48+
The down-sides of this approach, is that it only works well when we have a single Payer Participant Simulator. Its possible that we can support scaling the Payer Participant Simulator by having the K6 Runners subscribe to multiple instances, but that is currently not supported.
49+
50+
## 3. Types of tests
51+
52+
| Test Type | Description |
53+
|---|---|
54+
| **Smoke** | Validates scripts works and that our target env/system performs adequately under minimal load. |
55+
| **Average-load** | Assess how the system performs under expected normal conditions. |
56+
| **Stress** | Assess how the system performs at its limits when load exceeds the expected average. |
57+
| **Spike** | Validates the behavior and survival of the system in cases of sudden, short, and massive increases in activity. |
58+
| **Breakpoint** | Gradually increase load to identify the capacity limits of the system. |
59+
60+
[Reference](https://k6.io/docs/test-types/load-test-types/#different-tests-for-different-goals).
61+
62+
## 4. Tools Used
63+
64+
| Tool | Description |
65+
|---|---|
66+
| **ml-core-test-harness** | The [ml-core-test-harness](https://github.com/mojaloop/ml-core-test-harness) is a light-weight Docker-composed based test harness used by the Mojaloop community to execute Functional, and now Performance-Characterization tests |
67+
| **K6** | [Grafana k6](https://k6.io/docs/) is an open-source load testing tool. |
68+
| **Docker Compose** | [Docker Compose](https://docs.docker.com/compose/) is a tool for defining and running multi-container Docker applications. |

Template-DATE/s0-testId/README.md

+53
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Scenario 1 - FSPIOP Discovery GET Parties with Sims-only - ALS:v14.2.2, scale:1, k6vu:1
2+
3+
```yaml
4+
testid:
5+
- 1690367402771
6+
params: &from=XXXXX&to=XXXXXXXX
7+
ACCOUNT_LOOKUP_SERVICE_VERSION: v14.2.2
8+
```
9+
10+
## Environment
11+
12+
- m6i.2xlarge
13+
- 8 CPU - 3.5 GHz 3rd Generation Intel Xeon Scalable processors (Ice Lake 8375C)
14+
- 32gb RAM
15+
- https://gist.github.com/mdebarros/6d9ac90f33c96031cbce6b9a3ea8048e
16+
17+
## k6 Test Config
18+
19+
```json
20+
{
21+
"scenarios": { // define scenarios
22+
// warm-up
23+
"accountLookup": { // original scenario for accountLookup
24+
"executor": "ramping-vus",
25+
"exec": "accountLookupScenarios",
26+
"startVUs": 1,
27+
"stages": [
28+
{ "duration": "2m", "target": 1 },
29+
{ "duration": "5m", "target": 1 },
30+
]
31+
},
32+
},
33+
"thresholds": {
34+
"iteration_duration": [ "p(95)<1000" ],
35+
"http_req_failed": [ "rate<0.01" ],
36+
"http_req_duration": [ "p(95)<1000" ]
37+
}
38+
}
39+
```
40+
41+
## Snapshots
42+
43+
- [Docker](INSERT_LINK_HERE)
44+
- [K6](INSERT_LINK_HERE)
45+
- [Callback Handler Service](INSERT_LINK_HERE)
46+
- [Account Lookup Service](INSERT_LINK_HERE)
47+
- [Nodejs moja_als](INSERT_LINK_HERE)
48+
- [Nodejs cbs](INSERT_LINK_HERE)
49+
- [MySQL](INSERT_LINK_HERE)
50+
51+
## Observations
52+
53+
## Recommendations

Template-DATE/s0-testId/images/.gitignore

Whitespace-only changes.

Template-DATE/s0-testId/logs/.gitignore

Whitespace-only changes.

Template-DATE/s0-testId/snapshots/.gitignore

Whitespace-only changes.
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Scenario 1: FSPIOP Discovery GET Parties with Sims-only - ALS:v14.2.2, scale:1, k6vu:1
2+
3+
The End-to-end operation from the K6 test-runner included the following HTTP operations for each *iteration*:
4+
5+
1. ADMIN GET /participants request to the Central-Ledger to validate payerFspId. <-- sync response
6+
2. ADMIN GET /participants request to the Central-Ledger to validate payeeFspId. <-- sync response
7+
3. ORACLE GET /participants request to the Oracle to resolve FSPID for payeeId. <-- sync response
8+
4. FSPIOP GET /parties request to the ALS <-- async callback response
9+
5. WS Subscription to the `Callback-Handler` Service for Callback Response notifications
10+
11+
```conf
12+
var-testid=1690367402771
13+
params=&var-testid=1690367402771&from=1690367297867&to=1690368635328
14+
```
15+
16+
## Environment
17+
18+
- m6i.2xlarge
19+
- 8 CPU - 3.5 GHz 3rd Generation Intel Xeon Scalable processors (Ice Lake 8375C)
20+
- 32gb RAM
21+
- https://gist.github.com/mdebarros/6d9ac90f33c96031cbce6b9a3ea8048e
22+
23+
## k6 Test Config
24+
25+
```json
26+
{
27+
"scenarios": { // define scenarios
28+
// warm-up
29+
"accountLookup": { // original scenario for accountLookup
30+
"executor": "ramping-vus",
31+
"exec": "accountLookupScenarios",
32+
"startVUs": 1,
33+
"stages": [
34+
{ "duration": "2m", "target": 1 },
35+
{ "duration": "15m", "target": 1 },
36+
{ "duration": "2m", "target": 0 }
37+
]
38+
},
39+
},
40+
"thresholds": {
41+
"iteration_duration": [ "p(95)<1000" ],
42+
"http_req_failed": [ "rate<0.01" ],
43+
"http_req_duration": [ "p(95)<1000" ]
44+
}
45+
}
46+
```
47+
48+
## Snapshots
49+
50+
- https://snapshots.raintank.io/dashboard/snapshot/ie215NIaFsLwXIzrebhe7Dqy1QCkSFcG
51+
- https://snapshots.raintank.io/dashboard/snapshot/ysBZLoJedpygVROKbERs287JNLjuz8k5
52+
- https://snapshots.raintank.io/dashboard/snapshot/Nab8aB5S31oK3ey1hqb3LGq31lGJNokr
53+
- https://snapshots.raintank.io/dashboard/snapshot/i16hs25XuA5NJ7B2eNHiF1gD0I4XYX3Q
54+
55+
## Observations
56+
57+
- `Callback-Handler` Simulator Service is able to handle `400+ Ops/s` End-to-end, while sustaining an average duration of just over `2ms`. This is shown by the following dashboards/metrics:
58+
- [K6](./images/Official%20k6%20Test%20Result.png)
59+
- `Iteration Rate` (Mean) = `461 Ops/s`
60+
- `Ieration Duration (avg)` (Mean) = `2.22ms`
61+
- [Callback Handler Svc](./images/Supporting%20Services%20-%20Callback%20Hander%20Service.png)
62+
- `op:fspiop_put_parties_end2end - success:true` - observe the `E2E, Request, Response Calculations Processed Per Second` Graph. Note the Mean includes the pre/post run.
63+
- `op:fspiop_put_parties_end2end - success:true` - observe the `E2E, Request, Response Performance Timing Calculations`. Mean is `1.86ms`.
64+
- The `op:fspiop_put_parties_request` and `op:fspiop_put_parties_response` fall-inline with the observes `Ops/s` and the `request` where most of the duration is spend due to the Callback-Handler sending out the **Async** `FSPIOP PUT /parties` callback response.
65+
- [Docker Node Monitoring](./images/docker-prometheus-monitoring.png)
66+
- `Callback-Handler` services show no observable resource constraint from both a memory and cpu usage.
67+
68+
## Recommendations
69+
70+
- Observe `Scenario #2+` and compare the `Callback-Handler`'s metrics against this **baseline** to determine if there are any issues with either the Mocked Simulators (i.e. `Callback-Handlers`) or the **Async** `FSPIOP PUT /parties` callback response.

fspiop-discovery/20230726/s1-1690367402771/images/.gitignore

Whitespace-only changes.
Loading
Loading
Loading

fspiop-discovery/20230726/s1-1690367402771/logs/.gitignore

Whitespace-only changes.

0 commit comments

Comments
 (0)