Replies: 2 comments
-
Copying some discord chat with @davidgasquez over:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Here's a reworked and hopefully similar set of steps:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This is a thread on some of the metrics modeling discussions we've been having, as we move from a small number of static metrics to a large number of metrics that can be applied on a timeseries.
Metric schema
@ryscheng proposed the following as a
metrics_v0
yesterday:Sample metrics
Here are some examples of different types of metrics:
Gas Fees
This is a static metric that simply sums gas fees by project / event_source / time_interval. (The
event_source
represents the chain and thetime_interval
options are "7 DAYS", "30 DAYS", ... , "ALL".)Contributors
This is another static metric that counts the unique number of contributors by project / event_source / time_interval. (The
event_source
will always be GitHub for now and thetime_interval
options are "7 DAYS", "30 DAYS", ... , "ALL".)New Contributors
This is a more complex static metric that calculates the number of new contributors by project / event_source / time_interval. (The
event_source
will always be GitHub for now and thetime_interval
options are "7 DAYS", "30 DAYS", ... , "ALL".)Bus Factor
This is an even more complex static metric that does some math on the composition of contributors by project / event_source / time_interval. (The
event_source
will always be GitHub for now and thetime_interval
options are "7 DAYS", "30 DAYS", ... , "ALL".)Full-time active developers
This is v0 timeseries metric that counts the number of developers that have made 10+ commits in a 30 day period to a project. It constructs a synthetic calendar and applies a 30 day rolling window.
Transformation Steps
For each of these metrics, there appears to be a general pattern of transformation steps:
0. From staging to raw events
Currently, the
int_events
table has both raw events (eg,COMMIT_CODE
) and bucket events (eg,CONTRACT_INVOCATION_SUCCESS_DAILY_COUNT
). Theint_events
table also has fields that are not strictly necessary (eg,to_artifact_name
,to_artifact_type
).A proposal would be to remove all the superfluous fields and just have:
time, from_artifact_id, to_artifact_id, event_source, event_type, amount
Then, we should keep the raw times instead of bucketed ones, eg,
CONTRACT_INVOCATION_SUCCESS
with a specific timestamp.One downside is this will magnify the amount of events we have, ie, a token transfer could have events for gas, contract_invocation, usd_amount, donation, etc.
1. Filtering events
All events have a filtering step which could easily be parametrized in the metric definition, eg:
These could be expanded upon to include both types (set by the event source provider) and tags (set by different models, eg, from_artifact_ids associated with trusted farcaster users).
2. Deriving intermediate metrics
Once events have been filtered, there is usually a step where some intermediation transformation is needed. For instance:
gas_fees / 1e18
cast(amount > 0 as int64)
case when user_stats.first_day >= time_intervals.start_date then events.from_artifact_id end
This is usually an important part of the business logic.
3. Building a timeseries
For metrics that have rolling windows (eg,
fulltime_developers
), it may be necessary to create a utility calendar and add ephemeral events with 0 amounts. There's some logic around defining awindow_interval
and asampling_interval
, eg:An alternative implementation for a related metric might be:
4. Aggregating by entity type and applying remaining business logic
We should avoid having to define every metric for every artifact / project / collection. Thus, we'd like some generalized version of:
... where the
to_id
could be aproject_id
orcollection_id
.Then we perform our remaining business logic operations.
For example, with
bus_factor
we have:5. Agg functions
Finally, we can apply standard agg and limit functions to the raw metric models. These will mostly be
min
,max
,avg
,std
, andlimit 1
since thesum
andcount
/count_unique
agg funcs will already have been be done upstream.Curious what @ryscheng @ravenac95 think!
Beta Was this translation helpful? Give feedback.
All reactions