You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Replace user_pseudo_id with client_key
* Remove stream_id from fct_ga4__user_ids
* Add descriptions for client_key throughout package
* restore stream_id to several models
* Move client_key creation to stag_ga4__events
---------
Co-authored-by: Adam Ribaudo <[email protected]>
Copy file name to clipboardexpand all lines: README.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -20,11 +20,11 @@ Features include:
20
20
| stg_ga4__event_items | Contains item data associated with e-commerce events (Purchase, add to cart, etc) |
21
21
| stg_ga4__event_to_query_string_params | Mapping between each event and any query parameters & values that were contained in the event's `page_location` field |
22
22
| stg_ga4__user_properties | Finds the most recent occurance of specified user_properties for each user |
23
-
| stg_ga4__derived_user_properties | Finds the most recent occurance of specific event_params value and assigns them to a user_pseudo_id. Derived user properties are specified as variables (see documentation below) |
23
+
| stg_ga4__derived_user_properties | Finds the most recent occurance of specific event_params value and assigns them to a client_key. Derived user properties are specified as variables (see documentation below) |
24
24
| stg_ga4__derived_session_properties | Finds the most recent occurance of specific event_params or user_properties value and assigns them to a session's session_key. Derived session properties are specified as variables (see documentation below) |
25
25
| stg_ga4__session_conversions_daily | Produces daily counts of conversions per session. The list of conversion events to include is configurable (see documentation below) |
26
26
| stg_ga4__sessions_traffic_sources | Finds the first source, medium, campaign, content, paid search term (from UTM tracking), and default channel grouping for each session. |
27
-
|dim_ga4__user_pseudo_ids| Dimension table for user devices as indicated by user_pseudo_ids. Contains attributes such as first and last page viewed.|
27
+
|dim_ga4__client_keys| Dimension table for user devices as indicated by client_keys. Contains attributes such as first and last page viewed.|
28
28
| dim_ga4__sessions | Dimension table for sessions which contains useful attributes such as geography, device information, and acquisition data. Can be expensive to run on large installs (see `dim_ga4__sessions_daily`) |
29
29
| dim_ga4__sessions_daily | Query-optimized session dimension table that is incremental and partitioned on date. Assumes that each partition is contained within a single day |
30
30
| fct_ga4__pages | Fact table for pages which aggregates common page metrics by page_location, date, and hour. |
Copy file name to clipboardexpand all lines: models/marts/core/core.yml
+10-8
Original file line number
Diff line number
Diff line change
@@ -7,10 +7,11 @@ models:
7
7
- name: session_key
8
8
tests:
9
9
- unique
10
-
- name: dim_ga4__user_pseudo_ids
11
-
description: Dimension table for user devices (user_pseudo_id) which includes data from the first and last event produced. Unique on user_pseudo_id
10
+
- name: dim_ga4__client_keys
11
+
description: Dimension table for user devices (client_key) which includes data from the first and last event produced. Unique on client_key
12
12
columns:
13
-
- name: user_pseudo_id
13
+
- name: client_key
14
+
description: Hashed combination of user_pseudo_id and stream_id
14
15
tests:
15
16
- unique
16
17
- name: fct_ga4__sessions
@@ -26,15 +27,16 @@ models:
26
27
description: The total engagement time for that page_location.
27
28
- name: avg_engagement_time_denominator
28
29
description: Use avg_engagement_time_denominator to calculate the average engagement time, which is derived by dividing the sum of total engagement time by the product of the sum of the denominator and 1000 to get the average engagement time in seconds (average_engagement_time = sum(total_engagement_time_msec)/(sum(avg_engagement_time_denominator) *1000 )). The denominator excludes page_view events where no engagement time is recorded for the page_location within a session. However, it includes subsequent page_view events to a page_location that has previously recorded a page_view event in the same session, even if the subsequent event has no recorded engagement time.
29
-
- name: fct_ga4__user_pseudo_ids
30
-
description: Fact table with aggregate metrics at the level of the user's device (as indicated by the user_pseudo_id). Metrics are aggregated from fct_ga4__sessions.
30
+
- name: fct_ga4__client_keys
31
+
description: Fact table with aggregate metrics at the level of the user's device (as indicated by the client_key). Metrics are aggregated from fct_ga4__sessions.
31
32
columns:
32
-
- name: user_pseudo_id
33
+
- name: client_key
34
+
description: Hashed combination of user_pseudo_id and stream_id
33
35
tests:
34
36
- unique
35
37
- name: fct_ga4__user_ids
36
-
description: Fact table with aggregate metrics at the level of the user_id when one is present, otherwise at the device level (as indicated by the user_pseudo_id). Metrics are aggregated from fct_ga4__user_pseudo_ids.
38
+
description: Fact table with aggregate metrics at the level of the user_id when one is present, otherwise at the device level (as indicated by the client_key). Metrics are aggregated from fct_ga4__client_keys.
Copy file name to clipboardexpand all lines: models/marts/core/fct_ga4__sessions.sql
+1-1
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
-- Stay mindful of performance/cost when leavin this model enabled. Making this model incremental on date is not possible because there's no way to create a single record per session AND partition on date.
2
2
3
3
select
4
-
user_pseudo_id,
4
+
client_key,
5
5
session_key,
6
6
stream_id,
7
7
min(session_partition_min_timestamp) as session_start_timestamp,
Copy file name to clipboardexpand all lines: models/marts/core/fct_ga4__sessions_daily.sql
+2-2
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ with session_metrics as (
35
35
select
36
36
session_key,
37
37
session_partition_key,
38
-
user_pseudo_id,
38
+
client_key,
39
39
stream_id,
40
40
min(event_date_dt) as session_partition_date, -- Date of the session partition, does not represent the true session start date which, in GA4, can span multiple days
41
41
min(event_timestamp) as session_partition_min_timestamp,
Copy file name to clipboardexpand all lines: models/staging/stg_ga4__client_key_first_last_events.sql
+10-10
Original file line number
Diff line number
Diff line change
@@ -4,24 +4,24 @@
4
4
5
5
with first_last_event as (
6
6
select
7
-
user_pseudo_id,
8
-
FIRST_VALUE(event_key) OVER (PARTITION BY user_pseudo_idORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_event,
9
-
LAST_VALUE(event_key) OVER (PARTITION BY user_pseudo_idORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_event,
7
+
client_key,
8
+
FIRST_VALUE(event_key) OVER (PARTITION BY client_keyORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_event,
9
+
LAST_VALUE(event_key) OVER (PARTITION BY client_keyORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_event,
10
10
stream_id
11
11
from {{ref('stg_ga4__events')}}
12
-
whereuser_pseudo_idis not null--remove users with privacy settings enabled
12
+
whereclient_keyis not null--remove users with privacy settings enabled
description: Captures the first and last event completed by the user's device in order to pull in the first and last geo, device, and traffic source seen from the user
6
6
columns:
7
-
- name: user_pseudo_id
7
+
- name: client_key
8
+
description: Hashed combination of user_pseudo_id and stream_id
Copy file name to clipboardexpand all lines: models/staging/stg_ga4__client_key_first_last_pageviews.sql
+10-10
Original file line number
Diff line number
Diff line change
@@ -4,34 +4,34 @@
4
4
5
5
with page_views_first_last as (
6
6
select
7
-
user_pseudo_id,
8
-
FIRST_VALUE(event_key) OVER (PARTITION BY user_pseudo_idORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_page_view_event_key,
9
-
LAST_VALUE(event_key) OVER (PARTITION BY user_pseudo_idORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_page_view_event_key
7
+
client_key,
8
+
FIRST_VALUE(event_key) OVER (PARTITION BY client_keyORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_page_view_event_key,
9
+
LAST_VALUE(event_key) OVER (PARTITION BY client_keyORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_page_view_event_key
10
10
from {{ref('stg_ga4__event_page_view')}}
11
-
whereuser_pseudo_idis not null-- Remove users with privacy settings enabled
11
+
whereclient_keyis not null-- Remove users with privacy settings enabled
Copy file name to clipboardexpand all lines: models/staging/stg_ga4__derived_user_properties.yml
+3-2
Original file line number
Diff line number
Diff line change
@@ -2,8 +2,9 @@ version: 2
2
2
3
3
models:
4
4
- name: stg_ga4__derived_user_properties
5
-
description: Optional model that will pull out the most recent instance of a particular event parameter for each device (user_pseudo_id). Later used in the dim_ga4__user_pseudo_id dimension table.
5
+
description: Optional model that will pull out the most recent instance of a particular event parameter for each device (client_key). Later used in the dim_ga4__client_key dimension table.
6
6
columns:
7
-
- name: user_pseudo_id
7
+
- name: client_key
8
+
description: Hashed combination of user_pseudo_id and stream_id
Copy file name to clipboardexpand all lines: models/staging/stg_ga4__events.sql
+9-3
Original file line number
Diff line number
Diff line change
@@ -7,12 +7,18 @@ with base_events as (
7
7
select*from {{ref('base_ga4__events_intraday')}}
8
8
{% endif %}
9
9
),
10
-
-- Add unique key for sessions. session_key will be null if user_pseudo_id is null due to consent being denied. ga_session_id may be null during audience trigger events.
10
+
-- Add key that captures a combination of stream_id and user_pseudo_id to uniquely identify a 'client' (aka. a device) within a single stream
11
+
include_client_key as (
12
+
select*
13
+
, to_base64(md5(concat(user_pseudo_id, stream_id))) as client_key
14
+
from base_events
15
+
),
16
+
-- Add key for sessions. session_key will be null if client_key is null due to consent being denied. ga_session_id may be null during audience trigger events.
11
17
include_session_key as (
12
18
select
13
19
*,
14
-
to_base64(md5(CONCAT(stream_id, user_pseudo_id, CAST(session_id as STRING)))) as session_key-- Surrogate key to determine unique session across streams and users. Sessions do NOT reset after midnight in GA4
15
-
frombase_events
20
+
to_base64(md5(CONCAT(client_key, CAST(session_id as STRING)))) as session_key
21
+
frominclude_client_key
16
22
),
17
23
-- Add a key that combines session key and date. Useful when working with session table within date-partitioned tables
Copy file name to clipboardexpand all lines: models/staging/stg_ga4__events.yml
+2
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,8 @@ models:
4
4
- name: stg_ga4__events
5
5
description: Staging model that generates keys for users, sessions, and events. Also parses URLs to remove query string params as defined in project config.
6
6
columns:
7
+
- name: client_key
8
+
description: Surrogate key created from stream_id and user_pseudo_id. Provides a way to uniquely identify a user's device within a stream. Important when using the package to combine data across properties and streams.
0 commit comments