Skip to content

Commit 5c58a9b

Browse files
thomaspinderThomas Pinder
and
Thomas Pinder
authored
Build nbs (#10)
* Build nbs * Build nbs * Build nbs * Build nbs --------- Co-authored-by: Thomas Pinder <[email protected]>
1 parent 175bb0f commit 5c58a9b

9 files changed

+716
-501
lines changed

docs/examples/azcausal.ipynb

+200
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "153aaca2",
6+
"metadata": {},
7+
"source": [
8+
"# AZCausal Integration\n",
9+
"\n",
10+
"Amazon's [AZCausal](https://github.com/amazon-science/azcausal) library provides the\n",
11+
"functionality to fit synthetic control and difference-in-difference models to your\n",
12+
"data. Integrating the synthetic data generating process of `causal_validation` with\n",
13+
"AZCausal is trivial, as we show in this notebook. To start, we'll simulate a toy\n",
14+
"dataset."
15+
]
16+
},
17+
{
18+
"cell_type": "code",
19+
"execution_count": null,
20+
"id": "b134b49f",
21+
"metadata": {},
22+
"outputs": [],
23+
"source": [
24+
"from azcausal.estimators.panel.sdid import SDID\n",
25+
"import scipy.stats as st\n",
26+
"\n",
27+
"from causal_validation import (\n",
28+
" Config,\n",
29+
" simulate,\n",
30+
")\n",
31+
"from causal_validation.effects import StaticEffect\n",
32+
"from causal_validation.plotters import plot\n",
33+
"from causal_validation.transforms import (\n",
34+
" Periodic,\n",
35+
" Trend,\n",
36+
")\n",
37+
"from causal_validation.transforms.parameter import UnitVaryingParameter"
38+
]
39+
},
40+
{
41+
"cell_type": "code",
42+
"execution_count": null,
43+
"id": "86ba26f3",
44+
"metadata": {},
45+
"outputs": [],
46+
"source": [
47+
"cfg = Config(\n",
48+
" n_control_units=10,\n",
49+
" n_pre_intervention_timepoints=60,\n",
50+
" n_post_intervention_timepoints=30,\n",
51+
" seed=123,\n",
52+
")\n",
53+
"\n",
54+
"linear_trend = Trend(degree=1, coefficient=0.05)\n",
55+
"data = linear_trend(simulate(cfg))\n",
56+
"plot(data)"
57+
]
58+
},
59+
{
60+
"cell_type": "markdown",
61+
"id": "ae979b7b",
62+
"metadata": {
63+
"title": "We'll now simulate a 5% lift in the treatment group's observations. This"
64+
},
65+
"source": [
66+
"will inflate the treated group's observations in the post-intervention window."
67+
]
68+
},
69+
{
70+
"cell_type": "code",
71+
"execution_count": null,
72+
"id": "45f9e99f",
73+
"metadata": {},
74+
"outputs": [],
75+
"source": [
76+
"TRUE_EFFECT = 0.05\n",
77+
"effect = StaticEffect(effect=TRUE_EFFECT)\n",
78+
"inflated_data = effect(data)\n",
79+
"plot(inflated_data)"
80+
]
81+
},
82+
{
83+
"cell_type": "markdown",
84+
"id": "0ff7c192",
85+
"metadata": {},
86+
"source": [
87+
"## Fitting a model\n",
88+
"\n",
89+
"We now have some very toy data on which we may apply a model. For this demonstration\n",
90+
"we shall use the Synthetic Difference-in-Differences model implemented in AZCausal;\n",
91+
"however, the approach shown here will work for any model implemented in AZCausal. To\n",
92+
"achieve this, we must first coerce the data into a format that is digestible for\n",
93+
"AZCausal. Through the `.to_azcausal()` method implemented here, this is\n",
94+
"straightforward to achieve. Once we have a AZCausal compatible dataset, the modelling\n",
95+
"is very simple by virtue of the clean design of AZCausal."
96+
]
97+
},
98+
{
99+
"cell_type": "code",
100+
"execution_count": null,
101+
"id": "db0f85d8",
102+
"metadata": {},
103+
"outputs": [],
104+
"source": [
105+
"panel = inflated_data.to_azcausal()\n",
106+
"model = SDID()\n",
107+
"result = model.fit(panel)\n",
108+
"print(f\"Delta: {TRUE_EFFECT - result.effect.percentage().value / 100}\")\n",
109+
"print(result.summary(title=\"Synthetic Data Experiment\"))"
110+
]
111+
},
112+
{
113+
"cell_type": "markdown",
114+
"id": "5c71b479",
115+
"metadata": {
116+
"title": "We see that SDID has done an excellent job of estimating the treatment"
117+
},
118+
"source": [
119+
"effect. However, given the simplicity of the data, this is not surprising. With the\n",
120+
"functionality within this package though we can easily construct more complex datasets\n",
121+
"in effort to fully stress-test any new model and identify its limitations.\n",
122+
"\n",
123+
"To achieve this, we'll simulate 10 control units, 60 pre-intervention time points, and\n",
124+
"30 post-intervention time points according to the following process: $$ \\begin{align}\n",
125+
"\\mu_{n, t} & \\sim\\mathcal{N}(20, 0.5^2)\\\\\n",
126+
"\\alpha_{n} & \\sim \\mathcal{N}(0, 1^2)\\\\\n",
127+
"\\beta_{n} & \\sim \\mathcal{N}(0.05, 0.01^2)\\\\\n",
128+
"\\nu_n & \\sim \\mathcal{N}(1, 1^2)\\\\\n",
129+
"\\gamma_n & \\sim \\operatorname{Student-t}_{10}(1, 1^2)\\\\\n",
130+
"\\mathbf{Y}_{n, t} & = \\mu_{n, t} + \\alpha_{n} + \\beta_{n}t + \\nu_n\\sin\\left(3\\times\n",
131+
"2\\pi t + \\gamma\\right) + \\delta_{t, n} \\end{align} $$ where the true treatment effect\n",
132+
"$\\delta_{t, n}$ is 5% when $n=1$ and $t\\geq 60$ and 0 otherwise. Meanwhile,\n",
133+
"$\\mathbf{Y}$ is the matrix of observations, long in the number of time points and wide\n",
134+
"in the number of units."
135+
]
136+
},
137+
{
138+
"cell_type": "code",
139+
"execution_count": null,
140+
"id": "59d6a88b",
141+
"metadata": {},
142+
"outputs": [],
143+
"source": [
144+
"cfg = Config(\n",
145+
" n_control_units=10,\n",
146+
" n_pre_intervention_timepoints=60,\n",
147+
" n_post_intervention_timepoints=30,\n",
148+
" global_mean=20,\n",
149+
" global_scale=1,\n",
150+
" seed=123,\n",
151+
")\n",
152+
"\n",
153+
"intercept = UnitVaryingParameter(sampling_dist=st.norm(loc=0.0, scale=1))\n",
154+
"coefficient = UnitVaryingParameter(sampling_dist=st.norm(loc=0.05, scale=0.01))\n",
155+
"linear_trend = Trend(degree=1, coefficient=coefficient, intercept=intercept)\n",
156+
"\n",
157+
"amplitude = UnitVaryingParameter(sampling_dist=st.norm(loc=1.0, scale=2))\n",
158+
"shift = UnitVaryingParameter(sampling_dist=st.t(df=10))\n",
159+
"periodic = Periodic(amplitude=amplitude, shift=shift, frequency=3)\n",
160+
"\n",
161+
"data = effect(periodic(linear_trend(simulate(cfg))))\n",
162+
"plot(data)"
163+
]
164+
},
165+
{
166+
"cell_type": "markdown",
167+
"id": "5268b01a",
168+
"metadata": {
169+
"title": "As before, we may now go about estimating the treatment. However, this"
170+
},
171+
"source": [
172+
"time we see that the delta between the estaimted and true effect is much larger than\n",
173+
"before."
174+
]
175+
},
176+
{
177+
"cell_type": "code",
178+
"execution_count": null,
179+
"id": "71d101a2",
180+
"metadata": {},
181+
"outputs": [],
182+
"source": [
183+
"panel = data.to_azcausal()\n",
184+
"model = SDID()\n",
185+
"result = model.fit(panel)\n",
186+
"print(f\"Delta: {100*(TRUE_EFFECT - result.effect.percentage().value / 100): .2f}%\")\n",
187+
"print(result.summary(title=\"Synthetic Data Experiment\"))"
188+
]
189+
}
190+
],
191+
"metadata": {
192+
"jupytext": {
193+
"cell_metadata_filter": "title,-all",
194+
"main_language": "python",
195+
"notebook_metadata_filter": "-all"
196+
}
197+
},
198+
"nbformat": 4,
199+
"nbformat_minor": 5
200+
}

0 commit comments

Comments
 (0)