-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.xml
460 lines (434 loc) · 31.9 KB
/
index.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Eric Cramer</title>
<link>http://emcramer.github.io/</link>
<atom:link href="http://emcramer.github.io/index.xml" rel="self" type="application/rss+xml" />
<description>Eric Cramer</description>
<generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><copyright>© `2021`</copyright><lastBuildDate>Fri, 23 Apr 2021 09:00:00 -0800</lastBuildDate>
<image>
<url>http://emcramer.github.io/images/icon_hud9b055eee01d4de0eb2efdace9b98bee_9571_512x512_fill_lanczos_center_2.png</url>
<title>Eric Cramer</title>
<link>http://emcramer.github.io/</link>
</image>
<item>
<title>The association of the Stanford Expectations of Treatment Scale (SETS) with expectations on pain and opioid dose in a patient-centered prescription opioid tapering program</title>
<link>http://emcramer.github.io/talk/aapm2021/</link>
<pubDate>Fri, 23 Apr 2021 09:00:00 -0800</pubDate>
<guid>http://emcramer.github.io/talk/aapm2021/</guid>
<description><p>The Stanford Expectations of Treatment Scale (SETS) is a tool developed to measure patient outcome expectancy prior to treatment. It has been validated in patients receiving surgical and pain interventions, but its relationship with expectancies regarding opioid use tapering has not previously been examined. We aim to characterize the relationship between the SETS scores and patient expectancy regarding opioid tapering and pain levels post tapering.</p>
</description>
</item>
<item>
<title>Acute Pain Predictors of Remote Postoperative Pain Resolution After Hand Surgery</title>
<link>http://emcramer.github.io/publication/hah-2021/</link>
<pubDate>Sun, 18 Apr 2021 00:00:00 +0000</pubDate>
<guid>http://emcramer.github.io/publication/hah-2021/</guid>
<description><script type="text/javascript" src="https://d1bxh8uas1mnw7.cloudfront.net/assets/embed.js"></script>
<div class="altmetric-embed" data-badge-type="donut" data-altmetric-id="104265284" data-doi="10.1007/s40122-021-00263-y"></div>
</description>
</item>
<item>
<title>Development and validation of the Collaborative Health Outcomes Information Registry body map</title>
<link>http://emcramer.github.io/publication/scherrer-2021/</link>
<pubDate>Sun, 24 Jan 2021 16:44:05 -0800</pubDate>
<guid>http://emcramer.github.io/publication/scherrer-2021/</guid>
<description><script type="text/javascript" src="https://d1bxh8uas1mnw7.cloudfront.net/assets/embed.js"></script><div class="altmetric-embed" data-badge-type="donut" data-altmetric-id="98609954" />
<p><strong>Introduction:</strong>
Critical for the diagnosis and treatment of chronic pain is the anatomical distribution of pain. Several body maps allow patients to indicate pain areas on paper; however, each has its limitations.</p>
<p><strong>Objectives:</strong>
To provide a comprehensive body map that can be universally applied across pain conditions, we developed the electronic Collaborative Health Outcomes Information Registry (CHOIR) self-report body map by performing an environmental scan and assessing existing body maps.</p>
<p><strong>Methods:</strong>
After initial validation using a Delphi technique, we compared (1) pain location questionnaire responses of 530 participants with chronic pain with (2) their pain endorsements on the CHOIR body map (CBM) graphic. A subset of participants (n = 278) repeated the survey 1 week later to assess test–retest reliability. Finally, we interviewed a patient cohort from a tertiary pain management clinic (n = 28) to identify reasons for endorsement discordances.</p>
<p><strong>Results:</strong>
The intraclass correlation coefficient between the total number of body areas endorsed on the survey and those from the body map was 0.86 and improved to 0.93 at follow-up. The intraclass correlation coefficient of the 2 body map graphics separated by 1 week was 0.93. Further examination demonstrated high consistency between the questionnaire and CBM graphic (&lt;10% discordance) in most body areas except for the back and shoulders (≈15–19% discordance). Participants attributed inconsistencies to misinterpretation of body regions and laterality, the latter of which was addressed by modifying the instructions.</p>
<p><strong>Conclusions:</strong>
Our data suggest that the CBM is a valid and reliable instrument for assessing the distribution of pain.</p>
</description>
</item>
<item>
<title>COVID-19 Tracker</title>
<link>http://emcramer.github.io/project/covid19tracker/</link>
<pubDate>Sat, 28 Mar 2020 11:36:43 -0800</pubDate>
<guid>http://emcramer.github.io/project/covid19tracker/</guid>
<description><h1 id="tracking-the-covid-19-pandemic">Tracking the COVID-19 Pandemic</h1>
<p>After my county instituted a shelter-in-place lockdown due to the COVID-19 pandemic (caused by the SARS-CoV-2 virus), I was curious to know how bad the situation was. The nonpartisan website <a href="https://usafacts.org/">usafacts.org/</a> publishes national, state, and county level data for the United States, and provides a substantial amount of information for making your own calculations and conclusions.</p>
<p>I am particularly interested in the <strong>change rate</strong>, the change in the number of new cases of COVID-19 per day. This simple calculation gives an estimate of how quickly the virus is spreading, and whether measures put in place to curb the disease are working.</p>
<p>I intend to add other metrics to this project, such as R0 calculations and forecasting.</p>
<iframe width="720" height="480" scrolling="no" frameborder="no" src="https://eric-cramer.shinyapps.io/covid19tracker/"></iframe>
<p>Open the tracking tool in a separate window <a href="https://eric-cramer.shinyapps.io/covid19tracker/">here</a>. GitHub repository <a href="https://github.com/emcramer/covid19project">here</a>.</p>
</description>
</item>
<item>
<title>Multiple linear regression in a distributed system</title>
<link>http://emcramer.github.io/post/multiple-linear-regression-in-a-distributed-system/</link>
<pubDate>Sat, 15 Feb 2020 00:00:00 +0000</pubDate>
<guid>http://emcramer.github.io/post/multiple-linear-regression-in-a-distributed-system/</guid>
<description><p><img src="http://emcramer.github.io/post/2020-02-15-multiple-linear-regression-in-a-distributed-system_files/linear_regression.png" alt="Credit: XKCD"></p>
<p>In a previous <a href="https://emcramer.github.io/post/starting-distributed-computing/">post</a> I talked about adapting a linear regression algorithm so it can be used in a distributed system. Essentially, a master computer oversees computations run on local data, and the algorithm pauses midway through to send summary statistics to the master. In this way, the master receives enough information to reconstruct the model without seeing the underlying data.</p>
<p><img src="http://emcramer.github.io/post/2020-02-15-multiple-linear-regression-in-a-distributed-system_files/example-distcomp.png" alt="Example distributed computation system."></p>
<p>For a linear regression model, we can simply have the master iteratively pass candidate <code>\(\beta\)</code>s values to to the workers, which then return their local sum of the residual squares. By minimizing the sum of the squared residuals on each iteration, the master can find the optimal values for the <code>\(\beta\)</code>s in <code>\(y=\beta_0 + \beta_1x\)</code>.</p>
<p>We can expand this from simple linear regression with a single predictor to multiple linear prediction with several predictors ($y=\beta_0 + \beta_1x + &hellip; +\beta_nx$). The sum of the squared residuals is simply the summary statistic of a matrix operation. Therefore a master controller can pass a vector of <code>\(\beta\)</code> parameters to the workers on each iteration and receive the RSS in return:</p>
<p><code>$$RSS_{local}=\sum_{i=1}^n{\begin{bmatrix} x_{1,1} &amp; ... &amp; x_{1,n} &amp; \\ ... &amp; ... &amp; ... &amp; \\ x_{m,1} &amp; ... &amp; x_{m,n} &amp; \end{bmatrix}\times\begin{bmatrix} \beta_0 \\ ... \\ \beta_m \end{bmatrix}}$$</code></p>
<p>With a few minor changes in <a href="https://rextester.com/TDCUWC73705">code</a> from my previous post, we can adjust our loss function to accomodate a vector of <code>\(\beta\)</code>s.</p>
<pre><code class="language-r"># define a residual sum of squares function to handle multiple sites
multi.min.RSS &lt;- function(sites, par){
rs &lt;- 0
# calculate the residuals from each data source
for(site in sites){
tmp_mat &lt;-as.matrix(site$data[,1:(ncol(site$data)-2)])
tmps &lt;- par[1] + tmp_mat%*%par[-1] - site$data$y
rs &lt;- rs + sum(tmps^2)
}
# return the square and sum of the residuals
return(rs)
}
</code></pre>
<p>Now all we need to do is simulate some multi-variate data and test everything out.</p>
<pre><code class="language-r"># general function for simulating a sample data set given parameters
sim.data &lt;- function(mu, sig, amt, seed, mpar, nl){
# Simulate data for the practice
set.seed(seed)
x &lt;- replicate(length(mpar)-1, rnorm(n=amt, mean=mu, sd=sig))
# create the &quot;true&quot; equation for the regression
a.true &lt;- mpar[-1]
b.true &lt;- mpar[1]
y &lt;- x%*%a.true+b.true
# set the noise level
noise &lt;- rnorm(n=amt, mean=0, sd=nl)
d &lt;- data.frame(x
, &quot;y_true&quot;=y
, &quot;y&quot;=y + noise)
return(d)
}
true_vals &lt;- c(2,4,6,8)
sim.data1 &lt;-sim.data(10,2,100,2020,true_vals,1)
sim.data2 &lt;- sim.data(10,2,100,2019,true_vals,1)
sites &lt;- list(site1 = list(data=sim.data1), site2 = list(data=sim.data2))
knitr::kable(head(sim.data1))
</code></pre>
<table>
<thead>
<tr>
<th align="right">X1</th>
<th align="right">X2</th>
<th align="right">X3</th>
<th align="right">y_true</th>
<th align="right">y</th>
</tr>
</thead>
<tbody>
<tr>
<td align="right">10.753944</td>
<td align="right">6.542432</td>
<td align="right">8.540934</td>
<td align="right">152.5978</td>
<td align="right">153.5145</td>
</tr>
<tr>
<td align="right">10.603097</td>
<td align="right">8.017478</td>
<td align="right">11.702755</td>
<td align="right">186.1393</td>
<td align="right">185.9124</td>
</tr>
<tr>
<td align="right">7.803954</td>
<td align="right">8.828989</td>
<td align="right">9.207017</td>
<td align="right">159.8459</td>
<td align="right">161.0281</td>
</tr>
<tr>
<td align="right">7.739188</td>
<td align="right">10.767043</td>
<td align="right">10.813357</td>
<td align="right">184.0659</td>
<td align="right">185.5874</td>
</tr>
<tr>
<td align="right">4.406931</td>
<td align="right">11.493330</td>
<td align="right">7.922893</td>
<td align="right">151.9708</td>
<td align="right">153.4088</td>
</tr>
<tr>
<td align="right">11.441147</td>
<td align="right">8.143158</td>
<td align="right">7.488237</td>
<td align="right">156.5294</td>
<td align="right">158.8567</td>
</tr>
</tbody>
</table>
<pre><code class="language-r">knitr::kable(head(sim.data2))
</code></pre>
<table>
<thead>
<tr>
<th align="right">X1</th>
<th align="right">X2</th>
<th align="right">X3</th>
<th align="right">y_true</th>
<th align="right">y</th>
</tr>
</thead>
<tbody>
<tr>
<td align="right">11.477045</td>
<td align="right">8.309900</td>
<td align="right">11.441690</td>
<td align="right">189.3011</td>
<td align="right">189.1410</td>
</tr>
<tr>
<td align="right">8.970479</td>
<td align="right">11.715855</td>
<td align="right">9.210739</td>
<td align="right">181.8630</td>
<td align="right">181.7065</td>
</tr>
<tr>
<td align="right">6.719637</td>
<td align="right">8.632787</td>
<td align="right">11.965325</td>
<td align="right">176.3979</td>
<td align="right">177.0496</td>
</tr>
<tr>
<td align="right">11.832074</td>
<td align="right">9.978611</td>
<td align="right">7.191396</td>
<td align="right">166.7311</td>
<td align="right">167.2667</td>
</tr>
<tr>
<td align="right">7.465036</td>
<td align="right">7.191671</td>
<td align="right">11.600407</td>
<td align="right">167.8134</td>
<td align="right">168.0076</td>
</tr>
<tr>
<td align="right">11.476496</td>
<td align="right">12.783553</td>
<td align="right">8.498515</td>
<td align="right">192.5954</td>
<td align="right">192.8948</td>
</tr>
</tbody>
</table>
<p>We can test our loss function with a call to <code>optim</code> and compare the results to the base R linear modeling function (and the true values of our simulation).</p>
<pre><code class="language-r">param.fit &lt;- optim(par=c(0,0,0,0),
fn = multi.min.RSS,
hessian = TRUE,
sites=sites)
# stack the data frames vertically for later verification
sim.data3 &lt;- as.data.frame(rbind(sim.data1, sim.data2))
mlm &lt;- lm(y~., data=sim.data3[,-4])
d &lt;- data.frame(&quot;True Betas&quot;=true_vals
, &quot;Base R Coefficients&quot;=coef(mlm)
, &quot;Distributed Coefficients&quot;=param.fit$par)
knitr::kable(d)
</code></pre>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="right">True.Betas</th>
<th align="right">Base.R.Coefficients</th>
<th align="right">Distributed.Coefficients</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">(Intercept)</td>
<td align="right">2</td>
<td align="right">2.604539</td>
<td align="right">2.642250</td>
</tr>
<tr>
<td align="left">X1</td>
<td align="right">4</td>
<td align="right">4.018984</td>
<td align="right">4.021583</td>
</tr>
<tr>
<td align="left">X2</td>
<td align="right">6</td>
<td align="right">5.936988</td>
<td align="right">5.935438</td>
</tr>
<tr>
<td align="left">X3</td>
<td align="right">8</td>
<td align="right">7.978939</td>
<td align="right">7.973534</td>
</tr>
</tbody>
</table>
<p>Not far off!</p>
<p>You can run the full code <a href="https://rextester.com/PGLE10656">here</a>.</p>
</description>
</item>
<item>
<title>Predicting CRPS Limb Affectation</title>
<link>http://emcramer.github.io/project/apa2018/</link>
<pubDate>Tue, 21 Jan 2020 16:16:25 -0800</pubDate>
<guid>http://emcramer.github.io/project/apa2018/</guid>
<description><h1 id="predicting-crps-limb-affectation-from-physical-and-psychological-factors">Predicting CRPS Limb Affectation from Physical and Psychological Factors</h1>
<p><a href="https://www.ninds.nih.gov/Disorders/Patient-Caregiver-Education/Fact-Sheets/Complex-Regional-Pain-Syndrome-Fact-Sheet">Complex Regional Pain Syndrome (CRPS)</a> is a severe and rare chronic pain condition that often spreads from an initially affected limb to other parts of the body. The underlying etiology and factors that influence the spread of CRPS are not well understood. Previous research has sought to explain the mechanism, onset, and pain severity of CRPS, however, the contribution of psychosocial factors to CRPS affectation has not been investigated fully.</p>
<p>I extracted data from the <a href="https://choir.stanford.edu/">Collaborative Health Outcomes Information Registry (CHOIR)</a>, which is an electronic patient registry and learning health system. Then I trained a random forest model to describe the role of psychophysical and psychosocial factors as predictors of CRPS limb affectation. To train my model, I defined several &ldquo;classes&rdquo; of CRPS affectation such as ipsilateral and contralateral spread.</p>
<p>I presented my findings at the American Psychological Association's 126th annual convention in San Francisco, and my presentation received the Society for Health Psychology's Oustanding Poster Presentation award.</p>
<p><img src="apa-sfhp-award-2.jpg" alt="Award certificate"></p>
</description>
</item>
<item>
<title>Modeling Calmodulin</title>
<link>http://emcramer.github.io/project/bmi214/</link>
<pubDate>Tue, 21 Jan 2020 16:03:23 -0800</pubDate>
<guid>http://emcramer.github.io/project/bmi214/</guid>
<description><h1 id="modeling-calmodulin">Modeling Calmodulin</h1>
<p>One of our projects in <a href="http://explorecourses.stanford.edu/search?view=catalog&filter-coursestatus-Active=on&page=0&catalog=&academicYear=20152016&q=cs+107e&collapse=">BMI 214 (Representational Algorithms for Molecular Biology)</a> taught by <a href="https://en.wikipedia.org/wiki/Russ_Altman">Dr. Russ Altman</a>, was to create a molecular dynamics simulation of a protein. <a href="https://en.wikipedia.org/wiki/Molecular_dynamics">Molecular Dynamics</a> is using computers to simulate the interaction of atoms and molecules for a period of time under known laws of physics. I represented a fragment of the protein <a href="https://en.wikipedia.org/wiki/Calmodulin">calmodulin</a> by simulating the interactions of its component atoms (each atom within each amino acid). This entails modeling (simulating) how each atom's mass, velocity, energy, and forces change over time, and calculating the effect of the previous moment in time on the subsequent moment.</p>
</description>
</item>
<item>
<title>Exon Mutability Score</title>
<link>http://emcramer.github.io/project/bmi273/</link>
<pubDate>Tue, 21 Jan 2020 11:36:43 -0800</pubDate>
<guid>http://emcramer.github.io/project/bmi273/</guid>
<description><h1 id="class-project">Class Project</h1>
<p>For the final project of <a href="http://explorecourses.stanford.edu/search?view=catalog&amp;filter-coursestatus-Active=on&amp;page=0&amp;catalog=&amp;academicYear=20172018&amp;q=biomedin273a&amp;collapse=">BMI 273A (The Human Genome Source Code)</a> taught by <a href="http://bejerano.stanford.edu/pi.html">Dr. Gill Bejarano</a>, I worked with a group to devise a statistic to measure how &ldquo;mutable&rdquo; a given exon is (some mutations impact the function of an exoonic product, while others do not, ie. synonymous vs. nonsynonymous mutations). We used the <a href="http://exac.broadinstitute.org/">ExAC</a> dataset to isolate exons and calculate an <em>EMTM</em> score, our version of the <a href="https://doi.org/10.1371/journal.pgen.1003709">RVIS score</a>. We presented our metric, and maintain the results, explorations, and code on <a href="https://github.com/ostrowr/cs273a-project">GitHub</a>.</p>
<p><img src="rvis-lollipop.png" alt="Exon mutabilities of each chromosome based on our mutability score."></p>
</description>
</item>
<item>
<title>Starting distributed computing</title>
<link>http://emcramer.github.io/post/starting-distributed-computing/</link>
<pubDate>Thu, 16 Jan 2020 00:00:00 +0000</pubDate>
<guid>http://emcramer.github.io/post/starting-distributed-computing/</guid>
<description>
<div id="quick-intro" class="section level2">
<h2>Quick Intro</h2>
<p>As hospitals, care providers, and private companies collect more data, they develop rich databases that can be used to improve patient care (e.g. through precision medicine). Research institutions often cannot share their data with each other, however, out of privacy concerns and HIPAA compliance. This poses a hurdle to inter-institutional collaboration, and creates a research bottleneck. It is an unfortunate instance where good data security practices can create roadblocks to inter-institutional collaboration, which has the potential to solve problems such as <a href="https://medium.com/better-programming/bias-racist-robots-and-ai-the-problems-in-the-coding-that-coders-fail-to-see-305f6f324793">bias in AI</a>.</p>
<p>One way we may circumvent this issue is with distributed computation. Through an appropriately configured distributed computing service, it is possible to fit models on data that match by <a href="https://datacarpentry.org/stata-economics/img/append-merge.png">stacking vertically</a>, but is otherwise electronically separate.</p>
<div class="figure">
<img src="https://datacarpentry.org/stata-economics/img/append-merge.png" alt="" />
<p class="caption">Partitioning data</p>
</div>
<p>The underlying premise is that most (all?) computations for modelling require multiple steps. If we take values calculated during an intermediate step of a computation performed at individual sites, then we can aggregate these values in a central location to produce a final model. Since the sites don’t talk to each other, and the central location only receives a summary statistic, <strong>none of the underlying information gets shared</strong>. Each institution or entity can hold on to its data and maintain its security while still helping the common good.</p>
<p>That is the vision and purpose of the <a href="https://cran.r-project.org/web/packages/distcomp/index.html"><code>distcomp</code></a> R package, which makes the distributed computation process simpler through a series of GUIs that walk a user through the process. The only hang up is there are currently very few options for computations optimized for this distribution process.</p>
<p>That is why in this post, I am going to go through prototyping a distributed computation before adding it to the distcomp package. I am going to start small, by adding a linear regression (which was not included in the initial list of possible distributed computations).</p>
</div>
<div id="prototyping-locally" class="section level2">
<h2>Prototyping Locally</h2>
<p>To do a proper distributed computation, you need to have multiple <em>sites</em> and a <em>master</em> controlling instance. This involves configuring a server or VM. But you don’t really need to do that to <em>prototype</em> a computation and make sure you can integrate values at some intermediate step.</p>
<p>Consider computing the linear regression for some data set by minimizing the residual sum of squares. Given some set of data points <span class="math inline">\((x_i, y_i), i=1,...,n\)</span>, we obtain a residual (error) value in prediction with a model <span class="math inline">\(r_i = f(x_i, \beta)\)</span>. If this model is linear, <span class="math inline">\(f(x) = \beta_1 x + \beta_0\)</span>, then we can optimize it by minimizing the sum of the residuals: <span class="math inline">\(\sum_{i=1}^n r_i^2\)</span>.</p>
<p>It is at this step that we can split up the computation. We have the master send out the parameters (e.g. <span class="math inline">\(\beta_0, \beta_1\)</span>, etc.) for the computation to each of the participating sites. I can re-create this scenario locally by simulating two separate data sets.</p>
<pre class="r"><code># general function for simulating a sample data set given parameters
sim.data &lt;- function(mu, sig, amt, seed, mpar, nl){
# Simulate data for the practice
set.seed(seed)
x &lt;- rnorm(n=amt, mean=mu, sd=sig)
# create the &quot;true&quot; equation for the regression
a.true &lt;- mpar[1]
b.true &lt;- mpar[2]
y &lt;- x*a.true+b.true
# set the noise level
noise &lt;- rnorm(n=amt, mean=0, sd=nl)
d &lt;- cbind(x,y,y + noise)
colnames(d) &lt;- c(&quot;x&quot;, &quot;y_true&quot;,&quot;y&quot;)
return(as.data.frame(d))
}
sim.data1 &lt;-sim.data(10,2,100,2020,c(2,8),1)
sim.data2 &lt;- sim.data(10,2,100,2019,c(2,8),1)
sites &lt;- list(site1 = list(data=sim.data1), site2 = list(data=sim.data2))
head(sim.data1)</code></pre>
<pre><code>## x y_true y
## 1 10.753944 29.50789 27.77910
## 2 10.603097 29.20619 28.21493
## 3 7.803954 23.60791 23.02240
## 4 7.739188 23.47838 23.86190
## 5 4.406931 16.81386 17.56053
## 6 11.441147 30.88229 29.95387</code></pre>
<pre class="r"><code>head(sim.data2)</code></pre>
<pre><code>## x y_true y
## 1 11.477045 30.95409 30.10904
## 2 8.970479 25.94096 26.79889
## 3 6.719637 21.43927 20.75567
## 4 11.832074 31.66415 31.65345
## 5 7.465036 22.93007 21.52591
## 6 11.476496 30.95299 32.34477</code></pre>
<p>Here we simulate two data sets with 100 observations, a mean of 10, and a standard deviation of 2. The true values for the linear model are a slope of 2 and an intercept of 8.</p>
<p>Then each site will calculate the residuals from its own data and send back the summary statistic - the sum of its squared residuals.</p>
<pre class="r"><code># define a residual sum of squares function to handle multiple sites
multi.min.RSS &lt;- function(sites, par){
rs &lt;- 0
# calculate the residuals from each data source
for(site in sites){
tmps &lt;- par[1] + par[2] * site$data$x - site$data$y
rs &lt;- rs + sum(tmps^2) #c(rs, tmps)
}
# return the square and sum of the residuals
return(rs)
}</code></pre>
<p>All that is left to do is solve for each site. We can use base R’s <code>optim</code> function to do this.</p>
<pre class="r"><code>param.fit &lt;- optim(par=c(0,1),
fn = multi.min.RSS,
hessian = TRUE,
sites=sites)
print(&quot;Distributed linear model results:&quot;)</code></pre>
<pre><code>## [1] &quot;Distributed linear model results:&quot;</code></pre>
<pre class="r"><code>print(paste(&quot;Intercept: &quot;, param.fit$par[1], &quot; Slope: &quot;, param.fit$par[2]))</code></pre>
<pre><code>## [1] &quot;Intercept: 7.77183635600249 Slope: 2.00840909246344&quot;</code></pre>
<p>We can compare the result’s to R’s built-in linear model function, <code>lm</code> by stacking the data from the two “sites” and running a linear model on the full data set.</p>
<pre class="r"><code># stack the data frames vertically for later verification
sim.data3 &lt;- as.data.frame(rbind(sim.data1, sim.data2))
print(&quot;Base R linear model on the full data set:&quot;)</code></pre>
<pre><code>## [1] &quot;Base R linear model on the full data set:&quot;</code></pre>
<pre class="r"><code>lm(y~x, data=sim.data3)</code></pre>
<pre><code>##
## Call:
## lm(formula = y ~ x, data = sim.data3)
##
## Coefficients:
## (Intercept) x
## 7.776 2.008</code></pre>
<p>Pretty similar!</p>
<p>Click <a href="https://rextester.com/TDCUWC73705">here</a> to run the code.</p>
</div>
</description>
</item>
<item>
<title>Factors Associated With Acute Pain Estimation, Postoperative Pain Resolution, Opioid Cessation, and Recovery</title>
<link>http://emcramer.github.io/publication/hah-2019/</link>
<pubDate>Fri, 01 Mar 2019 00:00:00 +0000</pubDate>
<guid>http://emcramer.github.io/publication/hah-2019/</guid>
<description><script type="text/javascript" src="https://d1bxh8uas1mnw7.cloudfront.net/assets/embed.js"></script><div class="altmetric-embed" data-badge-type="donut" data-altmetric-id="56321663" />
</description>
</item>
<item>
<title>Predicting the Incidence of Pressure Ulcers in the Intensive Care Unit Using Machine Learning</title>
<link>http://emcramer.github.io/publication/cramer-2019/</link>
<pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate>
<guid>http://emcramer.github.io/publication/cramer-2019/</guid>
<description></description>
</item>
<item>
<title>The somatic distribution of chronic pain and emotional distress utilizing the collaborative health outcomes information registry (CHOIR) bodymap</title>
<link>http://emcramer.github.io/publication/cramer-2018/</link>
<pubDate>Thu, 01 Mar 2018 00:00:00 +0000</pubDate>
<guid>http://emcramer.github.io/publication/cramer-2018/</guid>
<description></description>
</item>
</channel>
</rss>