-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathautomation.html
120 lines (87 loc) · 3 KB
/
automation.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
title: "Automation"
quote:
text: "The three rules of the Librarians of Time and Space are: 1) Silence; 2) Books must be returned no later than the last date shown; and 3) Do not interfere with the nature of causality."
author: "Terry Pratchett"
---
# What problems are we trying to solve?
- Reproducibility
- "How did I produce Figure 3?"
- Transferability
- "How did they produce Figure 3?"
- Efficiency
- "…and then you type this, and then you type that…"
---
# Make and its children
- Make was created in 1975 by a summer intern to recompile C programs
- "If these files are out of date, re-create them"
- Dozens of imitators, but none have achieved critical mass
- Use drake for R
- Use snakemake for Python
---
# Dependencies
- A depends on B
- B depends on C
- So if C changes:
- Re-create B
- Then re-create A
- Don't re-create D unless E changes
{% include fig img="dependencies.png" alt="Dependencies" %}
---
# Rules
- A depends on all B's
- Each B<sub>i</sub> depends on a C<sub>i</sub>
- So if some C<sub>i</sub> change:
- Re-create those B<sub>i</sub>
- Then re-create A
- Write one rule for C<sub>i</sub> → B<sub>i</sub>
- Just like you would write a function
---
# Automation and notebooks
- Notebooks designed for interactive operation
- Linear sequence of steps, short execution times
- Build managers designed for batch operation
- Complex interactions, possibly long execution times
- Use notebooks as steps in a build
- Don't yet have good tools to express complex dependencies in notebooks
---
<h1 class="project-lead">As project lead</h1>
- Make sure all repeated actions are in the build file
- Make sure they are documented in the build file
- Because otherwise the docs will fall out of date
- Teach newcomers how to use the build system
- Probably new to most of them
- Decide what should be in notebooks vs. packages
---
# Continuous integration
1. Contributor pushes to repository
1. GitHub notices a CI configuration file
1. Creates a fresh virtual machine
1. Clones a fresh copy of repository
1. Runs tests, creates website, etc.
1. Reports results
1. …all while you're making tea
---
# Because
- Never forgets to do it
- Never forgets to tell people what it did
- Never forgets what it did
- Can do things on operating systems or software versions that individual developers don't have
- Documents the workflow
- A checklist that is constantly checked
---
<h1 class="project-lead">As project lead</h1>
- Same advice as the build system
- Manage infrastructure
- Teach people how to use it
- Make sure they do
- Recalibrate your productivty measures
---
<h1 class="exercise">What do you do now?</h1>
1. What tasks do you repeat most often on your project?
1. What does a newcomer have to set up in order to do these things?
1. Where is this documented?
---
<h1 class="exercise">What alternatives should you consider?</h1>
1. What is the biggest difference between GitHub Actions and a service like Travis-CI?
1. Which would be a better choice for your project? Why?