Skip to content

Commit b558e7f

Browse files
authored
Parcoords (#60)
* version-zero * notebook * update * black * parcoords * moar * moar * docs * version-update * black
1 parent 2fe1c82 commit b558e7f

21 files changed

+6036
-23
lines changed

docs/api/interactive-charts.md

+4
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
11
# `InteractiveCharts`
22

33
::: hulearn.experimental.InteractiveCharts
4+
5+
# `parallel_coordinates`
6+
7+
::: hulearn.experimental.parallel_coordinates

docs/guide/function-classifier/function-classifier.md

+61-2
Original file line numberDiff line numberDiff line change
@@ -138,10 +138,69 @@ grid = GridSearchCV(mod,
138138
grid.fit(X, y)
139139
```
140140

141+
## Guidance
142+
143+
Human Learn doesn't just allow you to turn functions into classifiers. It also tries
144+
to help you find rules that could be useful. In particular, an interactive parallel
145+
coordinates chart could be very helpful here.
146+
147+
You can create a parallel coordinates chart directly inside of jupyter.
148+
149+
```python
150+
from hulearn.experimental.interactive import parallel_coordinates
151+
parallel_coordinates(df, label="survived", height=200)
152+
```
153+
154+
What follows next are some explorations of the dataset. They are based on the scene
155+
from the titanic movie where they yell "Woman and Children First!". So let's see if
156+
we can confirm if this holds true.
157+
158+
### Explore
159+
160+
![](parcoords1.gif)
161+
162+
It indeed seems that women in 1st/2nd class have a high chance of surviving.
163+
164+
![](parcoords2.gif)
165+
166+
It also seems that male children have an increased change of survival, but only
167+
if they were travelling 1st/2nd class.
168+
169+
### Grid
170+
171+
Here's a lovely observation. By doing exploratory analysis we not only understand the
172+
data better but we can now also turn the patterns that we've observed into a model!
173+
174+
```python
175+
def make_prediction(dataf, age=15):
176+
women_rule = (dataf['pclass'] < 3.0) & (dataf['sex'] == "female")
177+
children_rule = (dataf['pclass'] < 3.0) & (dataf['age'] <= age)
178+
return women_rule | children_rule
179+
180+
mod = FunctionClassifier(make_prediction)
181+
```
182+
183+
We're even able to use grid-search again to find the optimal threshold for `"age"`.
184+
185+
### Comparison
186+
187+
To compare our results we've also trained a `RandomForestClassifier`.
188+
Here's how the models compare;
189+
190+
|Model | accuracy | precision | recall|
191+
--- | --- | --- | ---
192+
|Women & Children Rule |0.808157 | 0.952168 | 0.558621
193+
|RandomForestClassifier|0.813869 | 0.785059 | 0.751724
194+
195+
It seems like our rule based model is quite reasonable. A great follow-up exercise
196+
would be to try and understand when the random forest model disagrees with the rule
197+
based system. This could lead us to understand more patterns in the data.
198+
141199
## Conclusion
142200

143-
In this guide we've seen the `FunctionClassifier`. It is one of the many models in this
144-
library that will help you construct more "human" models.
201+
In this guide we've seen the `FunctionClassifier` in action. It is one of the many
202+
models in this library that will help you construct more "human" models. This component
203+
is very effective when it is combined with exploratory data analysis techniques.
145204

146205
### Notebook
147206

2.81 MB
Loading
3.07 MB
Loading

docs/guide/notebooks/01-function-classifier.ipynb

+3,021-8
Large diffs are not rendered by default.

docs/guide/notebooks/demo_data.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
[{"chart_id": "397ffa01-1", "x": "bill_length_mm", "y": "bill_depth_mm", "polygons": {"Adelie": {"bill_length_mm": [], "bill_depth_mm": []}, "Gentoo": {"bill_length_mm": [], "bill_depth_mm": []}, "Chinstrap": {"bill_length_mm": [], "bill_depth_mm": []}}}, {"chart_id": "71f2c74d-0", "x": "flipper_length_mm", "y": "body_mass_g", "polygons": {"Adelie": {"flipper_length_mm": [], "body_mass_g": []}, "Gentoo": {"flipper_length_mm": [], "body_mass_g": []}, "Chinstrap": {"flipper_length_mm": [], "body_mass_g": []}}}]
1+
[{"chart_id": "3fe365af-2", "x": "bill_length_mm", "y": "bill_depth_mm", "polygons": {"Adelie": {"bill_length_mm": [], "bill_depth_mm": []}, "Gentoo": {"bill_length_mm": [], "bill_depth_mm": []}, "Chinstrap": {"bill_length_mm": [], "bill_depth_mm": []}}}, {"chart_id": "d89db41b-0", "x": "flipper_length_mm", "y": "body_mass_g", "polygons": {"Adelie": {"flipper_length_mm": [], "body_mass_g": []}, "Gentoo": {"flipper_length_mm": [], "body_mass_g": []}, "Chinstrap": {"flipper_length_mm": [], "body_mass_g": []}}}]

hulearn/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.2.5"
1+
__version__ = "0.3.0"

hulearn/classification/functionclassifier.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -66,11 +66,11 @@ def predict(self, X):
6666
return self.func(X, **self.kwargs)
6767

6868
def get_params(self, deep=True):
69-
""""""
69+
""" """
7070
return {**self.kwargs, "func": self.func}
7171

7272
def set_params(self, **params):
73-
""""""
73+
""" """
7474
for k, v in params.items():
7575
if k == "func":
7676
self.func = v

hulearn/experimental/__init__.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
from .interactive import InteractiveCharts
1+
from .interactive import InteractiveCharts, parallel_coordinates
22

3-
__all__ = ["InteractiveCharts"]
3+
__all__ = ["InteractiveCharts", "parallel_coordinates"]

hulearn/experimental/interactive.py

+60
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
1+
import os
12
import uuid
3+
import random
4+
import pathlib
5+
from string import Template
26
from pkg_resources import resource_filename
37

48
from clumper import Clumper
9+
from IPython.core.display import HTML
510
from bokeh.models import ColumnDataSource
611
from bokeh.plotting import figure, show
712
from bokeh.models import PolyDrawTool, PolyEditTool
@@ -217,3 +222,58 @@ def data(self):
217222
for k, v in self.poly_patches.items()
218223
},
219224
}
225+
226+
227+
def _random_string():
228+
"""Generates a random HTML id for d3 charts."""
229+
return "".join([random.choice("qwertyuiopasdfghjklzxcvbnm") for _ in range(6)])
230+
231+
232+
def parallel_coordinates(dataf, label, height=200):
233+
"""
234+
Creates an interactive parallel coordinates chart to help with classification tasks.
235+
236+
Arguments:
237+
dataf: the dataframe to render
238+
label: the column that represents the label, will be used for coloring
239+
height: the height of the chart, in pixels
240+
241+
Usage:
242+
243+
```python
244+
from hulearn.datasets import load_titanic
245+
from hulearn.experimental.interactive import parallel_coordinates
246+
247+
df = load_titanic(as_frame=True)
248+
parallel_coordinates(df, label="survived", height=200)
249+
```
250+
"""
251+
t = Template(
252+
pathlib.Path(
253+
resource_filename(
254+
"hulearn", os.path.join("static", "parcoords", "template.html")
255+
)
256+
).read_text()
257+
)
258+
d3_blob_path = resource_filename(
259+
"hulearn", os.path.join("static", "parcoords", "d3.min.js")
260+
)
261+
css_blob_path = resource_filename(
262+
"hulearn", os.path.join("static", "parcoords", "d3.parcoords.css")
263+
)
264+
js_blob_path = resource_filename(
265+
"hulearn", os.path.join("static", "parcoords", "d3.parcoords.js")
266+
)
267+
268+
json_data = dataf.rename(columns={label: "label"}).to_json(orient="records")
269+
rendered = t.substitute(
270+
{
271+
"data": json_data,
272+
"id": _random_string(),
273+
"style": pathlib.Path(css_blob_path).read_text(),
274+
"d3_blob": pathlib.Path(d3_blob_path).read_text(),
275+
"parcoords_stuff": pathlib.Path(js_blob_path).read_text(),
276+
"height": f"{height}px",
277+
}
278+
)
279+
return HTML(rendered)

hulearn/outlier/functionoutlier.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,11 @@ def predict(self, X):
4545
return self.func(X, **self.kwargs)
4646

4747
def get_params(self, deep=True):
48-
""""""
48+
""" """
4949
return {**self.kwargs, "func": self.func}
5050

5151
def set_params(self, **params):
52-
""""""
52+
""" """
5353
for k, v in params.items():
5454
if k == "func":
5555
self.func = v

hulearn/preprocessing/pipetransformer.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -92,11 +92,11 @@ def transform(self, X):
9292
return self.func(X, **self.kwargs)
9393

9494
def get_params(self, deep=True):
95-
""""""
95+
""" """
9696
return {**self.kwargs, "func": self.func}
9797

9898
def set_params(self, **params):
99-
""""""
99+
""" """
100100
for k, v in params.items():
101101
if k == "func":
102102
self.func = v

hulearn/regression/functionregressor.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -43,11 +43,11 @@ def predict(self, X):
4343
return self.func(X, **self.kwargs)
4444

4545
def get_params(self, deep=True):
46-
""""""
46+
""" """
4747
return {**self.kwargs, "func": self.func}
4848

4949
def set_params(self, **params):
50-
""""""
50+
""" """
5151
for k, v in params.items():
5252
if k == "func":
5353
self.func = v

hulearn/static/parcoords/LICENSE

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
Copyright (c) 2012, Kai Chang
2+
All rights reserved.
3+
4+
Redistribution and use in source and binary forms, with or without
5+
modification, are permitted provided that the following conditions are met:
6+
7+
* Redistributions of source code must retain the above copyright notice, this
8+
list of conditions and the following disclaimer.
9+
10+
* Redistributions in binary form must reproduce the above copyright notice,
11+
this list of conditions and the following disclaimer in the documentation
12+
and/or other materials provided with the distribution.
13+
14+
* The name Kai Chang may not be used to endorse or promote products
15+
derived from this software without specific prior written permission.
16+
17+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
18+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
20+
DISCLAIMED. IN NO EVENT SHALL MICHAEL BOSTOCK BE LIABLE FOR ANY DIRECT,
21+
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
22+
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
23+
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
25+
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
26+
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

hulearn/static/parcoords/d3.min.js

+5
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
.parcoords > canvas {
2+
font: 14px sans-serif;
3+
position: absolute;
4+
}
5+
.parcoords > canvas {
6+
pointer-events: none;
7+
}
8+
.parcoords text.label {
9+
cursor: default;
10+
}
11+
.parcoords rect.background:hover {
12+
fill: rgba(120,120,120,0.2);
13+
}
14+
.parcoords canvas {
15+
opacity: 1;
16+
transition: opacity 0.3s;
17+
-moz-transition: opacity 0.3s;
18+
-webkit-transition: opacity 0.3s;
19+
-o-transition: opacity 0.3s;
20+
}
21+
.parcoords canvas.faded {
22+
opacity: 0.25;
23+
}
24+
.parcoords {
25+
-webkit-touch-callout: none;
26+
-webkit-user-select: none;
27+
-khtml-user-select: none;
28+
-moz-user-select: none;
29+
-ms-user-select: none;
30+
user-select: none;
31+
background-color: white;
32+
}

0 commit comments

Comments
 (0)