You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Before data, fix a distribution $$P\in M_1(\mathcal H)$$ "prior"
42
+
* Based on data, learn a distribution $$Q\in M_1(\mathcal H)$$ "posterior"
43
+
* Predictions:
44
+
** draw $$h\sim Q$$ and predict with the chosen $$h$$.
45
+
** each prediction wiht a fresh random draw.
46
+
47
+
The @@color:red;risk measures@@ $$R_{in}(h)$$ and $$R_{out}(h)$$ are @@color:red;extended by averaging@@:
48
+
49
+
$$
50
+
R(Q) \equiv \int_{\mathcal H}R(h)dQ(h)
51
+
$$
52
+
53
+
!!! PAC-Bayes vs Bayesian learning
54
+
55
+
* Prior
56
+
** PAC-Bayes: bounds hold for any distribution
57
+
** Bayes: prior choice impacts inference
58
+
* Posterior
59
+
** PAC-Bayes: bounds hold for any distribution
60
+
** Bayes: posterior uniquely defined by prior and statistical model
61
+
* Data distribution
62
+
** PAC-Bayes: bounds hold for any distribution
63
+
** Bayes: randomness lies in the noise model generating the output
64
+
65
+
!!! A General PAC-Bayesian Theorem
66
+
$$\Delta$$-function: "distance" between $$R_{in}(Q)$$ and $$R_{out}(Q)$$
67
+
68
+
Convex function $$\Delta: [0,1] \times [0,1]\rightarrow \mathbb R$$
69
+
70
+
For any distribution $$D$$ on $$\mathcal X\times \mathcal Y$$, for any set $$\mathcal H$$ of voters, for any distribution $$P$$ on $$\mathcal H$$, for any $$\delta\in[0, 1]$$, and for any $$\Delta$$-function, we have, with probability at least $$1-\delta$$ over the choice of $$S\sim D^m$$,
71
+
72
+
$$
73
+
\forall Q \text{ on } \mathcal H: \Delta(R_{in}(Q), R_{out}(Q))\le\frac1m[KL(Q\|P)+\ln\frac{\mathcal J_\Delta(m)}{\delta}]
74
+
$$
75
+
76
+
Proof: Change of measure inequality and Markov's inequality
* using part of the data to learn the prior for SVMs
92
+
* defining the prior in terms of the data generating distribution (aka localised PAC-Bayes)
93
+
94
+
$$\eta$$Prior SVM in case VC dimension does not work
95
+
96
+
* @@color:green;Bounds are tight@@
97
+
* @@color:green;Model selection from the bounds is as good as 10FCV@@
98
+
* @@color:red;The better bounds do not appear to give better model selection@@
99
+
100
+
!! Performance of deep NNs
101
+
102
+
* For SVMs we can think of the margin as capturing an accuracy with which we need to estimate the weights
103
+
* If we have a deep network solution with a wide basin of good performance we can take a similar approach using PAC-Bayes with a broad posterior around the solution
104
+
* (Dziugaite and Roy + Neyshabur) have derived some of the tightest deep learning bounds in this way
105
+
** by training to expand the basin of attraction
106
+
** hence not measuring good generalisation of normal training
0 commit comments