Shut down if all initial inputs crash #276

ThorbjoernSchulz · 2019-10-09T12:55:44Z

Regarding #262

The problem:
If a Fuzz function crashes on every initial input provided, the fuzzer
still executes and posts stats to stdout. However, since no inputs
are provided to the workers, nothing meaningful happens.
The behavior is confusing because of two points:

No proper logging is done. The coordinator continues to execute and
posts stats.
Crashing inputs in general are allowed. The problem arises only if
all initial inputs crash.

The problem in more detail:
The coordinator maps the initial corpus into memory. If no corpus is
provided the coordinator creates a default input (the empty input).
Workers then access the corpus over the hub, not over the coordinator.
Before this happens they are given the initial corpus files and are
testing them. If the Fuzz function crashes on these inputs, they will
never be passed to the hub. Therefore they are not available to the
workers after this initial stage.

The proposed solution:
After the initial triage stage we wait a short amount of time for
synchronization purposes. Then we check if the corpus is still empty.
If it is we conclude that the target crashed on every input and
shut down the fuzzer with the respective error message.

Please comment if the solution lacks in anything.

If a Fuzz function crashes on every initial input provided, the fuzzer still executes and posts stats to stdout. However, since no inputs are provided to the workers, nothing meaningful happens. The behaviour is confusing because of two points: 1. No proper logging is done. The coordinator continues to execute and posts stats. 2. Crashing inputs in general are allowed. The problem arises only if all initial inputs crash. The problem in more detail: The coordinator maps the initial corpus into memory. If no corpus is provided the coordinator creates a default input (the empty input). Workers then access the corpus over the hub, not over the coordinator. Before this happens they are given the initial corpus files and are testing them. If the Fuzz function crashes on these inputs, they will never be passed to the hub. Therefore they are not available to the workers after this initial stage. The proposed solution is as follows: After the initial triage stage we wait a short amount of time for synchronization purposes. Then we check if the corpus is still empty. If it is we conclude that the target crashed on every input and shut down the fuzzer with the respective error message.

klauspost · 2019-12-17T15:43:43Z

I tried out your patch, and things still stops processing, but quite randomly.

However the more workers I run, the more likely it seems to be that there is a lockup. Weird.

Restarting, does sometimes allow it to progress. No critique, but I am not this this fixes all cases of #262

3 executions:

 go-fuzz -v=0 -minimize=5s -bin=fuzz-build.zip -workdir=corpus -procs=32
2019/12/17 16:24:17 workers: 32, corpus: 5203 (2s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 0, uptime: 3s
2019/12/17 16:24:20 workers: 32, corpus: 5203 (5s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 2009, uptime: 6s
2019/12/17 16:24:23 workers: 32, corpus: 5203 (8s ago), crashers: 0, restarts: 1/6117, execs: 110121 (11814/sec), cover: 2009, uptime: 9s
2019/12/17 16:24:26 workers: 32, corpus: 5203 (11s ago), crashers: 0, restarts: 1/6117, execs: 110121 (8937/sec), cover: 2009, uptime: 12s
2019/12/17 16:24:29 workers: 32, corpus: 5203 (14s ago), crashers: 0, restarts: 1/6117, execs: 110121 (7187/sec), cover: 2009, uptime: 15s

 go-fuzz -v=0 -minimize=5s -bin=fuzz-build.zip -workdir=corpus -procs=32
2019/12/17 16:24:58 workers: 32, corpus: 5203 (3s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 0, uptime: 3s
2019/12/17 16:25:01 workers: 32, corpus: 5203 (6s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 2009, uptime: 6s
2019/12/17 16:25:04 workers: 32, corpus: 5203 (9s ago), crashers: 0, restarts: 1/5274, execs: 105488 (11322/sec), cover: 2009, uptime: 9s
2019/12/17 16:25:07 workers: 32, corpus: 5203 (12s ago), crashers: 0, restarts: 1/5274, execs: 105488 (8565/sec), cover: 2009, uptime: 12s
2019/12/17 16:25:10 workers: 32, corpus: 5203 (15s ago), crashers: 0, restarts: 1/5274, execs: 105488 (6887/sec), cover: 2009, uptime: 15s
2019/12/17 16:25:13 workers: 32, corpus: 5203 (18s ago), crashers: 0, restarts: 1/5274, execs: 105488 (5759/sec), cover: 2009, uptime: 18s

 go-fuzz -v=0 -minimize=5s -bin=fuzz-build.zip -workdir=corpus -procs=32
2019/12/17 16:25:30 workers: 32, corpus: 5209 (2s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 0, uptime: 3s
2019/12/17 16:25:33 workers: 32, corpus: 5209 (5s ago), crashers: 0, restarts: 1/0, execs: 0 (0/sec), cover: 2009, uptime: 6s
2019/12/17 16:25:36 workers: 32, corpus: 5209 (8s ago), crashers: 0, restarts: 1/3750, execs: 112522 (12078/sec), cover: 2009, uptime: 9s
2019/12/17 16:25:39 workers: 32, corpus: 5209 (11s ago), crashers: 1, restarts: 1/4893, execs: 342542 (27811/sec), cover: 2009, uptime: 12s
2019/12/17 16:25:42 workers: 32, corpus: 5209 (14s ago), crashers: 1, restarts: 1/5554, execs: 622050 (40613/sec), cover: 2009, uptime: 15s
2019/12/17 16:25:45 workers: 32, corpus: 5209 (17s ago), crashers: 1, restarts: 1/6328, execs: 873388 (47684/sec), cover: 2009, uptime: 18s
2019/12/17 16:25:48 workers: 32, corpus: 5209 (20s ago), crashers: 1, restarts: 1/6785, execs: 1092409 (51248/sec), cover: 2009, uptime: 21s
[... keeps running]

In some of the cases where execution stops one worker is running at full speed, but nothing appears to happen.

ThorbjoernSchulz · 2019-12-17T17:28:07Z

Thank you for the feedback. Can you share the fuzz function you used? I will try to reproduce this tomorrow and look further into it.

klauspost · 2019-12-17T17:44:22Z

@ThorbjoernSchulz

go get -u github.com/klauspost/simdjson-fuzz
go get -u github.com/fwessels/simdjson-go
cd $GOPATH$/src/github.com/klauspost/simdjson-fuzz
go-fuzz-build -o=fuzz-build.zip -func=Fuzz .
go-fuzz -minimize=5s -bin=fuzz-build.zip -workdir=corpus -procs=16

Using verbose output it ends up with something like this:

2019/12/17 18:36:47 workers: 16, corpus: 5286 (2s ago), crashers: 24, restarts: 1/0, execs: 0 (0/sec), cover: 0, uptime: 3s
2019/12/17 18:36:47 hub: corpus=358 bootstrap=358 fuzz=0 minimize=0 versifier=0 smash=0 sonar=0
2019/12/17 18:36:47 worker 0: triageq=33 execs=17269 mininp=17227 mincrash=0 triage=42 fuzz=0 versifier=0 smash=0 sonar=0 hint=0
2019/12/17 18:36:47 worker 1: triageq=6 execs=6463 mininp=6452 mincrash=0 triage=12 fuzz=0 versifier=0 smash=0 sonar=0 hint=0
2019/12/17 18:36:47 worker 3: triageq=1 execs=2467 mininp=2465 mincrash=0 triage=3 fuzz=0 versifier=0 smash=0 sonar=0 hint=0
2019/12/17 18:36:47 worker 4: triageq=0 execs=14400 mininp=14391 mincrash=0 triage=9 fuzz=0 versifier=0 smash=0 sonar=0 hint=0
2019/12/17 18:36:47 worker 2: triageq=9 execs=4801 mininp=4670 mincrash=0 triage=132 fuzz=0 versifier=0 smash=0 sonar=0 hint=0
2019/12/17 18:36:47 worker 5: triageq=5 execs=15234 mininp=15216 mincrash=0 triage=18 fuzz=0 versifier=0 smash=0 sonar=0 hint=0
2019/12/17 18:36:47 worker 9: triageq=3 execs=6651 mininp=6631 mincrash=0 triage=21 fuzz=0 versifier=0 smash=0 sonar=0 hint=0
2019/12/17 18:36:47 worker 8: triageq=0 execs=4350 mininp=4180 mincrash=0 triage=171 fuzz=0 versifier=0 smash=0 sonar=0 hint=0
2019/12/17 18:36:48 worker 6 triages coordinator input [24][56 4 134 248 50 207 33 67 208 107 249 95 120 27 102 230 89 83 77 157] minimized=true smashed=false
2019/12/17 18:36:48 worker 6: triageq=0 execs=3825 mininp=1320 mincrash=0 triage=2506 fuzz=0 versifier=0 smash=0 sonar=0 hint=0
2019/12/17 18:36:48 worker 10 triages coordinator input [7][131 169 22 178 212 36 154 81 152 182 248 40 29 202 223 43 254 252 190 50] minimized=true smashed=false
2019/12/17 18:36:48 worker 10: triageq=5 execs=3850 mininp=1420 mincrash=0 triage=2431 fuzz=0 versifier=0 smash=0 sonar=0 hint=0
2019/12/17 18:36:48 testee:
2019/12/17 18:36:48 testee:
2019/12/17 18:36:49 testee:
2019/12/17 18:36:49 testee:
2019/12/17 18:36:49 testee:
2019/12/17 18:36:49 testee:
2019/12/17 18:36:49 testee:
2019/12/17 18:36:50 workers: 16, corpus: 5286 (5s ago), crashers: 24, restarts: 1/0, execs: 0 (0/sec), cover: 2005, uptime: 6s
2019/12/17 18:36:50 hub: corpus=358 bootstrap=358 fuzz=0 minimize=0 versifier=0 smash=0 sonar=0
2019/12/17 18:36:53 workers: 16, corpus: 5286 (8s ago), crashers: 24, restarts: 1/6100, execs: 79310 (8500/sec), cover: 2005, uptime: 9s
2019/12/17 18:36:53 hub: corpus=358 bootstrap=358 fuzz=0 minimize=0 versifier=0 smash=0 sonar=0
2019/12/17 18:36:56 workers: 16, corpus: 5286 (11s ago), crashers: 24, restarts: 1/6100, execs: 79310 (6432/sec), cover: 2005, uptime: 12s
...

Adding more workers (I have 32 threads) seems to increase the chance of a deadlock.

And yes, the code does panic fairly often... The tested package is commit 56fc5c4ff6bb831811b4640aecdb998edebe545a (currently master)

go-fuzz/worker.go

The break statement accidentally slipped into the wrong block.

ThorbjoernSchulz · 2020-03-16T21:04:33Z

Looking into the CI results, the errors do not stem from these changes, do they?

dvyukov · 2020-03-18T09:00:46Z

go-fuzz/worker.go

+					// If the corpus is empty at this point, we have nothing to feed to our workers.
+					if x == 1 {
+						// Give the hub time to store the initial inputs
+						time.Sleep(100 * time.Millisecond)


If I am reading this correctly, we can fail spuriously if hub is delayed a bit (overloaded machine, virtualization, etc).
Also we don't generally ask user to add anything to corpus, fine to start with an empty one. Also if just part of inputs crash, we will continue working, this radical difference in behavior between part crash and all crash feels wrong. Also the corpus may contain, say, just 1 input and it happened to crash after code update. Then fuzzer won't start.
I agree the current behavior is bad. But I think a more useful way to fix this would be to let workers work even with empty corpus. They could generate random completely random inputs, it does not seem we can do anything better in absence of any seeds.
Then if just part of inputs crash (and we were unlucky with corpus, I think I saw cases where the initial empty input crash, maybe due to a bug in the fuzz function itself), the fuzzer will gracefully recover. Otherwise, the crash count will go up infinitely, which is, well, the expected behavior in such case.

It may be a bit tricky to preserve this part then:

if len(ro.corpus) == 0 { // Some other worker triages corpus inputs. time.Sleep(100 * time.Millisecond) continue }

because we want to wait for hub for initial inputs, but not if there are really 0 inputs. I think this bit has to do with corpus inflation problem mentioned above:

// Other workers are still triaging initial inputs. // Wait until they finish, otherwise we can generate // as if new interesting inputs that are not actually new // and thus unnecessary inflate corpus on every run. time.Sleep(100 * time.Millisecond)

But I hope there is some way to preserve it.

dvyukov · 2020-03-18T09:01:42Z

Yes, the CI failure looks unrelated. There seems to be some change in darwin setup on travis.

dvyukov · 2020-03-18T09:16:31Z

Yes, the CI failure looks unrelated. There seems to be some change in darwin setup on travis.

This should be fixed now: be3528f

klauspost mentioned this pull request Feb 24, 2020

High process count+fast CPU -> deadlock #287

Closed

klauspost reviewed Feb 24, 2020

View reviewed changes

go-fuzz/worker.go Outdated Show resolved Hide resolved

Break at the right spot

240e64a

The break statement accidentally slipped into the wrong block.

dvyukov reviewed Mar 18, 2020

View reviewed changes

klauspost mentioned this pull request Jun 7, 2021

[dev.fuzz] cmd/go: hangs in "gathering baseline coverage" golang/go#46633

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shut down if all initial inputs crash #276

Shut down if all initial inputs crash #276

ThorbjoernSchulz commented Oct 9, 2019

klauspost commented Dec 17, 2019

ThorbjoernSchulz commented Dec 17, 2019

klauspost commented Dec 17, 2019 •

edited

Loading

ThorbjoernSchulz commented Mar 16, 2020

dvyukov Mar 18, 2020

dvyukov commented Mar 18, 2020

dvyukov commented Mar 18, 2020

Shut down if all initial inputs crash #276

Are you sure you want to change the base?

Shut down if all initial inputs crash #276

Conversation

ThorbjoernSchulz commented Oct 9, 2019

klauspost commented Dec 17, 2019

ThorbjoernSchulz commented Dec 17, 2019

klauspost commented Dec 17, 2019 • edited Loading

ThorbjoernSchulz commented Mar 16, 2020

dvyukov Mar 18, 2020

Choose a reason for hiding this comment

dvyukov commented Mar 18, 2020

dvyukov commented Mar 18, 2020

klauspost commented Dec 17, 2019 •

edited

Loading