Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

strongly connected components are not strongly connected? #348

Closed
arekfu opened this issue Apr 29, 2016 · 21 comments
Closed

strongly connected components are not strongly connected? #348

arekfu opened this issue Apr 29, 2016 · 21 comments
Labels
bug confirmed bug producing incorrect results

Comments

@arekfu
Copy link
Contributor

arekfu commented Apr 29, 2016

I generate strongly connected components of random graphs and subsequently check if they are strongly connected. The result is systematically false. Am I missing something?

import LightGraphs: strongly_connected_components, is_strongly_connected,
                    induced_subgraph, erdos_renyi

function scc_ok(graph)
  scc = strongly_connected_components(graph)
  scc_as_subgraphs = map(i -> induced_subgraph(graph, i), scc)
  return all(is_strongly_connected, scc_as_subgraphs)
end

println(all(scc_ok, [erdos_renyi(400, 0.01, is_directed=true) for i in 1:100]))
@sbromberger
Copy link
Owner

Yup, this definitely smells like a bug, though it's not "systematically" false. I've got one graph that does it.

The graph: https://gist.github.com/sbromberger/af5912dd34190a8a2ed6e0ff9d0c0974

The subgraph that fails the check: https://gist.github.com/sbromberger/21bc112240f0308fc63de8e198fe2715

@sbromberger sbromberger added the bug confirmed bug producing incorrect results label Apr 29, 2016
@CarloLucibello
Copy link
Contributor

Could this be a bug introduced in #332 ?

Can someone try the same test on a previous commit? I don't have time at the moment, but eventually I can get to it in the next days

@sbromberger
Copy link
Owner

I can, but there are tons of warnings thrown on the new build: see https://gist.github.com/sbromberger/6a81432250b093b09d442a008177ab94. I'll file a new issue.

@sbromberger
Copy link
Owner

@CarloLucibello looks like you may be off the hook - this problem existed in 0.5.2.

@sbromberger
Copy link
Owner

sbromberger commented Apr 29, 2016

sccfail.jgz.zip

Unzip this, then load sccfail.jgz. The big graph is g, the one that fails the scc test is bad.

@arekfu
Copy link
Contributor Author

arekfu commented Apr 30, 2016

Incidentally, I think it would be a good idea to make extensive use of the testing strategy in the example above (asserting invariants on several instances of random graphs) in the package tests. Do you currently do this?

@CarloLucibello
Copy link
Contributor

a smaller graph failing the test:

julia> println(scc_ok(erdos_renyi(20, 0.1, is_directed=true,seed=17)))
false

@CarloLucibello
Copy link
Contributor

Incidentally, I think it would be a good idea to make extensive use of the testing strategy in the example above (asserting invariants on several instances of random graphs) in the package tests. Do you currently do this?

We mostly do tests on handmade small graphs, and we certainly could check for more invariants. Would be nice to see more functions like scc_ok in the testsuite

@CarloLucibello
Copy link
Contributor

smaller

julia> ok=scc_ok(erdos_renyi(11, 0.12, is_directed=true,seed=17))
false

@sbromberger
Copy link
Owner

Incidentally, I think it would be a good idea to make extensive use of the testing strategy in the example above (asserting invariants on several instances of random graphs) in the package tests. Do you currently do this?

No, because by definition they are not deterministic tests and can yield varying results. See #323 for more discussion.

@arekfu
Copy link
Contributor Author

arekfu commented Apr 30, 2016

They are deterministic if you specify the seeds, aren't they?

@sbromberger
Copy link
Owner

I'm not sure determinism crosses system bounds.

@arekfu
Copy link
Contributor Author

arekfu commented Apr 30, 2016

Still (last comment about this side issue), testing on random graphs might be nondeterministic and you might run into false negatives now and then, but at the same time it would enormously improve your coverage. If your code works on thousands on random graphs, the probability that it contains a bug becomes negligible.

@sbromberger
Copy link
Owner

That's certainly true, but the set of functions that we could run these on is limited to those that return boolean (or can be coerced to boolean) since we'd use any / all. Centrality measures, for example, won't be able to be tested this way. Balance this against the time it takes to run tests and it's not a straightforward "let's definitely do this".

That said, I do like the idea as long as we're not imposing too much of a burden on the testing framework. It might be worthwhile to see what functions can benefit from tests of random graphs and then make sure they don't take forever to run.

(...and I hope this isn't really your last comment on this issue, since it's worthy of discussion - please reconsider.)

@sbromberger
Copy link
Owner

This is still a bug.

@arekfu
Copy link
Contributor Author

arekfu commented May 2, 2016

Even smaller:

erdos_renyi(4, 0.25, seed=436, is_directed=true)

@arekfu
Copy link
Contributor Author

arekfu commented May 2, 2016

OK, so this is interesting:

julia> g=erdos_renyi(4, 0.25, seed=436, is_directed=true)
{4, 4} directed graph

julia> collect(edges(g))
4-element Array{Pair{Int64,Int64},1}:
 edge 1 - 3
 edge 1 - 4
 edge 2 - 3
 edge 4 - 2

julia> g2=DiGraph(4);  add_edge!(g2, 1, 2); add_edge!(g2, 2, 4); add_edge!(g2, 4, 3); add_edge!(g2, 1, 3);

julia> scc_ok(g)
false

julia> scc_ok(g2)
true

The interesting point is that g and g2 are isomorphic (g2 is g with vertices 2 and 4 exchanged). So I instrumented the calls to the TarjanVisitor methods (forgive my OO jargon) and I got:

julia> scc_ok(g)
INFO: discover_vertex!(LightGraphs.TarjanVisitor(Int64[],Int64[],[0,0,0,0],Array{Int64,1}[]), 1)
INFO: examine_neighbor!(LightGraphs.TarjanVisitor([1],[1],[1,0,0,0],Array{Int64,1}[]), 1, 3, 0, 0)
INFO: discover_vertex!(LightGraphs.TarjanVisitor([1],[1],[1,0,0,0],Array{Int64,1}[]), 3)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1,3],[1,2],[1,0,2,0],Array{Int64,1}[]), 3)
INFO: examine_neighbor!(LightGraphs.TarjanVisitor([1],[1],[1,0,2,0],[[3]]), 1, 4, 0, 0)
INFO: discover_vertex!(LightGraphs.TarjanVisitor([1],[1],[1,0,2,0],[[3]]), 4)
INFO: examine_neighbor!(LightGraphs.TarjanVisitor([1,4],[1,2],[1,0,2,2],[[3]]), 4, 2, 0, 0)
INFO: discover_vertex!(LightGraphs.TarjanVisitor([1,4],[1,2],[1,0,2,2],[[3]]), 2)
INFO: examine_neighbor!(LightGraphs.TarjanVisitor([1,4,2],[1,2,3],[1,3,2,2],[[3]]), 2, 3, 2, 0)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1,4,2],[1,2],[1,3,2,2],[[3]]), 2)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1,4,2],[1,2],[1,3,2,2],[[3]]), 4)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1],[1],[1,3,2,2],[[3],[4,2]]), 1)
INFO: discover_vertex!(LightGraphs.TarjanVisitor(Int64[],Int64[],[0],Array{Int64,1}[]), 1)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1],[1],[1],Array{Int64,1}[]), 1)
INFO: discover_vertex!(LightGraphs.TarjanVisitor(Int64[],Int64[],[0,0],Array{Int64,1}[]), 1)
INFO: examine_neighbor!(LightGraphs.TarjanVisitor([1],[1],[1,0],Array{Int64,1}[]), 1, 2, 0, 0)
INFO: discover_vertex!(LightGraphs.TarjanVisitor([1],[1],[1,0],Array{Int64,1}[]), 2)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1,2],[1,2],[1,2],Array{Int64,1}[]), 2)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1],[1],[1,2],[[2]]), 1)
false

julia> scc_ok(g2)
INFO: discover_vertex!(LightGraphs.TarjanVisitor(Int64[],Int64[],[0,0,0,0],Array{Int64,1}[]), 1)
INFO: examine_neighbor!(LightGraphs.TarjanVisitor([1],[1],[1,0,0,0],Array{Int64,1}[]), 1, 2, 0, 0)
INFO: discover_vertex!(LightGraphs.TarjanVisitor([1],[1],[1,0,0,0],Array{Int64,1}[]), 2)
INFO: examine_neighbor!(LightGraphs.TarjanVisitor([1,2],[1,2],[1,2,0,0],Array{Int64,1}[]), 2, 4, 0, 0)
INFO: discover_vertex!(LightGraphs.TarjanVisitor([1,2],[1,2],[1,2,0,0],Array{Int64,1}[]), 4)
INFO: examine_neighbor!(LightGraphs.TarjanVisitor([1,2,4],[1,2,3],[1,2,0,3],Array{Int64,1}[]), 4, 3, 0, 0)
INFO: discover_vertex!(LightGraphs.TarjanVisitor([1,2,4],[1,2,3],[1,2,0,3],Array{Int64,1}[]), 3)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1,2,4,3],[1,2,3,4],[1,2,4,3],Array{Int64,1}[]), 3)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1,2,4],[1,2,3],[1,2,4,3],[[3]]), 4)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1,2],[1,2],[1,2,4,3],[[3],[4]]), 2)
INFO: examine_neighbor!(LightGraphs.TarjanVisitor([1],[1],[1,2,4,3],[[3],[4],[2]]), 1, 3, 4, 0)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1],[1],[1,2,4,3],[[3],[4],[2]]), 1)
INFO: discover_vertex!(LightGraphs.TarjanVisitor(Int64[],Int64[],[0],Array{Int64,1}[]), 1)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1],[1],[1],Array{Int64,1}[]), 1)
INFO: discover_vertex!(LightGraphs.TarjanVisitor(Int64[],Int64[],[0],Array{Int64,1}[]), 1)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1],[1],[1],Array{Int64,1}[]), 1)
INFO: discover_vertex!(LightGraphs.TarjanVisitor(Int64[],Int64[],[0],Array{Int64,1}[]), 1)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1],[1],[1],Array{Int64,1}[]), 1)
INFO: discover_vertex!(LightGraphs.TarjanVisitor(Int64[],Int64[],[0],Array{Int64,1}[]), 1)
INFO: close_vertex!(LightGraphs.TarjanVisitor([1],[1],[1],Array{Int64,1}[]), 1)
true

@arekfu
Copy link
Contributor Author

arekfu commented May 2, 2016

Pull request submitted.

@arekfu
Copy link
Contributor Author

arekfu commented May 2, 2016

@sbromberger concerning the general issue of testing on random inputs, I think the non-determinism is not really an issue if you can track which seeds caused the tests to fail. Also, in my domain (Monte-Carlo simulations), a pseudo-random number generator that yields different sequences for the same seed on different system would probably be considered to be buggy.

I have no idea how long your tests take right now, but you could amortize the time necessary to generate the test graphs by generating them once and reusing them for all your testable predicates.

Finally, I think this strategy could apply to non-boolean functions, too, provided that you can say something non-trivial about the probability distribution of your statistics. Take a centrality measure, for instance, and construct a confidence interval for the resulting value. Your test fails if the sample statistics falls outside the confidence interval. This requires you to set some significance level, i.e. the false positive rate that you are ready to tolerate.

@sbromberger
Copy link
Owner

@arekfu I like the idea of MC simulation. If you have some time, would you mind mocking up a few example tests?

@sbromberger
Copy link
Owner

Closed by #351

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug confirmed bug producing incorrect results
Projects
None yet
Development

No branches or pull requests

3 participants