Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for Broadcast #511

Open
bmyerz opened this issue Apr 29, 2016 · 6 comments
Open

Tests for Broadcast #511

bmyerz opened this issue Apr 29, 2016 · 6 comments
Assignees
Labels

Comments

@bmyerz
Copy link
Member

bmyerz commented Apr 29, 2016

Now that we are adding broadcast support in the catalog, I suspect that many tests would "break" if they were performed on broadcasted input relations.

The reason is that broadcast is intended to be a physical property, not a logical property. Raco doesn't yet reason in a robust way about physically (and not logically) broadcasted relations.

@bmyerz bmyerz added the test label Apr 29, 2016
@bmyerz bmyerz self-assigned this Apr 29, 2016
@senderista
Copy link
Contributor

Something @mbalazin suggested this morning: we should test a single select on a broadcast relation to verify it is only run on one worker (presumably chosen at random).

@bmyerz
Copy link
Member Author

bmyerz commented May 2, 2016

@mbalazin @senderista I think we can implement that as
Select(predicate())[Select(WORKER_ID()==0)[Broadcast[...

For logical operators that turn into communicating physical operators (groupby, join, ...), we can choose to either compute locally and retain the broadcasted property on the output or precede the operator with a Select(WORKER_ID==0).

f the final Store does not indicate broadcasted then we have a Select(WORKER_ID==0) at the top that may be pushed down if desired.

@bmyerz
Copy link
Member Author

bmyerz commented May 2, 2016

It does occur to me that we have no tool in Raco for evaluating queries with respect to physical properties, only logical. (our FakeDB has a lot of no-ops because it is logical:

def myriabroadcastproducer(self, op):
). For testing Shuffle what I did was inspect whether the plan has features we expect. I think Broadcast actually has more correctness pitfalls, so I'm less satisfied with this level of testing. It is conceivable to extend our FakeDB evaluator to emulate a parallel query engine for testing purposes....

@bmyerz
Copy link
Member Author

bmyerz commented May 2, 2016

@senderista
Copy link
Contributor

Can Raco treat WORKER_ID() specially and only send a LocalFragment containing an equality predicate on that attribute with a constant(s) to workers with those ID(s)? That would optimize queries over broadcast relations where we don't want duplicate results (automatically insert WORKER_ID() == 0 predicate and push it all the way down to workers).

@bmyerz
Copy link
Member Author

bmyerz commented May 2, 2016

  • Optimization: if you push Select(WORKERID) all the way to Scan we can produce a json plan for Myria that only names a single worker from the catalog
  • list suggested tests to be added to MyriaX for broadcast

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants