You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Timothy Chen <tn...@gmail.com> on 2013/11/15 19:04:40 UTC
Broadcast exchange
Sent from my iPhone
Begin forwarded message:
> From: Timothy Chen <tn...@gmail.com>
> Date: November 15, 2013 at 1:32:26 AM HST
> To: Jacques Nadeau <jn...@maprtech.com>
> Subject: Broadcast exchange
>
> Hey Jacques,
>
> I'm working on the broadcast sender, and I remember we had some chat about the scenario and you mentioned buffering.
>
> Right now I have the most simplest version which I have single broadcast exchange which a single broadcast sender sends every incoming batch to all the random receivers.
>
> Couple questions I have now:
>
> 1) the scenario where I thought broadcast is very useful is when we join two data sources and one of them is a small fact table, so we copy that table to all join nodes for fast joining. However, I don't remember seeing any test if we have multiple scans joining?
>
> The physical plan for that will be two scan nodes, one uses broadcast exchange another perhaps hash to random exchange, and join from the two exchanges that sends to screen.
>
> Does that plan work? I feel like I might be missing something.
>
> 2) my impl doesn't do any buffering and just creates a fragment writable batch for each destination from the incoming batch buffers and sends it in a for loop. I was wondering what is the buffering you thought necessary when you mentioned it earlier?
>
> Btw I'll be soon experience the hangout time pain as the china folks as I am going to be in Taiwan for 2 weeks from this Sunday :)
>
> Tim
>
> Sent from my iPhone
Re: Broadcast exchange
Posted by Jacques Nadeau <ja...@apache.org>.
> > 1) the scenario where I thought broadcast is very useful is when we join
> two data sources and one of them is a small fact table, so we copy that
> table to all join nodes for fast joining. However, I don't remember seeing
> any test if we have multiple scans joining?
>
I believe the join test tests this. Your understanding is correct.
> >
> > The physical plan for that will be two scan nodes, one uses broadcast
> exchange another perhaps hash to random exchange, and join from the two
> exchanges that sends to screen.
> >
> > Does that plan work? I feel like I might be missing something.
>
Yes, that plan is possible. However, a simpler plan would be:
store
|
join
/ \
| scan
|
random-receiver
|
broadcast-sender
|
scan
This is where we don't transmit the larger table across the wire
unnecessarily. (In this example, a more likely but more complex plan might
have an aggregation after the join.
> 2) my impl doesn't do any buffering and just creates a fragment writable
> batch for each destination from the incoming batch buffers and sends it in
> a for loop. I was wondering what is the buffering you thought necessary
> when you mentioned it earlier?
>
You don't need to do buffering. That is handled by the rest of the
execution engine and Steven's incoming buffer spooling feature.
> >
> > Btw I'll be soon experience the hangout time pain as the china folks as
> I am going to be in Taiwan for 2 weeks from this Sunday :)
>
Good times!