You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Jeff Hammerbacher <je...@gmail.com> on 2008/10/27 21:16:41 UTC

Compile Pig and Hive queries to LINQ expression trees?

Hey,

There's been some discussion for a while about having a common logical
plan format for Pig and Hive to pass to their physical plan
generators. Erik Meijer gave a talk recently on LINQ Expression Trees
that made me think they would serve as an excellent intermediate data
structure. You can read more about them here:
http://msdn.microsoft.com/en-us/library/bb882636.aspx, and you can
read the proposal for Erik's presentation here [PDF]:
http://research.microsoft.com/~emeijer/Papers/Cloud%20computing%20workshop%20proposal%20Draft.pdf.

I was planning to save this discussion for a time when I understood
the plan structures in both Pig and Hive, but given the discussion
around Hive future plans going on right now, I figured now's as good a
time as any to get it started.

Later,
Jeff

Re: Compile Pig and Hive queries to LINQ expression trees?

Posted by pi song <pi...@gmail.com>.
LINQ is primarily based on classic relational model whereas Pig goes beyond
that. So, it would be possible to convert LINQ Query to Pig logical plans
but not the other way round.

Another way to go is to look at XLINQ which seems to have borrowed concepts
from graph queries. This may better represent Pig model in LINQ world but
can be overly-complex as well.

Pi
On Fri, Oct 31, 2008 at 3:14 PM, Jeff Hammerbacher <
jeff.hammerbacher@gmail.com> wrote:

> Hey Alan,
>
> The sharing of logical plans seemed like the first place to start on
> the way to a shared execution environment. By sharing an API, the
> execution environments could be altered under the covers until
> matching. The LINQ data model works for Hive, from what I can tell.
> It's not clear to me that the Pig data model is not also handled by
> LINQ's expressions.
>
> In general, merging execution environments seems fairly tedious, but
> sharing the logical plan seems much less difficult. Just wanted to
> hear the opinions of others on the topic and hear thoughts on
> implementation.
>
> Later,
> Jeff
>
> On Thu, Oct 30, 2008 at 10:05 AM, Alan Gates <ga...@yahoo-inc.com> wrote:
> > Jeff,
> >
> > If I understand your proposal it is that Hive SQL and Pig Latin would
> both
> > compile into LINQ Expression Trees as their logical plans, but continue
> to
> > have separate backends for executing the queries.  Is that correct?
> >
> > I'm not seeing the benefit there.  I see the benefit of sharing logical
> > plans and a merged backend that can execute both Pig Latin and Hive SQL.
> >  These benefits would include focusing more developers on what are
> probably
> > very similar issues that we need to address, plus allowing both our use
> > communities to choose which language to express their programs in without
> > needing to maintain both systems.  I also see all of the challenges of
> > merging two projects, the fact that we have differing data models, etc.
> >
> > What do you see as the benefits of sharing just the logical plans?
> >
> > Alan.
> >
> > On Oct 27, 2008, at 1:16 PM, Jeff Hammerbacher wrote:
> >
> >> Hey,
> >>
> >> There's been some discussion for a while about having a common logical
> >> plan format for Pig and Hive to pass to their physical plan
> >> generators. Erik Meijer gave a talk recently on LINQ Expression Trees
> >> that made me think they would serve as an excellent intermediate data
> >> structure. You can read more about them here:
> >> http://msdn.microsoft.com/en-us/library/bb882636.aspx, and you can
> >> read the proposal for Erik's presentation here [PDF]:
> >>
> >>
> http://research.microsoft.com/~emeijer/Papers/Cloud%20computing%20workshop%20proposal%20Draft.pdf
> .
> >>
> >> I was planning to save this discussion for a time when I understood
> >> the plan structures in both Pig and Hive, but given the discussion
> >> around Hive future plans going on right now, I figured now's as good a
> >> time as any to get it started.
> >>
> >> Later,
> >> Jeff
> >
> >
>

Re: Compile Pig and Hive queries to LINQ expression trees?

Posted by Jeff Hammerbacher <je...@gmail.com>.
Hey Alan,

The sharing of logical plans seemed like the first place to start on
the way to a shared execution environment. By sharing an API, the
execution environments could be altered under the covers until
matching. The LINQ data model works for Hive, from what I can tell.
It's not clear to me that the Pig data model is not also handled by
LINQ's expressions.

In general, merging execution environments seems fairly tedious, but
sharing the logical plan seems much less difficult. Just wanted to
hear the opinions of others on the topic and hear thoughts on
implementation.

Later,
Jeff

On Thu, Oct 30, 2008 at 10:05 AM, Alan Gates <ga...@yahoo-inc.com> wrote:
> Jeff,
>
> If I understand your proposal it is that Hive SQL and Pig Latin would both
> compile into LINQ Expression Trees as their logical plans, but continue to
> have separate backends for executing the queries.  Is that correct?
>
> I'm not seeing the benefit there.  I see the benefit of sharing logical
> plans and a merged backend that can execute both Pig Latin and Hive SQL.
>  These benefits would include focusing more developers on what are probably
> very similar issues that we need to address, plus allowing both our use
> communities to choose which language to express their programs in without
> needing to maintain both systems.  I also see all of the challenges of
> merging two projects, the fact that we have differing data models, etc.
>
> What do you see as the benefits of sharing just the logical plans?
>
> Alan.
>
> On Oct 27, 2008, at 1:16 PM, Jeff Hammerbacher wrote:
>
>> Hey,
>>
>> There's been some discussion for a while about having a common logical
>> plan format for Pig and Hive to pass to their physical plan
>> generators. Erik Meijer gave a talk recently on LINQ Expression Trees
>> that made me think they would serve as an excellent intermediate data
>> structure. You can read more about them here:
>> http://msdn.microsoft.com/en-us/library/bb882636.aspx, and you can
>> read the proposal for Erik's presentation here [PDF]:
>>
>> http://research.microsoft.com/~emeijer/Papers/Cloud%20computing%20workshop%20proposal%20Draft.pdf.
>>
>> I was planning to save this discussion for a time when I understood
>> the plan structures in both Pig and Hive, but given the discussion
>> around Hive future plans going on right now, I figured now's as good a
>> time as any to get it started.
>>
>> Later,
>> Jeff
>
>

Re: Compile Pig and Hive queries to LINQ expression trees?

Posted by Alan Gates <ga...@yahoo-inc.com>.
Jeff,

If I understand your proposal it is that Hive SQL and Pig Latin would  
both compile into LINQ Expression Trees as their logical plans, but  
continue to have separate backends for executing the queries.  Is  
that correct?

I'm not seeing the benefit there.  I see the benefit of sharing  
logical plans and a merged backend that can execute both Pig Latin  
and Hive SQL.  These benefits would include focusing more developers  
on what are probably very similar issues that we need to address,  
plus allowing both our use communities to choose which language to  
express their programs in without needing to maintain both systems.   
I also see all of the challenges of merging two projects, the fact  
that we have differing data models, etc.

What do you see as the benefits of sharing just the logical plans?

Alan.

On Oct 27, 2008, at 1:16 PM, Jeff Hammerbacher wrote:

> Hey,
>
> There's been some discussion for a while about having a common logical
> plan format for Pig and Hive to pass to their physical plan
> generators. Erik Meijer gave a talk recently on LINQ Expression Trees
> that made me think they would serve as an excellent intermediate data
> structure. You can read more about them here:
> http://msdn.microsoft.com/en-us/library/bb882636.aspx, and you can
> read the proposal for Erik's presentation here [PDF]:
> http://research.microsoft.com/~emeijer/Papers/Cloud%20computing% 
> 20workshop%20proposal%20Draft.pdf.
>
> I was planning to save this discussion for a time when I understood
> the plan structures in both Pig and Hive, but given the discussion
> around Hive future plans going on right now, I figured now's as good a
> time as any to get it started.
>
> Later,
> Jeff