You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by jian yi <ey...@gmail.com> on 2010/02/07 03:16:33 UTC

Will Pig support SQL?

Hi,

SQL is very helpful to develop data warehouse, but Hive don't support
procedure. if Pig support SQL, it will be more powerful.

Re: Will Pig support SQL?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Also, I looked at the idea you posted and it seems to me that your
balance step is in effect the sample step Pig's skewed data solution
implements. Except your balance step needs 100% of the data.

Consider how your balancing works when there's 1000 map tasks, each of
which produces outputs that will be fed into a total of 200 reducers.

-D

On Mon, Feb 8, 2010 at 7:16 AM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> Jian,
> If what you are looking for is something that will let you deal with
> skewed data and forget about how the underlying distributed system
> works, both Pig and Hive will help you do that to some extent. If you
> are looking for something that will let you exercise fine-grained
> control over individual scheduling of tasks, which is what this sounds
> like, neither project is for you -- in fact, this is more or less the
> opposite of what they are trying to do, which is to take away the
> complexities of partitioning large data sets, scheduling tasks, and
> orchestrating data flows.
>
> If you are looking to tweak the hadoop internals to schedule things
> differently, you may find that the pluggable scheduler interface is
> useful. If you manage to achieve your goals by constructing a new
> scheduler, Pig and Hive will both continue working as higher-level
> abstractions, as long as you adhere to the provided interface for task
> scheduling.
>
>
> On Mon, Feb 8, 2010 at 2:05 AM, jian yi <ey...@gmail.com> wrote:
>> We can regards a task as a sleep call, the parameter of sleep is the time
>> long.
>> sleep(N) - For hive ,the N is not certain
>> sleep(M) - For MBR, the M is certain
>>
>> 2010/2/8 jian yi <ey...@gmail.com>
>>
>>> Hi Jeff,
>>>
>>> Thank you Jeff.
>>> I known Hive has handling skewed join, but I think it is not enough:
>>> 1.Need cost sample
>>> 2.Can't control the size of a task
>>> 3.Not exact
>>> 4.Must use Hive or Pig
>>>
>>> I think this is a fundamental solution for skew problem by adding balacne
>>> between map and reduce. Maybe I need express it more detailed.
>>>
>>> Regards
>>> Jian YI
>>>
>>> 2010/2/8 Jeff Hammerbacher <ha...@cloudera.com>
>>>
>>> Hey Jian,
>>>>
>>>> Hive supports arbitrary procedural languages through Hadoop Streaming; see
>>>> http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform for more.
>>>>
>>>> Also, both Hive and Pig have support for handling skewed joins if you use
>>>> their higher-level interface. See
>>>> https://issues.apache.org/jira/browse/HIVE-562 and
>>>> http://wiki.apache.org/pig/PigSkewedJoinSpec.
>>>>
>>>> Thanks,
>>>> Jeff
>>>>
>>>> On Sun, Feb 7, 2010 at 4:13 AM, jian yi <ey...@gmail.com> wrote:
>>>>
>>>> > Hey Jeff,
>>>> >
>>>> > Thank you, Jeff.
>>>> > The procedure means procedure language, like Oracle PL/SQL, which is
>>>> very
>>>> > helpful to migrate old services. We want to build a data warehouse based
>>>> on
>>>> > MapReduce engine. I plan to optimize MapReduce to solve the skew problem
>>>> by
>>>> > adding a balance between map and reduce. Please refer to
>>>> > http://bbs.hadoopor.com/thread-521-1-1.html
>>>> >
>>>> > <http://bbs.hadoopor.com/thread-521-1-1.html>Regards,
>>>> > Jian
>>>> >
>>>> > 2010/2/7 Jeff Hammerbacher <ha...@cloudera.com>
>>>> >
>>>> > > Hey Jian,
>>>> > >
>>>> > > I'm not sure what you mean by "Hive don't support procedure", but in
>>>> any
>>>> > > case, the Pig team has stated that they will support SQL over the Pig
>>>> > > execution engine. See https://issues.apache.org/jira/browse/PIG-824.
>>>> > >
>>>> > > Regards,
>>>> > > Jeff
>>>> > >
>>>> > > On Sat, Feb 6, 2010 at 6:16 PM, jian yi <ey...@gmail.com> wrote:
>>>> > >
>>>> > > > Hi,
>>>> > > >
>>>> > > > SQL is very helpful to develop data warehouse, but Hive don't
>>>> support
>>>> > > > procedure. if Pig support SQL, it will be more powerful.
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Will Pig support SQL?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Jian,
If what you are looking for is something that will let you deal with
skewed data and forget about how the underlying distributed system
works, both Pig and Hive will help you do that to some extent. If you
are looking for something that will let you exercise fine-grained
control over individual scheduling of tasks, which is what this sounds
like, neither project is for you -- in fact, this is more or less the
opposite of what they are trying to do, which is to take away the
complexities of partitioning large data sets, scheduling tasks, and
orchestrating data flows.

If you are looking to tweak the hadoop internals to schedule things
differently, you may find that the pluggable scheduler interface is
useful. If you manage to achieve your goals by constructing a new
scheduler, Pig and Hive will both continue working as higher-level
abstractions, as long as you adhere to the provided interface for task
scheduling.


On Mon, Feb 8, 2010 at 2:05 AM, jian yi <ey...@gmail.com> wrote:
> We can regards a task as a sleep call, the parameter of sleep is the time
> long.
> sleep(N) - For hive ,the N is not certain
> sleep(M) - For MBR, the M is certain
>
> 2010/2/8 jian yi <ey...@gmail.com>
>
>> Hi Jeff,
>>
>> Thank you Jeff.
>> I known Hive has handling skewed join, but I think it is not enough:
>> 1.Need cost sample
>> 2.Can't control the size of a task
>> 3.Not exact
>> 4.Must use Hive or Pig
>>
>> I think this is a fundamental solution for skew problem by adding balacne
>> between map and reduce. Maybe I need express it more detailed.
>>
>> Regards
>> Jian YI
>>
>> 2010/2/8 Jeff Hammerbacher <ha...@cloudera.com>
>>
>> Hey Jian,
>>>
>>> Hive supports arbitrary procedural languages through Hadoop Streaming; see
>>> http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform for more.
>>>
>>> Also, both Hive and Pig have support for handling skewed joins if you use
>>> their higher-level interface. See
>>> https://issues.apache.org/jira/browse/HIVE-562 and
>>> http://wiki.apache.org/pig/PigSkewedJoinSpec.
>>>
>>> Thanks,
>>> Jeff
>>>
>>> On Sun, Feb 7, 2010 at 4:13 AM, jian yi <ey...@gmail.com> wrote:
>>>
>>> > Hey Jeff,
>>> >
>>> > Thank you, Jeff.
>>> > The procedure means procedure language, like Oracle PL/SQL, which is
>>> very
>>> > helpful to migrate old services. We want to build a data warehouse based
>>> on
>>> > MapReduce engine. I plan to optimize MapReduce to solve the skew problem
>>> by
>>> > adding a balance between map and reduce. Please refer to
>>> > http://bbs.hadoopor.com/thread-521-1-1.html
>>> >
>>> > <http://bbs.hadoopor.com/thread-521-1-1.html>Regards,
>>> > Jian
>>> >
>>> > 2010/2/7 Jeff Hammerbacher <ha...@cloudera.com>
>>> >
>>> > > Hey Jian,
>>> > >
>>> > > I'm not sure what you mean by "Hive don't support procedure", but in
>>> any
>>> > > case, the Pig team has stated that they will support SQL over the Pig
>>> > > execution engine. See https://issues.apache.org/jira/browse/PIG-824.
>>> > >
>>> > > Regards,
>>> > > Jeff
>>> > >
>>> > > On Sat, Feb 6, 2010 at 6:16 PM, jian yi <ey...@gmail.com> wrote:
>>> > >
>>> > > > Hi,
>>> > > >
>>> > > > SQL is very helpful to develop data warehouse, but Hive don't
>>> support
>>> > > > procedure. if Pig support SQL, it will be more powerful.
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Will Pig support SQL?

Posted by jian yi <ey...@gmail.com>.

We can regards a task as a sleep call, the parameter of sleep is the time
long.
sleep(N) - For hive ,the N is not certain
sleep(M) - For MBR, the M is certain

2010/2/8 jian yi <ey...@gmail.com>

> Hi Jeff,
>
> Thank you Jeff.
> I known Hive has handling skewed join, but I think it is not enough:
> 1.Need cost sample
> 2.Can't control the size of a task
> 3.Not exact
> 4.Must use Hive or Pig
>
> I think this is a fundamental solution for skew problem by adding balacne
> between map and reduce. Maybe I need express it more detailed.
>
> Regards
> Jian YI
>
> 2010/2/8 Jeff Hammerbacher <ha...@cloudera.com>
>
> Hey Jian,
>>
>> Hive supports arbitrary procedural languages through Hadoop Streaming; see
>> http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform for more.
>>
>> Also, both Hive and Pig have support for handling skewed joins if you use
>> their higher-level interface. See
>> https://issues.apache.org/jira/browse/HIVE-562 and
>> http://wiki.apache.org/pig/PigSkewedJoinSpec.
>>
>> Thanks,
>> Jeff
>>
>> On Sun, Feb 7, 2010 at 4:13 AM, jian yi <ey...@gmail.com> wrote:
>>
>> > Hey Jeff,
>> >
>> > Thank you, Jeff.
>> > The procedure means procedure language, like Oracle PL/SQL, which is
>> very
>> > helpful to migrate old services. We want to build a data warehouse based
>> on
>> > MapReduce engine. I plan to optimize MapReduce to solve the skew problem
>> by
>> > adding a balance between map and reduce. Please refer to
>> > http://bbs.hadoopor.com/thread-521-1-1.html
>> >
>> > <http://bbs.hadoopor.com/thread-521-1-1.html>Regards,
>> > Jian
>> >
>> > 2010/2/7 Jeff Hammerbacher <ha...@cloudera.com>
>> >
>> > > Hey Jian,
>> > >
>> > > I'm not sure what you mean by "Hive don't support procedure", but in
>> any
>> > > case, the Pig team has stated that they will support SQL over the Pig
>> > > execution engine. See https://issues.apache.org/jira/browse/PIG-824.
>> > >
>> > > Regards,
>> > > Jeff
>> > >
>> > > On Sat, Feb 6, 2010 at 6:16 PM, jian yi <ey...@gmail.com> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > SQL is very helpful to develop data warehouse, but Hive don't
>> support
>> > > > procedure. if Pig support SQL, it will be more powerful.
>> > > >
>> > >
>> >
>>
>
>

Re: Will Pig support SQL?

Posted by jian yi <ey...@gmail.com>.

Hi Jeff,

Thank you Jeff.
I known Hive has handling skewed join, but I think it is not enough:
1.Need cost sample
2.Can't control the size of a task
3.Not exact
4.Must use Hive or Pig

I think this is a fundamental solution for skew problem by adding balacne
between map and reduce. Maybe I need express it more detailed.

Regards
Jian YI

2010/2/8 Jeff Hammerbacher <ha...@cloudera.com>

> Hey Jian,
>
> Hive supports arbitrary procedural languages through Hadoop Streaming; see
> http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform for more.
>
> Also, both Hive and Pig have support for handling skewed joins if you use
> their higher-level interface. See
> https://issues.apache.org/jira/browse/HIVE-562 and
> http://wiki.apache.org/pig/PigSkewedJoinSpec.
>
> Thanks,
> Jeff
>
> On Sun, Feb 7, 2010 at 4:13 AM, jian yi <ey...@gmail.com> wrote:
>
> > Hey Jeff,
> >
> > Thank you, Jeff.
> > The procedure means procedure language, like Oracle PL/SQL, which is very
> > helpful to migrate old services. We want to build a data warehouse based
> on
> > MapReduce engine. I plan to optimize MapReduce to solve the skew problem
> by
> > adding a balance between map and reduce. Please refer to
> > http://bbs.hadoopor.com/thread-521-1-1.html
> >
> > <http://bbs.hadoopor.com/thread-521-1-1.html>Regards,
> > Jian
> >
> > 2010/2/7 Jeff Hammerbacher <ha...@cloudera.com>
> >
> > > Hey Jian,
> > >
> > > I'm not sure what you mean by "Hive don't support procedure", but in
> any
> > > case, the Pig team has stated that they will support SQL over the Pig
> > > execution engine. See https://issues.apache.org/jira/browse/PIG-824.
> > >
> > > Regards,
> > > Jeff
> > >
> > > On Sat, Feb 6, 2010 at 6:16 PM, jian yi <ey...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > SQL is very helpful to develop data warehouse, but Hive don't support
> > > > procedure. if Pig support SQL, it will be more powerful.
> > > >
> > >
> >
>

Re: Will Pig support SQL?

Posted by Jeff Hammerbacher <ha...@cloudera.com>.

Hey Jian,

Hive supports arbitrary procedural languages through Hadoop Streaming; see
http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform for more.

Also, both Hive and Pig have support for handling skewed joins if you use
their higher-level interface. See
https://issues.apache.org/jira/browse/HIVE-562 and
http://wiki.apache.org/pig/PigSkewedJoinSpec.

Thanks,
Jeff

On Sun, Feb 7, 2010 at 4:13 AM, jian yi <ey...@gmail.com> wrote:

> Hey Jeff,
>
> Thank you, Jeff.
> The procedure means procedure language, like Oracle PL/SQL, which is very
> helpful to migrate old services. We want to build a data warehouse based on
> MapReduce engine. I plan to optimize MapReduce to solve the skew problem by
> adding a balance between map and reduce. Please refer to
> http://bbs.hadoopor.com/thread-521-1-1.html
>
> <http://bbs.hadoopor.com/thread-521-1-1.html>Regards,
> Jian
>
> 2010/2/7 Jeff Hammerbacher <ha...@cloudera.com>
>
> > Hey Jian,
> >
> > I'm not sure what you mean by "Hive don't support procedure", but in any
> > case, the Pig team has stated that they will support SQL over the Pig
> > execution engine. See https://issues.apache.org/jira/browse/PIG-824.
> >
> > Regards,
> > Jeff
> >
> > On Sat, Feb 6, 2010 at 6:16 PM, jian yi <ey...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > SQL is very helpful to develop data warehouse, but Hive don't support
> > > procedure. if Pig support SQL, it will be more powerful.
> > >
> >
>

Re: Will Pig support SQL?

Posted by jian yi <ey...@gmail.com>.

Hey Jeff,

Thank you, Jeff.
The procedure means procedure language, like Oracle PL/SQL, which is very
helpful to migrate old services. We want to build a data warehouse based on
MapReduce engine. I plan to optimize MapReduce to solve the skew problem by
adding a balance between map and reduce. Please refer to
http://bbs.hadoopor.com/thread-521-1-1.html

<http://bbs.hadoopor.com/thread-521-1-1.html>Regards,
Jian

2010/2/7 Jeff Hammerbacher <ha...@cloudera.com>

> Hey Jian,
>
> I'm not sure what you mean by "Hive don't support procedure", but in any
> case, the Pig team has stated that they will support SQL over the Pig
> execution engine. See https://issues.apache.org/jira/browse/PIG-824.
>
> Regards,
> Jeff
>
> On Sat, Feb 6, 2010 at 6:16 PM, jian yi <ey...@gmail.com> wrote:
>
> > Hi,
> >
> > SQL is very helpful to develop data warehouse, but Hive don't support
> > procedure. if Pig support SQL, it will be more powerful.
> >
>

Re: Will Pig support SQL?

Posted by Jeff Hammerbacher <ha...@cloudera.com>.

Hey Jian,

I'm not sure what you mean by "Hive don't support procedure", but in any
case, the Pig team has stated that they will support SQL over the Pig
execution engine. See https://issues.apache.org/jira/browse/PIG-824.

Regards,
Jeff

On Sat, Feb 6, 2010 at 6:16 PM, jian yi <ey...@gmail.com> wrote:

> Hi,
>
> SQL is very helpful to develop data warehouse, but Hive don't support
> procedure. if Pig support SQL, it will be more powerful.
>