You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by 刘祥龙 <sa...@hotmail.com> on 2009/09/08 16:22:25 UTC

new comer questions about difference between pig and hive and SQL

Hi, everyone. I am trying to use pig, but now I am confused about what the difference is between Pig and Hive, which is more powerful or more convenient for users. In addition,  what is their difference from SQL.  Could someone help me about these questions or give me some advices? 

Thanks, everyone!


Best Wishes!
_____________________________________________________________
 
刘祥龙  Liu Xianglong

Re: new comer questions about difference between pig and hive and SQL

Posted by Jeff Hammerbacher <ha...@cloudera.com>.

Hey Alan,

3) Both Pig and Hive support calling external binaries.  Both support user
> defined functions to read and write data in HDFS in the case where your data
> is not a CSV or other simple text format.  Pig supports user defined
> functions on both simple and aggregated values.  Last I checked (9 months
> ago) Hive supported UDFs on simple values but not on aggregates.  They may
> have added user defined aggregates by now.
>

Was doing some digging into Hive source today, and I think they may have
support for user-defined aggregates in the 0.4.0 release: see GenericUDAF*
in
http://svn.apache.org/viewvc/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/.
Then again, it's been some time since I followed the codebase closely, so I
may be misreading!

Regards,
Jeff

Re: new comer questions about difference between pig and hive and SQL

Posted by Alan Gates <ga...@yahoo-inc.com>.

There are a few differences:

1) Pig Latin is imperative, meaning the programmer controls the  
program flow directly.  In SQL (and thus Hive), the programmer  
declares what they want, but not how to do it.  Pig Latin allows the  
user to control how it is done as well.

2) Pig Latin does not impose a strong type system.  So if your data is  
unstructured or nested, that will fit very naturally in Pig Latin.   
SQL requires that the user first define the data type and assure that  
the data loaded into the system matches with the defined structure.   
If your data is structured, Pig Latin allows you to declare that at  
run time or encode it in the data itself (using formats like JSON).

3) Both Pig and Hive support calling external binaries.  Both support  
user defined functions to read and write data in HDFS in the case  
where your data is not a CSV or other simple text format.  Pig  
supports user defined functions on both simple and aggregated values.   
Last I checked (9 months ago) Hive supported UDFs on simple values but  
not on aggregates.  They may have added user defined aggregates by now.

4) The freedom of control in Pig Latin allows some operations in Pig  
Latin that are difficult in SQL (such as data pipelines that have a  
large number of steps) or that are not yet implemented in Hive (such  
as anti-join).

5) Performance wise its a mixed bag, with Hive performing better on  
some queries and Pig better on others.

Alan.

On Sep 8, 2009, at 7:22 AM, 刘祥龙 wrote:

> Hi, everyone. I am trying to use pig, but now I am confused about  
> what the difference is between Pig and Hive, which is more powerful  
> or more convenient for users. In addition,  what is their difference  
> from SQL.  Could someone help me about these questions or give me  
> some advices?
>
> Thanks, everyone!
>
>
> Best Wishes!
> _____________________________________________________________
>
> 刘祥龙  Liu Xianglong