You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Earl Cahill <ca...@yahoo.com> on 2009/04/01 04:49:03 UTC

Re: How are you using Pig?

I am using pig to parse apache logs and generate aggregation files, which data I put into mysql.  From mysql I generate stats for the sites on my free hosting site.  I used to work for a web hosting company and we spent quite a bit of effort to get to my current level.  My pig script is fewer than I think 50 lines and it does the work of a whole processing / parsing framework we built.

Earl

 http://blog.spack.net
http://holaservers.com




________________________________
From: Olga Natkovich <ol...@yahoo-inc.com>
To: pig-user@hadoop.apache.org
Sent: Wednesday, March 11, 2009 7:01:05 PM
Subject: How are you using Pig?

Hi,

If you have done some interesting work using Pig, we would like to know!
Could you please, send a brief description of the kind of work you are
doing with Pig and what have you learned from working woth Pig: things
that worked well, things that you would like to be improved.

Thanks,

Olga



      

RE: How are you using Pig?

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
Have you tried the latest code from svn? There have been some fixes for
union related issues in the recent past. If you still see issues, you
should file a jira with details.

-Pradeep

-----Original Message-----
From: Dhruv Matani [mailto:dhruv.m@directi.com] 
Sent: Tuesday, March 31, 2009 11:45 PM
To: pig-user@hadoop.apache.org
Subject: Re: How are you using Pig?

Hello all,
  We are using pig to perform transformation on a lot of data. This
includes doing join on some relations. However, we have been facing a
lot of problems with using UNION, and it is causing a lot of data loss,
because of which we have had to do certain steps manually. However, all
in all, pig has considerably reduced the amount of work we would have
had to put in to perform ad-hoc queries and transformations on data.

Regards,
-Dhruv.


zhang jianfeng wrote:
> I am using pig to parse site logs, generate some stat like unique
users,
> download count and so on.  Currently we are in the initial stage, so
all the
> work is done in local. Later we will move the pig to EC2, and all the
data
> will be put on S3.
>
> My suggestion: sometimes the exception message is not so informative.
>
>
>
> On Wed, Apr 1, 2009 at 10:49 AM, Earl Cahill <ca...@yahoo.com>
wrote:
>
>   
>> I am using pig to parse apache logs and generate aggregation files,
which
>> data I put into mysql.  From mysql I generate stats for the sites on
my free
>> hosting site.  I used to work for a web hosting company and we spent
quite a
>> bit of effort to get to my current level.  My pig script is fewer
than I
>> think 50 lines and it does the work of a whole processing / parsing
>> framework we built.
>>
>> Earl
>>
>>  http://blog.spack.net
>> http://holaservers.com
>>
>>
>>
>>
>> ________________________________
>> From: Olga Natkovich <ol...@yahoo-inc.com>
>> To: pig-user@hadoop.apache.org
>> Sent: Wednesday, March 11, 2009 7:01:05 PM
>> Subject: How are you using Pig?
>>
>> Hi,
>>
>> If you have done some interesting work using Pig, we would like to
know!
>> Could you please, send a brief description of the kind of work you
are
>> doing with Pig and what have you learned from working woth Pig:
things
>> that worked well, things that you would like to be improved.
>>
>> Thanks,
>>
>> Olga
>>
>>
>>
>>
>>     
>
>   


Re: How are you using Pig?

Posted by Chris Olston <ol...@yahoo-inc.com>.
> we have been facing a
> lot of problems with using UNION, and it is causing a lot of data loss,
> because of which we have had to do certain steps manually.

Sorry to hear that. Did you file a bug report? If not it's unlikely to get
fixed :)

http://issues.apache.org/jira/browse/PIG

Cheers,

Chris

 
> zhang jianfeng wrote:
>> I am using pig to parse site logs, generate some stat like unique users,
>> download count and so on.  Currently we are in the initial stage, so all the
>> work is done in local. Later we will move the pig to EC2, and all the data
>> will be put on S3.
>> 
>> My suggestion: sometimes the exception message is not so informative.
>> 
>> 
>> 
>> On Wed, Apr 1, 2009 at 10:49 AM, Earl Cahill <ca...@yahoo.com> wrote:
>> 
>>   
>>> I am using pig to parse apache logs and generate aggregation files, which
>>> data I put into mysql.  From mysql I generate stats for the sites on my free
>>> hosting site.  I used to work for a web hosting company and we spent quite a
>>> bit of effort to get to my current level.  My pig script is fewer than I
>>> think 50 lines and it does the work of a whole processing / parsing
>>> framework we built.
>>> 
>>> Earl
>>> 
>>>  http://blog.spack.net
>>> http://holaservers.com
>>> 
>>> 
>>> 
>>> 
>>> ________________________________
>>> From: Olga Natkovich <ol...@yahoo-inc.com>
>>> To: pig-user@hadoop.apache.org
>>> Sent: Wednesday, March 11, 2009 7:01:05 PM
>>> Subject: How are you using Pig?
>>> 
>>> Hi,
>>> 
>>> If you have done some interesting work using Pig, we would like to know!
>>> Could you please, send a brief description of the kind of work you are
>>> doing with Pig and what have you learned from working woth Pig: things
>>> that worked well, things that you would like to be improved.
>>> 
>>> Thanks,
>>> 
>>> Olga
>>> 
>>> 
>>> 
>>> 
>>>     
>> 
>>   
> 

--
Christopher Olston, Ph.D.
Sr. Research Scientist
Yahoo! Research





Re: How are you using Pig?

Posted by Dhruv Matani <dh...@directi.com>.
Hello all,
  We are using pig to perform transformation on a lot of data. This
includes doing join on some relations. However, we have been facing a
lot of problems with using UNION, and it is causing a lot of data loss,
because of which we have had to do certain steps manually. However, all
in all, pig has considerably reduced the amount of work we would have
had to put in to perform ad-hoc queries and transformations on data.

Regards,
-Dhruv.


zhang jianfeng wrote:
> I am using pig to parse site logs, generate some stat like unique users,
> download count and so on.  Currently we are in the initial stage, so all the
> work is done in local. Later we will move the pig to EC2, and all the data
> will be put on S3.
>
> My suggestion: sometimes the exception message is not so informative.
>
>
>
> On Wed, Apr 1, 2009 at 10:49 AM, Earl Cahill <ca...@yahoo.com> wrote:
>
>   
>> I am using pig to parse apache logs and generate aggregation files, which
>> data I put into mysql.  From mysql I generate stats for the sites on my free
>> hosting site.  I used to work for a web hosting company and we spent quite a
>> bit of effort to get to my current level.  My pig script is fewer than I
>> think 50 lines and it does the work of a whole processing / parsing
>> framework we built.
>>
>> Earl
>>
>>  http://blog.spack.net
>> http://holaservers.com
>>
>>
>>
>>
>> ________________________________
>> From: Olga Natkovich <ol...@yahoo-inc.com>
>> To: pig-user@hadoop.apache.org
>> Sent: Wednesday, March 11, 2009 7:01:05 PM
>> Subject: How are you using Pig?
>>
>> Hi,
>>
>> If you have done some interesting work using Pig, we would like to know!
>> Could you please, send a brief description of the kind of work you are
>> doing with Pig and what have you learned from working woth Pig: things
>> that worked well, things that you would like to be improved.
>>
>> Thanks,
>>
>> Olga
>>
>>
>>
>>
>>     
>
>   


Re: How are you using Pig?

Posted by zhang jianfeng <zj...@gmail.com>.
I am using pig to parse site logs, generate some stat like unique users,
download count and so on.  Currently we are in the initial stage, so all the
work is done in local. Later we will move the pig to EC2, and all the data
will be put on S3.

My suggestion: sometimes the exception message is not so informative.



On Wed, Apr 1, 2009 at 10:49 AM, Earl Cahill <ca...@yahoo.com> wrote:

> I am using pig to parse apache logs and generate aggregation files, which
> data I put into mysql.  From mysql I generate stats for the sites on my free
> hosting site.  I used to work for a web hosting company and we spent quite a
> bit of effort to get to my current level.  My pig script is fewer than I
> think 50 lines and it does the work of a whole processing / parsing
> framework we built.
>
> Earl
>
>  http://blog.spack.net
> http://holaservers.com
>
>
>
>
> ________________________________
> From: Olga Natkovich <ol...@yahoo-inc.com>
> To: pig-user@hadoop.apache.org
> Sent: Wednesday, March 11, 2009 7:01:05 PM
> Subject: How are you using Pig?
>
> Hi,
>
> If you have done some interesting work using Pig, we would like to know!
> Could you please, send a brief description of the kind of work you are
> doing with Pig and what have you learned from working woth Pig: things
> that worked well, things that you would like to be improved.
>
> Thanks,
>
> Olga
>
>
>
>