You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Russell Jurney <ru...@gmail.com> on 2012/06/13 06:45:34 UTC

Pig Meetup Notes

Tuesday, Pig Meetup

Alan Gates - upcoming improvements in operators/backend physical plan.
Desphagetification.
Reworking UDF interface, keep backward compatibility.
Hadoop 2 coming, will be slow adoption.

Bill Graham, Julien & Twitter - Optimization oriented. Cluster is at
capacity. Detect skew, cost based optimizers, dynamic tuning. Gathering
performance metrics, will be in HCatalog. Look at previous executions of
same job to optimize on the fly.

Companies: Yahoo, consultants, salesforce, twitter, hortonworks, cloudera,
zocalo systems?, trend micro

Bill presented Ambrose. Motivation: 40MR job pig scripts, added DAG view.
Shows you progress of your script as percentage and stepwise view. Helps
with debug, optimization. Major progress.

Pig users talk - using pig in local mode on sample, then pushing to
cluster. Using illustrate to cut developer iterations. No counters in local
mode. Embedded pig in loops for ML. Java embedding.
Java API PigServer to run scripts from apps. Macros are helping remove ugly
blocks of code, but UDFs are more solved by JRuby. Mortar data fixed Python
UDFs.

Reducing friction around using Pig with tools is important. Slowness of
batch is hard for new users. Sample is hard to prepare that will do joins.
Illustrate was invented for this purpose.

Scheduling pig jobs is still a problem. Oozie is unpopular and too hard.
Azkaban is inadequate for the enterprise. People hack things together. It
sucks.

HCatalog is maturing. Rest API. Hive and Pig together. Rest interface is
for metadata so far. People are wanting to extend it to grab UDFs, etc.

Russell Jurney http://datasyndrome.com

Re: Pig Meetup Notes

Posted by Russell Jurney <ru...@gmail.com>.
Is that a PMC position? I also do AV and can bounce #credentials :D

Russell Jurney
twitter.com/rjurney
russell.jurney@gmail.com
datasyndrome.com

On Jun 15, 2012, at 3:35 PM, Jonathan Coveney <jc...@gmail.com> wrote:

> +1
>
> 2012/6/15 Alan Gates <ga...@hortonworks.com>
>
>> Thanks Russell.  I move we make you the official Apache Pig secretary. :)
>>
>> Alan.
>>
>> On Jun 12, 2012, at 9:45 PM, Russell Jurney wrote:
>>
>>> Tuesday, Pig Meetup
>>>
>>> Alan Gates - upcoming improvements in operators/backend physical plan.
>>> Desphagetification.
>>> Reworking UDF interface, keep backward compatibility.
>>> Hadoop 2 coming, will be slow adoption.
>>>
>>> Bill Graham, Julien & Twitter - Optimization oriented. Cluster is at
>>> capacity. Detect skew, cost based optimizers, dynamic tuning. Gathering
>>> performance metrics, will be in HCatalog. Look at previous executions of
>>> same job to optimize on the fly.
>>>
>>> Companies: Yahoo, consultants, salesforce, twitter, hortonworks,
>> cloudera,
>>> zocalo systems?, trend micro
>>>
>>> Bill presented Ambrose. Motivation: 40MR job pig scripts, added DAG view.
>>> Shows you progress of your script as percentage and stepwise view. Helps
>>> with debug, optimization. Major progress.
>>>
>>> Pig users talk - using pig in local mode on sample, then pushing to
>>> cluster. Using illustrate to cut developer iterations. No counters in
>> local
>>> mode. Embedded pig in loops for ML. Java embedding.
>>> Java API PigServer to run scripts from apps. Macros are helping remove
>> ugly
>>> blocks of code, but UDFs are more solved by JRuby. Mortar data fixed
>> Python
>>> UDFs.
>>>
>>> Reducing friction around using Pig with tools is important. Slowness of
>>> batch is hard for new users. Sample is hard to prepare that will do
>> joins.
>>> Illustrate was invented for this purpose.
>>>
>>> Scheduling pig jobs is still a problem. Oozie is unpopular and too hard.
>>> Azkaban is inadequate for the enterprise. People hack things together. It
>>> sucks.
>>>
>>> HCatalog is maturing. Rest API. Hive and Pig together. Rest interface is
>>> for metadata so far. People are wanting to extend it to grab UDFs, etc.
>>>
>>> Russell Jurney http://datasyndrome.com
>>
>>

Re: Pig Meetup Notes

Posted by Jonathan Coveney <jc...@gmail.com>.
+1

2012/6/15 Alan Gates <ga...@hortonworks.com>

> Thanks Russell.  I move we make you the official Apache Pig secretary. :)
>
> Alan.
>
> On Jun 12, 2012, at 9:45 PM, Russell Jurney wrote:
>
> > Tuesday, Pig Meetup
> >
> > Alan Gates - upcoming improvements in operators/backend physical plan.
> > Desphagetification.
> > Reworking UDF interface, keep backward compatibility.
> > Hadoop 2 coming, will be slow adoption.
> >
> > Bill Graham, Julien & Twitter - Optimization oriented. Cluster is at
> > capacity. Detect skew, cost based optimizers, dynamic tuning. Gathering
> > performance metrics, will be in HCatalog. Look at previous executions of
> > same job to optimize on the fly.
> >
> > Companies: Yahoo, consultants, salesforce, twitter, hortonworks,
> cloudera,
> > zocalo systems?, trend micro
> >
> > Bill presented Ambrose. Motivation: 40MR job pig scripts, added DAG view.
> > Shows you progress of your script as percentage and stepwise view. Helps
> > with debug, optimization. Major progress.
> >
> > Pig users talk - using pig in local mode on sample, then pushing to
> > cluster. Using illustrate to cut developer iterations. No counters in
> local
> > mode. Embedded pig in loops for ML. Java embedding.
> > Java API PigServer to run scripts from apps. Macros are helping remove
> ugly
> > blocks of code, but UDFs are more solved by JRuby. Mortar data fixed
> Python
> > UDFs.
> >
> > Reducing friction around using Pig with tools is important. Slowness of
> > batch is hard for new users. Sample is hard to prepare that will do
> joins.
> > Illustrate was invented for this purpose.
> >
> > Scheduling pig jobs is still a problem. Oozie is unpopular and too hard.
> > Azkaban is inadequate for the enterprise. People hack things together. It
> > sucks.
> >
> > HCatalog is maturing. Rest API. Hive and Pig together. Rest interface is
> > for metadata so far. People are wanting to extend it to grab UDFs, etc.
> >
> > Russell Jurney http://datasyndrome.com
>
>

Re: Pig Meetup Notes

Posted by Alan Gates <ga...@hortonworks.com>.
Thanks Russell.  I move we make you the official Apache Pig secretary. :)

Alan.

On Jun 12, 2012, at 9:45 PM, Russell Jurney wrote:

> Tuesday, Pig Meetup
> 
> Alan Gates - upcoming improvements in operators/backend physical plan.
> Desphagetification.
> Reworking UDF interface, keep backward compatibility.
> Hadoop 2 coming, will be slow adoption.
> 
> Bill Graham, Julien & Twitter - Optimization oriented. Cluster is at
> capacity. Detect skew, cost based optimizers, dynamic tuning. Gathering
> performance metrics, will be in HCatalog. Look at previous executions of
> same job to optimize on the fly.
> 
> Companies: Yahoo, consultants, salesforce, twitter, hortonworks, cloudera,
> zocalo systems?, trend micro
> 
> Bill presented Ambrose. Motivation: 40MR job pig scripts, added DAG view.
> Shows you progress of your script as percentage and stepwise view. Helps
> with debug, optimization. Major progress.
> 
> Pig users talk - using pig in local mode on sample, then pushing to
> cluster. Using illustrate to cut developer iterations. No counters in local
> mode. Embedded pig in loops for ML. Java embedding.
> Java API PigServer to run scripts from apps. Macros are helping remove ugly
> blocks of code, but UDFs are more solved by JRuby. Mortar data fixed Python
> UDFs.
> 
> Reducing friction around using Pig with tools is important. Slowness of
> batch is hard for new users. Sample is hard to prepare that will do joins.
> Illustrate was invented for this purpose.
> 
> Scheduling pig jobs is still a problem. Oozie is unpopular and too hard.
> Azkaban is inadequate for the enterprise. People hack things together. It
> sucks.
> 
> HCatalog is maturing. Rest API. Hive and Pig together. Rest interface is
> for metadata so far. People are wanting to extend it to grab UDFs, etc.
> 
> Russell Jurney http://datasyndrome.com