You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Markus Resch <ma...@adtech.de> on 2012/07/17 13:26:09 UTC

using multipe avro schemas in globbed files (schema merging)

Hey everyone,

in the thread "Downgrade CDH4 to CDH3" of the cloudera mailing list I
talked about issues we had with pig while testing cdh4 and that we had
trouble in switching back to cdh3. After I figured out the reason of our
pig issue I tried to apply the patch
(https://issues.apache.org/jira/browse/PIG-2579 ) to the cdh4 version of
pig. Sadly this was much harder then applying this particular patch to
the cdh3 version of pig before. Does anyone have this or a similar patch
in a way that is suitable for the cdh4 version of pig? I'm just asking
because doing work twice doesn't help anyone. If this work is already
done: could this patch be attached to the PIG-2579-ticket as well?

Thanks

Markus




Re: using multipe avro schemas in globbed files (schema merging)

Posted by Cheolsoo Park <ch...@cloudera.com>.
Hi Nebo,

Thank you for your suggestion. I will keep it in mind.

Cheolsoo

On Wed, Jul 25, 2012 at 11:40 PM, Zebeljan, Nebojsa <
nebojsa.zebeljan@adtech.com> wrote:

> Hi Cheolsoo, hi Philipp,
> we've already patched Stans original patch to pig-0.9.2-cdh4.0.1 and
> adjusted it to our needs.
>
> In detail we've removed in the method
> "org.apache.pig.piggybank.storage.avro.AvroStorageUtils.union(Schema,
> Schema)" the schema name validation, since in our case the schemas have
> always the same name.
>
> Code:
> //      if (x.getName().equals(y.getName())) {
> //              throw new RuntimeException("Union of two schemas of the
> same name is not supported");
> //      }
>
> @Chelsoo: When applying the "merge code" to the piggybank codebase, please
> consider if this check makes in general sense.
>
> By the way the patch works pretty good for us - Thanks to Stan!
>
> Regards,
> Nebo
>
> -----Ursprüngliche Nachricht-----
> Von: Cheolsoo Park [mailto:cheolsoo@cloudera.com]
> Gesendet: Mittwoch, 25. Juli 2012 23:18
> An: user@pig.apache.org
> Betreff: Re: using multipe avro schemas in globbed files (schema merging)
>
> Hi Phillipp,
>
> Sure, I put PIG-2579 into my queue. I will start working on it shortly.
>
> Thanks,
> Cheolsoo
>
> On Wed, Jul 25, 2012 at 7:35 AM, Philipp Pahl <ph.pahl@googlemail.com
> >wrote:
>
> > Hi Cheolsoo,
> >
> > I saw that you integrated the "globs and commas" support into the pig
> > code. I was wondering if you are also planning to integrate the
> > multiple Avro schema support, which I would greatly appreciate.
> >
> > Thanks and regards
> > Philipp
> >
> >
> > On 07/17/2012 07:03 PM, Cheolsoo Park wrote:
> >
> >> Hi Markus,
> >>
> >> Thank you for sharing your problem.
> >>
> >> Looking at the PIG-2579
> >> <https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apac
> >> he.org/jira/browse/PIG-2579>>patch,
> >> it seems to try
> >>
> >> to address two issues at the same time:
> >> 1) Globs support
> >> 2) Multiple Avro schemas support
> >>
> >> I think that it's better to solve one issue at a time. In fact, there
> >> is another jira PIG-2492
> >> <https://issues.apache.org/**jira/browse/PIG-2492<https://issues.apac
> >> he.org/jira/browse/PIG-2492>>
> >> that
> >>
> >> tries to address #1 particularly. Once
> >> PIG-2492<https://issues.**apache.org/jira/browse/PIG-**2492<https://i
> >> ssues.apache.org/jira/browse/PIG-2492>>is
> >> resolved, I
> >>
> >> think we can rebase/fix the
> >> PIG-2579
> >> <https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apac
> >> he.org/jira/browse/PIG-2579>>
> >> patch on top of
> >>
> >> that.
> >>
> >> I am happy to work on both jiras. Please let me know what you think.
> >>
> >> Thanks,
> >> Cheolsoo
> >>
> >> On Tue, Jul 17, 2012 at 4:26 AM, Markus Resch <markus.resch@adtech.de
> >> >wrote:
> >>
> >>  Hey everyone,
> >>>
> >>> in the thread "Downgrade CDH4 to CDH3" of the cloudera mailing list
> >>> I talked about issues we had with pig while testing cdh4 and that we
> >>> had trouble in switching back to cdh3. After I figured out the
> >>> reason of our pig issue I tried to apply the patch
> >>> (https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apa
> >>> che.org/jira/browse/PIG-2579>) to the cdh4 version of pig. Sadly
> >>> this was much harder then applying this particular patch to the cdh3
> >>> version of pig before. Does anyone have this or a similar patch in a
> >>> way that is suitable for the cdh4 version of pig? I'm just asking
> >>> because doing work twice doesn't help anyone. If this work is
> >>> already
> >>> done: could this patch be attached to the PIG-2579-ticket as well?
> >>>
> >>> Thanks
> >>>
> >>> Markus
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>>
> >>>
> >>>
> >>>
> >
>

AW: using multipe avro schemas in globbed files (schema merging)

Posted by "Zebeljan, Nebojsa" <ne...@adtech.com>.
Hi Cheolsoo, hi Philipp,
we've already patched Stans original patch to pig-0.9.2-cdh4.0.1 and adjusted it to our needs.

In detail we've removed in the method "org.apache.pig.piggybank.storage.avro.AvroStorageUtils.union(Schema, Schema)" the schema name validation, since in our case the schemas have always the same name.

Code:
//	if (x.getName().equals(y.getName())) {
//		throw new RuntimeException("Union of two schemas of the same name is not supported");
//	}

@Chelsoo: When applying the "merge code" to the piggybank codebase, please consider if this check makes in general sense.

By the way the patch works pretty good for us - Thanks to Stan!

Regards,
Nebo

-----Ursprüngliche Nachricht-----
Von: Cheolsoo Park [mailto:cheolsoo@cloudera.com] 
Gesendet: Mittwoch, 25. Juli 2012 23:18
An: user@pig.apache.org
Betreff: Re: using multipe avro schemas in globbed files (schema merging)

Hi Phillipp,

Sure, I put PIG-2579 into my queue. I will start working on it shortly.

Thanks,
Cheolsoo

On Wed, Jul 25, 2012 at 7:35 AM, Philipp Pahl <ph...@googlemail.com>wrote:

> Hi Cheolsoo,
>
> I saw that you integrated the "globs and commas" support into the pig 
> code. I was wondering if you are also planning to integrate the 
> multiple Avro schema support, which I would greatly appreciate.
>
> Thanks and regards
> Philipp
>
>
> On 07/17/2012 07:03 PM, Cheolsoo Park wrote:
>
>> Hi Markus,
>>
>> Thank you for sharing your problem.
>>
>> Looking at the PIG-2579
>> <https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apac
>> he.org/jira/browse/PIG-2579>>patch,
>> it seems to try
>>
>> to address two issues at the same time:
>> 1) Globs support
>> 2) Multiple Avro schemas support
>>
>> I think that it's better to solve one issue at a time. In fact, there 
>> is another jira PIG-2492 
>> <https://issues.apache.org/**jira/browse/PIG-2492<https://issues.apac
>> he.org/jira/browse/PIG-2492>>
>> that
>>
>> tries to address #1 particularly. Once 
>> PIG-2492<https://issues.**apache.org/jira/browse/PIG-**2492<https://i
>> ssues.apache.org/jira/browse/PIG-2492>>is
>> resolved, I
>>
>> think we can rebase/fix the
>> PIG-2579 
>> <https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apac
>> he.org/jira/browse/PIG-2579>>
>> patch on top of
>>
>> that.
>>
>> I am happy to work on both jiras. Please let me know what you think.
>>
>> Thanks,
>> Cheolsoo
>>
>> On Tue, Jul 17, 2012 at 4:26 AM, Markus Resch <markus.resch@adtech.de
>> >wrote:
>>
>>  Hey everyone,
>>>
>>> in the thread "Downgrade CDH4 to CDH3" of the cloudera mailing list 
>>> I talked about issues we had with pig while testing cdh4 and that we 
>>> had trouble in switching back to cdh3. After I figured out the 
>>> reason of our pig issue I tried to apply the patch
>>> (https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apa
>>> che.org/jira/browse/PIG-2579>) to the cdh4 version of pig. Sadly 
>>> this was much harder then applying this particular patch to the cdh3 
>>> version of pig before. Does anyone have this or a similar patch in a 
>>> way that is suitable for the cdh4 version of pig? I'm just asking 
>>> because doing work twice doesn't help anyone. If this work is 
>>> already
>>> done: could this patch be attached to the PIG-2579-ticket as well?
>>>
>>> Thanks
>>>
>>> Markus
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>

Re: using multipe avro schemas in globbed files (schema merging)

Posted by Cheolsoo Park <ch...@cloudera.com>.
Hi Phillipp,

Sure, I put PIG-2579 into my queue. I will start working on it shortly.

Thanks,
Cheolsoo

On Wed, Jul 25, 2012 at 7:35 AM, Philipp Pahl <ph...@googlemail.com>wrote:

> Hi Cheolsoo,
>
> I saw that you integrated the "globs and commas" support into the pig
> code. I was wondering if you are also planning to integrate the multiple
> Avro schema support, which I would greatly appreciate.
>
> Thanks and regards
> Philipp
>
>
> On 07/17/2012 07:03 PM, Cheolsoo Park wrote:
>
>> Hi Markus,
>>
>> Thank you for sharing your problem.
>>
>> Looking at the PIG-2579
>> <https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apache.org/jira/browse/PIG-2579>>patch,
>> it seems to try
>>
>> to address two issues at the same time:
>> 1) Globs support
>> 2) Multiple Avro schemas support
>>
>> I think that it's better to solve one issue at a time. In fact, there is
>> another jira PIG-2492 <https://issues.apache.org/**jira/browse/PIG-2492<https://issues.apache.org/jira/browse/PIG-2492>>
>> that
>>
>> tries to address #1 particularly. Once
>> PIG-2492<https://issues.**apache.org/jira/browse/PIG-**2492<https://issues.apache.org/jira/browse/PIG-2492>>is
>> resolved, I
>>
>> think we can rebase/fix the
>> PIG-2579 <https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apache.org/jira/browse/PIG-2579>>
>> patch on top of
>>
>> that.
>>
>> I am happy to work on both jiras. Please let me know what you think.
>>
>> Thanks,
>> Cheolsoo
>>
>> On Tue, Jul 17, 2012 at 4:26 AM, Markus Resch <markus.resch@adtech.de
>> >wrote:
>>
>>  Hey everyone,
>>>
>>> in the thread "Downgrade CDH4 to CDH3" of the cloudera mailing list I
>>> talked about issues we had with pig while testing cdh4 and that we had
>>> trouble in switching back to cdh3. After I figured out the reason of our
>>> pig issue I tried to apply the patch
>>> (https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apache.org/jira/browse/PIG-2579>) to the cdh4 version of
>>> pig. Sadly this was much harder then applying this particular patch to
>>> the cdh3 version of pig before. Does anyone have this or a similar patch
>>> in a way that is suitable for the cdh4 version of pig? I'm just asking
>>> because doing work twice doesn't help anyone. If this work is already
>>> done: could this patch be attached to the PIG-2579-ticket as well?
>>>
>>> Thanks
>>>
>>> Markus
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>

Re: using multipe avro schemas in globbed files (schema merging)

Posted by Philipp Pahl <ph...@googlemail.com>.
Hi Cheolsoo,

I saw that you integrated the "globs and commas" support into the pig 
code. I was wondering if you are also planning to integrate the multiple 
Avro schema support, which I would greatly appreciate.

Thanks and regards
Philipp

On 07/17/2012 07:03 PM, Cheolsoo Park wrote:
> Hi Markus,
>
> Thank you for sharing your problem.
>
> Looking at the PIG-2579
> <https://issues.apache.org/jira/browse/PIG-2579>patch, it seems to try
> to address two issues at the same time:
> 1) Globs support
> 2) Multiple Avro schemas support
>
> I think that it's better to solve one issue at a time. In fact, there is
> another jira PIG-2492 <https://issues.apache.org/jira/browse/PIG-2492> that
> tries to address #1 particularly. Once
> PIG-2492<https://issues.apache.org/jira/browse/PIG-2492>is resolved, I
> think we can rebase/fix the
> PIG-2579 <https://issues.apache.org/jira/browse/PIG-2579> patch on top of
> that.
>
> I am happy to work on both jiras. Please let me know what you think.
>
> Thanks,
> Cheolsoo
>
> On Tue, Jul 17, 2012 at 4:26 AM, Markus Resch <ma...@adtech.de>wrote:
>
>> Hey everyone,
>>
>> in the thread "Downgrade CDH4 to CDH3" of the cloudera mailing list I
>> talked about issues we had with pig while testing cdh4 and that we had
>> trouble in switching back to cdh3. After I figured out the reason of our
>> pig issue I tried to apply the patch
>> (https://issues.apache.org/jira/browse/PIG-2579 ) to the cdh4 version of
>> pig. Sadly this was much harder then applying this particular patch to
>> the cdh3 version of pig before. Does anyone have this or a similar patch
>> in a way that is suitable for the cdh4 version of pig? I'm just asking
>> because doing work twice doesn't help anyone. If this work is already
>> done: could this patch be attached to the PIG-2579-ticket as well?
>>
>> Thanks
>>
>> Markus
>>
>>
>>
>> --
>>
>>
>>
>>


Re: using multipe avro schemas in globbed files (schema merging)

Posted by Cheolsoo Park <ch...@cloudera.com>.
Hi Markus,

Thank you for sharing your problem.

Looking at the PIG-2579
<https://issues.apache.org/jira/browse/PIG-2579>patch, it seems to try
to address two issues at the same time:
1) Globs support
2) Multiple Avro schemas support

I think that it's better to solve one issue at a time. In fact, there is
another jira PIG-2492 <https://issues.apache.org/jira/browse/PIG-2492> that
tries to address #1 particularly. Once
PIG-2492<https://issues.apache.org/jira/browse/PIG-2492>is resolved, I
think we can rebase/fix the
PIG-2579 <https://issues.apache.org/jira/browse/PIG-2579> patch on top of
that.

I am happy to work on both jiras. Please let me know what you think.

Thanks,
Cheolsoo

On Tue, Jul 17, 2012 at 4:26 AM, Markus Resch <ma...@adtech.de>wrote:

> Hey everyone,
>
> in the thread "Downgrade CDH4 to CDH3" of the cloudera mailing list I
> talked about issues we had with pig while testing cdh4 and that we had
> trouble in switching back to cdh3. After I figured out the reason of our
> pig issue I tried to apply the patch
> (https://issues.apache.org/jira/browse/PIG-2579 ) to the cdh4 version of
> pig. Sadly this was much harder then applying this particular patch to
> the cdh3 version of pig before. Does anyone have this or a similar patch
> in a way that is suitable for the cdh4 version of pig? I'm just asking
> because doing work twice doesn't help anyone. If this work is already
> done: could this patch be attached to the PIG-2579-ticket as well?
>
> Thanks
>
> Markus
>
>
>
> --
>
>
>
>