You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Raghu Angadi <ra...@apache.org> on 2011/07/21 20:12:11 UTC

PigStorage's handling of InputFormat and OutputFormat

expectation from PigStorage.getInputFormat()  is that it is a
InputFormat<Writable, Text>, and PigStorage handles converting Text to
Tuple.
This is very useful and easy for users to use some other input format.

But the same is not true for PigStorage().getOutputFormat().. Here it
expects OutputFormat<Writable, Tuple>. So the output format needs to convert
Tuple to Text().

Not sure if this is intentional or not. I can submit a patch to move Tuple
handling into PigStorage. Then PigTextOutputFormat would be as thin as
PigTextInputFormat.

Re: PigStorage's handling of InputFormat and OutputFormat

Posted by Raghu Angadi <an...@gmail.com>.
makes sense. I will attach an updated patch that move Tuple serialization to
StorageUtil.

since we expect uses to extend PigStorage, I would like to add
getFieldDelmiter() method.. otherwise the extender has to parse and
remember.

Raghu.

On Fri, Jul 22, 2011 at 3:10 PM, Alan Gates <ga...@hortonworks.com> wrote:

> "There are very few StoreFuncs that extend PigStorage" that we know of.  We
> don't know how our users are extending it for themselves.  And PigStorage is
> a public interface.  Breaking it is a non-starter.
>
> Alan.
>
> On Jul 22, 2011, at 2:57 PM, Raghu Angadi wrote:
>
> > Yes, I don't like the extra copies either.. thats why didn't mark the
> Jira
> > 'patch available'. A static helper method would also be useful.
> >
> > But I don't see how it breaks how it breaks existing StoreFuncs or output
> > formats.. is there an example? There are very few StoreFuncs that extend
> > PigStorage.
> >
> > Raghu.
> >
> > On Fri, Jul 22, 2011 at 1:37 PM, Alan Gates <ga...@hortonworks.com>
> wrote:
> >
> >> At this point I'm -1 on this.  I don't want to break existing output
> >> formats or store functions.  And I don't see that much value here.  You
> can
> >> accomplish the same thing by putting the logic in a static method of
> >> PigTextOutputFormat and letting other users use it.  Also, the cost of
> an
> >> extra copy of the output is bad.  We don't want to slow down storing
> data.
> >>
> >> Alan.
> >>
> >> On Jul 22, 2011, at 12:24 PM, Raghu Angadi wrote:
> >>
> >>> attached a patch to https://issues.apache.org/jira/browse/PIG-2187
> >>>
> >>> Only drawback is extra copies required to make a Text().
> >>>
> >>>
> >>>
> >>> On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <da...@hortonworks.com>
> >> wrote:
> >>>
> >>>> I agree tuple -> text conversion better be in StoreFunc. User may have
> >>>> better chance to reuse OutputFormat.
> >>>>
> >>>> For backward compatibility, the signature of StoreFunc.getOutputFormat
> >>>> returns a generic OutputFormat object, this is fine. However, existing
> >>>> StoreFunc use PigOutputFormat need to change.
> >>>
> >>>
> >>> you mean existing classes that override PigStorage.getOutputFormat()
> and
> >> not
> >>> PigStorage.putNext()?
> >>> Yes, they would be affected.. but fixing them is very simple, they just
> >> need
> >>> to extend putNext().
> >>> As such there is no contract regd getOutputFormat() for us to break :)
> >>>
> >>> Raghu.
> >>>
> >>>> I don't know how much impact
> >>>> that will be, but need to be careful. We need to make clear
> announcement
> >>>> and
> >>>> document it as incompatible change if we do so.
> >>>>
> >>>> Daniel
> >>>>
> >>>> On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <ra...@apache.org>
> >> wrote:
> >>>>
> >>>>> expectation from PigStorage.getInputFormat()  is that it is a
> >>>>> InputFormat<Writable, Text>, and PigStorage handles converting Text
> to
> >>>>> Tuple.
> >>>>> This is very useful and easy for users to use some other input
> format.
> >>>>>
> >>>>> But the same is not true for PigStorage().getOutputFormat().. Here it
> >>>>> expects OutputFormat<Writable, Tuple>. So the output format needs to
> >>>>> convert
> >>>>> Tuple to Text().
> >>>>>
> >>>>> Not sure if this is intentional or not. I can submit a patch to move
> >>>> Tuple
> >>>>> handling into PigStorage. Then PigTextOutputFormat would be as thin
> as
> >>>>> PigTextInputFormat.
> >>>>>
> >>>>
> >>
> >>
>
>

Re: PigStorage's handling of InputFormat and OutputFormat

Posted by Alan Gates <ga...@hortonworks.com>.
"There are very few StoreFuncs that extend PigStorage" that we know of.  We don't know how our users are extending it for themselves.  And PigStorage is a public interface.  Breaking it is a non-starter.

Alan.

On Jul 22, 2011, at 2:57 PM, Raghu Angadi wrote:

> Yes, I don't like the extra copies either.. thats why didn't mark the Jira
> 'patch available'. A static helper method would also be useful.
> 
> But I don't see how it breaks how it breaks existing StoreFuncs or output
> formats.. is there an example? There are very few StoreFuncs that extend
> PigStorage.
> 
> Raghu.
> 
> On Fri, Jul 22, 2011 at 1:37 PM, Alan Gates <ga...@hortonworks.com> wrote:
> 
>> At this point I'm -1 on this.  I don't want to break existing output
>> formats or store functions.  And I don't see that much value here.  You can
>> accomplish the same thing by putting the logic in a static method of
>> PigTextOutputFormat and letting other users use it.  Also, the cost of an
>> extra copy of the output is bad.  We don't want to slow down storing data.
>> 
>> Alan.
>> 
>> On Jul 22, 2011, at 12:24 PM, Raghu Angadi wrote:
>> 
>>> attached a patch to https://issues.apache.org/jira/browse/PIG-2187
>>> 
>>> Only drawback is extra copies required to make a Text().
>>> 
>>> 
>>> 
>>> On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <da...@hortonworks.com>
>> wrote:
>>> 
>>>> I agree tuple -> text conversion better be in StoreFunc. User may have
>>>> better chance to reuse OutputFormat.
>>>> 
>>>> For backward compatibility, the signature of StoreFunc.getOutputFormat
>>>> returns a generic OutputFormat object, this is fine. However, existing
>>>> StoreFunc use PigOutputFormat need to change.
>>> 
>>> 
>>> you mean existing classes that override PigStorage.getOutputFormat() and
>> not
>>> PigStorage.putNext()?
>>> Yes, they would be affected.. but fixing them is very simple, they just
>> need
>>> to extend putNext().
>>> As such there is no contract regd getOutputFormat() for us to break :)
>>> 
>>> Raghu.
>>> 
>>>> I don't know how much impact
>>>> that will be, but need to be careful. We need to make clear announcement
>>>> and
>>>> document it as incompatible change if we do so.
>>>> 
>>>> Daniel
>>>> 
>>>> On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <ra...@apache.org>
>> wrote:
>>>> 
>>>>> expectation from PigStorage.getInputFormat()  is that it is a
>>>>> InputFormat<Writable, Text>, and PigStorage handles converting Text to
>>>>> Tuple.
>>>>> This is very useful and easy for users to use some other input format.
>>>>> 
>>>>> But the same is not true for PigStorage().getOutputFormat().. Here it
>>>>> expects OutputFormat<Writable, Tuple>. So the output format needs to
>>>>> convert
>>>>> Tuple to Text().
>>>>> 
>>>>> Not sure if this is intentional or not. I can submit a patch to move
>>>> Tuple
>>>>> handling into PigStorage. Then PigTextOutputFormat would be as thin as
>>>>> PigTextInputFormat.
>>>>> 
>>>> 
>> 
>> 


Re: PigStorage's handling of InputFormat and OutputFormat

Posted by Raghu Angadi <an...@gmail.com>.
Yes, I don't like the extra copies either.. thats why didn't mark the Jira
'patch available'. A static helper method would also be useful.

But I don't see how it breaks how it breaks existing StoreFuncs or output
formats.. is there an example? There are very few StoreFuncs that extend
PigStorage.

Raghu.

On Fri, Jul 22, 2011 at 1:37 PM, Alan Gates <ga...@hortonworks.com> wrote:

> At this point I'm -1 on this.  I don't want to break existing output
> formats or store functions.  And I don't see that much value here.  You can
> accomplish the same thing by putting the logic in a static method of
> PigTextOutputFormat and letting other users use it.  Also, the cost of an
> extra copy of the output is bad.  We don't want to slow down storing data.
>
> Alan.
>
> On Jul 22, 2011, at 12:24 PM, Raghu Angadi wrote:
>
> > attached a patch to https://issues.apache.org/jira/browse/PIG-2187
> >
> > Only drawback is extra copies required to make a Text().
> >
> >
> >
> > On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <da...@hortonworks.com>
> wrote:
> >
> >> I agree tuple -> text conversion better be in StoreFunc. User may have
> >> better chance to reuse OutputFormat.
> >>
> >> For backward compatibility, the signature of StoreFunc.getOutputFormat
> >> returns a generic OutputFormat object, this is fine. However, existing
> >> StoreFunc use PigOutputFormat need to change.
> >
> >
> > you mean existing classes that override PigStorage.getOutputFormat() and
> not
> > PigStorage.putNext()?
> > Yes, they would be affected.. but fixing them is very simple, they just
> need
> > to extend putNext().
> > As such there is no contract regd getOutputFormat() for us to break :)
> >
> > Raghu.
> >
> >> I don't know how much impact
> >> that will be, but need to be careful. We need to make clear announcement
> >> and
> >> document it as incompatible change if we do so.
> >>
> >> Daniel
> >>
> >> On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <ra...@apache.org>
> wrote:
> >>
> >>> expectation from PigStorage.getInputFormat()  is that it is a
> >>> InputFormat<Writable, Text>, and PigStorage handles converting Text to
> >>> Tuple.
> >>> This is very useful and easy for users to use some other input format.
> >>>
> >>> But the same is not true for PigStorage().getOutputFormat().. Here it
> >>> expects OutputFormat<Writable, Tuple>. So the output format needs to
> >>> convert
> >>> Tuple to Text().
> >>>
> >>> Not sure if this is intentional or not. I can submit a patch to move
> >> Tuple
> >>> handling into PigStorage. Then PigTextOutputFormat would be as thin as
> >>> PigTextInputFormat.
> >>>
> >>
>
>

Re: PigStorage's handling of InputFormat and OutputFormat

Posted by Alan Gates <ga...@hortonworks.com>.
At this point I'm -1 on this.  I don't want to break existing output formats or store functions.  And I don't see that much value here.  You can accomplish the same thing by putting the logic in a static method of PigTextOutputFormat and letting other users use it.  Also, the cost of an extra copy of the output is bad.  We don't want to slow down storing data.

Alan.

On Jul 22, 2011, at 12:24 PM, Raghu Angadi wrote:

> attached a patch to https://issues.apache.org/jira/browse/PIG-2187
> 
> Only drawback is extra copies required to make a Text().
> 
> 
> 
> On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <da...@hortonworks.com> wrote:
> 
>> I agree tuple -> text conversion better be in StoreFunc. User may have
>> better chance to reuse OutputFormat.
>> 
>> For backward compatibility, the signature of StoreFunc.getOutputFormat
>> returns a generic OutputFormat object, this is fine. However, existing
>> StoreFunc use PigOutputFormat need to change.
> 
> 
> you mean existing classes that override PigStorage.getOutputFormat() and not
> PigStorage.putNext()?
> Yes, they would be affected.. but fixing them is very simple, they just need
> to extend putNext().
> As such there is no contract regd getOutputFormat() for us to break :)
> 
> Raghu.
> 
>> I don't know how much impact
>> that will be, but need to be careful. We need to make clear announcement
>> and
>> document it as incompatible change if we do so.
>> 
>> Daniel
>> 
>> On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <ra...@apache.org> wrote:
>> 
>>> expectation from PigStorage.getInputFormat()  is that it is a
>>> InputFormat<Writable, Text>, and PigStorage handles converting Text to
>>> Tuple.
>>> This is very useful and easy for users to use some other input format.
>>> 
>>> But the same is not true for PigStorage().getOutputFormat().. Here it
>>> expects OutputFormat<Writable, Tuple>. So the output format needs to
>>> convert
>>> Tuple to Text().
>>> 
>>> Not sure if this is intentional or not. I can submit a patch to move
>> Tuple
>>> handling into PigStorage. Then PigTextOutputFormat would be as thin as
>>> PigTextInputFormat.
>>> 
>> 


Re: PigStorage's handling of InputFormat and OutputFormat

Posted by Raghu Angadi <an...@gmail.com>.
Thanks guys. Updated PIG-2187 with a new patch.

On Fri, Jul 22, 2011 at 3:44 PM, Daniel Dai <da...@hortonworks.com> wrote:

> Yes, I am talking about PigTextOutputFormat.
>
> On Fri, Jul 22, 2011 at 2:51 PM, Raghu Angadi <an...@gmail.com> wrote:
>
> > On Fri, Jul 22, 2011 at 1:29 PM, Daniel Dai <da...@hortonworks.com>
> wrote:
> >
> > > I mean StoreFunc that delegate outputformat to PigOutputFormat.
> >
> >
> >
> >
> > > Though
> > > PigOutputFormat is not in package org.apache.pig, it is the
> OutputFormat
> > of
> > > PigStorage,
> >
> >
> > There is no reference to PigOutputFormat in PigStorage. Did you mean
> > PigTextOutputFormat
> >
> > Raghu.
> >
> >
> > > which many users will use as reference implementation for a
> > > StoreFunc.
> > >
> > > Daniel
> > >
> > > On Fri, Jul 22, 2011 at 12:24 PM, Raghu Angadi <ra...@apache.org>
> > wrote:
> > >
> > > > attached a patch to https://issues.apache.org/jira/browse/PIG-2187
> > > >
> > > > Only drawback is extra copies required to make a Text().
> > > >
> > > >
> > > >
> > > > On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <da...@hortonworks.com>
> > > wrote:
> > > >
> > > > > I agree tuple -> text conversion better be in StoreFunc. User may
> > have
> > > > > better chance to reuse OutputFormat.
> > > > >
> > > > > For backward compatibility, the signature of
> > StoreFunc.getOutputFormat
> > > > > returns a generic OutputFormat object, this is fine. However,
> > existing
> > > > > StoreFunc use PigOutputFormat need to change.
> > > >
> > > >
> > > > you mean existing classes that override PigStorage.getOutputFormat()
> > and
> > > > not
> > > > PigStorage.putNext()?
> > > > Yes, they would be affected.. but fixing them is very simple, they
> just
> > > > need
> > > > to extend putNext().
> > > > As such there is no contract regd getOutputFormat() for us to break
> :)
> > > >
> > > > Raghu.
> > > >
> > > > > I don't know how much impact
> > > > > that will be, but need to be careful. We need to make clear
> > > announcement
> > > > > and
> > > > > document it as incompatible change if we do so.
> > > > >
> > > > > Daniel
> > > > >
> > > > > On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <rangadi@apache.org
> >
> > > > wrote:
> > > > >
> > > > > > expectation from PigStorage.getInputFormat()  is that it is a
> > > > > > InputFormat<Writable, Text>, and PigStorage handles converting
> Text
> > > to
> > > > > > Tuple.
> > > > > > This is very useful and easy for users to use some other input
> > > format.
> > > > > >
> > > > > > But the same is not true for PigStorage().getOutputFormat()..
> Here
> > it
> > > > > > expects OutputFormat<Writable, Tuple>. So the output format needs
> > to
> > > > > > convert
> > > > > > Tuple to Text().
> > > > > >
> > > > > > Not sure if this is intentional or not. I can submit a patch to
> > move
> > > > > Tuple
> > > > > > handling into PigStorage. Then PigTextOutputFormat would be as
> thin
> > > as
> > > > > > PigTextInputFormat.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: PigStorage's handling of InputFormat and OutputFormat

Posted by Daniel Dai <da...@hortonworks.com>.
Yes, I am talking about PigTextOutputFormat.

On Fri, Jul 22, 2011 at 2:51 PM, Raghu Angadi <an...@gmail.com> wrote:

> On Fri, Jul 22, 2011 at 1:29 PM, Daniel Dai <da...@hortonworks.com> wrote:
>
> > I mean StoreFunc that delegate outputformat to PigOutputFormat.
>
>
>
>
> > Though
> > PigOutputFormat is not in package org.apache.pig, it is the OutputFormat
> of
> > PigStorage,
>
>
> There is no reference to PigOutputFormat in PigStorage. Did you mean
> PigTextOutputFormat
>
> Raghu.
>
>
> > which many users will use as reference implementation for a
> > StoreFunc.
> >
> > Daniel
> >
> > On Fri, Jul 22, 2011 at 12:24 PM, Raghu Angadi <ra...@apache.org>
> wrote:
> >
> > > attached a patch to https://issues.apache.org/jira/browse/PIG-2187
> > >
> > > Only drawback is extra copies required to make a Text().
> > >
> > >
> > >
> > > On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <da...@hortonworks.com>
> > wrote:
> > >
> > > > I agree tuple -> text conversion better be in StoreFunc. User may
> have
> > > > better chance to reuse OutputFormat.
> > > >
> > > > For backward compatibility, the signature of
> StoreFunc.getOutputFormat
> > > > returns a generic OutputFormat object, this is fine. However,
> existing
> > > > StoreFunc use PigOutputFormat need to change.
> > >
> > >
> > > you mean existing classes that override PigStorage.getOutputFormat()
> and
> > > not
> > > PigStorage.putNext()?
> > > Yes, they would be affected.. but fixing them is very simple, they just
> > > need
> > > to extend putNext().
> > > As such there is no contract regd getOutputFormat() for us to break :)
> > >
> > > Raghu.
> > >
> > > > I don't know how much impact
> > > > that will be, but need to be careful. We need to make clear
> > announcement
> > > > and
> > > > document it as incompatible change if we do so.
> > > >
> > > > Daniel
> > > >
> > > > On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <ra...@apache.org>
> > > wrote:
> > > >
> > > > > expectation from PigStorage.getInputFormat()  is that it is a
> > > > > InputFormat<Writable, Text>, and PigStorage handles converting Text
> > to
> > > > > Tuple.
> > > > > This is very useful and easy for users to use some other input
> > format.
> > > > >
> > > > > But the same is not true for PigStorage().getOutputFormat().. Here
> it
> > > > > expects OutputFormat<Writable, Tuple>. So the output format needs
> to
> > > > > convert
> > > > > Tuple to Text().
> > > > >
> > > > > Not sure if this is intentional or not. I can submit a patch to
> move
> > > > Tuple
> > > > > handling into PigStorage. Then PigTextOutputFormat would be as thin
> > as
> > > > > PigTextInputFormat.
> > > > >
> > > >
> > >
> >
>

Re: PigStorage's handling of InputFormat and OutputFormat

Posted by Raghu Angadi <an...@gmail.com>.
On Fri, Jul 22, 2011 at 1:29 PM, Daniel Dai <da...@hortonworks.com> wrote:

> I mean StoreFunc that delegate outputformat to PigOutputFormat.




> Though
> PigOutputFormat is not in package org.apache.pig, it is the OutputFormat of
> PigStorage,


There is no reference to PigOutputFormat in PigStorage. Did you mean
PigTextOutputFormat

Raghu.


> which many users will use as reference implementation for a
> StoreFunc.
>
> Daniel
>
> On Fri, Jul 22, 2011 at 12:24 PM, Raghu Angadi <ra...@apache.org> wrote:
>
> > attached a patch to https://issues.apache.org/jira/browse/PIG-2187
> >
> > Only drawback is extra copies required to make a Text().
> >
> >
> >
> > On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <da...@hortonworks.com>
> wrote:
> >
> > > I agree tuple -> text conversion better be in StoreFunc. User may have
> > > better chance to reuse OutputFormat.
> > >
> > > For backward compatibility, the signature of StoreFunc.getOutputFormat
> > > returns a generic OutputFormat object, this is fine. However, existing
> > > StoreFunc use PigOutputFormat need to change.
> >
> >
> > you mean existing classes that override PigStorage.getOutputFormat() and
> > not
> > PigStorage.putNext()?
> > Yes, they would be affected.. but fixing them is very simple, they just
> > need
> > to extend putNext().
> > As such there is no contract regd getOutputFormat() for us to break :)
> >
> > Raghu.
> >
> > > I don't know how much impact
> > > that will be, but need to be careful. We need to make clear
> announcement
> > > and
> > > document it as incompatible change if we do so.
> > >
> > > Daniel
> > >
> > > On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <ra...@apache.org>
> > wrote:
> > >
> > > > expectation from PigStorage.getInputFormat()  is that it is a
> > > > InputFormat<Writable, Text>, and PigStorage handles converting Text
> to
> > > > Tuple.
> > > > This is very useful and easy for users to use some other input
> format.
> > > >
> > > > But the same is not true for PigStorage().getOutputFormat().. Here it
> > > > expects OutputFormat<Writable, Tuple>. So the output format needs to
> > > > convert
> > > > Tuple to Text().
> > > >
> > > > Not sure if this is intentional or not. I can submit a patch to move
> > > Tuple
> > > > handling into PigStorage. Then PigTextOutputFormat would be as thin
> as
> > > > PigTextInputFormat.
> > > >
> > >
> >
>

Re: PigStorage's handling of InputFormat and OutputFormat

Posted by Daniel Dai <da...@hortonworks.com>.
I mean StoreFunc that delegate outputformat to PigOutputFormat. Though
PigOutputFormat is not in package org.apache.pig, it is the OutputFormat of
PigStorage, which many users will use as reference implementation for a
StoreFunc.

Daniel

On Fri, Jul 22, 2011 at 12:24 PM, Raghu Angadi <ra...@apache.org> wrote:

> attached a patch to https://issues.apache.org/jira/browse/PIG-2187
>
> Only drawback is extra copies required to make a Text().
>
>
>
> On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <da...@hortonworks.com> wrote:
>
> > I agree tuple -> text conversion better be in StoreFunc. User may have
> > better chance to reuse OutputFormat.
> >
> > For backward compatibility, the signature of StoreFunc.getOutputFormat
> > returns a generic OutputFormat object, this is fine. However, existing
> > StoreFunc use PigOutputFormat need to change.
>
>
> you mean existing classes that override PigStorage.getOutputFormat() and
> not
> PigStorage.putNext()?
> Yes, they would be affected.. but fixing them is very simple, they just
> need
> to extend putNext().
> As such there is no contract regd getOutputFormat() for us to break :)
>
> Raghu.
>
> > I don't know how much impact
> > that will be, but need to be careful. We need to make clear announcement
> > and
> > document it as incompatible change if we do so.
> >
> > Daniel
> >
> > On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <ra...@apache.org>
> wrote:
> >
> > > expectation from PigStorage.getInputFormat()  is that it is a
> > > InputFormat<Writable, Text>, and PigStorage handles converting Text to
> > > Tuple.
> > > This is very useful and easy for users to use some other input format.
> > >
> > > But the same is not true for PigStorage().getOutputFormat().. Here it
> > > expects OutputFormat<Writable, Tuple>. So the output format needs to
> > > convert
> > > Tuple to Text().
> > >
> > > Not sure if this is intentional or not. I can submit a patch to move
> > Tuple
> > > handling into PigStorage. Then PigTextOutputFormat would be as thin as
> > > PigTextInputFormat.
> > >
> >
>

Re: PigStorage's handling of InputFormat and OutputFormat

Posted by Raghu Angadi <ra...@apache.org>.
attached a patch to https://issues.apache.org/jira/browse/PIG-2187

Only drawback is extra copies required to make a Text().



On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai <da...@hortonworks.com> wrote:

> I agree tuple -> text conversion better be in StoreFunc. User may have
> better chance to reuse OutputFormat.
>
> For backward compatibility, the signature of StoreFunc.getOutputFormat
> returns a generic OutputFormat object, this is fine. However, existing
> StoreFunc use PigOutputFormat need to change.


you mean existing classes that override PigStorage.getOutputFormat() and not
PigStorage.putNext()?
Yes, they would be affected.. but fixing them is very simple, they just need
to extend putNext().
As such there is no contract regd getOutputFormat() for us to break :)

Raghu.

> I don't know how much impact
> that will be, but need to be careful. We need to make clear announcement
> and
> document it as incompatible change if we do so.
>
> Daniel
>
> On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <ra...@apache.org> wrote:
>
> > expectation from PigStorage.getInputFormat()  is that it is a
> > InputFormat<Writable, Text>, and PigStorage handles converting Text to
> > Tuple.
> > This is very useful and easy for users to use some other input format.
> >
> > But the same is not true for PigStorage().getOutputFormat().. Here it
> > expects OutputFormat<Writable, Tuple>. So the output format needs to
> > convert
> > Tuple to Text().
> >
> > Not sure if this is intentional or not. I can submit a patch to move
> Tuple
> > handling into PigStorage. Then PigTextOutputFormat would be as thin as
> > PigTextInputFormat.
> >
>

Re: PigStorage's handling of InputFormat and OutputFormat

Posted by Daniel Dai <da...@hortonworks.com>.
I agree tuple -> text conversion better be in StoreFunc. User may have
better chance to reuse OutputFormat.

For backward compatibility, the signature of StoreFunc.getOutputFormat
returns a generic OutputFormat object, this is fine. However, existing
StoreFunc use PigOutputFormat need to change. I don't know how much impact
that will be, but need to be careful. We need to make clear announcement and
document it as incompatible change if we do so.

Daniel

On Thu, Jul 21, 2011 at 11:12 AM, Raghu Angadi <ra...@apache.org> wrote:

> expectation from PigStorage.getInputFormat()  is that it is a
> InputFormat<Writable, Text>, and PigStorage handles converting Text to
> Tuple.
> This is very useful and easy for users to use some other input format.
>
> But the same is not true for PigStorage().getOutputFormat().. Here it
> expects OutputFormat<Writable, Tuple>. So the output format needs to
> convert
> Tuple to Text().
>
> Not sure if this is intentional or not. I can submit a patch to move Tuple
> handling into PigStorage. Then PigTextOutputFormat would be as thin as
> PigTextInputFormat.
>