You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jonathan Coveney <jc...@gmail.com> on 2011/04/28 17:30:38 UTC
Confusion looking at source for PigStorage
I'm sure this is well known, I'm just curious why it is documented as
such... perhaps I am missing something obvious, but I see:
/**
* A load function that parses a line of input into fields using a delimiter
to
* set the fields. The delimiter is given as a regular expression. See
* {@link java.lang.String#split(String)} and {@link
java.util.regex.Pattern}
* for more information.
*/
But then
/**
* Constructs a Pig loader that uses specified regex as a field
delimiter.
*
* @param delimiter
* the single byte character that is used to separate fields.
* ("\t" is the default.)
*/
public PigStorage(String delimiter) {
this();
fieldDel = StorageUtil.parseFieldDel(delimiter);
}
Nowhere does PigStorage use a regular expression? It only accepts a single
byte...
Just thought I'd ask. Thank you
Jon
Re: Confusion looking at source for PigStorage
Posted by Jonathan Coveney <jc...@gmail.com>.
Ah, ok. That's what I assumed. Just wanted to make sure I wasn't crazy...
2011/4/28 Alan Gates <ga...@yahoo-inc.com>
> Originally it used a regular expression. At some point we changed that to
> a single character because it was much faster than a regex. Apparently we
> missed a spot in the documentation when we made the change.
>
> Alan.
>
>
> On Apr 28, 2011, at 8:30 AM, Jonathan Coveney wrote:
>
> I'm sure this is well known, I'm just curious why it is documented as
>> such... perhaps I am missing something obvious, but I see:
>>
>> /**
>> * A load function that parses a line of input into fields using a
>> delimiter
>> to
>> * set the fields. The delimiter is given as a regular expression. See
>> * {@link java.lang.String#split(String)} and {@link
>> java.util.regex.Pattern}
>> * for more information.
>> */
>>
>> But then
>>
>> /**
>> * Constructs a Pig loader that uses specified regex as a field
>> delimiter.
>> *
>> * @param delimiter
>> * the single byte character that is used to separate fields.
>> * ("\t" is the default.)
>> */
>> public PigStorage(String delimiter) {
>> this();
>> fieldDel = StorageUtil.parseFieldDel(delimiter);
>> }
>>
>>
>> Nowhere does PigStorage use a regular expression? It only accepts a single
>> byte...
>>
>> Just thought I'd ask. Thank you
>> Jon
>>
>
>
Re: Confusion looking at source for PigStorage
Posted by Alan Gates <ga...@yahoo-inc.com>.
Originally it used a regular expression. At some point we changed
that to a single character because it was much faster than a regex.
Apparently we missed a spot in the documentation when we made the
change.
Alan.
On Apr 28, 2011, at 8:30 AM, Jonathan Coveney wrote:
> I'm sure this is well known, I'm just curious why it is documented as
> such... perhaps I am missing something obvious, but I see:
>
> /**
> * A load function that parses a line of input into fields using a
> delimiter
> to
> * set the fields. The delimiter is given as a regular expression. See
> * {@link java.lang.String#split(String)} and {@link
> java.util.regex.Pattern}
> * for more information.
> */
>
> But then
>
> /**
> * Constructs a Pig loader that uses specified regex as a field
> delimiter.
> *
> * @param delimiter
> * the single byte character that is used to separate
> fields.
> * ("\t" is the default.)
> */
> public PigStorage(String delimiter) {
> this();
> fieldDel = StorageUtil.parseFieldDel(delimiter);
> }
>
>
> Nowhere does PigStorage use a regular expression? It only accepts a
> single
> byte...
>
> Just thought I'd ask. Thank you
> Jon