You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jonathan Coveney <jc...@gmail.com> on 2011/04/28 17:30:38 UTC

Confusion looking at source for PigStorage

I'm sure this is well known, I'm just curious why it is documented as
such... perhaps I am missing something obvious, but I see:

/**
 * A load function that parses a line of input into fields using a delimiter
to
 * set the fields. The delimiter is given as a regular expression. See
 * {@link java.lang.String#split(String)} and {@link
java.util.regex.Pattern}
 * for more information.
 */

But then

    /**
     * Constructs a Pig loader that uses specified regex as a field
delimiter.
     *
     * @param delimiter
     *            the single byte character that is used to separate fields.
     *            ("\t" is the default.)
     */
    public PigStorage(String delimiter) {
        this();
        fieldDel = StorageUtil.parseFieldDel(delimiter);
    }


Nowhere does PigStorage use a regular expression? It only accepts a single
byte...

Just thought I'd ask. Thank you
Jon

Re: Confusion looking at source for PigStorage

Posted by Jonathan Coveney <jc...@gmail.com>.
Ah, ok. That's what I assumed. Just wanted to make sure I wasn't crazy...

2011/4/28 Alan Gates <ga...@yahoo-inc.com>

> Originally it used a regular expression.  At some point we changed that to
> a single character because it was much faster than a regex.  Apparently we
> missed a spot in the documentation when we made the change.
>
> Alan.
>
>
> On Apr 28, 2011, at 8:30 AM, Jonathan Coveney wrote:
>
>  I'm sure this is well known, I'm just curious why it is documented as
>> such... perhaps I am missing something obvious, but I see:
>>
>> /**
>> * A load function that parses a line of input into fields using a
>> delimiter
>> to
>> * set the fields. The delimiter is given as a regular expression. See
>> * {@link java.lang.String#split(String)} and {@link
>> java.util.regex.Pattern}
>> * for more information.
>> */
>>
>> But then
>>
>>   /**
>>    * Constructs a Pig loader that uses specified regex as a field
>> delimiter.
>>    *
>>    * @param delimiter
>>    *            the single byte character that is used to separate fields.
>>    *            ("\t" is the default.)
>>    */
>>   public PigStorage(String delimiter) {
>>       this();
>>       fieldDel = StorageUtil.parseFieldDel(delimiter);
>>   }
>>
>>
>> Nowhere does PigStorage use a regular expression? It only accepts a single
>> byte...
>>
>> Just thought I'd ask. Thank you
>> Jon
>>
>
>

Re: Confusion looking at source for PigStorage

Posted by Alan Gates <ga...@yahoo-inc.com>.
Originally it used a regular expression.  At some point we changed  
that to a single character because it was much faster than a regex.   
Apparently we missed a spot in the documentation when we made the  
change.

Alan.

On Apr 28, 2011, at 8:30 AM, Jonathan Coveney wrote:

> I'm sure this is well known, I'm just curious why it is documented as
> such... perhaps I am missing something obvious, but I see:
>
> /**
> * A load function that parses a line of input into fields using a  
> delimiter
> to
> * set the fields. The delimiter is given as a regular expression. See
> * {@link java.lang.String#split(String)} and {@link
> java.util.regex.Pattern}
> * for more information.
> */
>
> But then
>
>    /**
>     * Constructs a Pig loader that uses specified regex as a field
> delimiter.
>     *
>     * @param delimiter
>     *            the single byte character that is used to separate  
> fields.
>     *            ("\t" is the default.)
>     */
>    public PigStorage(String delimiter) {
>        this();
>        fieldDel = StorageUtil.parseFieldDel(delimiter);
>    }
>
>
> Nowhere does PigStorage use a regular expression? It only accepts a  
> single
> byte...
>
> Just thought I'd ask. Thank you
> Jon