You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Taku Miyakawa <mi...@gmail.com> on 2013/02/28 23:53:52 UTC

Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/
-----------------------------------------------------------

Review request for pig.


Description
-------

This is a review board for  https://issues.apache.org/jira/browse/PIG-3215

The patch adds LTSVLoader function and its test class.


This addresses bug PIG-3215.
    https://issues.apache.org/jira/browse/PIG-3215


Diffs
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION 

Diff: https://reviews.apache.org/r/9685/diff/


Testing
-------

ant compile-test
cd contrib/piggybank/java/
ant -Dtestcase=TestLTSVLoader test


Thanks,

Taku Miyakawa


Re: Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files

Posted by Taku Miyakawa <mi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/
-----------------------------------------------------------

(Updated March 9, 2013, 8:18 a.m.)


Review request for pig.


Changes
-------

#pushProjection now also performs projection when the schema is specified to the constructor.


Description
-------

This is a review board for  https://issues.apache.org/jira/browse/PIG-3215

The patch adds LTSVLoader function and its test class.


This addresses bug PIG-3215.
    https://issues.apache.org/jira/browse/PIG-3215


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION 

Diff: https://reviews.apache.org/r/9685/diff/


Testing
-------

ant compile-test
cd contrib/piggybank/java/
ant -Dtestcase=TestLTSVLoader test


Thanks,

Taku Miyakawa


Re: Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files

Posted by Taku Miyakawa <mi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/
-----------------------------------------------------------

(Updated March 7, 2013, 11:31 p.m.)


Review request for pig.


Changes
-------

Outputs skipped labels to the task log at the first occurrence.


Description
-------

This is a review board for  https://issues.apache.org/jira/browse/PIG-3215

The patch adds LTSVLoader function and its test class.


This addresses bug PIG-3215.
    https://issues.apache.org/jira/browse/PIG-3215


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION 

Diff: https://reviews.apache.org/r/9685/diff/


Testing
-------

ant compile-test
cd contrib/piggybank/java/
ant -Dtestcase=TestLTSVLoader test


Thanks,

Taku Miyakawa


Re: Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files

Posted by Taku Miyakawa <mi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/
-----------------------------------------------------------

(Updated March 7, 2013, 10:46 p.m.)


Review request for pig.


Changes
-------

Removes excess blank lines.


Description
-------

This is a review board for  https://issues.apache.org/jira/browse/PIG-3215

The patch adds LTSVLoader function and its test class.


This addresses bug PIG-3215.
    https://issues.apache.org/jira/browse/PIG-3215


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION 

Diff: https://reviews.apache.org/r/9685/diff/


Testing
-------

ant compile-test
cd contrib/piggybank/java/
ant -Dtestcase=TestLTSVLoader test


Thanks,

Taku Miyakawa


Re: Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files

Posted by Taku Miyakawa <mi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/
-----------------------------------------------------------

(Updated March 5, 2013, 11:42 p.m.)


Review request for pig.


Changes
-------

Fixes minor issues such as ones about layout.


Description
-------

This is a review board for  https://issues.apache.org/jira/browse/PIG-3215

The patch adds LTSVLoader function and its test class.


This addresses bug PIG-3215.
    https://issues.apache.org/jira/browse/PIG-3215


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION 

Diff: https://reviews.apache.org/r/9685/diff/


Testing
-------

ant compile-test
cd contrib/piggybank/java/
ant -Dtestcase=TestLTSVLoader test


Thanks,

Taku Miyakawa


Re: Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files

Posted by Taku Miyakawa <mi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/
-----------------------------------------------------------

(Updated March 5, 2013, 11:38 p.m.)


Review request for pig.


Description
-------

This is a review board for  https://issues.apache.org/jira/browse/PIG-3215

The patch adds LTSVLoader function and its test class.


This addresses bug PIG-3215.
    https://issues.apache.org/jira/browse/PIG-3215


Diffs (updated)
-----

  contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION 
  contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION 

Diff: https://reviews.apache.org/r/9685/diff/


Testing
-------

ant compile-test
cd contrib/piggybank/java/
ant -Dtestcase=TestLTSVLoader test


Thanks,

Taku Miyakawa


Re: Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files

Posted by Taku Miyakawa <mi...@gmail.com>.

> On March 1, 2013, 2:54 p.m., Jonathan Coveney wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java, line 157
> > <https://reviews.apache.org/r/9685/diff/1/?file=263710#file263710line157>
> >
> >     In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map. It's implicit. Furthermore, I think that instead of passing the Schema as an argument, if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
> >     
> >     Open to argument, though.
> 
> Taku Miyakawa wrote:
>     > In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map.
>     
>     Yes ":map[]" is redundant. I will remove it.
>     
>     > if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
>     
>     Does this code meet your point?
>     
>     <code>
>     access = LOAD 'access.log' USING org.apache.pig.piggybank.storage.LTSVLoader() AS (host:chararray, ua:chararray);
>     ua = FOREACH access GENERATE ua;
>     dump ua;
>     </code>
>     
>     It will be good, but I could not find a way to implement the function in that manner. LoadFunc seems to have no method which can get an input schema specified by users.
> 
> Cheolsoo Park wrote:
>     >> It will be good, but I could not find a way to implement the function in that manner. LoadFunc seems to have no method which can get an input schema specified by users.
>     
>     You don't need access to the schema given by the AS clause inside LoadFunc. You can just make LoadMetaData.getSchema() return null and load every field as bytearray in LoadFunc.getNext(). Then, Pig inserts a FOREACH operator after LOAD and does type conversion using the schema given by the AS clause.
>     
>     This is what happens when PigStorage is not given a JSON schema file, but the output schema is defined in the AS clause.
> 
> Jonathan Coveney wrote:
>     Cheolsoo is totally correct. That said, this should probably be documented better. Also, the fact that we rely on an implicit foreach is so gross... but this is how you do it!

I think the function is similar to JsonLoader, it also takes a schema as an argument.

An LTSV file itself contains field names in each row.  A column consists of a name and a value (<name>:<value>).  And users may want to select fields by their names.  That is why the loader function should know field names, and it takes a schema as an argument.

Let me explain in detail.

In an LTSV file, orders of columns may differ for each row (although it is not a common case).  And a row may lack some columns.  In this example, the second row lacks "ua" column, and the third row has a different order of columns, but it's OK.

{data}
host:pc.example.com	req:GET /index.html	ua:Opera/9.80
host:user.example.net	req:GET /favicon.ico
ua:Mozilla/5.0	req:GET /news.html	host:workstation.example.org
{/data}

To load "host" column and "ua" column into Pig fields, the loader function should know their names, because it cannot depend on positions of columns.

{code}
access = LOAD 'access.log' USING LTSVLoader('host, ua');
DUMP access;

-- Output:
-- (pc.example.com, Opera/9.80)
-- (user.example.net,)
-- (workstation.example.org, Mozilla/5.0)
{/code}


> On March 1, 2013, 2:54 p.m., Jonathan Coveney wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java, line 576
> > <https://reviews.apache.org/r/9685/diff/1/?file=263710#file263710line576>
> >
> >     Why doesn't this support pushProjection as well?

It was my sloth. I will implement it.


> On March 1, 2013, 2:54 p.m., Jonathan Coveney wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java, line 547
> > <https://reviews.apache.org/r/9685/diff/1/?file=263710#file263710line547>
> >
> >     It is probably useful to LOG.debug (or LOG.warn) once for each label that comes up that isn't used. Right now it would be silent, and people might want to know.

LOG.debug for the first occurrence of a unused column seems good. I will implement in that way.


- Taku


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/#review17239
-----------------------------------------------------------


On Feb. 28, 2013, 10:53 p.m., Taku Miyakawa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9685/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2013, 10:53 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> This is a review board for  https://issues.apache.org/jira/browse/PIG-3215
> 
> The patch adds LTSVLoader function and its test class.
> 
> 
> This addresses bug PIG-3215.
>     https://issues.apache.org/jira/browse/PIG-3215
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/9685/diff/
> 
> 
> Testing
> -------
> 
> ant compile-test
> cd contrib/piggybank/java/
> ant -Dtestcase=TestLTSVLoader test
> 
> 
> Thanks,
> 
> Taku Miyakawa
> 
>


Re: Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files

Posted by Jonathan Coveney <jc...@gmail.com>.

> On March 1, 2013, 2:54 p.m., Jonathan Coveney wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java, line 157
> > <https://reviews.apache.org/r/9685/diff/1/?file=263710#file263710line157>
> >
> >     In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map. It's implicit. Furthermore, I think that instead of passing the Schema as an argument, if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
> >     
> >     Open to argument, though.
> 
> Taku Miyakawa wrote:
>     > In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map.
>     
>     Yes ":map[]" is redundant. I will remove it.
>     
>     > if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
>     
>     Does this code meet your point?
>     
>     <code>
>     access = LOAD 'access.log' USING org.apache.pig.piggybank.storage.LTSVLoader() AS (host:chararray, ua:chararray);
>     ua = FOREACH access GENERATE ua;
>     dump ua;
>     </code>
>     
>     It will be good, but I could not find a way to implement the function in that manner. LoadFunc seems to have no method which can get an input schema specified by users.
> 
> Cheolsoo Park wrote:
>     >> It will be good, but I could not find a way to implement the function in that manner. LoadFunc seems to have no method which can get an input schema specified by users.
>     
>     You don't need access to the schema given by the AS clause inside LoadFunc. You can just make LoadMetaData.getSchema() return null and load every field as bytearray in LoadFunc.getNext(). Then, Pig inserts a FOREACH operator after LOAD and does type conversion using the schema given by the AS clause.
>     
>     This is what happens when PigStorage is not given a JSON schema file, but the output schema is defined in the AS clause.
> 
> Jonathan Coveney wrote:
>     Cheolsoo is totally correct. That said, this should probably be documented better. Also, the fact that we rely on an implicit foreach is so gross... but this is how you do it!
> 
> Taku Miyakawa wrote:
>     I think the function is similar to JsonLoader, it also takes a schema as an argument.
>     
>     An LTSV file itself contains field names in each row.  A column consists of a name and a value (<name>:<value>).  And users may want to select fields by their names.  That is why the loader function should know field names, and it takes a schema as an argument.
>     
>     Let me explain in detail.
>     
>     In an LTSV file, orders of columns may differ for each row (although it is not a common case).  And a row may lack some columns.  In this example, the second row lacks "ua" column, and the third row has a different order of columns, but it's OK.
>     
>     {data}
>     host:pc.example.com	req:GET /index.html	ua:Opera/9.80
>     host:user.example.net	req:GET /favicon.ico
>     ua:Mozilla/5.0	req:GET /news.html	host:workstation.example.org
>     {/data}
>     
>     To load "host" column and "ua" column into Pig fields, the loader function should know their names, because it cannot depend on positions of columns.
>     
>     {code}
>     access = LOAD 'access.log' USING LTSVLoader('host, ua');
>     DUMP access;
>     
>     -- Output:
>     -- (pc.example.com, Opera/9.80)
>     -- (user.example.net,)
>     -- (workstation.example.org, Mozilla/5.0)
>     {/code}
>

Hmm, ask on the dev@ mailing list about this. I understand your reasoning and it is reasonable, but that said, this syntax irks me. I think there has to be a way to do this, but I am not sure what it is. If there isn't let's add one :) But there should be. Once we nail that, I will give it another go but it looks close to a +1.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/#review17239
-----------------------------------------------------------


On March 9, 2013, 8:18 a.m., Taku Miyakawa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9685/
> -----------------------------------------------------------
> 
> (Updated March 9, 2013, 8:18 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> This is a review board for  https://issues.apache.org/jira/browse/PIG-3215
> 
> The patch adds LTSVLoader function and its test class.
> 
> 
> This addresses bug PIG-3215.
>     https://issues.apache.org/jira/browse/PIG-3215
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/9685/diff/
> 
> 
> Testing
> -------
> 
> ant compile-test
> cd contrib/piggybank/java/
> ant -Dtestcase=TestLTSVLoader test
> 
> 
> Thanks,
> 
> Taku Miyakawa
> 
>


Re: Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files

Posted by Taku Miyakawa <mi...@gmail.com>.

> On March 1, 2013, 2:54 p.m., Jonathan Coveney wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java, line 265
> > <https://reviews.apache.org/r/9685/diff/1/?file=263710#file263710line265>
> >
> >     you don't reference this anywhere else, so I'd just inline the check as "if (line == null)" and so on. Java is verbose enough as it is :)

I will fix it and other similar kind of things. Actually I prefer a verbose style of coding, but I admit it is a bit excessive :)


> On March 1, 2013, 2:54 p.m., Jonathan Coveney wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java, line 157
> > <https://reviews.apache.org/r/9685/diff/1/?file=263710#file263710line157>
> >
> >     In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map. It's implicit. Furthermore, I think that instead of passing the Schema as an argument, if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
> >     
> >     Open to argument, though.

> In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map.

Yes ":map[]" is redundant. I will remove it.

> if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.

Does this code meet your point?

<code>
access = LOAD 'access.log' USING org.apache.pig.piggybank.storage.LTSVLoader() AS (host:chararray, ua:chararray);
ua = FOREACH access GENERATE ua;
dump ua;
</code>

It will be good, but I could not find a way to implement the function in that manner. LoadFunc seems to have no method which can get an input schema specified by users.


- Taku


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/#review17239
-----------------------------------------------------------


On Feb. 28, 2013, 10:53 p.m., Taku Miyakawa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9685/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2013, 10:53 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> This is a review board for  https://issues.apache.org/jira/browse/PIG-3215
> 
> The patch adds LTSVLoader function and its test class.
> 
> 
> This addresses bug PIG-3215.
>     https://issues.apache.org/jira/browse/PIG-3215
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/9685/diff/
> 
> 
> Testing
> -------
> 
> ant compile-test
> cd contrib/piggybank/java/
> ant -Dtestcase=TestLTSVLoader test
> 
> 
> Thanks,
> 
> Taku Miyakawa
> 
>


Re: Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files

Posted by Cheolsoo Park <ch...@cloudera.com>.

> On March 1, 2013, 2:54 p.m., Jonathan Coveney wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java, line 157
> > <https://reviews.apache.org/r/9685/diff/1/?file=263710#file263710line157>
> >
> >     In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map. It's implicit. Furthermore, I think that instead of passing the Schema as an argument, if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
> >     
> >     Open to argument, though.
> 
> Taku Miyakawa wrote:
>     > In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map.
>     
>     Yes ":map[]" is redundant. I will remove it.
>     
>     > if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
>     
>     Does this code meet your point?
>     
>     <code>
>     access = LOAD 'access.log' USING org.apache.pig.piggybank.storage.LTSVLoader() AS (host:chararray, ua:chararray);
>     ua = FOREACH access GENERATE ua;
>     dump ua;
>     </code>
>     
>     It will be good, but I could not find a way to implement the function in that manner. LoadFunc seems to have no method which can get an input schema specified by users.

>> It will be good, but I could not find a way to implement the function in that manner. LoadFunc seems to have no method which can get an input schema specified by users.

You don't need access to the schema given by the AS clause inside LoadFunc. You can just make LoadMetaData.getSchema() return null and load every field as bytearray in LoadFunc.getNext(). Then, Pig inserts a FOREACH operator after LOAD and does type conversion using the schema given by the AS clause.

This is what happens when PigStorage is not given a JSON schema file, but the output schema is defined in the AS clause.


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/#review17239
-----------------------------------------------------------


On Feb. 28, 2013, 10:53 p.m., Taku Miyakawa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9685/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2013, 10:53 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> This is a review board for  https://issues.apache.org/jira/browse/PIG-3215
> 
> The patch adds LTSVLoader function and its test class.
> 
> 
> This addresses bug PIG-3215.
>     https://issues.apache.org/jira/browse/PIG-3215
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/9685/diff/
> 
> 
> Testing
> -------
> 
> ant compile-test
> cd contrib/piggybank/java/
> ant -Dtestcase=TestLTSVLoader test
> 
> 
> Thanks,
> 
> Taku Miyakawa
> 
>


Re: Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files

Posted by Jonathan Coveney <jc...@gmail.com>.

> On March 1, 2013, 2:54 p.m., Jonathan Coveney wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java, line 157
> > <https://reviews.apache.org/r/9685/diff/1/?file=263710#file263710line157>
> >
> >     In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map. It's implicit. Furthermore, I think that instead of passing the Schema as an argument, if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
> >     
> >     Open to argument, though.
> 
> Taku Miyakawa wrote:
>     > In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map.
>     
>     Yes ":map[]" is redundant. I will remove it.
>     
>     > if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
>     
>     Does this code meet your point?
>     
>     <code>
>     access = LOAD 'access.log' USING org.apache.pig.piggybank.storage.LTSVLoader() AS (host:chararray, ua:chararray);
>     ua = FOREACH access GENERATE ua;
>     dump ua;
>     </code>
>     
>     It will be good, but I could not find a way to implement the function in that manner. LoadFunc seems to have no method which can get an input schema specified by users.
> 
> Cheolsoo Park wrote:
>     >> It will be good, but I could not find a way to implement the function in that manner. LoadFunc seems to have no method which can get an input schema specified by users.
>     
>     You don't need access to the schema given by the AS clause inside LoadFunc. You can just make LoadMetaData.getSchema() return null and load every field as bytearray in LoadFunc.getNext(). Then, Pig inserts a FOREACH operator after LOAD and does type conversion using the schema given by the AS clause.
>     
>     This is what happens when PigStorage is not given a JSON schema file, but the output schema is defined in the AS clause.

Cheolsoo is totally correct. That said, this should probably be documented better. Also, the fact that we rely on an implicit foreach is so gross... but this is how you do it!


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/#review17239
-----------------------------------------------------------


On Feb. 28, 2013, 10:53 p.m., Taku Miyakawa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9685/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2013, 10:53 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> This is a review board for  https://issues.apache.org/jira/browse/PIG-3215
> 
> The patch adds LTSVLoader function and its test class.
> 
> 
> This addresses bug PIG-3215.
>     https://issues.apache.org/jira/browse/PIG-3215
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/9685/diff/
> 
> 
> Testing
> -------
> 
> ant compile-test
> cd contrib/piggybank/java/
> ant -Dtestcase=TestLTSVLoader test
> 
> 
> Thanks,
> 
> Taku Miyakawa
> 
>


Re: Review Request: PIG-3215 [piggybank] Add LTSVLoader to load LTSV files

Posted by Jonathan Coveney <jc...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9685/#review17239
-----------------------------------------------------------



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36628>

    In the case where they do not give it a schema, I think that they shouldn't have to actually specify that it is a map. It's implicit. Furthermore, I think that instead of passing the Schema as an argument, if they do an "as (schema)" THEN it should attempt to return a Schema in accordance with what you provide.
    
    Open to argument, though.



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36630>

    you don't reference this anywhere else, so I'd just inline the check as "if (line == null)" and so on. Java is verbose enough as it is :)



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36629>

    this isn't necessary -- it is impossible for this to be false



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36631>

    same comments as above...no use being more verbose than we need to be



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36632>

    I personally do not love the number of gratuitous newlines in the file, and would prefer it to be a bit more compact. I think it hinders readability and just spreads things out. This is a nitpick though :)



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36633>

    I prefer the pattern of having all of the members at the top of the file. Another style nitpick, but I think this class could use some unification in that respect (lord knows I've violated this in the past but I do think it makes things easier if there is a predictable place to look for things).



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36634>

    in Pig, all Maps are <String,Object>



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36635>

    This should probably be LOG.debug



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36637>

    LOG.debug



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36636>

    same comments as above, and let's just apply it to every assert that comes right after a check which makes it clear what is desired.



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36638>

    Move this up to be with the null check ie "if (fields == null || fields.isEmpty())". It makes more sense for them to be together



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36639>

    LOG.debug



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36640>

    It is probably useful to LOG.debug (or LOG.warn) once for each label that comes up that isn't used. Right now it would be silent, and people might want to know.



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36644>

    Why doesn't this support pushProjection as well?



contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36642>

    Spaces and tabs after the last character are discouraged



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java
<https://reviews.apache.org/r/9685/#comment36643>

    you should be able to just do new PigServer(ExecType.LOCAL)


- Jonathan Coveney


On Feb. 28, 2013, 10:53 p.m., Taku Miyakawa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9685/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2013, 10:53 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> -------
> 
> This is a review board for  https://issues.apache.org/jira/browse/PIG-3215
> 
> The patch adds LTSVLoader function and its test class.
> 
> 
> This addresses bug PIG-3215.
>     https://issues.apache.org/jira/browse/PIG-3215
> 
> 
> Diffs
> -----
> 
>   contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/LTSVLoader.java PRE-CREATION 
>   contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestLTSVLoader.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/9685/diff/
> 
> 
> Testing
> -------
> 
> ant compile-test
> cd contrib/piggybank/java/
> ant -Dtestcase=TestLTSVLoader test
> 
> 
> Thanks,
> 
> Taku Miyakawa
> 
>