You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ken Goodhope (JIRA)" <ji...@apache.org> on 2011/07/01 23:03:28 UTC

[jira] [Created] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

POProject throws an error with tuples containing a single non-tuple field
-------------------------------------------------------------------------

                 Key: PIG-2153
                 URL: https://issues.apache.org/jira/browse/PIG-2153
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.8.1
            Reporter: Ken Goodhope


When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out.  If that field is not a tuple, a cast exception is thrown.  This is happening in the folliwing block of code at line 401.

           if(columns.size() == 1) {
                try{
                    ret = inpValue.get(columns.get(0));
                ...
           res.result = (Tuple)ret;

I am seeing this error in a unit test that is loading an array of floats.  The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.  

({(3.3),(1.2),(5.6)})

This results on POProject attempting to cast the bag to a tuple.  Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.

(({(3.3),(1.2),(5.6)}))

In this case it would work because POProject would extract the first inner tuple and return it.  But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.

This could be fixed by first checking that the tuple does actually wrap another tuple.

           if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...

I don't know the original intent of this code well enough to say this is the appropriate fix or not.  Hoping someone with more Pig experience can help here.  Right now this is preventing the unit tests in AvroStorage from working.  I can change the unit test, but I think in this case the unit test is catching a real bug.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai resolved PIG-2153.
-----------------------------

    Resolution: Invalid

> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
>                 Key: PIG-2153
>                 URL: https://issues.apache.org/jira/browse/PIG-2153
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out.  If that field is not a tuple, a cast exception is thrown.  This is happening in the folliwing block of code at line 401.
>            if(columns.size() == 1) {
>                 try{
>                     ret = inpValue.get(columns.get(0));
>                 ...
>            res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it.  But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
>            if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not.  Hoping someone with more Pig experience can help here.  Right now this is preventing the unit tests in AvroStorage from working.  I can change the unit test, but I think in this case the unit test is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060840#comment-13060840 ] 

Pradeep Kamath commented on PIG-2153:
-------------------------------------

Also am wondering if changes (any fix) are needed in the appropriate LoadFunc rather than in POProject (if my initial hypothesis that the cast is valid is true)

> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
>                 Key: PIG-2153
>                 URL: https://issues.apache.org/jira/browse/PIG-2153
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out.  If that field is not a tuple, a cast exception is thrown.  This is happening in the folliwing block of code at line 401.
>            if(columns.size() == 1) {
>                 try{
>                     ret = inpValue.get(columns.get(0));
>                 ...
>            res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it.  But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
>            if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not.  Hoping someone with more Pig experience can help here.  Right now this is preventing the unit tests in AvroStorage from working.  I can change the unit test, but I think in this case the unit test is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060251#comment-13060251 ] 

Ken Goodhope commented on PIG-2153:
-----------------------------------

I am the first to admit this is ugly, and if someone has a better idea I would be thrilled.  I am currently running unit tests with this possible fix.

if(columns.size() == 1 && ((!overloaded && inpValue.getType(0) == DataType.TUPLE) || (overloaded && inpValue.getType(0) == DataType.BAG))) {
...

My current thinking is the reason the previous fix broke so many unit tests is single element tuples containing a databag are acceptable if overloaded is set.  I will post the results of the tests when complete.

This might fix the issue in ElephantBird, but I haven't had time to investigate that.

> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
>                 Key: PIG-2153
>                 URL: https://issues.apache.org/jira/browse/PIG-2153
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out.  If that field is not a tuple, a cast exception is thrown.  This is happening in the folliwing block of code at line 401.
>            if(columns.size() == 1) {
>                 try{
>                     ret = inpValue.get(columns.get(0));
>                 ...
>            res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it.  But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
>            if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not.  Hoping someone with more Pig experience can help here.  Right now this is preventing the unit tests in AvroStorage from working.  I can change the unit test, but I think in this case the unit test is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060298#comment-13060298 ] 

Pradeep Kamath commented on PIG-2153:
-------------------------------------

I don't have full context and given that I have not actively looked at Pig code in quite a while, my comments should be taken with a grain of salt. I am assuming POProject.getNext(Tuple) is being called because the schema (of load?) says that a tuple field should be projected. If that is indeed the case, then shouldn't the LoadFunc be returning a Tuple (with the bag in it)? The outer tuple that the LoadFunc returns simply represents a record and does not count - the types of the fields inside the outer tuple are the ones that matter in the schema and if the schema says there is one field of type Tuple, then POProject would except a type Tuple - so am wondering if the cast is correct as it is.

Again, I have been out of touch with Pig for a good 8 months now - so my thinking above could be completely wrong :) - hopefully the more active Pig committers can confirm/refute my hypothesis.

> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
>                 Key: PIG-2153
>                 URL: https://issues.apache.org/jira/browse/PIG-2153
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out.  If that field is not a tuple, a cast exception is thrown.  This is happening in the folliwing block of code at line 401.
>            if(columns.size() == 1) {
>                 try{
>                     ret = inpValue.get(columns.get(0));
>                 ...
>            res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it.  But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
>            if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not.  Hoping someone with more Pig experience can help here.  Right now this is preventing the unit tests in AvroStorage from working.  I can change the unit test, but I think in this case the unit test is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062666#comment-13062666 ] 

Ken Goodhope commented on PIG-2153:
-----------------------------------

The behavior of POProject is correct.  LoadFuncs need make sure the pig schema they return does not include the implicit wrapping tuple.  The schema should only reflect the contents inside the wrapping tuple.  

I am not 100% sure how this relates to the issue with ElephantBird, but I am reasonably convinced the problem there would lie in either how the schema is built, or possibly how the logical plan is being executed.  Regardless I believe this jira can be closed, since POProject is no longer suspect.

> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
>                 Key: PIG-2153
>                 URL: https://issues.apache.org/jira/browse/PIG-2153
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out.  If that field is not a tuple, a cast exception is thrown.  This is happening in the folliwing block of code at line 401.
>            if(columns.size() == 1) {
>                 try{
>                     ret = inpValue.get(columns.get(0));
>                 ...
>            res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it.  But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
>            if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not.  Hoping someone with more Pig experience can help here.  Right now this is preventing the unit tests in AvroStorage from working.  I can change the unit test, but I think in this case the unit test is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060758#comment-13060758 ] 

Ken Goodhope commented on PIG-2153:
-----------------------------------

Thanks Pradeep, that is actually very helpful.  If I understand you correctly, the outer tuple isn't part of the schema returned by LoadFunc.getSchema().  Is it possible that the result of LoadFunc.getNext used to be wrapped in an implicit tuple, and that is no longer happening?  

The results of the unit tests with the fix I suggested in my last comment showed 11 tests now working that were broke before, and 11 tests now breaking that used to work.  This makes me wonder if some of the tests have been written with the expectation there is an implicit wrapping tuple, and some have been written with expectation that there is no implicit wrapper.  Am I missing something?

Here are the test results.

Test that were broke and now work.
>     [junit] Test org.apache.pig.test.TestBestFitCast
>     [junit] Test org.apache.pig.test.TestCounters
>     [junit] Test org.apache.pig.test.TestDataBagAccess
>     [junit] Test org.apache.pig.test.TestEmptyInputDir
>     [junit] Test org.apache.pig.test.TestImplicitSplit
>     [junit] Test org.apache.pig.test.TestInvoker
>     [junit] Test org.apache.pig.test.TestPigRunner
>     [junit] Test org.apache.pig.test.TestPigSplit
>     [junit] Test org.apache.pig.test.TestScriptLanguage
>     [junit] Test org.apache.pig.test.TestScriptUDF
>     [junit] Test org.apache.pig.test.TestSkewedJoin

Tests that used to work, but break with the fix I tried.
<     [junit] Test org.apache.pig.test.TestCombiner FAILED
<     [junit] Test org.apache.pig.test.TestCommit FAILED
<     [junit] Test org.apache.pig.test.TestEvalPipeline2 FAILED
<     [junit] Test org.apache.pig.test.TestEvalPipelineLocal FAILED
<     [junit] Test org.apache.pig.test.TestForEachNestedPlanLocal FAILED
<     [junit] Test org.apache.pig.test.TestLimitAdjuster FAILED
<     [junit] Test org.apache.pig.test.TestMergeJoinOuter FAILED
<     [junit] Test org.apache.pig.test.TestProject FAILED
<     [junit] Test org.apache.pig.test.TestProjectRange FAILED
<     [junit] Test org.apache.pig.test.TestPruneColumn FAILED
<     [junit] Test org.apache.pig.test.TestUnionOnSchema FAILED

> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
>                 Key: PIG-2153
>                 URL: https://issues.apache.org/jira/browse/PIG-2153
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out.  If that field is not a tuple, a cast exception is thrown.  This is happening in the folliwing block of code at line 401.
>            if(columns.size() == 1) {
>                 try{
>                     ret = inpValue.get(columns.get(0));
>                 ...
>            res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it.  But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
>            if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not.  Hoping someone with more Pig experience can help here.  Right now this is preventing the unit tests in AvroStorage from working.  I can change the unit test, but I think in this case the unit test is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060034#comment-13060034 ] 

Ken Goodhope commented on PIG-2153:
-----------------------------------

I ran unit tests with the change I recommend in the description.  Good news is several tests that failed before now work and are listed below.
org.apache.pig.test.TestBestFitCast 
org.apache.pig.test.TestDataBagAccess 
org.apache.pig.test.TestGrunt 
org.apache.pig.test.TestImplicitSplit 
org.apache.pig.test.TestMapSideCogroup 
org.apache.pig.test.TestPigRunner 
org.apache.pig.test.TestPigSplit 
org.apache.pig.test.TestScriptUDF 

The bad news is several tests that were working now fail.
org.apache.pig.test.TestBuiltin 
org.apache.pig.test.TestCollectedGroup 
org.apache.pig.test.TestCombiner 
org.apache.pig.test.TestCommit 
org.apache.pig.test.TestEvalPipeline2 
org.apache.pig.test.TestEvalPipelineLocal 
org.apache.pig.test.TestFRJoin2 
org.apache.pig.test.TestFilter 
org.apache.pig.test.TestForEach 
org.apache.pig.test.TestForEachNestedPlanLocal 
org.apache.pig.test.TestJoin 
org.apache.pig.test.TestJoinSmoke 
org.apache.pig.test.TestLimitAdjuster 
org.apache.pig.test.TestLocalRearrange 
org.apache.pig.test.TestNativeMapReduce 
org.apache.pig.test.TestNewPlanImplicitSplit 
org.apache.pig.test.TestProject 
org.apache.pig.test.TestStore 
org.apache.pig.test.TestStoreInstances 
org.apache.pig.test.TestUnionOnSchema 

Obviously, there are more tests that break than get fixed.  

> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
>                 Key: PIG-2153
>                 URL: https://issues.apache.org/jira/browse/PIG-2153
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out.  If that field is not a tuple, a cast exception is thrown.  This is happening in the folliwing block of code at line 401.
>            if(columns.size() == 1) {
>                 try{
>                     ret = inpValue.get(columns.get(0));
>                 ...
>            res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it.  But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
>            if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not.  Hoping someone with more Pig experience can help here.  Right now this is preventing the unit tests in AvroStorage from working.  I can change the unit test, but I think in this case the unit test is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060835#comment-13060835 ] 

Pradeep Kamath commented on PIG-2153:
-------------------------------------

Based on my (old) knowledge, the tuple returned by LoadFunc (LoadFunc always has to return a tuple) simply stands for the record and the schema deals with the types of the fields inside it. So if the schema is A: {t:tuple(i:int,c:char)} that means each record contains one field of type tuple which has an int and char). I would think this means the LoadFunc returns an outer tuple (for the record), with a tuple inside (standing for the field) which has int and char subfields. I will let the more active committers comment on whether anything with respect to LoadFunc tuple handling has changed. Hopefully I am not giving wrong information here based my old knowledge, apologies in advance if so.

> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
>                 Key: PIG-2153
>                 URL: https://issues.apache.org/jira/browse/PIG-2153
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out.  If that field is not a tuple, a cast exception is thrown.  This is happening in the folliwing block of code at line 401.
>            if(columns.size() == 1) {
>                 try{
>                     ret = inpValue.get(columns.get(0));
>                 ...
>            res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it.  But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
>            if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not.  Hoping someone with more Pig experience can help here.  Right now this is preventing the unit tests in AvroStorage from working.  I can change the unit test, but I think in this case the unit test is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061035#comment-13061035 ] 

Ken Goodhope commented on PIG-2153:
-----------------------------------

In my LoadFunc, I modified getSchema to check for a single element wrapping tuple and return the inner ResourceSchema when one is found.  This fixed the errors I was getting from POProject.java.  The unit tests for my LoadFunc are still breaking, because the output has changed.  However I suspect the new output is correct, so after some more investigation I will probably change the unit tests.  Why including the wrapping tuple in the schema used to work is still a mystery.  Maybe someone currently working on the project can answer that question.

> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
>                 Key: PIG-2153
>                 URL: https://issues.apache.org/jira/browse/PIG-2153
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out.  If that field is not a tuple, a cast exception is thrown.  This is happening in the folliwing block of code at line 401.
>            if(columns.size() == 1) {
>                 try{
>                     ret = inpValue.get(columns.get(0));
>                 ...
>            res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it.  But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
>            if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not.  Hoping someone with more Pig experience can help here.  Right now this is preventing the unit tests in AvroStorage from working.  I can change the unit test, but I think in this case the unit test is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060062#comment-13060062 ] 

Ken Goodhope commented on PIG-2153:
-----------------------------------

It looks like the last time this code was touched it was for PIG-1369 by Pradeep Kamath.

> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
>                 Key: PIG-2153
>                 URL: https://issues.apache.org/jira/browse/PIG-2153
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out.  If that field is not a tuple, a cast exception is thrown.  This is happening in the folliwing block of code at line 401.
>            if(columns.size() == 1) {
>                 try{
>                     ret = inpValue.get(columns.get(0));
>                 ...
>            res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it.  But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
>            if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not.  Hoping someone with more Pig experience can help here.  Right now this is preventing the unit tests in AvroStorage from working.  I can change the unit test, but I think in this case the unit test is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2153) POProject throws an error with tuples containing a single non-tuple field

Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060860#comment-13060860 ] 

Ken Goodhope commented on PIG-2153:
-----------------------------------

That makes sense, and if it is still the case it would mean the fix needs to occur in the LoadFunc and not POProject.  This is also consistent with the original comments by Daniel Dae for PIG-1890. AvroStorage has always included the wrapping tuple as part of the schema. In most cases the outer tuple isn't really a wrapper, but a record with multiple fields and that works fine.  Later tonight I will take a look and see what changes I need to make at the LoadFunc level.  I am still perplexed why the incorrect behavior used to work.  Thanks again Pradeep.

> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
>                 Key: PIG-2153
>                 URL: https://issues.apache.org/jira/browse/PIG-2153
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out.  If that field is not a tuple, a cast exception is thrown.  This is happening in the folliwing block of code at line 401.
>            if(columns.size() == 1) {
>                 try{
>                     ret = inpValue.get(columns.get(0));
>                 ...
>            res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats.  The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.  
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple.  Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it.  But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
>            if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not.  Hoping someone with more Pig experience can help here.  Right now this is preventing the unit tests in AvroStorage from working.  I can change the unit test, but I think in this case the unit test is catching a real bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira