You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ken Goodhope (JIRA)" <ji...@apache.org> on 2011/07/01 23:03:28 UTC
[jira] [Created] (PIG-2153) POProject throws an error with tuples
containing a single non-tuple field
POProject throws an error with tuples containing a single non-tuple field
-------------------------------------------------------------------------
Key: PIG-2153
URL: https://issues.apache.org/jira/browse/PIG-2153
Project: Pig
Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Ken Goodhope
When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out. If that field is not a tuple, a cast exception is thrown. This is happening in the folliwing block of code at line 401.
if(columns.size() == 1) {
try{
ret = inpValue.get(columns.get(0));
...
res.result = (Tuple)ret;
I am seeing this error in a unit test that is loading an array of floats. The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.
({(3.3),(1.2),(5.6)})
This results on POProject attempting to cast the bag to a tuple. Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
(({(3.3),(1.2),(5.6)}))
In this case it would work because POProject would extract the first inner tuple and return it. But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
This could be fixed by first checking that the tuple does actually wrap another tuple.
if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
I don't know the original intent of this code well enough to say this is the appropriate fix or not. Hoping someone with more Pig experience can help here. Right now this is preventing the unit tests in AvroStorage from working. I can change the unit test, but I think in this case the unit test is catching a real bug.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2153) POProject throws an error with tuples
containing a single non-tuple field
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai resolved PIG-2153.
-----------------------------
Resolution: Invalid
> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1
> Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out. If that field is not a tuple, a cast exception is thrown. This is happening in the folliwing block of code at line 401.
> if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
> res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats. The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple. Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it. But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
> if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not. Hoping someone with more Pig experience can help here. Right now this is preventing the unit tests in AvroStorage from working. I can change the unit test, but I think in this case the unit test is catching a real bug.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples
containing a single non-tuple field
Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060840#comment-13060840 ]
Pradeep Kamath commented on PIG-2153:
-------------------------------------
Also am wondering if changes (any fix) are needed in the appropriate LoadFunc rather than in POProject (if my initial hypothesis that the cast is valid is true)
> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1
> Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out. If that field is not a tuple, a cast exception is thrown. This is happening in the folliwing block of code at line 401.
> if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
> res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats. The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple. Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it. But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
> if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not. Hoping someone with more Pig experience can help here. Right now this is preventing the unit tests in AvroStorage from working. I can change the unit test, but I think in this case the unit test is catching a real bug.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples
containing a single non-tuple field
Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060251#comment-13060251 ]
Ken Goodhope commented on PIG-2153:
-----------------------------------
I am the first to admit this is ugly, and if someone has a better idea I would be thrilled. I am currently running unit tests with this possible fix.
if(columns.size() == 1 && ((!overloaded && inpValue.getType(0) == DataType.TUPLE) || (overloaded && inpValue.getType(0) == DataType.BAG))) {
...
My current thinking is the reason the previous fix broke so many unit tests is single element tuples containing a databag are acceptable if overloaded is set. I will post the results of the tests when complete.
This might fix the issue in ElephantBird, but I haven't had time to investigate that.
> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1
> Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out. If that field is not a tuple, a cast exception is thrown. This is happening in the folliwing block of code at line 401.
> if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
> res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats. The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple. Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it. But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
> if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not. Hoping someone with more Pig experience can help here. Right now this is preventing the unit tests in AvroStorage from working. I can change the unit test, but I think in this case the unit test is catching a real bug.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples
containing a single non-tuple field
Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060298#comment-13060298 ]
Pradeep Kamath commented on PIG-2153:
-------------------------------------
I don't have full context and given that I have not actively looked at Pig code in quite a while, my comments should be taken with a grain of salt. I am assuming POProject.getNext(Tuple) is being called because the schema (of load?) says that a tuple field should be projected. If that is indeed the case, then shouldn't the LoadFunc be returning a Tuple (with the bag in it)? The outer tuple that the LoadFunc returns simply represents a record and does not count - the types of the fields inside the outer tuple are the ones that matter in the schema and if the schema says there is one field of type Tuple, then POProject would except a type Tuple - so am wondering if the cast is correct as it is.
Again, I have been out of touch with Pig for a good 8 months now - so my thinking above could be completely wrong :) - hopefully the more active Pig committers can confirm/refute my hypothesis.
> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1
> Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out. If that field is not a tuple, a cast exception is thrown. This is happening in the folliwing block of code at line 401.
> if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
> res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats. The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple. Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it. But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
> if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not. Hoping someone with more Pig experience can help here. Right now this is preventing the unit tests in AvroStorage from working. I can change the unit test, but I think in this case the unit test is catching a real bug.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples
containing a single non-tuple field
Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062666#comment-13062666 ]
Ken Goodhope commented on PIG-2153:
-----------------------------------
The behavior of POProject is correct. LoadFuncs need make sure the pig schema they return does not include the implicit wrapping tuple. The schema should only reflect the contents inside the wrapping tuple.
I am not 100% sure how this relates to the issue with ElephantBird, but I am reasonably convinced the problem there would lie in either how the schema is built, or possibly how the logical plan is being executed. Regardless I believe this jira can be closed, since POProject is no longer suspect.
> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1
> Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out. If that field is not a tuple, a cast exception is thrown. This is happening in the folliwing block of code at line 401.
> if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
> res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats. The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple. Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it. But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
> if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not. Hoping someone with more Pig experience can help here. Right now this is preventing the unit tests in AvroStorage from working. I can change the unit test, but I think in this case the unit test is catching a real bug.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples
containing a single non-tuple field
Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060758#comment-13060758 ]
Ken Goodhope commented on PIG-2153:
-----------------------------------
Thanks Pradeep, that is actually very helpful. If I understand you correctly, the outer tuple isn't part of the schema returned by LoadFunc.getSchema(). Is it possible that the result of LoadFunc.getNext used to be wrapped in an implicit tuple, and that is no longer happening?
The results of the unit tests with the fix I suggested in my last comment showed 11 tests now working that were broke before, and 11 tests now breaking that used to work. This makes me wonder if some of the tests have been written with the expectation there is an implicit wrapping tuple, and some have been written with expectation that there is no implicit wrapper. Am I missing something?
Here are the test results.
Test that were broke and now work.
> [junit] Test org.apache.pig.test.TestBestFitCast
> [junit] Test org.apache.pig.test.TestCounters
> [junit] Test org.apache.pig.test.TestDataBagAccess
> [junit] Test org.apache.pig.test.TestEmptyInputDir
> [junit] Test org.apache.pig.test.TestImplicitSplit
> [junit] Test org.apache.pig.test.TestInvoker
> [junit] Test org.apache.pig.test.TestPigRunner
> [junit] Test org.apache.pig.test.TestPigSplit
> [junit] Test org.apache.pig.test.TestScriptLanguage
> [junit] Test org.apache.pig.test.TestScriptUDF
> [junit] Test org.apache.pig.test.TestSkewedJoin
Tests that used to work, but break with the fix I tried.
< [junit] Test org.apache.pig.test.TestCombiner FAILED
< [junit] Test org.apache.pig.test.TestCommit FAILED
< [junit] Test org.apache.pig.test.TestEvalPipeline2 FAILED
< [junit] Test org.apache.pig.test.TestEvalPipelineLocal FAILED
< [junit] Test org.apache.pig.test.TestForEachNestedPlanLocal FAILED
< [junit] Test org.apache.pig.test.TestLimitAdjuster FAILED
< [junit] Test org.apache.pig.test.TestMergeJoinOuter FAILED
< [junit] Test org.apache.pig.test.TestProject FAILED
< [junit] Test org.apache.pig.test.TestProjectRange FAILED
< [junit] Test org.apache.pig.test.TestPruneColumn FAILED
< [junit] Test org.apache.pig.test.TestUnionOnSchema FAILED
> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1
> Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out. If that field is not a tuple, a cast exception is thrown. This is happening in the folliwing block of code at line 401.
> if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
> res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats. The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple. Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it. But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
> if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not. Hoping someone with more Pig experience can help here. Right now this is preventing the unit tests in AvroStorage from working. I can change the unit test, but I think in this case the unit test is catching a real bug.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples
containing a single non-tuple field
Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060034#comment-13060034 ]
Ken Goodhope commented on PIG-2153:
-----------------------------------
I ran unit tests with the change I recommend in the description. Good news is several tests that failed before now work and are listed below.
org.apache.pig.test.TestBestFitCast
org.apache.pig.test.TestDataBagAccess
org.apache.pig.test.TestGrunt
org.apache.pig.test.TestImplicitSplit
org.apache.pig.test.TestMapSideCogroup
org.apache.pig.test.TestPigRunner
org.apache.pig.test.TestPigSplit
org.apache.pig.test.TestScriptUDF
The bad news is several tests that were working now fail.
org.apache.pig.test.TestBuiltin
org.apache.pig.test.TestCollectedGroup
org.apache.pig.test.TestCombiner
org.apache.pig.test.TestCommit
org.apache.pig.test.TestEvalPipeline2
org.apache.pig.test.TestEvalPipelineLocal
org.apache.pig.test.TestFRJoin2
org.apache.pig.test.TestFilter
org.apache.pig.test.TestForEach
org.apache.pig.test.TestForEachNestedPlanLocal
org.apache.pig.test.TestJoin
org.apache.pig.test.TestJoinSmoke
org.apache.pig.test.TestLimitAdjuster
org.apache.pig.test.TestLocalRearrange
org.apache.pig.test.TestNativeMapReduce
org.apache.pig.test.TestNewPlanImplicitSplit
org.apache.pig.test.TestProject
org.apache.pig.test.TestStore
org.apache.pig.test.TestStoreInstances
org.apache.pig.test.TestUnionOnSchema
Obviously, there are more tests that break than get fixed.
> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1
> Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out. If that field is not a tuple, a cast exception is thrown. This is happening in the folliwing block of code at line 401.
> if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
> res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats. The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple. Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it. But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
> if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not. Hoping someone with more Pig experience can help here. Right now this is preventing the unit tests in AvroStorage from working. I can change the unit test, but I think in this case the unit test is catching a real bug.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples
containing a single non-tuple field
Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060835#comment-13060835 ]
Pradeep Kamath commented on PIG-2153:
-------------------------------------
Based on my (old) knowledge, the tuple returned by LoadFunc (LoadFunc always has to return a tuple) simply stands for the record and the schema deals with the types of the fields inside it. So if the schema is A: {t:tuple(i:int,c:char)} that means each record contains one field of type tuple which has an int and char). I would think this means the LoadFunc returns an outer tuple (for the record), with a tuple inside (standing for the field) which has int and char subfields. I will let the more active committers comment on whether anything with respect to LoadFunc tuple handling has changed. Hopefully I am not giving wrong information here based my old knowledge, apologies in advance if so.
> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1
> Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out. If that field is not a tuple, a cast exception is thrown. This is happening in the folliwing block of code at line 401.
> if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
> res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats. The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple. Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it. But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
> if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not. Hoping someone with more Pig experience can help here. Right now this is preventing the unit tests in AvroStorage from working. I can change the unit test, but I think in this case the unit test is catching a real bug.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples
containing a single non-tuple field
Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061035#comment-13061035 ]
Ken Goodhope commented on PIG-2153:
-----------------------------------
In my LoadFunc, I modified getSchema to check for a single element wrapping tuple and return the inner ResourceSchema when one is found. This fixed the errors I was getting from POProject.java. The unit tests for my LoadFunc are still breaking, because the output has changed. However I suspect the new output is correct, so after some more investigation I will probably change the unit tests. Why including the wrapping tuple in the schema used to work is still a mystery. Maybe someone currently working on the project can answer that question.
> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1
> Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out. If that field is not a tuple, a cast exception is thrown. This is happening in the folliwing block of code at line 401.
> if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
> res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats. The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple. Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it. But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
> if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not. Hoping someone with more Pig experience can help here. Right now this is preventing the unit tests in AvroStorage from working. I can change the unit test, but I think in this case the unit test is catching a real bug.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples
containing a single non-tuple field
Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060062#comment-13060062 ]
Ken Goodhope commented on PIG-2153:
-----------------------------------
It looks like the last time this code was touched it was for PIG-1369 by Pradeep Kamath.
> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1
> Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out. If that field is not a tuple, a cast exception is thrown. This is happening in the folliwing block of code at line 401.
> if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
> res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats. The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple. Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it. But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
> if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not. Hoping someone with more Pig experience can help here. Right now this is preventing the unit tests in AvroStorage from working. I can change the unit test, but I think in this case the unit test is catching a real bug.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2153) POProject throws an error with tuples
containing a single non-tuple field
Posted by "Ken Goodhope (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060860#comment-13060860 ]
Ken Goodhope commented on PIG-2153:
-----------------------------------
That makes sense, and if it is still the case it would mean the fix needs to occur in the LoadFunc and not POProject. This is also consistent with the original comments by Daniel Dae for PIG-1890. AvroStorage has always included the wrapping tuple as part of the schema. In most cases the outer tuple isn't really a wrapper, but a record with multiple fields and that works fine. Later tonight I will take a look and see what changes I need to make at the LoadFunc level. I am still perplexed why the incorrect behavior used to work. Thanks again Pradeep.
> POProject throws an error with tuples containing a single non-tuple field
> -------------------------------------------------------------------------
>
> Key: PIG-2153
> URL: https://issues.apache.org/jira/browse/PIG-2153
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1
> Reporter: Ken Goodhope
>
> When POProject.getNext(tuple) processes a tuple with one field, the field is pulled out. If that field is not a tuple, a cast exception is thrown. This is happening in the folliwing block of code at line 401.
> if(columns.size() == 1) {
> try{
> ret = inpValue.get(columns.get(0));
> ...
> res.result = (Tuple)ret;
> I am seeing this error in a unit test that is loading an array of floats. The LoadFunc is converting the array to bag, and wrapping the bag in a tuple.
> ({(3.3),(1.2),(5.6)})
> This results on POProject attempting to cast the bag to a tuple. Looking at the code, it appears that if I wrapped the previous tuple in another tuple, then it would work.
> (({(3.3),(1.2),(5.6)}))
> In this case it would work because POProject would extract the first inner tuple and return it. But this would require the LoadFunc to check for tuples with a single non-tuple field and only wrap those.
> This could be fixed by first checking that the tuple does actually wrap another tuple.
> if(columns.size() == 1 && inpValue.getType(0) == DataType.TUPLE) {...
> I don't know the original intent of this code well enough to say this is the appropriate fix or not. Hoping someone with more Pig experience can help here. Right now this is preventing the unit tests in AvroStorage from working. I can change the unit test, but I think in this case the unit test is catching a real bug.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira