You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2011/06/15 20:40:48 UTC
[jira] [Created] (PIG-2127) PigStorageSchema need to deal with
missing field
PigStorageSchema need to deal with missing field
------------------------------------------------
Key: PIG-2127
URL: https://issues.apache.org/jira/browse/PIG-2127
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.10
Reporter: Daniel Dai
Fix For: 0.10
Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with
missing field
Posted by "Andrew Perepelytsya (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474535#comment-13474535 ]
Andrew Perepelytsya commented on PIG-2127:
------------------------------------------
The reason Prashant couldn't reproduce the issue is that it doesn't happen with the schema declared inline. However, if schema has been loaded from the .pig_schema file (e.g. leave PigStorage defaults, it will load it if available), things break.
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
> Key: PIG-2127
> URL: https://issues.apache.org/jira/browse/PIG-2127
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.10.0
> Reporter: Daniel Dai
> Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with
missing field
Posted by "Andrew Perepelytsya (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474529#comment-13474529 ]
Andrew Perepelytsya commented on PIG-2127:
------------------------------------------
Hi, I've come across this regression in 0.10.0. However, after building PIG from the branch-0.10 (upcoming 0.10.1 as of Oct 11 2012), the issue is still there, nothing was fixed. This is a major problem, as it breaks the very basic promise of Pig being resilient to an evolving schema, please don't let it slip into 0.10.1 and fix it.
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
> Key: PIG-2127
> URL: https://issues.apache.org/jira/browse/PIG-2127
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.10.0
> Reporter: Daniel Dai
> Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with
missing field
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290371#comment-13290371 ]
Thejas M Nair commented on PIG-2127:
------------------------------------
Vivek,
Please open a new jira linked to this one.
There does not seem to be an option of reopening this one.
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
> Key: PIG-2127
> URL: https://issues.apache.org/jira/browse/PIG-2127
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.10.0
> Reporter: Daniel Dai
> Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with
missing field
Posted by "Vivek Padmanabhan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290233#comment-13290233 ]
Vivek Padmanabhan commented on PIG-2127:
----------------------------------------
I think this issue is still present with PigStorage -schema option,
{code}
a = load '2127_withschema' using PigStorage(',','-schema');
b = foreach a generate f1,f2,f3,f4;
dump b;
{code}
input
{code}
d,e,4,1
a,b,1,2
c,b
d,e,4,1
{code}
The above given script and input produces the below exception;
java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:156)
at org.apache.pig.builtin.PigStorage.applySchema(PigStorage.java:282)
at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:246)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
> Key: PIG-2127
> URL: https://issues.apache.org/jira/browse/PIG-2127
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.10.0
> Reporter: Daniel Dai
> Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2127) PigStorageSchema need to deal with
missing field
Posted by "Olga Natkovich (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich resolved PIG-2127.
---------------------------------
Resolution: Fixed
Looks like PigStorageSchema has been converted to just use PigStorage implementation so it has exactly the same semantics
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
> Key: PIG-2127
> URL: https://issues.apache.org/jira/browse/PIG-2127
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.10
> Reporter: Daniel Dai
> Fix For: 0.10
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with
missing field
Posted by "Vivek Padmanabhan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290241#comment-13290241 ]
Vivek Padmanabhan commented on PIG-2127:
----------------------------------------
I am seeing the same issue with PigStorage also for Pig 0.10;
Input;
d,e,4,1
a,b,1,2
c,b
d,e,4,1
Script
a = load '2127_withschema' using PigStorage(',') as (f1,f2,f3,f4);
b = foreach a generate f1,f2,f3,f4;
dump b;
The above script also results in the same IndexOutOfBound exception in Pig 0.10. (works fine with Pig 0.9)
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
> Key: PIG-2127
> URL: https://issues.apache.org/jira/browse/PIG-2127
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.10.0
> Reporter: Daniel Dai
> Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with
missing field
Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500468#comment-13500468 ]
Koji Noguchi commented on PIG-2127:
-----------------------------------
bq. Vivek,
bq. Please open a new jira linked to this one.
bq. There does not seem to be an option of reopening this one.
Cloned PIG-3056.
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
> Key: PIG-2127
> URL: https://issues.apache.org/jira/browse/PIG-2127
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.10.0
> Reporter: Daniel Dai
> Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with
missing field
Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290533#comment-13290533 ]
Prashant Kommireddi commented on PIG-2127:
------------------------------------------
Is this happening with trunk? I can't reproduce this issue
{code}
cat data
1 3 5
4 123 b
5 12
10
A = LOAD 'data' as (a:int, b:chararray, c:chararray);
Store A INTO 'out' using PigStorage(',', '-schema');
B = load 'out' using PigStorage(',', '-schema');
describe B;
B: {a: int,b: chararray,c: chararray}
dump B;
(1,3,5)
(4,123,b)
(5,12,)
(10,,)
{code}
dump B;
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
> Key: PIG-2127
> URL: https://issues.apache.org/jira/browse/PIG-2127
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.10.0
> Reporter: Daniel Dai
> Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira