You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2011/06/15 20:40:48 UTC

[jira] [Created] (PIG-2127) PigStorageSchema need to deal with missing field

PigStorageSchema need to deal with missing field
------------------------------------------------

                 Key: PIG-2127
                 URL: https://issues.apache.org/jira/browse/PIG-2127
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.10
            Reporter: Daniel Dai
             Fix For: 0.10


Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with missing field

Posted by "Andrew Perepelytsya (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474535#comment-13474535 ] 

Andrew Perepelytsya commented on PIG-2127:
------------------------------------------

The reason Prashant couldn't reproduce the issue is that it doesn't happen with the schema declared inline. However, if schema has been loaded from the .pig_schema file (e.g. leave PigStorage defaults, it will load it if available), things break.
                
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
>                 Key: PIG-2127
>                 URL: https://issues.apache.org/jira/browse/PIG-2127
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.10.0
>            Reporter: Daniel Dai
>             Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with missing field

Posted by "Andrew Perepelytsya (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474529#comment-13474529 ] 

Andrew Perepelytsya commented on PIG-2127:
------------------------------------------

Hi, I've come across this regression in 0.10.0. However, after building PIG from the branch-0.10 (upcoming 0.10.1 as of Oct 11 2012), the issue is still there, nothing was fixed. This is a major problem, as it breaks the very basic promise of Pig being resilient to an evolving schema, please don't let it slip into 0.10.1 and fix it.
                
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
>                 Key: PIG-2127
>                 URL: https://issues.apache.org/jira/browse/PIG-2127
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.10.0
>            Reporter: Daniel Dai
>             Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with missing field

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290371#comment-13290371 ] 

Thejas M Nair commented on PIG-2127:
------------------------------------

Vivek,
Please open a new jira linked to this one.
There does not seem to be an option of reopening this one.
                
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
>                 Key: PIG-2127
>                 URL: https://issues.apache.org/jira/browse/PIG-2127
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.10.0
>            Reporter: Daniel Dai
>             Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with missing field

Posted by "Vivek Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290233#comment-13290233 ] 

Vivek Padmanabhan commented on PIG-2127:
----------------------------------------

I think this issue is still present with PigStorage -schema option,


{code}
a = load '2127_withschema' using PigStorage(',','-schema');
b = foreach a generate f1,f2,f3,f4;
dump b;
{code}

input
{code}
d,e,4,1
a,b,1,2
c,b
d,e,4,1
{code}

The above given script and input produces the below exception;
java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
	at java.util.ArrayList.RangeCheck(ArrayList.java:547)
	at java.util.ArrayList.get(ArrayList.java:322)
	at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:156)
	at org.apache.pig.builtin.PigStorage.applySchema(PigStorage.java:282)
	at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:246)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
	
                
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
>                 Key: PIG-2127
>                 URL: https://issues.apache.org/jira/browse/PIG-2127
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.10.0
>            Reporter: Daniel Dai
>             Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (PIG-2127) PigStorageSchema need to deal with missing field

Posted by "Olga Natkovich (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-2127.
---------------------------------

    Resolution: Fixed

Looks like PigStorageSchema has been converted to just use PigStorage implementation so it has exactly the same semantics
                
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
>                 Key: PIG-2127
>                 URL: https://issues.apache.org/jira/browse/PIG-2127
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.10
>            Reporter: Daniel Dai
>             Fix For: 0.10
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with missing field

Posted by "Vivek Padmanabhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290241#comment-13290241 ] 

Vivek Padmanabhan commented on PIG-2127:
----------------------------------------

I am seeing the same issue with PigStorage also for Pig 0.10;
Input;
d,e,4,1
a,b,1,2
c,b
d,e,4,1

Script
a = load '2127_withschema' using PigStorage(',') as (f1,f2,f3,f4);
b = foreach a generate f1,f2,f3,f4;
dump b;

The above script also results in the same IndexOutOfBound exception in Pig 0.10. (works fine with Pig 0.9)


                
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
>                 Key: PIG-2127
>                 URL: https://issues.apache.org/jira/browse/PIG-2127
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.10.0
>            Reporter: Daniel Dai
>             Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with missing field

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500468#comment-13500468 ] 

Koji Noguchi commented on PIG-2127:
-----------------------------------

bq. Vivek,
bq. Please open a new jira linked to this one.
bq. There does not seem to be an option of reopening this one.

Cloned PIG-3056.

                
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
>                 Key: PIG-2127
>                 URL: https://issues.apache.org/jira/browse/PIG-2127
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.10.0
>            Reporter: Daniel Dai
>             Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2127) PigStorageSchema need to deal with missing field

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290533#comment-13290533 ] 

Prashant Kommireddi commented on PIG-2127:
------------------------------------------

Is this happening with trunk? I can't reproduce this issue

{code}
cat data
1	3	5
4	123	b
5	12	
10

A = LOAD 'data' as (a:int, b:chararray, c:chararray);
Store A INTO 'out' using PigStorage(',', '-schema');

B = load 'out' using PigStorage(',', '-schema');
describe B;
B: {a: int,b: chararray,c: chararray}

dump B;
(1,3,5)
(4,123,b)
(5,12,)
(10,,)
{code}
dump B;
                
> PigStorageSchema need to deal with missing field
> ------------------------------------------------
>
>                 Key: PIG-2127
>                 URL: https://issues.apache.org/jira/browse/PIG-2127
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.10.0
>            Reporter: Daniel Dai
>             Fix For: 0.10.0
>
>
> Currently, if data contains fewer columns than the schema, PigStorageSchema will throw IndexOutOfBound exception (PigStorageSchema:97). We should padding null in this case as we did in PigStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira