You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Viraj Bhat (JIRA)" <ji...@apache.org> on 2009/05/02 06:03:30 UTC

[jira] Created: (PIG-798) Schema errors when using PigStorage and none when using BinStorage??

Schema errors when using PigStorage and none when using BinStorage??
--------------------------------------------------------------------

                 Key: PIG-798
                 URL: https://issues.apache.org/jira/browse/PIG-798
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.2.0
            Reporter: Viraj Bhat
             Fix For: 0.2.0


In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
{code}
A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);

B = group A by name;

store B into '/user/viraj/binstoragecreateop' using BinStorage();

dump B;
{code}

I later load file 'binstoragecreateop' in the following way.
{code}

A = load '/user/viraj/binstoragecreateop' using BinStorage();

B = foreach A generate $0 as name:chararray;

dump B;
{code}
Result
=======================================================================
(Amy)
(Fred)
=======================================================================
The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
{code}
A = load '/user/viraj/visits.txt' using PigStorage();

B = foreach A generate $0 as name:chararray;

dump B;

{code}
=======================================================================
2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
=======================================================================
So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Viraj Bhat updated PIG-798:
---------------------------

    Affects Version/s: 0.6.0
                       0.5.0
                       0.4.0
                       0.3.0
                       0.7.0
                       0.8.0

> Schema errors when using PigStorage and none when using BinStorage in FOREACH??
> -------------------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0
>            Reporter: Viraj Bhat
>         Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860452#action_12860452 ] 

Viraj Bhat commented on PIG-798:
--------------------------------

Hi Ashutosh,
 The problem here is not about using the data interchangeably between BinStorage() and PigStorage(), it is about the consistency issues in schema. Sorry if the description was unclear.

I can see that it is possible to write statements such as this using BinStorage() 

{code}
A = load 'somedata' using BinStorage();
B = foreach A generate $0 as name:chararray;
dump B;
{code}

and not write it using PigStorage().

Should we not support the following statement, as a user I am interested in projecting the first column and casting it to a chararray. I am not interested in knowing what the schemas are of other columns!!

Fails when I do the following:
{code}
A = load 'somedata' using PigStorage();
B = foreach A generate $0 as name:chararray;
dump B;
{code}

Can you tell me why the schema specification in FOREACH GENERATE works with BinStorage and not in PigStorage? 

Viraj

> Schema errors when using PigStorage and none when using BinStorage in FOREACH??
> -------------------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Viraj Bhat
>         Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861122#action_12861122 ] 

Ashutosh Chauhan commented on PIG-798:
--------------------------------------

1.
{noformat}
 b = foreach a generate (chararray) $0 as name; 
{noformat}

2. {noformat}
B = foreach A generate $0 as name:chararray;
{noformat}

@Viraj,

Discussed with Alan and Daniel. Language semantics for achieving this functionality with whatever loader is 1. The fact that 2 works for BinStorage is unfortunate and is bug. It is something which is currently there for backward compatibility and will eventually be removed. 


> Schema errors when using PigStorage and none when using BinStorage in FOREACH??
> -------------------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0
>            Reporter: Viraj Bhat
>         Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861097#action_12861097 ] 

Viraj Bhat commented on PIG-798:
--------------------------------

Hi Ashutosh,
 Yes that is possible, I know that we can do that in PigStorage() but why can we not do this in PigStorage? What do I need to cast as (chararray) ?
{code}
A = load 'somedata' using PigStorage();
B = foreach A generate $0 as name:chararray;
dump B;
{code}

But this is possible in BinStorage(), why is this not consistent?

Is it that BinStorage() has schemas embedded while PigStorage() does not? 

Should this not be fixed to make it consistent across storage formats?

Viraj

> Schema errors when using PigStorage and none when using BinStorage in FOREACH??
> -------------------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0
>            Reporter: Viraj Bhat
>         Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-798:
-------------------------------

    Fix Version/s: 0.9.0

> Schema errors when using PigStorage and none when using BinStorage in FOREACH??
> -------------------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0
>            Reporter: Viraj Bhat
>             Fix For: 0.9.0
>
>         Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859159#action_12859159 ] 

Ashutosh Chauhan commented on PIG-798:
--------------------------------------

Viraj,

I am confused with this description. It seems to me that you are first storing some data using BinStorage and then loading it using PigStorage. If that is so, obviously it will not work. PigStorage and BinStorage aren't interoperable in this way. Specifically, data stored using BinStorage, can only be loaded using BinStorage.

> Schema errors when using PigStorage and none when using BinStorage in FOREACH??
> -------------------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Viraj Bhat
>         Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Viraj Bhat updated PIG-798:
---------------------------

    Summary: Schema errors when using PigStorage and none when using BinStorage in FOREACH??  (was: Schema errors when using PigStorage and none when using BinStorage??)

> Schema errors when using PigStorage and none when using BinStorage in FOREACH??
> -------------------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Viraj Bhat
>             Fix For: 0.2.0
>
>         Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates reassigned PIG-798:
------------------------------

    Assignee: Alan Gates

> Schema errors when using PigStorage and none when using BinStorage in FOREACH??
> -------------------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0
>            Reporter: Viraj Bhat
>            Assignee: Alan Gates
>             Fix For: 0.9.0
>
>         Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-798) Schema errors when using PigStorage and none when using BinStorage??

Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Viraj Bhat updated PIG-798:
---------------------------

    Description: 
In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
{code}
A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);

B = group A by name;

store B into '/user/viraj/binstoragecreateop' using BinStorage();

dump B;
{code}

I later load file 'binstoragecreateop' in the following way.
{code}

A = load '/user/viraj/binstoragecreateop' using BinStorage();

B = foreach A generate $0 as name:chararray;

dump B;
{code}
Result
=======================================================================
(Amy)
(Fred)
=======================================================================
The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
{code}
A = load '/user/viraj/visits.txt' using PigStorage();

B = foreach A generate $0 as name:chararray;

dump B;

{code}
=======================================================================
{code}
2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
{code}
=======================================================================
So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

  was:
In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
{code}
A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);

B = group A by name;

store B into '/user/viraj/binstoragecreateop' using BinStorage();

dump B;
{code}

I later load file 'binstoragecreateop' in the following way.
{code}

A = load '/user/viraj/binstoragecreateop' using BinStorage();

B = foreach A generate $0 as name:chararray;

dump B;
{code}
Result
=======================================================================
(Amy)
(Fred)
=======================================================================
The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
{code}
A = load '/user/viraj/visits.txt' using PigStorage();

B = foreach A generate $0 as name:chararray;

dump B;

{code}
=======================================================================
2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
=======================================================================
So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.


> Schema errors when using PigStorage and none when using BinStorage??
> --------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Viraj Bhat
>             Fix For: 0.2.0
>
>
> In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-798) Schema errors when using PigStorage and none when using BinStorage??

Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Viraj Bhat updated PIG-798:
---------------------------

    Attachment: binstoragecreateop
                visits.txt
                schemaerr.pig

Pig script, Input File and Bin Storage() input file

> Schema errors when using PigStorage and none when using BinStorage??
> --------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Viraj Bhat
>             Fix For: 0.2.0
>
>         Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860598#action_12860598 ] 

Ashutosh Chauhan commented on PIG-798:
--------------------------------------

You can specify schema in FOREACH GENERATE with PigStorage loader as follows:
{code}
grunt> a = load 'data' using PigStorage();
grunt> b = foreach a generate (chararray) $0 as name; 
grunt> describe b;
b: {name: chararray}
grunt> dump b;
{code}

I get the expected result.

> Schema errors when using PigStorage and none when using BinStorage in FOREACH??
> -------------------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0
>            Reporter: Viraj Bhat
>         Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

Posted by "Viraj Bhat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861134#action_12861134 ] 

Viraj Bhat commented on PIG-798:
--------------------------------

Ashutosh thanks for clarifying, we will wait till that bug is fixed in BinStorage

Viraj

> Schema errors when using PigStorage and none when using BinStorage in FOREACH??
> -------------------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0
>            Reporter: Viraj Bhat
>         Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.