You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Graham Lea (JIRA)" <ji...@apache.org> on 2011/09/26 03:43:26 UTC

[jira] [Created] (PIG-2303) Documentation says the MAX function can be used on chararray, but it can't

Documentation says the MAX function can be used on chararray, but it can't
--------------------------------------------------------------------------

                 Key: PIG-2303
                 URL: https://issues.apache.org/jira/browse/PIG-2303
             Project: Pig
          Issue Type: Bug
          Components: documentation
            Reporter: Graham Lea
            Priority: Trivial


Here: http://pig.apache.org/docs/r0.9.0/func.html#max
It says MIN/Max can be used on chararray, but the result of those functions is always a double.
Had to search through the Pig Javadoc to find that I should use StringMin/StringMax instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2303) FOREACH doesn't allow explicit schema for MAX() result to be chararray

Posted by "Graham Lea (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Graham Lea updated PIG-2303:
----------------------------

    Affects Version/s:     (was: 0.9.0)
                       0.8.1
        Fix Version/s: 0.9.0

Oh dear.
I'm using the latest Cloudera distribution, which is based on 0.8.1.
Sorry!
                
> FOREACH doesn't allow explicit schema for MAX() result to be chararray
> ----------------------------------------------------------------------
>
>                 Key: PIG-2303
>                 URL: https://issues.apache.org/jira/browse/PIG-2303
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.8.1
>            Reporter: Graham Lea
>            Priority: Trivial
>             Fix For: 0.9.0
>
>
> Here: http://pig.apache.org/docs/r0.9.0/func.html#max
> It says MIN/Max can be used on chararray, but the result of those functions is always a double.
> Had to search through the Pig Javadoc to find that I should use StringMin/StringMax instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (PIG-2303) Documentation says the MAX function can be used on chararray, but it can't

Posted by "Daniel Dai (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai resolved PIG-2303.
-----------------------------

    Resolution: Invalid
    
> Documentation says the MAX function can be used on chararray, but it can't
> --------------------------------------------------------------------------
>
>                 Key: PIG-2303
>                 URL: https://issues.apache.org/jira/browse/PIG-2303
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation
>            Reporter: Graham Lea
>            Priority: Trivial
>              Labels: documentation
>
> Here: http://pig.apache.org/docs/r0.9.0/func.html#max
> It says MIN/Max can be used on chararray, but the result of those functions is always a double.
> Had to search through the Pig Javadoc to find that I should use StringMin/StringMax instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2303) FOREACH doesn't allow explicit schema for MAX() result to be chararray

Posted by "Daniel Dai (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115098#comment-13115098 ] 

Daniel Dai commented on PIG-2303:
---------------------------------

I still didn't see. I run this with success:

{code}
A = LOAD 'student.txt' USING PigStorage(',') AS (name:chararray, age:int, gpa:chararray);
B = group A by age;
C = foreach B generate MAX(A.name) as maxName: chararray;
explain C;
{code}

describe C gives me:
C: {maxName: chararray}

I'm using Pig 0.9.0.
                
> FOREACH doesn't allow explicit schema for MAX() result to be chararray
> ----------------------------------------------------------------------
>
>                 Key: PIG-2303
>                 URL: https://issues.apache.org/jira/browse/PIG-2303
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9.0
>            Reporter: Graham Lea
>            Priority: Trivial
>
> Here: http://pig.apache.org/docs/r0.9.0/func.html#max
> It says MIN/Max can be used on chararray, but the result of those functions is always a double.
> Had to search through the Pig Javadoc to find that I should use StringMin/StringMax instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2303) Documentation says the MAX function can be used on chararray, but it can't

Posted by "Graham Lea (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114431#comment-13114431 ] 

Graham Lea commented on PIG-2303:
---------------------------------

Note: The error I was getting on entering the line using MIN + MAX into Grunt was:
{code}
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: double. Other Field Schema: fieldName: chararray
{code}
(Just adding this to help people searching for this error message to find a solution.)

> Documentation says the MAX function can be used on chararray, but it can't
> --------------------------------------------------------------------------
>
>                 Key: PIG-2303
>                 URL: https://issues.apache.org/jira/browse/PIG-2303
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation
>            Reporter: Graham Lea
>            Priority: Trivial
>              Labels: documentation
>
> Here: http://pig.apache.org/docs/r0.9.0/func.html#max
> It says MIN/Max can be used on chararray, but the result of those functions is always a double.
> Had to search through the Pig Javadoc to find that I should use StringMin/StringMax instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2303) FOREACH doesn't allow explicit schema for MAX() result to be chararray

Posted by "Graham Lea (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Graham Lea updated PIG-2303:
----------------------------

          Component/s:     (was: documentation)
                       parser
    Affects Version/s: 0.9.0
               Labels:   (was: documentation)
              Summary: FOREACH doesn't allow explicit schema for MAX() result to be chararray  (was: Documentation says the MAX function can be used on chararray, but it can't)
    
> FOREACH doesn't allow explicit schema for MAX() result to be chararray
> ----------------------------------------------------------------------
>
>                 Key: PIG-2303
>                 URL: https://issues.apache.org/jira/browse/PIG-2303
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9.0
>            Reporter: Graham Lea
>            Priority: Trivial
>
> Here: http://pig.apache.org/docs/r0.9.0/func.html#max
> It says MIN/Max can be used on chararray, but the result of those functions is always a double.
> Had to search through the Pig Javadoc to find that I should use StringMin/StringMax instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2303) Documentation says the MAX function can be used on chararray, but it can't

Posted by "Daniel Dai (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115042#comment-13115042 ] 

Daniel Dai commented on PIG-2303:
---------------------------------

Hi, Graham, I ran the following script with success:

{code}
A = LOAD 'student.txt' AS (name:chararray, age:int, gpa:chararray);
B = group A by age;
C = foreach B generate MAX(A.name);
dump C;
{code}

I believe your script fail for a different reason. Can you post your script?
                
> Documentation says the MAX function can be used on chararray, but it can't
> --------------------------------------------------------------------------
>
>                 Key: PIG-2303
>                 URL: https://issues.apache.org/jira/browse/PIG-2303
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation
>            Reporter: Graham Lea
>            Priority: Trivial
>              Labels: documentation
>
> Here: http://pig.apache.org/docs/r0.9.0/func.html#max
> It says MIN/Max can be used on chararray, but the result of those functions is always a double.
> Had to search through the Pig Javadoc to find that I should use StringMin/StringMax instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (PIG-2303) FOREACH doesn't allow explicit schema for MAX() result to be chararray

Posted by "Graham Lea (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Graham Lea resolved PIG-2303.
-----------------------------

    Resolution: Cannot Reproduce
    
> FOREACH doesn't allow explicit schema for MAX() result to be chararray
> ----------------------------------------------------------------------
>
>                 Key: PIG-2303
>                 URL: https://issues.apache.org/jira/browse/PIG-2303
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.8.1
>            Reporter: Graham Lea
>            Priority: Trivial
>             Fix For: 0.9.0
>
>
> Here: http://pig.apache.org/docs/r0.9.0/func.html#max
> It says MIN/Max can be used on chararray, but the result of those functions is always a double.
> Had to search through the Pig Javadoc to find that I should use StringMin/StringMax instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2303) Documentation says the MAX function can be used on chararray, but it can't

Posted by "Graham Lea (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115014#comment-13115014 ] 

Graham Lea commented on PIG-2303:
---------------------------------

>From the log file:
{noformat}
Pig Stack Trace
---------------
ERROR 1022: Type mismatch merging schema prefix. Field Schema: double. Other Field Schema: firstMonth: chararray

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Type mismatch merging schema prefix. Field Schema: double. Other Field Schema: firstMonth: chararray
	at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1618)
	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1562)
	at org.apache.pig.PigServer.registerQuery(PigServer.java:534)
	at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:871)
	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:388)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
	at org.apache.pig.Main.run(Main.java:455)
	at org.apache.pig.Main.main(Main.java:107)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problems in merging user defined schema
	at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:853)
	at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
	at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1612)
	... 9 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1016: Problems in merging user defined schema
	at org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:337)
	at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:851)
	... 11 more
Caused by: org.apache.pig.impl.logicalLayer.schema.SchemaMergeException: ERROR 1022: Type mismatch merging schema prefix. Field Schema: double. Other Field Schema: firstMonth: chararray
	at org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.mergePrefixFieldSchema(Schema.java:550)
	at org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.mergePrefixFieldSchema(Schema.java:474)
	at org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:332)
	... 12 more
================================================================================
{noformat}
                
> Documentation says the MAX function can be used on chararray, but it can't
> --------------------------------------------------------------------------
>
>                 Key: PIG-2303
>                 URL: https://issues.apache.org/jira/browse/PIG-2303
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation
>            Reporter: Graham Lea
>            Priority: Trivial
>              Labels: documentation
>
> Here: http://pig.apache.org/docs/r0.9.0/func.html#max
> It says MIN/Max can be used on chararray, but the result of those functions is always a double.
> Had to search through the Pig Javadoc to find that I should use StringMin/StringMax instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2303) Documentation says the MAX function can be used on chararray, but it can't

Posted by "Daniel Dai (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114912#comment-13114912 ] 

Daniel Dai commented on PIG-2303:
---------------------------------

It's not about MAX/MIN. "+" does not concat strings. You need to use CONCAT UDF.
                
> Documentation says the MAX function can be used on chararray, but it can't
> --------------------------------------------------------------------------
>
>                 Key: PIG-2303
>                 URL: https://issues.apache.org/jira/browse/PIG-2303
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation
>            Reporter: Graham Lea
>            Priority: Trivial
>              Labels: documentation
>
> Here: http://pig.apache.org/docs/r0.9.0/func.html#max
> It says MIN/Max can be used on chararray, but the result of those functions is always a double.
> Had to search through the Pig Javadoc to find that I should use StringMin/StringMax instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (PIG-2303) Documentation says the MAX function can be used on chararray, but it can't

Posted by "Graham Lea (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Graham Lea reopened PIG-2303:
-----------------------------


I don't believe I'm using any string concatenation, unless this is happening under the hood without my knowledge.

Here's a simple script showing what I described:
{code}
$ bin/pig 
2011-09-27 09:01:26,411 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/development/pig-0.8.1-cdh3u1/pig_1317078086408.log
2011-09-27 09:01:26,592 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2011-09-27 09:01:26,751 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001
grunt> A = LOAD 'clientInvoiceIds' AS (clientId:int, invoiceMonth:chararray);
grunt> B = GROUP A BY clientId;
grunt> describe B
B: {group: int,A: {clientId: int,invoiceMonth: chararray}}
grunt> C = FOREACH B GENERATE group, MIN(A.invoiceMonth) as firstMonth:chararray, MAX(A.invoiceMonth) as lastMonth:chararray;
2011-09-27 09:01:45,992 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: double. Other Field Schema: firstMonth: chararray
Details at logfile: /home/development/pig-0.8.1-cdh3u1/pig_1317078086408.log
{code}
                
> Documentation says the MAX function can be used on chararray, but it can't
> --------------------------------------------------------------------------
>
>                 Key: PIG-2303
>                 URL: https://issues.apache.org/jira/browse/PIG-2303
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation
>            Reporter: Graham Lea
>            Priority: Trivial
>              Labels: documentation
>
> Here: http://pig.apache.org/docs/r0.9.0/func.html#max
> It says MIN/Max can be used on chararray, but the result of those functions is always a double.
> Had to search through the Pig Javadoc to find that I should use StringMin/StringMax instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2303) Documentation says the MAX function can be used on chararray, but it can't

Posted by "Graham Lea (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115082#comment-13115082 ] 

Graham Lea commented on PIG-2303:
---------------------------------

Hi Daniel.

If you just add a schema to the definition of 'C' in your own script, you will see the problem:
{code}
C = foreach B generate MAX(A.name) as maxName: chararray; 
{code}

Result:
{noformat}
ERROR 1022: Type mismatch merging schema prefix. Field Schema: double. Other Field Schema: maxName: chararray
{noformat}

Interestingly, without the schema a describe of C produces "C: {chararray}".
So it seems that the type of the function is correct (chararray in, chararray out) but perhaps the schema processing of FOREACH is incorrectly inferring the result type to be double?
                
> Documentation says the MAX function can be used on chararray, but it can't
> --------------------------------------------------------------------------
>
>                 Key: PIG-2303
>                 URL: https://issues.apache.org/jira/browse/PIG-2303
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9.0
>            Reporter: Graham Lea
>            Priority: Trivial
>
> Here: http://pig.apache.org/docs/r0.9.0/func.html#max
> It says MIN/Max can be used on chararray, but the result of those functions is always a double.
> Had to search through the Pig Javadoc to find that I should use StringMin/StringMax instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira