You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Steve Corona (JIRA)" <ji...@apache.org> on 2009/03/26 16:42:15 UTC

[jira] Created: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Nested UDFs cause _very_ high memory usage when processing query
----------------------------------------------------------------

                 Key: HIVE-372
                 URL: https://issues.apache.org/jira/browse/HIVE-372
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
            Reporter: Steve Corona


When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:

select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;

This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.

Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao reassigned HIVE-372:
-------------------------------

    Assignee: Zheng Shao

> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>            Assignee: Zheng Shao
>         Attachments: HIVE-372.1.patch, HIVE-372.2.patch, HIVE-372.2.trunk.patch
>
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697865#action_12697865 ] 

Ashish Thusoo commented on HIVE-372:
------------------------------------

should we first figure out why create table is misreporting things? I fear that we may be swapping one set of bad error messages with another.

> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>         Attachments: HIVE-372.1.patch, HIVE-372.2.patch
>
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698507#action_12698507 ] 

Ashish Thusoo commented on HIVE-372:
------------------------------------

+1.

Looks good to me.


> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>         Attachments: HIVE-372.1.patch, HIVE-372.2.patch
>
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697701#action_12697701 ] 

Zheng Shao commented on HIVE-372:
---------------------------------

The reason is that "backtrack" is enabled in Hive.g. However in order to disable "backtrack" we need to do some other cleanup of the grammar to make sure it's LL(k).


> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698817#action_12698817 ] 

Namit Jain commented on HIVE-372:
---------------------------------

committed in branch 3. 
However, I am getting the following error in trunk:

    [junit] diff -a -I \(file:\)\|\(/tmp/.*\) /data/users/njain/hive_commit/tru\
nk/build/ql/test/logs/clientpositive/udf_10_trims.q.out /data/users/njain/hive_\
commit/trunk/ql/src/test/results/clientpositive/udf_10_trims.q.out
    [junit] 30c30
    [junit] <                         output format: org.apache.hadoop.hive.ql.\
io.HiveIgnoreKeyTextOutputFormat
    [junit] ---
    [junit] >                         output format: org.apache.hadoop.hive.ql.\
io.IgnoreKeyTextOutputFormat
    [junit] 40c40
    [junit] <                 output format: org.apache.hadoop.hive.ql.io.HiveI\
gnoreKeyTextOutputFormat
    [junit] ---
    [junit] >                 output format: org.apache.hadoop.hive.ql.io.Ignor\
eKeyTextOutputFormat



Zheng, can you create a new patch for trunk ?


> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>         Attachments: HIVE-372.1.patch, HIVE-372.2.patch
>
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698611#action_12698611 ] 

Namit Jain commented on HIVE-372:
---------------------------------

the .2 patch also failed for sort.q

    [junit] 09/04/13 18:10:01 ERROR exec.ExecDriver: Job Submission failed with exception 'java.io.IOExcep\
tion(Failed to get the current user's information.)'
    [junit] java.io.IOException: Failed to get the current user's information.
    [junit]   at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:717)
    [junit]   at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:592)
    [junit]   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:774)
    [junit]   at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:393)
    [junit]   at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:556)
    [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    [junit]   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    [junit]   at java.lang.reflect.Method.invoke(Method.java:597)
    [junit]   at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
    [junit]   at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    [junit]   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    [junit]   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    [junit]   at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
    [junit] Caused by: javax.security.auth.login.LoginException: Login failed: whoami: cannot find usernam\
e for UID 1688



I dont think this has anything to do with your patch, but will run the tests again and see


> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>         Attachments: HIVE-372.1.patch, HIVE-372.2.patch
>
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697705#action_12697705 ] 

Zheng Shao commented on HIVE-372:
---------------------------------

By just modifying "k=1" to "k=3", the compilation was much faster. We were also able to identify the correct error like the following:

{code}
hive> create table test (a int b);
FAILED: Parse Error: line 2:25 mismatched input 'b' expecting ) in create statement
{code}

Originally it was:
{code}
hive> create table test (a int b);
FAILED: Parse Error: line 2:7 mismatched input 'table' expecting TEMPORARY in create function statement
{code}


I guess the reason is that k=3 makes sure in most cases, we don't need to backtrack. (We still need backtracking in some cases, since when I switch backtracking I saw some warnings from antlr).

I will make a patch to change k=1 to k=3, and add a test case for the above, then I will leave the backtracking problem a bit later.


> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-372:
----------------------------

    Attachment: HIVE-372.2.trunk.patch

Adapted to trunk.

> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>         Attachments: HIVE-372.1.patch, HIVE-372.2.patch, HIVE-372.2.trunk.patch
>
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-372:
----------------------------

    Attachment: HIVE-372.1.patch

Changed k=1 to k=3. Added 3 test cases. Fix an existing test case.


> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>         Attachments: HIVE-372.1.patch
>
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-372:
----------------------------

    Attachment: HIVE-372.2.patch

This attached patch contains changed output files.

There is one regression - "CREATE TABL test ..." is reporting error in "CREATE" now.  I tried to fix that but I didn't succeed. It probably needs much more effort.  Basically, we may see this kind of problems for any consecutive keywords.

However this patch does detect a lot of the syntax errors much better than before (See other new/changed test cases), so I still think we should still include this in 0.3.


> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>         Attachments: HIVE-372.1.patch, HIVE-372.2.patch
>
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698566#action_12698566 ] 

Namit Jain commented on HIVE-372:
---------------------------------

While committing, I got the following errors:

    [junit] diff -a -I \(file:\)\|\(/tmp/.*\) /data/users/njain/hive_br3_commit/hive_br3_commit/build/ql/test/logs/clientnegative/invalid_create_tbl2.q.out /data/users/njain/hive_br3_commit/hive_br3_commit/ql/src/test/results/clientnegative/invalid_create_tbl2.q.out
    [junit] 1c1
    [junit] < FAILED: Parse Error: line 1:0 cannot recognize input 'create' in ddl statement
    [junit] ---
    [junit] > FAILED: Parse Error: line 1:7 mismatched input 'tabl' expecting TEMPORARY in create function statement
    [junit] Exception: Client execution results failed with error code = 1
    [junit] junit.framework.AssertionFailedError: Client execution results failed with error code = 1
    [junit] 	at junit.framework.Assert.fail(Assert.java:47)
    [junit] 	at org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_create_tbl2(TestNegativeCliDriver.java:591)
    [junit] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    [junit] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    [junit] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    [junit] 	at java.lang.reflect.Method.invoke(Method.java:597)
    [junit] 	at junit.framework.TestCase.runTest(TestCase.java:154)
    [junit] 	at junit.framework.TestCase.runBare(TestCase.java:127)
    [junit] 	at junit.framework.TestResult$1.protect(TestResult.java:106)
    [junit] 	at junit.framework.TestResult.runProtected(TestResult.java:124)
    [junit] 	at junit.framework.TestResult.run(TestResult.java:109)
    [junit] 	at junit.framework.TestCase.run(TestCase.java:118)
    [junit] 	at junit.framework.TestSuite.runTest(TestSuite.java:208)
    [junit] 	at junit.framework.TestSuite.run(TestSuite.java:203)
    [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297)
    [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672)
    [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:567)
    [junit] Begin query: invalid_select_expression.q
    [junit] Hive history file=/data/users/njain/hive_br3_commit/hive_br3_commit/ql/../build/ql/tmp/hive_job_log_njain_200904131444_1922313556.txt
    [junit] diff -a -I \(file:\)\|\(/tmp/.*\) /data/users/njain/hive_br3_commit/hive_br3_commit/build/ql/test/logs/clientnegative/invalid_select_expression.q.out /data/users/njain/hive_br3_commit/hive_br3_commit/ql/src/test/results/clientnegative/invalid_select_expression.q.out
    [junit] 1c1
    [junit] < FAILED: Parse Error: line 1:32 cannot recognize input '.' in expression specification
    [junit] ---
    [junit] > FAILED: Parse Error: line 1:32 cannot recognize input '.' in table column identifier
    [junit] Exception: Client execution results failed with error code = 1
    [junit] junit.framework.AssertionFailedError: Client execution results failed with error code = 1
    [junit] 	at junit.framework.Assert.fail(Assert.java:47)
    [junit] 	at org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_select_expression(TestNegativeCliDriver.java:616)
    [junit] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    [junit] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    [junit] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    [junit] 	at java.lang.reflect.Method.invoke(Method.java:597)
    [junit] 	at junit.framework.TestCase.runTest(TestCase.java:154)
    [junit] 	at junit.framework.TestCase.runBare(TestCase.java:127)
    [junit] 	at junit.framework.TestResult$1.protect(TestResult.java:106)
    [junit] 	at junit.framework.TestResult.runProtected(TestResult.java:124)
    [junit] 	at junit.framework.TestResult.run(TestResult.java:109)
    [junit] 	at junit.framework.TestCase.run(TestCase.java:118)
    [junit] 	at junit.framework.TestSuite.runTest(TestSuite.java:208)
    [junit] 	at junit.framework.TestSuite.run(TestSuite.java:203)
    [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:297)
    [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:672)
    [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:567)
    [junit] Begin query: invalid_tbl_name.q
    [junit] Hive history file=/data/users/njain/hive_br3_commit/hive_br3_commit/ql/../build/ql/tmp/hive_job_log_njain_200904131444_1797516854.txt
    [junit] diff -a -I \(file:\)\|\(/tmp/.*\) /data/users/njain/hive_br3_commit/hive_br3_commit/build/ql/test/logs/clientnegative/invalid_tbl_name.q.out /data/users/njain/hive_br3_commit/hive_br3_commit/ql/src/test/results/clientnegative/invalid_tbl_name.q.out
    [junit] 1c1
    [junit] < FAILED: Parse Error: line 1:20 mismatched input '-' expecting EOF
    [junit] ---
    [junit] > FAILED: Parse Error: line 1:20 mismatched input '-' expecting EOF in statement

can you fix and regenerate the patch ?



> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>         Attachments: HIVE-372.1.patch, HIVE-372.2.patch
>
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Richard Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689578#action_12689578 ] 

Richard Lee commented on HIVE-372:
----------------------------------

I've also noticed this problem.   based on the thread dump i saw... it looks like there's some bigtime indirect recursion... i'm guessing the parser is implemented recursive descent and each time we recurse into the next level of function some sizable amount of memory is used up.

Here's a stacktrace of a select with 8 nested trims:
       at org.antlr.runtime.CommonTokenStream.LB(CommonTokenStream.java:289)
        at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:244)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedencePlusExpression(HiveParser.java:14723)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAmpersandExpression(HiveParser.java:14819)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseOrExpression(HiveParser.java:14956)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceEqualExpression(HiveParser.java:15100)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceNotExpression(HiveParser.java:15264)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAndExpression(HiveParser.java:15367)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceOrExpression(HiveParser.java:15504)
        at org.apache.hadoop.hive.ql.parse.HiveParser.expression(HiveParser.java:13455)
        at org.apache.hadoop.hive.ql.parse.HiveParser.function(HiveParser.java:12888)
        at org.apache.hadoop.hive.ql.parse.HiveParser.atomExpression(HiveParser.java:13624)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceFieldExpression(HiveParser.java:13728)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred163_fragment(HiveParser.java:16523)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred163(HiveParser.java:16734)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceUnaryExpression(HiveParser.java:14059)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseXorExpression(HiveParser.java:14394)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceStarExpression(HiveParser.java:14538)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedencePlusExpression(HiveParser.java:14682)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAmpersandExpression(HiveParser.java:14819)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseOrExpression(HiveParser.java:14956)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceEqualExpression(HiveParser.java:15100)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceNotExpression(HiveParser.java:15264)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAndExpression(HiveParser.java:15367)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceOrExpression(HiveParser.java:15504)
        at org.apache.hadoop.hive.ql.parse.HiveParser.expression(HiveParser.java:13455)
        at org.apache.hadoop.hive.ql.parse.HiveParser.function(HiveParser.java:12888)
        at org.apache.hadoop.hive.ql.parse.HiveParser.atomExpression(HiveParser.java:13624)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceFieldExpression(HiveParser.java:13728)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceUnaryExpression(HiveParser.java:14289)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseXorExpression(HiveParser.java:14394)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceStarExpression(HiveParser.java:14538)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedencePlusExpression(HiveParser.java:14682)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAmpersandExpression(HiveParser.java:14819)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseOrExpression(HiveParser.java:14956)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceEqualExpression(HiveParser.java:15100)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceNotExpression(HiveParser.java:15264)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAndExpression(HiveParser.java:15367)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceOrExpression(HiveParser.java:15504)
        at org.apache.hadoop.hive.ql.parse.HiveParser.expression(HiveParser.java:13455)
        at org.apache.hadoop.hive.ql.parse.HiveParser.function(HiveParser.java:12888)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred155_fragment(HiveParser.java:16479)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred155(HiveParser.java:16804)
        at org.apache.hadoop.hive.ql.parse.HiveParser.atomExpression(HiveParser.java:13537)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceFieldExpression(HiveParser.java:13728)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceUnaryExpression(HiveParser.java:14289)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseXorExpression(HiveParser.java:14394)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceStarExpression(HiveParser.java:14538)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedencePlusExpression(HiveParser.java:14682)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAmpersandExpression(HiveParser.java:14819)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseOrExpression(HiveParser.java:14956)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceEqualExpression(HiveParser.java:15100)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceNotExpression(HiveParser.java:15264)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAndExpression(HiveParser.java:15367)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceOrExpression(HiveParser.java:15504)
        at org.apache.hadoop.hive.ql.parse.HiveParser.expression(HiveParser.java:13455)
        at org.apache.hadoop.hive.ql.parse.HiveParser.function(HiveParser.java:12888)
        at org.apache.hadoop.hive.ql.parse.HiveParser.atomExpression(HiveParser.java:13624)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceFieldExpression(HiveParser.java:13728)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred163_fragment(HiveParser.java:16523)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred163(HiveParser.java:16734)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceUnaryExpression(HiveParser.java:14059)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseXorExpression(HiveParser.java:14394)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceStarExpression(HiveParser.java:14538)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedencePlusExpression(HiveParser.java:14682)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAmpersandExpression(HiveParser.java:14819)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseOrExpression(HiveParser.java:14956)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceEqualExpression(HiveParser.java:15100)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceNotExpression(HiveParser.java:15264)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAndExpression(HiveParser.java:15367)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceOrExpression(HiveParser.java:15504)
        at org.apache.hadoop.hive.ql.parse.HiveParser.expression(HiveParser.java:13455)
        at org.apache.hadoop.hive.ql.parse.HiveParser.function(HiveParser.java:12888)
        at org.apache.hadoop.hive.ql.parse.HiveParser.atomExpression(HiveParser.java:13624)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceFieldExpression(HiveParser.java:13728)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred163_fragment(HiveParser.java:16523)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred163(HiveParser.java:16734)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceUnaryExpression(HiveParser.java:14059)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseXorExpression(HiveParser.java:14394)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceStarExpression(HiveParser.java:14538)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedencePlusExpression(HiveParser.java:14682)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAmpersandExpression(HiveParser.java:14819)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseOrExpression(HiveParser.java:14956)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceEqualExpression(HiveParser.java:15100)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceNotExpression(HiveParser.java:15264)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAndExpression(HiveParser.java:15367)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceOrExpression(HiveParser.java:15504)
        at org.apache.hadoop.hive.ql.parse.HiveParser.expression(HiveParser.java:13455)
        at org.apache.hadoop.hive.ql.parse.HiveParser.function(HiveParser.java:12888)
        at org.apache.hadoop.hive.ql.parse.HiveParser.atomExpression(HiveParser.java:13624)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceFieldExpression(HiveParser.java:13728)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred163_fragment(HiveParser.java:16523)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred163(HiveParser.java:16734)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceUnaryExpression(HiveParser.java:14059)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseXorExpression(HiveParser.java:14394)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceStarExpression(HiveParser.java:14538)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedencePlusExpression(HiveParser.java:14682)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAmpersandExpression(HiveParser.java:14819)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseOrExpression(HiveParser.java:14956)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceEqualExpression(HiveParser.java:15100)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceNotExpression(HiveParser.java:15264)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAndExpression(HiveParser.java:15367)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceOrExpression(HiveParser.java:15504)
        at org.apache.hadoop.hive.ql.parse.HiveParser.expression(HiveParser.java:13455)
        at org.apache.hadoop.hive.ql.parse.HiveParser.function(HiveParser.java:12888)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred155_fragment(HiveParser.java:16479)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred155(HiveParser.java:16804)
        at org.apache.hadoop.hive.ql.parse.HiveParser.atomExpression(HiveParser.java:13537)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceFieldExpression(HiveParser.java:13728)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred163_fragment(HiveParser.java:16523)
        at org.apache.hadoop.hive.ql.parse.HiveParser.synpred163(HiveParser.java:16734)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceUnaryExpression(HiveParser.java:14059)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseXorExpression(HiveParser.java:14394)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceStarExpression(HiveParser.java:14538)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedencePlusExpression(HiveParser.java:14682)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAmpersandExpression(HiveParser.java:14819)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceBitwiseOrExpression(HiveParser.java:14956)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceEqualExpression(HiveParser.java:15100)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceNotExpression(HiveParser.java:15264)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceAndExpression(HiveParser.java:15367)
        at org.apache.hadoop.hive.ql.parse.HiveParser.precedenceOrExpression(HiveParser.java:15504)
        at org.apache.hadoop.hive.ql.parse.HiveParser.expression(HiveParser.java:13455)
        at org.apache.hadoop.hive.ql.parse.HiveParser.selectExpression(HiveParser.java:10075)
        at org.apache.hadoop.hive.ql.parse.HiveParser.selectItem(HiveParser.java:9655)
        at org.apache.hadoop.hive.ql.parse.HiveParser.selectList(HiveParser.java:9536)
        at org.apache.hadoop.hive.ql.parse.HiveParser.selectClause(HiveParser.java:9284)
        at org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:7893)
        at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:7527)
        at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:7332)
        at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:583)
        at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:369)
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:355)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:184)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:306)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)


> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-372:
--------------------------------

    Fix Version/s: 0.3.0
      Component/s: UDF

> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor, UDF
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>            Assignee: Zheng Shao
>             Fix For: 0.3.0
>
>         Attachments: HIVE-372.1.patch, HIVE-372.2.patch, HIVE-372.2.trunk.patch
>
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697999#action_12697999 ] 

Zheng Shao commented on HIVE-372:
---------------------------------

The patch is mainly for fixing the long-query problem - like 80-union query and this long-udf query.

The error message is kind of a side effect - and overall we are seeing more error messages (Just see all the changed error messages in the diff)

We could also do a more thorough fix of the grammar, but that will potentially take much longer, and it may not be that prioritized compared wtih other to-do items.


> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>         Attachments: HIVE-372.1.patch, HIVE-372.2.patch
>
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HIVE-372) Nested UDFs cause _very_ high memory usage when processing query

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain resolved HIVE-372.
-----------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

committed. Thanks Zheng

> Nested UDFs cause _very_ high memory usage when processing query
> ----------------------------------------------------------------
>
>                 Key: HIVE-372
>                 URL: https://issues.apache.org/jira/browse/HIVE-372
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Fedora Linux, 10x Amazon EC2 (Large Instance w/ 8GB Ram)
>            Reporter: Steve Corona
>            Assignee: Zheng Shao
>         Attachments: HIVE-372.1.patch, HIVE-372.2.patch, HIVE-372.2.trunk.patch
>
>
> When nesting UDFs, the Hive Query processor takes a large amount of time+memory to process the query. For example, I ran something along the lines of:
> select trim( trim( trim(trim( trim( trim( trim( trim( trim(column))))))))) from test_table;
> This query needs 10GB+ of memory to process before it'll launch the job. The amount of memory increases exponentially with each nested UDF.
> Obviously, I am using trim() in this case as a simple example that causes the same problem to occur. In my actual use-case I had a bunch of nested regexp_replaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.