You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by "Brotanek, Jan" <Ja...@adastragrp.com> on 2017/01/12 15:54:30 UTC

tez session timesout?

Hello, I am running insert statement via CLI interface under Hive on Tez on HDP 2.4.0.:

hive -hiveconf hive.cli.errors.ignore=true -v -f hive_pl_new7.sql

hive_pl_new7.sql consists of couple of insert into partition statements which take quite long time - about 1200s each.

    insert into table table (part_col = '2015-12')
    select col1, col2
    from table
    where col2 >= '2015-12-01 00:00:00'
    and col2 <= '2015-12-31 23:59:59';

    insert into table table (part_col = '2016-01')
    select col1, col2
    from table
    where col2 >= '2016-01-01 00:00:00'
    and col2 <= '2016-01-31 23:59:59';

    insert into table table (part_col = '2016-02')
    select col1, col2
    from table
    where col2 >= '2016-02-01 00:00:00'
    and col2 <= '2016-02-31 23:59:59';

First two statements run just fine. When 3rd is launched, I get following error. There are no syntax/semantic errors in statements, I tested that. When using execution engine MR, it runs just fine. This is serious issue for running automatical batch jobs. Can anyone explain?

Versions:
Hive 1.2.1000.2.4.0.0-169
HDP: 2.4.0
Hadoop 2.7.1
Hcatalog: 1.2.1
Hbase: 1.1.2

    Exception in thread "main" java.lang.RuntimeException: Unable to determine our local host!
       at org.apache.hadoop.hive.metastore.LockRequestBuilder.build(LockRequestBuilder.java:56)
       at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:227)
       at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:92)
       at org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:1047)
       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1244)
       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118)
       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
       at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:216)
       at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168)
       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:379)
       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:314)
       at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:412)
       at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:428)
       at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:717)
       at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
       at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:497)
       at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
       at org.apache.hadoop.util.RunJar.main(RunJar.java:136)


-----Original Message-----
From: Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] On Behalf Of Gopal Vijayaraghavan
Sent: čtvrtek 12. ledna 2017 0:20
To: user@hive.apache.org
Subject: Re: Vectorised Queries in Hive



> I have also noticed that this execution mode is only applicable to single predicate search. It does not work with multiple predicates searches. Can someone confirms this please?

Can you explain what you mean?

Vectorization supports multiple & nested AND+OR predicates - with some extra SIMD efficiencies in place for constants or repeated values.

Cheers,
Gopal

RE: tez session timesout?

Posted by "Brotanek, Jan" <Ja...@adastragrp.com>.

Seems TeZ is spawning many processes and using all file descriptors, causing Unix to temporarily run out of resources. 

I suppose this may be the problem, but don't know why it doesn't happen when 2nd query is invoked. It always fails on 3rd query.

Is there any settings which can prevent this behaviour? 

Any help much appreciated!

-bash: fork: retry: Resource temporarily unavailable

-bash: ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127808
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


-----Original Message-----
From: Sergey Shelukhin [mailto:sergey@hortonworks.com] 
Sent: čtvrtek 12. ledna 2017 18:12
To: user@hive.apache.org
Subject: Re: tez session timesout?

That should only happen when InetAddress.getLocalHost().getHostName()
throws UnknownHostException… do you have any other suspicious logs or activity around that time?

On 17/1/12, 07:54, "Brotanek, Jan" <Ja...@adastragrp.com> wrote:

>Hello, I am running insert statement via CLI interface under Hive on 
>Tez on HDP 2.4.0.:
>
>hive -hiveconf hive.cli.errors.ignore=true -v -f hive_pl_new7.sql
>
>hive_pl_new7.sql consists of couple of insert into partition statements 
>which take quite long time - about 1200s each.
>
>    insert into table table (part_col = '2015-12')
>    select col1, col2
>    from table
>    where col2 >= '2015-12-01 00:00:00'
>    and col2 <= '2015-12-31 23:59:59';
>
>    insert into table table (part_col = '2016-01')
>    select col1, col2
>    from table
>    where col2 >= '2016-01-01 00:00:00'
>    and col2 <= '2016-01-31 23:59:59';
>
>    insert into table table (part_col = '2016-02')
>    select col1, col2
>    from table
>    where col2 >= '2016-02-01 00:00:00'
>    and col2 <= '2016-02-31 23:59:59';
>
>First two statements run just fine. When 3rd is launched, I get 
>following error. There are no syntax/semantic errors in statements, I tested that.
>When using execution engine MR, it runs just fine. This is serious 
>issue for running automatical batch jobs. Can anyone explain?
>
>Versions:
>Hive 1.2.1000.2.4.0.0-169
>HDP: 2.4.0
>Hadoop 2.7.1
>Hcatalog: 1.2.1
>Hbase: 1.1.2
>
>    Exception in thread "main" java.lang.RuntimeException: Unable to 
>determine our local host!
>       at
>org.apache.hadoop.hive.metastore.LockRequestBuilder.build(LockRequestBu
>ild
>er.java:56)
>       at
>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManage
>r.j
>ava:227)
>       at
>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManage
>r.j
>ava:92)
>       at
>org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:1047)
>       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1244)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
>       at
>org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:216)
>       at
>org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168)
>       at
>org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:379)
>       at
>org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:314)
>       at
>org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:412)
>       at
>org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:428)
>       at
>org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:717)
>       at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
>       at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
>62)
>       at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso
>rIm
>pl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>
>
>-----Original Message-----
>From: Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] On Behalf Of 
>Gopal Vijayaraghavan
>Sent: čtvrtek 12. ledna 2017 0:20
>To: user@hive.apache.org
>Subject: Re: Vectorised Queries in Hive
>
>
>
>> I have also noticed that this execution mode is only applicable to 
>>single predicate search. It does not work with multiple predicates 
>>searches. Can someone confirms this please?
>
>Can you explain what you mean?
>
>Vectorization supports multiple & nested AND+OR predicates - with some 
>extra SIMD efficiencies in place for constants or repeated values.
>
>Cheers,
>Gopal
>
>

Re: tez session timesout?

Posted by Sergey Shelukhin <se...@hortonworks.com>.

That should only happen when InetAddress.getLocalHost().getHostName()
throws UnknownHostException… do you have any other suspicious logs or
activity around that time?

On 17/1/12, 07:54, "Brotanek, Jan" <Ja...@adastragrp.com> wrote:

>Hello, I am running insert statement via CLI interface under Hive on Tez
>on HDP 2.4.0.:
>
>hive -hiveconf hive.cli.errors.ignore=true -v -f hive_pl_new7.sql
>
>hive_pl_new7.sql consists of couple of insert into partition statements
>which take quite long time - about 1200s each.
>
>    insert into table table (part_col = '2015-12')
>    select col1, col2
>    from table
>    where col2 >= '2015-12-01 00:00:00'
>    and col2 <= '2015-12-31 23:59:59';
>
>    insert into table table (part_col = '2016-01')
>    select col1, col2
>    from table
>    where col2 >= '2016-01-01 00:00:00'
>    and col2 <= '2016-01-31 23:59:59';
>
>    insert into table table (part_col = '2016-02')
>    select col1, col2
>    from table
>    where col2 >= '2016-02-01 00:00:00'
>    and col2 <= '2016-02-31 23:59:59';
>
>First two statements run just fine. When 3rd is launched, I get following
>error. There are no syntax/semantic errors in statements, I tested that.
>When using execution engine MR, it runs just fine. This is serious issue
>for running automatical batch jobs. Can anyone explain?
>
>Versions:
>Hive 1.2.1000.2.4.0.0-169
>HDP: 2.4.0
>Hadoop 2.7.1
>Hcatalog: 1.2.1
>Hbase: 1.1.2
>
>    Exception in thread "main" java.lang.RuntimeException: Unable to
>determine our local host!
>       at 
>org.apache.hadoop.hive.metastore.LockRequestBuilder.build(LockRequestBuild
>er.java:56)
>       at 
>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.j
>ava:227)
>       at 
>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.j
>ava:92)
>       at 
>org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:1047)
>       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1244)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
>       at 
>org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:216)
>       at 
>org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168)
>       at 
>org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:379)
>       at 
>org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:314)
>       at 
>org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:412)
>       at 
>org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:428)
>       at 
>org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:717)
>       at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
>       at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
>62)
>       at 
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
>pl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>
>
>-----Original Message-----
>From: Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] On Behalf Of
>Gopal Vijayaraghavan
>Sent: čtvrtek 12. ledna 2017 0:20
>To: user@hive.apache.org
>Subject: Re: Vectorised Queries in Hive
>
>
>
>> I have also noticed that this execution mode is only applicable to
>>single predicate search. It does not work with multiple predicates
>>searches. Can someone confirms this please?
>
>Can you explain what you mean?
>
>Vectorization supports multiple & nested AND+OR predicates - with some
>extra SIMD efficiencies in place for constants or repeated values.
>
>Cheers,
>Gopal
>
>