You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by theena <th...@gmail.com> on 2018/06/30 06:36:14 UTC
Hive (IGFS + IgniteMR) vs Hive (Tez)
Hi
I am doing a POC on HDP 2.5 Cluster with Ignite as Hadoop Accelerator.
we have 3 node cluster each with 8 core and 60G RAM.
I was able to run hive on Tez query on a sample data set and finished in 32
sec.
The same query took 94 sec in Hive + IGFS + Ignite-MR.
I followed most of the instructions from this forum and Ignite Website. Just
want to check if I am missing any important parameters that could improve
the performance.
Below are more details:
core-site properties:
<property>
<name>fs.igfs.impl</name>
<value>org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.igfs.impl</name>
<value>org.apache.ignite.hadoop.fs.v2.IgniteHadoopFileSystem</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>igfs://igfs@</value>
<final>true</final>
</property>
IG_MR_JOB_TRACKER=ip-10-0-0-200:11211
1. I can run IGFS and use HDFS as the secondary FS
hadoop fs -ls igfs://igfs@//tmp/orders/
2. I can run Ignite-MR
hadoop --config /usr/etc/ignite_conf jar
/usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar
wordcount -Dignite.job.shared.classloader=false
-Dmapreduce.jobtracker.address=$IG_MR_JOB_TRACKER
-Dmapreduce.framework.name=ignite igfs://igfs@//tmp/orders/
igfs://igfs@//tmp/6
3. I can run hive query on IGFS+IgniteMR and see the console log.
beeline -n hdfs -u "$HIVE_JDBC" --hiveconf fs.defaultFS=igfs://igfs@/
--hiveconf mapreduce.framework.name=ignite --hiveconf
mapreduce.jobtracker.address=$IG_MR_JOB_TRACKER
set ignite.job.shared.classloader=false ;
set hive.rpc.query.plan = true;
set hive.execution.engine = mr;
set hive.auto.convert.join = false; -- Added this to avoid mapRed task
failure while running the query.
4. I could not run Hive+Tez+IGFS
beeline -n hdfs -u "$HIVE_JDBC"
set fs.defaultFS=igfs://igfs@/ ;
set hive.execution.engine = tez;
set tez.use.cluster.hadoop-libs = true;
set ignite.job.shared.classloader=false ;
set hive.rpc.query.plan = true;
INFO : Tez session hasn't been created yet. Opening session
ERROR : Failed to execute tez graph.
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown.
Application application_1530330378726_0007 failed 2 times due to AM
Container for appattempt_1530330378726_0007_000002 exited with exitCode:
-1000
For more detailed output, check the application tracking page:
http://ip-10.ec2.internal:8088/cluster/app/application_1530330378726_0007
Then click on links to logs of each attempt.
Diagnostics: java.io.IOException: Failed to parse endpoint: null
Failing this attempt. Failing the application.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:779)
at
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217)
at
org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:279)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:159)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1745)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1491)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1151)
at
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:197)
at
org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:76)
at
org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:253)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1865)
at
org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:264)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Error: Error while processing statement: FAILED: Execution Error, return
code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1)
Attached the log file and default config.
default-config.xml
<http://apache-ignite-users.70518.x6.nabble.com/file/t1884/default-config.xml>
master.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t1884/master.log>
worker.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t1884/worker.log>
compute.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t1884/compute.log>
SQL:
select
l_shipmode,
sum(case
when o_orderpriority ='1-URGENT'
or o_orderpriority ='2-HIGH'
then 1
else 0
end
) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end
) as low_line_count
from
orders o join lineitem l
on
o.o_orderkey = l.l_orderkey and l.l_commitdate < l.l_receiptdate
and l.l_shipdate < l.l_commitdate and l.l_receiptdate >= '1994-01-01'
and l.l_receiptdate < '1995-01-01'
where
l.l_shipmode = 'MAIL' or l.l_shipmode = 'SHIP'
group by l_shipmode
order by l_shipmode;
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Hive (IGFS + IgniteMR) vs Hive (Tez)
Posted by ezhuravlev <e....@gmail.com>.
Hi,
1. Have you tried to run it without -Dignite.job.shared.classloader=false ?
It definitely has a performance impact.
2. Are Ignite nodes placed on the same machines as Hadoop? If not, it will
add a huge network interaction.
3. What is the amount of the data that you have in hdfs? If it's not fit in
IGFS(as I see, you have something like 90gb in IGFS), you will have a lot of
data moving between memory and hdfs - it will remove old data on reading a
new one every time, when there is not enough memory.
Evgenii
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/