You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by sagar nikam <sa...@gmail.com> on 2012/10/30 04:10:55 UTC

Hive performance-how to increase ?

Respected sir,

     I am dealing with a database (2.5 GB) having some tables only 40 row
to some having 9 million rows data.
when I am doing any query for large table it takes more time.
I want results in less time

small query-->
=========================================================================
hive> select count(*) from cidade;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201210300724_0003, Tracking URL =
http://localhost:50030/jobdetails.jsp?jobid=job_201210300724_0003
Kill Command = /home/trendwise/Hadoop/hadoop-0.20.2/bin/../bin/hadoop job
-Dmapred.job.tracker=localhost:54311 -kill job_201210300724_0003
2012-10-30 07:37:41,588 Stage-1 map = 0%,  reduce = 0%
2012-10-30 07:37:57,493 Stage-1 map = 100%,  reduce = 0%
2012-10-30 07:38:17,905 Stage-1 map = 100%,  reduce = 33%
2012-10-30 07:38:20,965 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201210300724_0003
OK
5566
Time taken: 50.172 seconds
=================================================================================================================
hdfs-site.xml

<configuration>
<property>
  <name>dfs.replication</name>
  <value>3</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is
created.
  The default is used if replication is not specified in create time.
  </description>
</property>

<property>
  <name>dfs.block.size</name>
  <value>131072</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is
created.
  The default is used if replication is not specified in create time.
  </description>
</property>
</configuration>


does these setting affects performance of hive?
dfs.replication=3
dfs.block.size=131072

can i set it from hive prompt as
hive>set dfs.replication=5
Is this value remains for a perticular session only ?
or Is it better to change it in .xml file ?



which more setting should i do to incrase performance ?



Sagar Nikam
Trendwise Analytics
Bangalore,INDIA