You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by abhishek pathak <fo...@yahoo.co.in> on 2011/03/07 10:19:15 UTC

Hive too slow?

Hi,

I am a hive newbie.I just finished setting up hive on a cluster of two servers 
for my organisation.As a test drill, we operated some simple queries.It took the 
standard map-reduce algorithm around 4 minutes just to execute this query:

count(1) from tablename;

The answer returned was around 2200.Clearly, this is not a big number by hadoop 
standards.My question is whether this is a standard performance or is there some 
configuration that is not optimised?Will scaling up of data to say,50 times, 
produce any drastic slowness?I tried reading the documentation but was not clear 
on these issues, and i would like to have an idea before this setup starts 
working in a production environment.

Thanks in advance,
Regards,
Abhishek Pathak

Re: Hive too slow?

Posted by Edward Capriolo <ed...@gmail.com>.

You should also look for the basics look at your job tracker web
interface. One of you nodes can be mis-configured. At times a job may
sit on a node for several minutes before it fails and moves to the
other. You also want to make sure none of the Hadoop components are
having memory related JVM pauses.



On Fri, Mar 11, 2011 at 8:20 AM, Ajo Fod <aj...@gmail.com> wrote:
> I'd say start with something simpler ... say how about converting all files
> to tab delimited text files in uncompressed format and run the same query on
> the new table. If that works, you know the problem is with the .seq files
> ... if not there is something funky about the configuration or the machine.
>
> Also, ou might want to check the CPU usage .. are all cores being used? If
> not perhaps you want to see if increasing reduce threads helps
> set mapred.reduce.tasks=<number>
> Also, if you file is small, you could split it into the number of cores ...
> and
> set mapred.map.tasks=<number>
> ... but I suspect this is not the bottleneck given the time the query takes.
>
> Cheers,
> -Ajo.
>
> On Fri, Mar 11, 2011 at 4:05 AM, abhishek pathak
> <fo...@yahoo.co.in> wrote:
>>
>> Hi all,
>> I looked into various configurations and have come up with the following
>> information:
>> 1. The underlying files are compressed .seq files,but i guess that is a
>> pretty standard format for HDFS.
>> 2. The files are located on the hdfs spread across 2 servers,on top of
>> which hive runs.
>> 3.  I am not too familiar with map-reduce,however to the best of my
>> knowledge all the configurations in the jobtrackers as well as core-
>> utilization were the default ones.
>> 4. No swapping occurs( ~200mb remains free all the time)
>> For more information I give below the output of a sample query:
>> hive> select count (1) from searchlogs where days=20110311;
>> Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks determined at compile time: 1
>> In order to change the average load for a reducer (in bytes):
>>   set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>   set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>   set mapred.reduce.tasks=<number>
>> Starting Job = job_201010220228_0022, Tracking URL =
>> http://serverName:50030/jobdetails.jsp?jobid=job_201010220228_0022
>> Kill Command = /home/userName/hadoop/bin/hadoop job
>>  -Dmapred.job.tracker=serverName:9001 -kill job_201010220228_0022
>> 2011-03-11 16:49:15,457 Stage-1 map = 0%,  reduce = 0%
>> 2011-03-11 16:49:26,677 Stage-1 map = 1%,  reduce = 0%
>> 2011-03-11 16:49:32,926 Stage-1 map = 2%,  reduce = 0%
>> 2011-03-11 16:49:45,350 Stage-1 map = 3%,  reduce = 0%
>> 2011-03-11 16:49:51,988 Stage-1 map = 4%,  reduce = 0%
>> 2011-03-11 16:49:57,424 Stage-1 map = 5%,  reduce = 0%
>> 2011-03-11 16:50:09,872 Stage-1 map = 6%,  reduce = 0%
>> 2011-03-11 16:50:16,056 Stage-1 map = 7%,  reduce = 2%
>> 2011-03-11 16:50:28,403 Stage-1 map = 8%,  reduce = 2%
>> .......................................................
>> .......................................................
>> .......................................................
>> 2011-03-11 17:02:25,947 Stage-1 map = 96%,  reduce = 32%
>> 2011-03-11 17:02:28,026 Stage-1 map = 97%,  reduce = 32%
>> 2011-03-11 17:02:37,483 Stage-1 map = 98%,  reduce = 32%
>> 2011-03-11 17:02:43,920 Stage-1 map = 99%,  reduce = 32%
>> 2011-03-11 17:02:47,036 Stage-1 map = 99%,  reduce = 33%
>> 2011-03-11 17:02:50,135 Stage-1 map = 100%,  reduce = 33%
>> 2011-03-11 17:03:04,455 Stage-1 map = 100%,  reduce = 100%
>> Ended Job = job_201010220228_0022
>> OK
>> 6768
>> Time taken: 835.495 seconds
>> hive>
>> As you can see, it took nearly 14 minutes to execute this query.
>> The query:
>> hive> select count(1) from searchlogs;
>> fired immediately after the above one, takes about 25 minutes,and gives
>> the answer as 15118.
>> As you all pointed out,this is slow,even by Hive standards.How do I
>> proceed further to solve this problem?
>> P.S: In this setup,data is being continuously added to the HDFS at the
>> approx. rate of 1 mb/sec through Flume (https://github.com/cloudera/flume)
>> .Hive runs and queries on top of that data.Could this,in any way,affect
>> performance?If so,what can be the solution?
>> Regards,
>> Abhishek Pathak
>> ________________________________
>> From: abhishek pathak <fo...@yahoo.co.in>
>> To: user@hive.apache.org
>> Sent: Tue, 8 March, 2011 12:04:22 PM
>> Subject: Re: Hive too slow?
>>
>> Thank you all for the tips.I'll dig into all these and let you people know
>> :)
>> ________________________________
>> From: Igor Tatarinov <ig...@decide.com>
>> To: user@hive.apache.org
>> Sent: Tue, 8 March, 2011 11:47:20 AM
>> Subject: Re: Hive too slow?
>>
>> Most likely, Hadoop's memory settings are too high and Linux starts
>> swapping. You should be able to detect that too using vmstat.
>> Just a guess.
>>
>> On Mon, Mar 7, 2011 at 10:11 PM, Ajo Fod <aj...@gmail.com> wrote:
>>>
>>> hmm I don't know of such a place ... but if I had to debug, I'd try to
>>> understand the following:
>>> 1) are the underlying files zipped/compressed ... that ususally makes it
>>> slower.
>>> 2) are the files located on the hard drive or hdfs?
>>> 3) are all the cores being used? ... check number of reduce and map
>>> tasks.
>>>
>>> -Ajo
>>>
>>> On Mon, Mar 7, 2011 at 9:24 PM, abhishek pathak
>>> <fo...@yahoo.co.in> wrote:
>>>>
>>>> I suspected as such.My system is a Core2Duo,1.86 Ghz.I understand that
>>>> map-reduce is not instantaneous, just wanted to confirm that 2200 rows in 4
>>>> minutes is indeeed not normal behaviour.Could you point me at some places
>>>> where i can get some info on how to tune this up?
>>>> Regards,
>>>> Abhishek
>>>> ________________________________
>>>> From: Ajo Fod <aj...@gmail.com>
>>>> To: user@hive.apache.org
>>>> Sent: Mon, 7 March, 2011 9:21:51 PM
>>>> Subject: Re: Hive too slow?
>>>>
>>>> In my experience, hive is not instantaneous like other DBs, but 4
>>>> minutes to count 2200 rows seems unreasonable.
>>>>
>>>> For comparison my query of 169k rows one one computer with 4 cores
>>>> running 1Ghz (approx) took 20 seconds.
>>>>
>>>> Cheers,
>>>> Ajo.
>>>>
>>>> On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak
>>>> <fo...@yahoo.co.in> wrote:
>>>>>
>>>>> Hi,
>>>>> I am a hive newbie.I just finished setting up hive on a cluster of two
>>>>> servers for my organisation.As a test drill, we operated some simple
>>>>> queries.It took the standard map-reduce algorithm around 4 minutes just to
>>>>> execute this query:
>>>>> count(1) from tablename;
>>>>> The answer returned was around 2200.Clearly, this is not a big number
>>>>> by hadoop standards.My question is whether this is a standard performance or
>>>>> is there some configuration that is not optimised?Will scaling up of data to
>>>>> say,50 times, produce any drastic slowness?I tried reading the documentation
>>>>> but was not clear on these issues, and i would like to have an idea before
>>>>> this setup starts working in a production environment.
>>>>> Thanks in advance,
>>>>> Regards,
>>>>> Abhishek Pathak
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>

Re: Hive too slow?

Posted by Ajo Fod <aj...@gmail.com>.

I'd say start with something simpler ... say how about converting all files
to tab delimited text files in uncompressed format and run the same query on
the new table. If that works, you know the problem is with the .seq files
... if not there is something funky about the configuration or the machine.

Also, ou might want to check the CPU usage .. are all cores being used? If
not perhaps you want to see if increasing reduce threads helps
set mapred.reduce.tasks=<number>
Also, if you file is small, you could split it into the number of cores ...
and
set mapred.map.tasks=<number>
... but I suspect this is not the bottleneck given the time the query takes.

Cheers,
-Ajo.

On Fri, Mar 11, 2011 at 4:05 AM, abhishek pathak <
forever_yours_abhi@yahoo.co.in> wrote:

> Hi all,
>
> I looked into various configurations and have come up with the following
> information:
>
> 1. The underlying files are compressed .seq files,but i guess that is a
> pretty standard format for HDFS.
> 2. The files are located on the hdfs spread across 2 servers,on top of
> which hive runs.
> 3.  I am not too familiar with map-reduce,however to the best of my
> knowledge all the configurations in the jobtrackers as well as core-
> utilization were the default ones.
> 4. No swapping occurs( ~200mb remains free all the time)
>
> For more information I give below the output of a sample query:
>
> hive> select count (1) from searchlogs where days=20110311;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Starting Job = job_201010220228_0022, Tracking URL =
> http://serverName:50030/jobdetails.jsp?jobid=job_201010220228_0022
> Kill Command = /home/userName/hadoop/bin/hadoop job
>  -Dmapred.job.tracker=serverName:9001 -kill job_201010220228_0022
> 2011-03-11 16:49:15,457 Stage-1 map = 0%,  reduce = 0%
> 2011-03-11 16:49:26,677 Stage-1 map = 1%,  reduce = 0%
> 2011-03-11 16:49:32,926 Stage-1 map = 2%,  reduce = 0%
> 2011-03-11 16:49:45,350 Stage-1 map = 3%,  reduce = 0%
> 2011-03-11 16:49:51,988 Stage-1 map = 4%,  reduce = 0%
> 2011-03-11 16:49:57,424 Stage-1 map = 5%,  reduce = 0%
> 2011-03-11 16:50:09,872 Stage-1 map = 6%,  reduce = 0%
> 2011-03-11 16:50:16,056 Stage-1 map = 7%,  reduce = 2%
> 2011-03-11 16:50:28,403 Stage-1 map = 8%,  reduce = 2%
> .......................................................
> .......................................................
> .......................................................
> 2011-03-11 17:02:25,947 Stage-1 map = 96%,  reduce = 32%
> 2011-03-11 17:02:28,026 Stage-1 map = 97%,  reduce = 32%
> 2011-03-11 17:02:37,483 Stage-1 map = 98%,  reduce = 32%
> 2011-03-11 17:02:43,920 Stage-1 map = 99%,  reduce = 32%
> 2011-03-11 17:02:47,036 Stage-1 map = 99%,  reduce = 33%
> 2011-03-11 17:02:50,135 Stage-1 map = 100%,  reduce = 33%
> 2011-03-11 17:03:04,455 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201010220228_0022
> OK
> 6768
> Time taken: 835.495 seconds
> hive>
>
> As you can see, it took nearly 14 minutes to execute this query.
> The query:
>
> hive> select count(1) from searchlogs;
>
> fired immediately after the above one, takes about 25 minutes,and gives the
> answer as 15118.
>
> As you all pointed out,this is slow,even by Hive standards.How do I proceed
> further to solve this problem?
>
> P.S: In this setup,data is being continuously added to the HDFS at the
> approx. rate of 1 mb/sec through Flume (https://github.com/cloudera/flume)
> .Hive runs and queries on top of that data.Could this,in any way,affect
> performance?If so,what can be the solution?
>
> Regards,
> Abhishek Pathak
>
> ------------------------------
> *From:* abhishek pathak <fo...@yahoo.co.in>
>
> *To:* user@hive.apache.org
> *Sent:* Tue, 8 March, 2011 12:04:22 PM
>
> *Subject:* Re: Hive too slow?
>
> Thank you all for the tips.I'll dig into all these and let you people know
> :)
>
> ------------------------------
> *From:* Igor Tatarinov <ig...@decide.com>
> *To:* user@hive.apache.org
> *Sent:* Tue, 8 March, 2011 11:47:20 AM
> *Subject:* Re: Hive too slow?
>
> Most likely, Hadoop's memory settings are too high and Linux starts
> swapping. You should be able to detect that too using vmstat.
> Just a guess.
>
> On Mon, Mar 7, 2011 at 10:11 PM, Ajo Fod <aj...@gmail.com> wrote:
>
>> hmm I don't know of such a place ... but if I had to debug, I'd try to
>> understand the following:
>> 1) are the underlying files zipped/compressed ... that ususally makes it
>> slower.
>> 2) are the files located on the hard drive or hdfs?
>> 3) are all the cores being used? ... check number of reduce and map tasks.
>>
>> -Ajo
>>
>>
>> On Mon, Mar 7, 2011 at 9:24 PM, abhishek pathak <
>> forever_yours_abhi@yahoo.co.in> wrote:
>>
>>> I suspected as such.My system is a Core2Duo,1.86 Ghz.I understand that
>>> map-reduce is not instantaneous, just wanted to confirm that 2200 rows in 4
>>> minutes is indeeed not normal behaviour.Could you point me at some places
>>> where i can get some info on how to tune this up?
>>>
>>> Regards,
>>> Abhishek
>>>
>>> ------------------------------
>>> *From:* Ajo Fod <aj...@gmail.com>
>>> *To:* user@hive.apache.org
>>> *Sent:* Mon, 7 March, 2011 9:21:51 PM
>>> *Subject:* Re: Hive too slow?
>>>
>>> In my experience, hive is not instantaneous like other DBs, but 4 minutes
>>> to count 2200 rows seems unreasonable.
>>>
>>> For comparison my query of 169k rows one one computer with 4 cores
>>> running 1Ghz (approx) took 20 seconds.
>>>
>>> Cheers,
>>> Ajo.
>>>
>>> On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak <
>>> forever_yours_abhi@yahoo.co.in> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am a hive newbie.I just finished setting up hive on a cluster of two
>>>> servers for my organisation.As a test drill, we operated some simple
>>>> queries.It took the standard map-reduce algorithm around 4 minutes just to
>>>> execute this query:
>>>>
>>>> count(1) from tablename;
>>>>
>>>> The answer returned was around 2200.Clearly, this is not a big number by
>>>> hadoop standards.My question is whether this is a standard performance or is
>>>> there some configuration that is not optimised?Will scaling up of data to
>>>> say,50 times, produce any drastic slowness?I tried reading the documentation
>>>> but was not clear on these issues, and i would like to have an idea before
>>>> this setup starts working in a production environment.
>>>>
>>>> Thanks in advance,
>>>> Regards,
>>>> Abhishek Pathak
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
>

Re: Hive too slow?

Posted by abhishek pathak <fo...@yahoo.co.in>.

Hi all,

I looked into various configurations and have come up with the following 
information:

1. The underlying files are compressed .seq files,but i guess that is a pretty 
standard format for HDFS.
2. The files are located on the hdfs spread across 2 servers,on top of which 
hive runs.
3.  I am not too familiar with map-reduce,however to the best of my knowledge 
all the configurations in the jobtrackers as well as core-     utilization were 
the default ones.
4. No swapping occurs( ~200mb remains free all the time)

For more information I give below the output of a sample query:

hive> select count (1) from searchlogs where days=20110311;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201010220228_0022, Tracking URL = 
http://serverName:50030/jobdetails.jsp?jobid=job_201010220228_0022
Kill Command = /home/userName/hadoop/bin/hadoop job 
 -Dmapred.job.tracker=serverName:9001 -kill job_201010220228_0022
2011-03-11 16:49:15,457 Stage-1 map = 0%,  reduce = 0%
2011-03-11 16:49:26,677 Stage-1 map = 1%,  reduce = 0%
2011-03-11 16:49:32,926 Stage-1 map = 2%,  reduce = 0%
2011-03-11 16:49:45,350 Stage-1 map = 3%,  reduce = 0%
2011-03-11 16:49:51,988 Stage-1 map = 4%,  reduce = 0%
2011-03-11 16:49:57,424 Stage-1 map = 5%,  reduce = 0%
2011-03-11 16:50:09,872 Stage-1 map = 6%,  reduce = 0%
2011-03-11 16:50:16,056 Stage-1 map = 7%,  reduce = 2%
2011-03-11 16:50:28,403 Stage-1 map = 8%,  reduce = 2%
.......................................................
.......................................................
.......................................................
2011-03-11 17:02:25,947 Stage-1 map = 96%,  reduce = 32%
2011-03-11 17:02:28,026 Stage-1 map = 97%,  reduce = 32%
2011-03-11 17:02:37,483 Stage-1 map = 98%,  reduce = 32%
2011-03-11 17:02:43,920 Stage-1 map = 99%,  reduce = 32%
2011-03-11 17:02:47,036 Stage-1 map = 99%,  reduce = 33%
2011-03-11 17:02:50,135 Stage-1 map = 100%,  reduce = 33%
2011-03-11 17:03:04,455 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201010220228_0022
OK
6768
Time taken: 835.495 seconds
hive>

As you can see, it took nearly 14 minutes to execute this query.
The query:

hive> select count(1) from searchlogs;

fired immediately after the above one, takes about 25 minutes,and gives the 
answer as 15118.

As you all pointed out,this is slow,even by Hive standards.How do I proceed 
further to solve this problem?

P.S: In this setup,data is being continuously added to the HDFS at the approx. 
rate of 1 mb/sec through Flume (https://github.com/cloudera/flume) .Hive runs 
and queries on top of that data.Could this,in any way,affect performance?If 
so,what can be the solution?

Regards,
Abhishek Pathak



________________________________
From: abhishek pathak <fo...@yahoo.co.in>
To: user@hive.apache.org
Sent: Tue, 8 March, 2011 12:04:22 PM
Subject: Re: Hive too slow?


Thank you all for the tips.I'll dig into all these and let you people know :)



________________________________
From: Igor Tatarinov <ig...@decide.com>
To: user@hive.apache.org
Sent: Tue, 8 March, 2011 11:47:20 AM
Subject: Re: Hive too slow?

Most likely, Hadoop's memory settings are too high and Linux starts swapping. 
You should be able to detect that too using vmstat.
Just a guess.


On Mon, Mar 7, 2011 at 10:11 PM, Ajo Fod <aj...@gmail.com> wrote:

hmm I don't know of such a place ... but if I had to debug, I'd try to 
understand the following:
>1) are the underlying files zipped/compressed ... that ususally makes it 
slower.
>2) are the files located on the hard drive or hdfs?
>3) are all the cores being used? ... check number of reduce and map tasks.
>
>-Ajo
>
>
>
>On Mon, Mar 7, 2011 at 9:24 PM, abhishek pathak <fo...@yahoo.co.in> 
>wrote:
>
>I suspected as such.My system is a Core2Duo,1.86 Ghz.I understand that 
>map-reduce is not instantaneous, just wanted to confirm that 2200 rows in 4 
>minutes is indeeed not normal behaviour.Could you point me at some places where 
>i can get some info on how to tune this up?
>>
>>
>>Regards,
>>Abhishek
>>
>>
>>
________________________________
 From: Ajo Fod <aj...@gmail.com>
>>To: user@hive.apache.org
>>Sent: Mon, 7 March, 2011 9:21:51 PM
>>Subject: Re: Hive too slow?
>>
>>
>>In my experience, hive is not instantaneous like other DBs, but 4 minutes to 
>>count 2200 rows seems unreasonable.
>>
>>For comparison my query of 169k rows one one computer with 4 cores running 1Ghz 
>>(approx) took 20 seconds.
>>
>>Cheers,
>>Ajo.
>>
>>
>>On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak <fo...@yahoo.co.in> 
>>wrote:
>>
>>Hi,
>>>
>>>
>>>I am a hive newbie.I just finished setting up hive on a cluster of two servers 
>>>for my organisation.As a test drill, we operated some simple queries.It took the 
>>>standard map-reduce algorithm around 4 minutes just to execute this query:
>>>
>>>
>>>count(1) from tablename;
>>>
>>>
>>>The answer returned was around 2200.Clearly, this is not a big number by hadoop 
>>>standards.My question is whether this is a standard performance or is there some 
>>>configuration that is not optimised?Will scaling up of data to say,50 times, 
>>>produce any drastic slowness?I tried reading the documentation but was not clear 
>>>on these issues, and i would like to have an idea before this setup starts 
>>>working in a production  environment.
>>>
>>>
>>>Thanks in advance,
>>>Regards,
>>>Abhishek Pathak
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: Hive too slow?

Posted by abhishek pathak <fo...@yahoo.co.in>.

Thank you all for the tips.I'll dig into all these and let you people know :)



________________________________
From: Igor Tatarinov <ig...@decide.com>
To: user@hive.apache.org
Sent: Tue, 8 March, 2011 11:47:20 AM
Subject: Re: Hive too slow?

Most likely, Hadoop's memory settings are too high and Linux starts swapping. 
You should be able to detect that too using vmstat.
Just a guess.


On Mon, Mar 7, 2011 at 10:11 PM, Ajo Fod <aj...@gmail.com> wrote:

hmm I don't know of such a place ... but if I had to debug, I'd try to 
understand the following:
>1) are the underlying files zipped/compressed ... that ususally makes it 
slower.
>2) are the files located on the hard drive or hdfs?
>3) are all the cores being used? ... check number of reduce and map tasks.
>
>-Ajo
>
>
>
>On Mon, Mar 7, 2011 at 9:24 PM, abhishek pathak <fo...@yahoo.co.in> 
>wrote:
>
>I suspected as such.My system is a Core2Duo,1.86 Ghz.I understand that 
>map-reduce is not instantaneous, just wanted to confirm that 2200 rows in 4 
>minutes is indeeed not normal behaviour.Could you point me at some places where 
>i can get some info on how to tune this up?
>>
>>
>>Regards,
>>Abhishek
>>
>>
>>
________________________________
 From: Ajo Fod <aj...@gmail.com>
>>To: user@hive.apache.org
>>Sent: Mon, 7 March, 2011 9:21:51 PM
>>Subject: Re: Hive too slow?
>>
>>
>>In my experience, hive is not instantaneous like other DBs, but 4 minutes to 
>>count 2200 rows seems unreasonable.
>>
>>For comparison my query of 169k rows one one computer with 4 cores running 1Ghz 
>>(approx) took 20 seconds.
>>
>>Cheers,
>>Ajo.
>>
>>
>>On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak <fo...@yahoo.co.in> 
>>wrote:
>>
>>Hi,
>>>
>>>
>>>I am a hive newbie.I just finished setting up hive on a cluster of two servers 
>>>for my organisation.As a test drill, we operated some simple queries.It took the 
>>>standard map-reduce algorithm around 4 minutes just to execute this query:
>>>
>>>
>>>count(1) from tablename;
>>>
>>>
>>>The answer returned was around 2200.Clearly, this is not a big number by hadoop 
>>>standards.My question is whether this is a standard performance or is there some 
>>>configuration that is not optimised?Will scaling up of data to say,50 times, 
>>>produce any drastic slowness?I tried reading the documentation but was not clear 
>>>on these issues, and i would like to have an idea before this setup starts 
>>>working in a production  environment.
>>>
>>>
>>>Thanks in advance,
>>>Regards,
>>>Abhishek Pathak
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: Hive too slow?

Posted by Igor Tatarinov <ig...@decide.com>.

Most likely, Hadoop's memory settings are too high and Linux starts
swapping. You should be able to detect that too using vmstat.
Just a guess.

On Mon, Mar 7, 2011 at 10:11 PM, Ajo Fod <aj...@gmail.com> wrote:

> hmm I don't know of such a place ... but if I had to debug, I'd try to
> understand the following:
> 1) are the underlying files zipped/compressed ... that ususally makes it
> slower.
> 2) are the files located on the hard drive or hdfs?
> 3) are all the cores being used? ... check number of reduce and map tasks.
>
> -Ajo
>
>
> On Mon, Mar 7, 2011 at 9:24 PM, abhishek pathak <
> forever_yours_abhi@yahoo.co.in> wrote:
>
>> I suspected as such.My system is a Core2Duo,1.86 Ghz.I understand that
>> map-reduce is not instantaneous, just wanted to confirm that 2200 rows in 4
>> minutes is indeeed not normal behaviour.Could you point me at some places
>> where i can get some info on how to tune this up?
>>
>> Regards,
>> Abhishek
>>
>> ------------------------------
>> *From:* Ajo Fod <aj...@gmail.com>
>> *To:* user@hive.apache.org
>> *Sent:* Mon, 7 March, 2011 9:21:51 PM
>> *Subject:* Re: Hive too slow?
>>
>> In my experience, hive is not instantaneous like other DBs, but 4 minutes
>> to count 2200 rows seems unreasonable.
>>
>> For comparison my query of 169k rows one one computer with 4 cores running
>> 1Ghz (approx) took 20 seconds.
>>
>> Cheers,
>> Ajo.
>>
>> On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak <
>> forever_yours_abhi@yahoo.co.in> wrote:
>>
>>> Hi,
>>>
>>> I am a hive newbie.I just finished setting up hive on a cluster of two
>>> servers for my organisation.As a test drill, we operated some simple
>>> queries.It took the standard map-reduce algorithm around 4 minutes just to
>>> execute this query:
>>>
>>> count(1) from tablename;
>>>
>>> The answer returned was around 2200.Clearly, this is not a big number by
>>> hadoop standards.My question is whether this is a standard performance or is
>>> there some configuration that is not optimised?Will scaling up of data to
>>> say,50 times, produce any drastic slowness?I tried reading the documentation
>>> but was not clear on these issues, and i would like to have an idea before
>>> this setup starts working in a production environment.
>>>
>>> Thanks in advance,
>>> Regards,
>>> Abhishek Pathak
>>>
>>>
>>>
>>>
>>
>>
>

Re: Hive too slow?

Posted by Ajo Fod <aj...@gmail.com>.

hmm I don't know of such a place ... but if I had to debug, I'd try to
understand the following:
1) are the underlying files zipped/compressed ... that ususally makes it
slower.
2) are the files located on the hard drive or hdfs?
3) are all the cores being used? ... check number of reduce and map tasks.

-Ajo

On Mon, Mar 7, 2011 at 9:24 PM, abhishek pathak <
forever_yours_abhi@yahoo.co.in> wrote:

> I suspected as such.My system is a Core2Duo,1.86 Ghz.I understand that
> map-reduce is not instantaneous, just wanted to confirm that 2200 rows in 4
> minutes is indeeed not normal behaviour.Could you point me at some places
> where i can get some info on how to tune this up?
>
> Regards,
> Abhishek
>
> ------------------------------
> *From:* Ajo Fod <aj...@gmail.com>
> *To:* user@hive.apache.org
> *Sent:* Mon, 7 March, 2011 9:21:51 PM
> *Subject:* Re: Hive too slow?
>
> In my experience, hive is not instantaneous like other DBs, but 4 minutes
> to count 2200 rows seems unreasonable.
>
> For comparison my query of 169k rows one one computer with 4 cores running
> 1Ghz (approx) took 20 seconds.
>
> Cheers,
> Ajo.
>
> On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak <
> forever_yours_abhi@yahoo.co.in> wrote:
>
>> Hi,
>>
>> I am a hive newbie.I just finished setting up hive on a cluster of two
>> servers for my organisation.As a test drill, we operated some simple
>> queries.It took the standard map-reduce algorithm around 4 minutes just to
>> execute this query:
>>
>> count(1) from tablename;
>>
>> The answer returned was around 2200.Clearly, this is not a big number by
>> hadoop standards.My question is whether this is a standard performance or is
>> there some configuration that is not optimised?Will scaling up of data to
>> say,50 times, produce any drastic slowness?I tried reading the documentation
>> but was not clear on these issues, and i would like to have an idea before
>> this setup starts working in a production environment.
>>
>> Thanks in advance,
>> Regards,
>> Abhishek Pathak
>>
>>
>>
>>
>
>

Re: Hive too slow?

Posted by Vijay <te...@gmail.com>.

If you go to the jobtracker's web UI, it provides plenty of details
about each job. Even with all the default settings of a typical
hadoop/hive installation, 4 minutes for 2200 rows is extremely slow.
It feels like there is some kind of problem but it is hard to guess
what that could be. Digging through the web UI can tell you how much
time is spent on map vs reduce. It can also provide some insight into
the I/O operations.

Good luck!
-Vijay

On Mon, Mar 7, 2011 at 9:24 PM, abhishek pathak
<fo...@yahoo.co.in> wrote:
> I suspected as such.My system is a Core2Duo,1.86 Ghz.I understand that
> map-reduce is not instantaneous, just wanted to confirm that 2200 rows in 4
> minutes is indeeed not normal behaviour.Could you point me at some places
> where i can get some info on how to tune this up?
> Regards,
> Abhishek
> ________________________________
> From: Ajo Fod <aj...@gmail.com>
> To: user@hive.apache.org
> Sent: Mon, 7 March, 2011 9:21:51 PM
> Subject: Re: Hive too slow?
>
> In my experience, hive is not instantaneous like other DBs, but 4 minutes to
> count 2200 rows seems unreasonable.
>
> For comparison my query of 169k rows one one computer with 4 cores running
> 1Ghz (approx) took 20 seconds.
>
> Cheers,
> Ajo.
>
> On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak
> <fo...@yahoo.co.in> wrote:
>>
>> Hi,
>> I am a hive newbie.I just finished setting up hive on a cluster of two
>> servers for my organisation.As a test drill, we operated some simple
>> queries.It took the standard map-reduce algorithm around 4 minutes just to
>> execute this query:
>> count(1) from tablename;
>> The answer returned was around 2200.Clearly, this is not a big number by
>> hadoop standards.My question is whether this is a standard performance or is
>> there some configuration that is not optimised?Will scaling up of data to
>> say,50 times, produce any drastic slowness?I tried reading the documentation
>> but was not clear on these issues, and i would like to have an idea before
>> this setup starts working in a production environment.
>> Thanks in advance,
>> Regards,
>> Abhishek Pathak
>>
>>
>
>
>

Re: Hive too slow?

Posted by abhishek pathak <fo...@yahoo.co.in>.

I suspected as such.My system is a Core2Duo,1.86 Ghz.I understand that 
map-reduce is not instantaneous, just wanted to confirm that 2200 rows in 4 
minutes is indeeed not normal behaviour.Could you point me at some places where 
i can get some info on how to tune this up?

Regards,
Abhishek

________________________________
From: Ajo Fod <aj...@gmail.com>
To: user@hive.apache.org
Sent: Mon, 7 March, 2011 9:21:51 PM
Subject: Re: Hive too slow?

In my experience, hive is not instantaneous like other DBs, but 4 minutes to 
count 2200 rows seems unreasonable.

For comparison my query of 169k rows one one computer with 4 cores running 1Ghz 
(approx) took 20 seconds.

Cheers,
Ajo.

On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak <fo...@yahoo.co.in> 
wrote:

Hi,
>
>
>I am a hive newbie.I just finished setting up hive on a cluster of two servers 
>for my organisation.As a test drill, we operated some simple queries.It took the 
>standard map-reduce algorithm around 4 minutes just to execute this query:
>
>
>count(1) from tablename;
>
>
>The answer returned was around 2200.Clearly, this is not a big number by hadoop 
>standards.My question is whether this is a standard performance or is there some 
>configuration that is not optimised?Will scaling up of data to say,50 times, 
>produce any drastic slowness?I tried reading the documentation but was not clear 
>on these issues, and i would like to have an idea before this setup starts 
>working in a production  environment.
>
>
>Thanks in advance,
>Regards,
>Abhishek Pathak
>
>
>
>
>

Re: Hive too slow?

Posted by Ajo Fod <aj...@gmail.com>.

In my experience, hive is not instantaneous like other DBs, but 4 minutes to
count 2200 rows seems unreasonable.

For comparison my query of 169k rows one one computer with 4 cores running
1Ghz (approx) took 20 seconds.

Cheers,
Ajo.

On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak <
forever_yours_abhi@yahoo.co.in> wrote:

> Hi,
>
> I am a hive newbie.I just finished setting up hive on a cluster of two
> servers for my organisation.As a test drill, we operated some simple
> queries.It took the standard map-reduce algorithm around 4 minutes just to
> execute this query:
>
> count(1) from tablename;
>
> The answer returned was around 2200.Clearly, this is not a big number by
> hadoop standards.My question is whether this is a standard performance or is
> there some configuration that is not optimised?Will scaling up of data to
> say,50 times, produce any drastic slowness?I tried reading the documentation
> but was not clear on these issues, and i would like to have an idea before
> this setup starts working in a production environment.
>
> Thanks in advance,
> Regards,
> Abhishek Pathak
>
>
>
>