You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Philippe Girolami <ph...@girolami.org> on 2011/03/09 00:00:50 UTC
Is it possible to run a query over multiple cores for a (small)
dataset in local mode ?
Hi,
I am testing the Hive 0.6 on parts of my data set. It's only a couple GB of
log files that I am reading through a custom SerDe. The table is
partitionned. I am using Hadoop local mode for testing.
When I run simple Group By queries (4 MR jobs), I am getting logs such as
- map : 100%
- reduce : 0%
- map : 85%
- reduce : 0%
- map : 86%
- reduce : 0%
all the while only using one core on an 8 core server. Kind of a waste...
I have activated the parallel option but it still won't parallelize. I have
set the number of reduce jobs to be 8.
My expectations is that since my data set is partitionned (=> different
files), at least some of the map-reduce phases could be run on parallel on
those files.
Is my understanding wrong ? Is there a specific way to write the queries ?
Thanks
Philippe