You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by 村下瑛 <ak...@gmail.com> on 2014/12/24 10:27:50 UTC

Number of mappers is always 1 for external Parquet tables.

Hi, all

I am trying to load pig output from Hive as an external table,
and currently stuck with that Hive always set the number of mappers to 1,
though it has more than 10 million records and is composed of multiple
files.
Could any of guys have any idea?

To be more specific, the output is in Parquet format generated by Pig Script
without any compression.

STORE rows INTO '/table-data/test' USING parquet.pig.ParquetStorer;

The directory does contain 16 part-m-00xx.parquet files and _metadata.
And the external table is pointed to the directory.

Here are the create table statement I've used.

CREATE EXTERNAL TABLE `t_main_wop`(
  `id` string,
  `f1` string,
  ...
 )
STORED AS PARQUET
LOCATION
  '/table-data/test';

It seem to properly read the parquet file itself since
SELECT * FROM test;
returns the proper result.

However, everytime I give it queries that requires mapreduce jobs,
It only uses single mapper, and takes like forever.

hive> select count(*) from t_main_wop;
Query ID = xxx
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_yyy, Tracking URL = zzz
Kill Command = hadoop_job  -kill job_yyy
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 1
2014-12-24 02:49:46,912 Stage-1 map = 0%,  reduce = 0%
2014-12-24 02:50:45,847 Stage-1 map = 0%,  reduce = 0%


Why is it?
I've set mapred.map.tasks=100, but to no avail.
Again the directory contans 16 part files, so I think it sould be able to
use at least 16 mappers.

I would really appreciate if you could give me any suggestions
Thanks,

Akira

Re: Number of mappers is always 1 for external Parquet tables.

Posted by Navis류승우 <na...@nexr.com>.
Try with "set
hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat"

Thanks,
Navis

2014-12-24 18:27 GMT+09:00 村下瑛 <ak...@gmail.com>:

> Hi, all
>
> I am trying to load pig output from Hive as an external table,
> and currently stuck with that Hive always set the number of mappers to 1,
> though it has more than 10 million records and is composed of multiple
> files.
> Could any of guys have any idea?
>
> To be more specific, the output is in Parquet format generated by Pig
> Script
> without any compression.
>
> STORE rows INTO '/table-data/test' USING parquet.pig.ParquetStorer;
>
> The directory does contain 16 part-m-00xx.parquet files and _metadata.
> And the external table is pointed to the directory.
>
> Here are the create table statement I've used.
>
> CREATE EXTERNAL TABLE `t_main_wop`(
>   `id` string,
>   `f1` string,
>   ...
>  )
> STORED AS PARQUET
> LOCATION
>   '/table-data/test';
>
> It seem to properly read the parquet file itself since
> SELECT * FROM test;
> returns the proper result.
>
> However, everytime I give it queries that requires mapreduce jobs,
> It only uses single mapper, and takes like forever.
>
> hive> select count(*) from t_main_wop;
> Query ID = xxx
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=<number>
> Starting Job = job_yyy, Tracking URL = zzz
> Kill Command = hadoop_job  -kill job_yyy
> Hadoop job information for Stage-1: number of mappers: 1; number of
> reducers: 1
> 2014-12-24 02:49:46,912 Stage-1 map = 0%,  reduce = 0%
> 2014-12-24 02:50:45,847 Stage-1 map = 0%,  reduce = 0%
>
>
> Why is it?
> I've set mapred.map.tasks=100, but to no avail.
> Again the directory contans 16 part files, so I think it sould be able to
> use at least 16 mappers.
>
> I would really appreciate if you could give me any suggestions
> Thanks,
>
> Akira
>
>
>
>
>