You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Roberto Congiu <ro...@openx.org> on 2009/09/29 03:57:59 UTC

HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop

Hi guys,
I've been working on integrating hive with a legacy file format we use
here. I wrote the appropriate InputFormat and SerDe and everything
works, but it's painfully slow.
The reason is that the files I am reading are many and hive uses one
mapper for every file.
I saw the HIVE-74 patches but those use CombineFileInputFormat which
is available on hadoop 0.20...but we use 0.19. Is there any reason the
same goal could not be achieved using the deprecated (but present  <
0.20) MultiFileInputFormat ?

Thanks,
Roberto

RE: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop

Posted by Namit Jain <nj...@facebook.com>.
Just checked - CombineFileInputFormat and a lot of other related stuff went to hadoop 0.20
So, it would be very difficult to add this for 0.19


From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, September 28, 2009 10:30 PM
To: hive-user@hadoop.apache.org; roberto.congiu@openx.org
Subject: Re: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop

I am not sure whether CombineFileInputFormat (in hadoop) is available in 0.19 -
If it is, we can add it, otherwise it will be very difficult.



On 9/28/09 7:06 PM, "Raghu Murthy" <rm...@facebook.com> wrote:
Can we add MultiFileInputFormat as the CombineFileInputFormatShim for
hadoop-0.19?

On 9/28/09 6:57 PM, "Roberto Congiu" <ro...@openx.org> wrote:

> Hi guys,
> I've been working on integrating hive with a legacy file format we use
> here. I wrote the appropriate InputFormat and SerDe and everything
> works, but it's painfully slow.
> The reason is that the files I am reading are many and hive uses one
> mapper for every file.
> I saw the HIVE-74 patches but those use CombineFileInputFormat which
> is available on hadoop 0.20...but we use 0.19. Is there any reason the
> same goal could not be achieved using the deprecated (but present  <
> 0.20) MultiFileInputFormat ?
>
> Thanks,
> Roberto


Re: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop

Posted by Namit Jain <nj...@facebook.com>.
I am not sure whether CombineFileInputFormat (in hadoop) is available in 0.19 -
If it is, we can add it, otherwise it will be very difficult.



On 9/28/09 7:06 PM, "Raghu Murthy" <rm...@facebook.com> wrote:

Can we add MultiFileInputFormat as the CombineFileInputFormatShim for
hadoop-0.19?

On 9/28/09 6:57 PM, "Roberto Congiu" <ro...@openx.org> wrote:

> Hi guys,
> I've been working on integrating hive with a legacy file format we use
> here. I wrote the appropriate InputFormat and SerDe and everything
> works, but it's painfully slow.
> The reason is that the files I am reading are many and hive uses one
> mapper for every file.
> I saw the HIVE-74 patches but those use CombineFileInputFormat which
> is available on hadoop 0.20...but we use 0.19. Is there any reason the
> same goal could not be achieved using the deprecated (but present  <
> 0.20) MultiFileInputFormat ?
>
> Thanks,
> Roberto



Re: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop

Posted by Raghu Murthy <rm...@facebook.com>.
Can we add MultiFileInputFormat as the CombineFileInputFormatShim for
hadoop-0.19? 

On 9/28/09 6:57 PM, "Roberto Congiu" <ro...@openx.org> wrote:

> Hi guys,
> I've been working on integrating hive with a legacy file format we use
> here. I wrote the appropriate InputFormat and SerDe and everything
> works, but it's painfully slow.
> The reason is that the files I am reading are many and hive uses one
> mapper for every file.
> I saw the HIVE-74 patches but those use CombineFileInputFormat which
> is available on hadoop 0.20...but we use 0.19. Is there any reason the
> same goal could not be achieved using the deprecated (but present  <
> 0.20) MultiFileInputFormat ?
> 
> Thanks,
> Roberto