You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ophir Cohen <op...@gmail.com> on 2011/08/11 10:08:47 UTC
Bulk upload
Hi,
I started to use bulk upload and encounter a strange problem.
I'm using Cloudera cdh3-u1.
I'm using HFileOutputFormat.configureIncrementalLoad() to configure my job.
This method create partition file for the TotalOrderPartitioner and save it
to HDFS.
When the TotalOrderPartitioner initiated it tries to find the path for the
file in the configuration:
public static String getPartitionFile(Configuration conf) {
return conf.get(PARTITIONER_PATH, DEFAULT_PATH);
}
The strange thing is that this parameter never assigned!
It looks to me that it should have configured
in HFileOutputFormat.configureIncrementalLoad() but it does not!
Then it takes the default ("_part") or something similar and (of course)
does not find it...
BTW
When I manually add this parameter it works great.
Is that a bug or do I miss something?
Thanks,
Ophir
Re: Bulk upload
Posted by Ophir Cohen <op...@gmail.com>.
Thanks for the answer - it exactly what I encountered...
It looks that it still exists in cdh3-u1...
Ophir
On Tue, Aug 16, 2011 at 02:07, Jean-Daniel Cryans <jd...@apache.org>wrote:
> From this jira it was fixed in 0.21.0:
> https://issues.apache.org/jira/browse/MAPREDUCE-476
>
> I know CDH has it patched in, not sure about the others.
>
> J-D
>
> On Thu, Aug 11, 2011 at 1:28 AM, Ophir Cohen <op...@gmail.com> wrote:
> > I did some more tests and found the problem: on local run the distribtued
> > cache does not work.
> >
> > On full cluster it works.
> > Sorry for your time...
> > Ophir
> >
> > PS
> > Is there any way to use distributed cache localy as well (i.e. when I'm
> > running MR from intellijIdea )?
> >
> > On Thu, Aug 11, 2011 at 11:20, Ophir Cohen <op...@gmail.com> wrote:
> >
> >> Now I see that it uses the distributed cache - but for some reason
> >> the TotalOrderPartitioner does not grab it.
> >> Ophir
> >>
> >>
> >> On Thu, Aug 11, 2011 at 11:08, Ophir Cohen <op...@gmail.com> wrote:
> >>
> >>> Hi,
> >>> I started to use bulk upload and encounter a strange problem.
> >>> I'm using Cloudera cdh3-u1.
> >>>
> >>> I'm using HFileOutputFormat.configureIncrementalLoad() to configure my
> >>> job.
> >>> This method create partition file for the TotalOrderPartitioner and
> save
> >>> it to HDFS.
> >>>
> >>> When the TotalOrderPartitioner initiated it tries to find the path for
> the
> >>> file in the configuration:
> >>> public static String getPartitionFile(Configuration conf) {
> >>> return conf.get(PARTITIONER_PATH, DEFAULT_PATH);
> >>> }
> >>>
> >>> The strange thing is that this parameter never assigned!
> >>> It looks to me that it should have configured
> >>> in HFileOutputFormat.configureIncrementalLoad() but it does not!
> >>>
> >>> Then it takes the default ("_part") or something similar and (of
> course)
> >>> does not find it...
> >>>
> >>> BTW
> >>> When I manually add this parameter it works great.
> >>>
> >>> Is that a bug or do I miss something?
> >>> Thanks,
> >>> Ophir
> >>>
> >>>
> >>
> >
>
Re: Bulk upload
Posted by Jean-Daniel Cryans <jd...@apache.org>.
>From this jira it was fixed in 0.21.0:
https://issues.apache.org/jira/browse/MAPREDUCE-476
I know CDH has it patched in, not sure about the others.
J-D
On Thu, Aug 11, 2011 at 1:28 AM, Ophir Cohen <op...@gmail.com> wrote:
> I did some more tests and found the problem: on local run the distribtued
> cache does not work.
>
> On full cluster it works.
> Sorry for your time...
> Ophir
>
> PS
> Is there any way to use distributed cache localy as well (i.e. when I'm
> running MR from intellijIdea )?
>
> On Thu, Aug 11, 2011 at 11:20, Ophir Cohen <op...@gmail.com> wrote:
>
>> Now I see that it uses the distributed cache - but for some reason
>> the TotalOrderPartitioner does not grab it.
>> Ophir
>>
>>
>> On Thu, Aug 11, 2011 at 11:08, Ophir Cohen <op...@gmail.com> wrote:
>>
>>> Hi,
>>> I started to use bulk upload and encounter a strange problem.
>>> I'm using Cloudera cdh3-u1.
>>>
>>> I'm using HFileOutputFormat.configureIncrementalLoad() to configure my
>>> job.
>>> This method create partition file for the TotalOrderPartitioner and save
>>> it to HDFS.
>>>
>>> When the TotalOrderPartitioner initiated it tries to find the path for the
>>> file in the configuration:
>>> public static String getPartitionFile(Configuration conf) {
>>> return conf.get(PARTITIONER_PATH, DEFAULT_PATH);
>>> }
>>>
>>> The strange thing is that this parameter never assigned!
>>> It looks to me that it should have configured
>>> in HFileOutputFormat.configureIncrementalLoad() but it does not!
>>>
>>> Then it takes the default ("_part") or something similar and (of course)
>>> does not find it...
>>>
>>> BTW
>>> When I manually add this parameter it works great.
>>>
>>> Is that a bug or do I miss something?
>>> Thanks,
>>> Ophir
>>>
>>>
>>
>
Re: Bulk upload
Posted by Ophir Cohen <op...@gmail.com>.
I did some more tests and found the problem: on local run the distribtued
cache does not work.
On full cluster it works.
Sorry for your time...
Ophir
PS
Is there any way to use distributed cache localy as well (i.e. when I'm
running MR from intellijIdea )?
On Thu, Aug 11, 2011 at 11:20, Ophir Cohen <op...@gmail.com> wrote:
> Now I see that it uses the distributed cache - but for some reason
> the TotalOrderPartitioner does not grab it.
> Ophir
>
>
> On Thu, Aug 11, 2011 at 11:08, Ophir Cohen <op...@gmail.com> wrote:
>
>> Hi,
>> I started to use bulk upload and encounter a strange problem.
>> I'm using Cloudera cdh3-u1.
>>
>> I'm using HFileOutputFormat.configureIncrementalLoad() to configure my
>> job.
>> This method create partition file for the TotalOrderPartitioner and save
>> it to HDFS.
>>
>> When the TotalOrderPartitioner initiated it tries to find the path for the
>> file in the configuration:
>> public static String getPartitionFile(Configuration conf) {
>> return conf.get(PARTITIONER_PATH, DEFAULT_PATH);
>> }
>>
>> The strange thing is that this parameter never assigned!
>> It looks to me that it should have configured
>> in HFileOutputFormat.configureIncrementalLoad() but it does not!
>>
>> Then it takes the default ("_part") or something similar and (of course)
>> does not find it...
>>
>> BTW
>> When I manually add this parameter it works great.
>>
>> Is that a bug or do I miss something?
>> Thanks,
>> Ophir
>>
>>
>
Re: Bulk upload
Posted by Ophir Cohen <op...@gmail.com>.
Now I see that it uses the distributed cache - but for some reason
the TotalOrderPartitioner does not grab it.
Ophir
On Thu, Aug 11, 2011 at 11:08, Ophir Cohen <op...@gmail.com> wrote:
> Hi,
> I started to use bulk upload and encounter a strange problem.
> I'm using Cloudera cdh3-u1.
>
> I'm using HFileOutputFormat.configureIncrementalLoad() to configure my
> job.
> This method create partition file for the TotalOrderPartitioner and save it
> to HDFS.
>
> When the TotalOrderPartitioner initiated it tries to find the path for the
> file in the configuration:
> public static String getPartitionFile(Configuration conf) {
> return conf.get(PARTITIONER_PATH, DEFAULT_PATH);
> }
>
> The strange thing is that this parameter never assigned!
> It looks to me that it should have configured
> in HFileOutputFormat.configureIncrementalLoad() but it does not!
>
> Then it takes the default ("_part") or something similar and (of course)
> does not find it...
>
> BTW
> When I manually add this parameter it works great.
>
> Is that a bug or do I miss something?
> Thanks,
> Ophir
>
>