You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Ioakim Perros <im...@gmail.com> on 2012/07/23 02:45:45 UTC

Efficient read/write - Iterative M/R jobs

Hi,

Is there any efficient way (beyond the trivial using TableMapReduceUtil 
/ TableOutputFormat) to perform faster read and write operations to 
tables ? Could anyone provide some example code of it ?

As of faster importing to table, I am aware of tools such as 
completebulkload, but I would prefer triggering such a process through 
M/R code, as I would like a whole table to be read and updated through 
iterations of M/R jobs.

Thanks in advance!
IP

Re: Efficient read/write - Iterative M/R jobs

Posted by Ioakim Perros <im...@gmail.com>.

Update (for anyone ending up here after a possible google search on the 
issue) :

Finally, running M/R job in order to bulk import data in a 
pseudo-distributed is feasible (for testing purposes) .

The error concerning TotalOrderPartitioner had something to do with a 
trivial bug at the keys I passed from mappers.

The thing is that you need to add "guava-r09.jar" (or any version of 
latest guava I suppose - it is located under lib folder of hbase setup 
path) to the lib folder of hadoop setup path. I suppose that in order 
for the same job to run on a truly distributed environment, one has to 
add -libjars /path/to/guava.jar to the options of hadoop jar command.

On 07/24/2012 02:06 AM, Jean-Daniel Cryans wrote:
>> ... INFO mapred.JobClient: Task Id : attempt_201207232344_0001_m_000000_0,
>> Status : FAILED
>> java.lang.IllegalArgumentException: *Can't read partitions file*
>>      at
>> org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
>> ...
>>
>> I followed this link, while googling for the solution :
>> http://hbase.apache.org/book/trouble.mapreduce.html
>> and it implies a misconfiguration concerning a fully distributed
>> environment.
>>
>> I would like, therefore, to ask if it is even possible to bulk import data
>> in a pseudo-distributed mode and if this is the case, does anyone have a
>> guess about this error?
> AFAIK you just can't use the local job tracker for this, so you do
> need to start one.
>
> J-D

Re: Efficient read/write - Iterative M/R jobs

Posted by Ioakim Perros <im...@gmail.com>.

Thank you very much for your instant response :-)

Hope Amazon Web Services will help me with this one.
IP


On 07/24/2012 02:06 AM, Jean-Daniel Cryans wrote:
>> ... INFO mapred.JobClient: Task Id : attempt_201207232344_0001_m_000000_0,
>> Status : FAILED
>> java.lang.IllegalArgumentException: *Can't read partitions file*
>>      at
>> org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
>> ...
>>
>> I followed this link, while googling for the solution :
>> http://hbase.apache.org/book/trouble.mapreduce.html
>> and it implies a misconfiguration concerning a fully distributed
>> environment.
>>
>> I would like, therefore, to ask if it is even possible to bulk import data
>> in a pseudo-distributed mode and if this is the case, does anyone have a
>> guess about this error?
> AFAIK you just can't use the local job tracker for this, so you do
> need to start one.
>
> J-D

Re: Efficient read/write - Iterative M/R jobs

Posted by Jean-Daniel Cryans <jd...@apache.org>.

> ... INFO mapred.JobClient: Task Id : attempt_201207232344_0001_m_000000_0,
> Status : FAILED
> java.lang.IllegalArgumentException: *Can't read partitions file*
>     at
> org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
> ...
>
> I followed this link, while googling for the solution :
> http://hbase.apache.org/book/trouble.mapreduce.html
> and it implies a misconfiguration concerning a fully distributed
> environment.
>
> I would like, therefore, to ask if it is even possible to bulk import data
> in a pseudo-distributed mode and if this is the case, does anyone have a
> guess about this error?

AFAIK you just can't use the local job tracker for this, so you do
need to start one.

J-D

Re: Efficient read/write - Iterative M/R jobs

Posted by Ioakim Perros <im...@gmail.com>.

Thank you very much for responding :-)

I also found this one : http://www.deerwalk.com/bulk_importing_data , 
which seems very informative.

The thing is that I tried to create and run a simple (custom) bulk 
loading job and I tried to run it locally (in pseudo-distributed mode) - 
and the following error occurs:

... INFO mapred.JobClient: Task Id : 
attempt_201207232344_0001_m_000000_0, Status : FAILED
java.lang.IllegalArgumentException: *Can't read partitions file*
     at 
org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111) 
...

I followed this link, while googling for the solution : 
http://hbase.apache.org/book/trouble.mapreduce.html
and it implies a misconfiguration concerning a fully distributed 
environment.

I would like, therefore, to ask if it is even possible to bulk import 
data in a pseudo-distributed mode and if this is the case, does anyone 
have a guess about this error?

Thanks in advance!
IP


On 07/23/2012 07:40 AM, Sonal Goyal wrote:
> Hi,
>
> You can check the bulk loading section at
>
> http://hbase.apache.org/book/arch.bulk.load.html
>
> Best Regards,
> Sonal
> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
> On Mon, Jul 23, 2012 at 6:15 AM, Ioakim Perros <im...@gmail.com> wrote:
>
>> Hi,
>>
>> Is there any efficient way (beyond the trivial using TableMapReduceUtil /
>> TableOutputFormat) to perform faster read and write operations to tables ?
>> Could anyone provide some example code of it ?
>>
>> As of faster importing to table, I am aware of tools such as
>> completebulkload, but I would prefer triggering such a process through M/R
>> code, as I would like a whole table to be read and updated through
>> iterations of M/R jobs.
>>
>> Thanks in advance!
>> IP
>>

Re: Efficient read/write - Iterative M/R jobs

Posted by Sonal Goyal <so...@gmail.com>.

Hi,

You can check the bulk loading section at

http://hbase.apache.org/book/arch.bulk.load.html

Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Mon, Jul 23, 2012 at 6:15 AM, Ioakim Perros <im...@gmail.com> wrote:

> Hi,
>
> Is there any efficient way (beyond the trivial using TableMapReduceUtil /
> TableOutputFormat) to perform faster read and write operations to tables ?
> Could anyone provide some example code of it ?
>
> As of faster importing to table, I am aware of tools such as
> completebulkload, but I would prefer triggering such a process through M/R
> code, as I would like a whole table to be read and updated through
> iterations of M/R jobs.
>
> Thanks in advance!
> IP
>