You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ioakim Perros <im...@gmail.com> on 2012/07/23 02:45:45 UTC
Efficient read/write - Iterative M/R jobs
Hi,
Is there any efficient way (beyond the trivial using TableMapReduceUtil
/ TableOutputFormat) to perform faster read and write operations to
tables ? Could anyone provide some example code of it ?
As of faster importing to table, I am aware of tools such as
completebulkload, but I would prefer triggering such a process through
M/R code, as I would like a whole table to be read and updated through
iterations of M/R jobs.
Thanks in advance!
IP
Re: Efficient read/write - Iterative M/R jobs
Posted by Ioakim Perros <im...@gmail.com>.
Update (for anyone ending up here after a possible google search on the
issue) :
Finally, running M/R job in order to bulk import data in a
pseudo-distributed is feasible (for testing purposes) .
The error concerning TotalOrderPartitioner had something to do with a
trivial bug at the keys I passed from mappers.
The thing is that you need to add "guava-r09.jar" (or any version of
latest guava I suppose - it is located under lib folder of hbase setup
path) to the lib folder of hadoop setup path. I suppose that in order
for the same job to run on a truly distributed environment, one has to
add -libjars /path/to/guava.jar to the options of hadoop jar command.
On 07/24/2012 02:06 AM, Jean-Daniel Cryans wrote:
>> ... INFO mapred.JobClient: Task Id : attempt_201207232344_0001_m_000000_0,
>> Status : FAILED
>> java.lang.IllegalArgumentException: *Can't read partitions file*
>> at
>> org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
>> ...
>>
>> I followed this link, while googling for the solution :
>> http://hbase.apache.org/book/trouble.mapreduce.html
>> and it implies a misconfiguration concerning a fully distributed
>> environment.
>>
>> I would like, therefore, to ask if it is even possible to bulk import data
>> in a pseudo-distributed mode and if this is the case, does anyone have a
>> guess about this error?
> AFAIK you just can't use the local job tracker for this, so you do
> need to start one.
>
> J-D
Re: Efficient read/write - Iterative M/R jobs
Posted by Ioakim Perros <im...@gmail.com>.
Thank you very much for your instant response :-)
Hope Amazon Web Services will help me with this one.
IP
On 07/24/2012 02:06 AM, Jean-Daniel Cryans wrote:
>> ... INFO mapred.JobClient: Task Id : attempt_201207232344_0001_m_000000_0,
>> Status : FAILED
>> java.lang.IllegalArgumentException: *Can't read partitions file*
>> at
>> org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
>> ...
>>
>> I followed this link, while googling for the solution :
>> http://hbase.apache.org/book/trouble.mapreduce.html
>> and it implies a misconfiguration concerning a fully distributed
>> environment.
>>
>> I would like, therefore, to ask if it is even possible to bulk import data
>> in a pseudo-distributed mode and if this is the case, does anyone have a
>> guess about this error?
> AFAIK you just can't use the local job tracker for this, so you do
> need to start one.
>
> J-D
Re: Efficient read/write - Iterative M/R jobs
Posted by Jean-Daniel Cryans <jd...@apache.org>.
> ... INFO mapred.JobClient: Task Id : attempt_201207232344_0001_m_000000_0,
> Status : FAILED
> java.lang.IllegalArgumentException: *Can't read partitions file*
> at
> org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
> ...
>
> I followed this link, while googling for the solution :
> http://hbase.apache.org/book/trouble.mapreduce.html
> and it implies a misconfiguration concerning a fully distributed
> environment.
>
> I would like, therefore, to ask if it is even possible to bulk import data
> in a pseudo-distributed mode and if this is the case, does anyone have a
> guess about this error?
AFAIK you just can't use the local job tracker for this, so you do
need to start one.
J-D
Re: Efficient read/write - Iterative M/R jobs
Posted by Ioakim Perros <im...@gmail.com>.
Thank you very much for responding :-)
I also found this one : http://www.deerwalk.com/bulk_importing_data ,
which seems very informative.
The thing is that I tried to create and run a simple (custom) bulk
loading job and I tried to run it locally (in pseudo-distributed mode) -
and the following error occurs:
... INFO mapred.JobClient: Task Id :
attempt_201207232344_0001_m_000000_0, Status : FAILED
java.lang.IllegalArgumentException: *Can't read partitions file*
at
org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
...
I followed this link, while googling for the solution :
http://hbase.apache.org/book/trouble.mapreduce.html
and it implies a misconfiguration concerning a fully distributed
environment.
I would like, therefore, to ask if it is even possible to bulk import
data in a pseudo-distributed mode and if this is the case, does anyone
have a guess about this error?
Thanks in advance!
IP
On 07/23/2012 07:40 AM, Sonal Goyal wrote:
> Hi,
>
> You can check the bulk loading section at
>
> http://hbase.apache.org/book/arch.bulk.load.html
>
> Best Regards,
> Sonal
> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
> On Mon, Jul 23, 2012 at 6:15 AM, Ioakim Perros <im...@gmail.com> wrote:
>
>> Hi,
>>
>> Is there any efficient way (beyond the trivial using TableMapReduceUtil /
>> TableOutputFormat) to perform faster read and write operations to tables ?
>> Could anyone provide some example code of it ?
>>
>> As of faster importing to table, I am aware of tools such as
>> completebulkload, but I would prefer triggering such a process through M/R
>> code, as I would like a whole table to be read and updated through
>> iterations of M/R jobs.
>>
>> Thanks in advance!
>> IP
>>
Re: Efficient read/write - Iterative M/R jobs
Posted by Sonal Goyal <so...@gmail.com>.
Hi,
You can check the bulk loading section at
http://hbase.apache.org/book/arch.bulk.load.html
Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>
<http://in.linkedin.com/in/sonalgoyal>
On Mon, Jul 23, 2012 at 6:15 AM, Ioakim Perros <im...@gmail.com> wrote:
> Hi,
>
> Is there any efficient way (beyond the trivial using TableMapReduceUtil /
> TableOutputFormat) to perform faster read and write operations to tables ?
> Could anyone provide some example code of it ?
>
> As of faster importing to table, I am aware of tools such as
> completebulkload, but I would prefer triggering such a process through M/R
> code, as I would like a whole table to be read and updated through
> iterations of M/R jobs.
>
> Thanks in advance!
> IP
>