You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Stuti Awasthi <st...@hcl.com> on 2011/11/10 08:31:50 UTC

MR - Input from Hbase output to HDFS

Hi
Currently I am understading Hbase MapReduce support. I followed http://hbase.apache.org/book/mapreduce.example.html and executed it successfully.
But I am not sure what changes to be done to  MR which takes input from Hbase table and put output to HDFS.

How to set output dir . I tried to set with JobConf but it gives me error that output directory is not set.
Please Suggest.

Regards,
Stuti Awasthi
HCL Comnet Systems and Services Ltd
F-8/9 Basement, Sec-3,Noida.


________________________________
::DISCLAIMER::
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
received this email in error please delete it and notify the sender immediately. Before opening any mail and
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

Re: MR - Input from Hbase output to HDFS

Posted by Harsh J <ha...@cloudera.com>.
When using HBase, consider using the new API primarily.

The mapred.* package upstream in Hadoop is not deprecated anymore, however.

On 22-Nov-2011, at 1:21 AM, Denis Kreis wrote:

> Hi
> 
> Is org.apache.hadoop.mapred.FileInputFormat to be considered
> as obsolete/deprecated?
> 
> Thanks!
> 
> 2011/11/15 Stuti Awasthi <st...@hcl.com>
> 
>> Sure Doug,
>> Thanks
>> 
>> -----Original Message-----
>> From: Doug Meil [mailto:doug.meil@explorysmedical.com]
>> Sent: Monday, November 14, 2011 9:08 PM
>> To: user@hbase.apache.org
>> Subject: Re: MR - Input from Hbase output to HDFS
>> 
>> 
>> Glad to worked through that and everything is working.  I will add an
>> example of MR to Hbase-to-HDFS in the book.
>> 
>> 
>> 
>> 
>> 
>> On 11/14/11 1:24 AM, "Stuti Awasthi" <st...@hcl.com> wrote:
>> 
>>> Hi,
>>> I think that issue is with Filesystem Configuration, as in config, it
>>> is picking HbaseConfiguration. When I modified my output directory path
>>> to absolute path of HDFS :
>>> FileOutputFormat.setOutputPath(job, new
>>> Path("hdfs://master:54310/MR/stuti3"));
>>> 
>>> The MR jobs runs successfully and I am able to see stuti3 directory
>>> inside HDFS at desired path.
>>> 
>>> 
>>> -----Original Message-----
>>> From: Stuti Awasthi
>>> Sent: Monday, November 14, 2011 11:40 AM
>>> To: user@hbase.apache.org
>>> Subject: RE: MR - Input from Hbase output to HDFS
>>> 
>>> Hi Joey,
>>> Thanks for pointing this. After importing "FileOutputFormat" as you
>>> suggested, I am able to run MR job from eclipse (Windows) the only
>>> problem is I am not able to see the output directory this code is
>>> creating. HDFS and HBase are on Linux machine.
>>> 
>>> Code :
>>>              Configuration config = HBaseConfiguration.create();
>>>              config.set("hbase.zookeeper.quorum", "master");
>>>              config.set("hbase.zookeeper.property.clientPort", "2181");
>>> 
>>>              Job job = new Job(config, "Hbase_Read_Write");
>>>              job.setJarByClass(ReadWriteDriver.class);
>>>              Scan scan = new Scan();
>>>              scan.setCaching(500);
>>>              scan.setCacheBlocks(false);
>>>              TableMapReduceUtil.initTableMapperJob("users",
>>> scan,ReadWriteMapper.class, Text.class, IntWritable.class, job);
>>>              job.setOutputFormatClass(TextOutputFormat.class);
>>>              FileOutputFormat.setOutputPath(job, new Path("/stuti2"));
>>> 
>>> After executing this code, the MR jobs runs successfully but when I
>>> look hdfs no directory is created "/stuti2". I also looked directory in
>>> local filesystem of Linux machine as well as windows machine, but not
>>> able to find the output folder anywhere.
>>> 
>>> Eclipse console Output :
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.version=1.6.0_27
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.vendor=Sun Microsystems Inc.
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.home=C:\Program Files\Java\jdk1.6.0_27\jre
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.class.path=D:\workspace\Hbase\MRHbaseReadWrite\bin;D:\
>>> wor
>>> kspace\Hbase\MRHbaseReadWrite\lib\commons-cli-1.2.jar;D:\workspace\Hbas
>>> e\M
>>> RHbaseReadWrite\lib\commons-httpclient-3.0.1.jar;D:\workspace\Hbase\MRH
>>> bas
>>> eReadWrite\lib\commons-logging-1.0.4.jar;D:\workspace\Hbase\MRHbaseRead
>>> Wri
>>> te\lib\hadoop-0.20.2-core.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\h
>>> bas
>>> e-0.90.3.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\log4j-1.2.15.jar;D
>>> :\w orkspace\Hbase\MRHbaseReadWrite\lib\zookeeper-3.3.2.jar
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.library.path=C:\Program
>>> Files\Java\jdk1.6.0_27\jre\bin;C:\Windows\Sun\Java\bin;C:\Windows\syste
>>> m32 ;C:\Windows;C:/Program Files/Java/jre6/bin/client;C:/Program
>>> Files/Java/jre6/bin;C:/Program
>>> Files/Java/jre6/lib/i386;C:\Windows\system32;C:\Windows;C:\Windows\Syst
>>> em3 2\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
>>> Files\Java\jdk1.6.0_27;C:\Program
>>> Files\TortoiseSVN\bin;C:\cygwin\bin;D:\apache-maven-3.0.3\bin;D:\eclips
>>> e;;
>>> .
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.io.tmpdir=C:\Users\STUTIA~1\AppData\Local\Temp\
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>>> environment:java.compiler=<NA>
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>>> environment:os.name=Windows 7
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>>> environment:os.arch=x86
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>>> environment:os.version=6.1
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>>> environment:user.name=stutiawasthi
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>>> environment:user.home=C:\Users\stutiawasthi
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>>> environment:user.dir=D:\workspace\Hbase\MRHbaseReadWrite
>>> 11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Initiating client
>>> connection,
>>> connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>>> 11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Opening socket connection
>>> to server master/10.33.64.235:2181
>>> 11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Socket connection
>>> established to master/10.33.64.235:2181, initiating session
>>> 11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Session establishment
>>> complete on server master/10.33.64.235:2181, sessionid =
>>> 0x33879243de00ec, negotiated timeout = 180000
>>> 11/11/14 11:21:46 INFO mapred.JobClient: Running job: job_local_0001
>>> 11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client
>>> connection,
>>> connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>>> 11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection
>>> to server master/10.33.64.235:2181
>>> 11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection
>>> established to master/10.33.64.235:2181, initiating session
>>> 11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment
>>> complete on server master/10.33.64.235:2181, sessionid =
>>> 0x33879243de00ed, negotiated timeout = 180000
>>> 11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client
>>> connection,
>>> connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>>> 11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection
>>> to server master/10.33.64.235:2181
>>> 11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection
>>> established to master/10.33.64.235:2181, initiating session
>>> 11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment
>>> complete on server master/10.33.64.235:2181, sessionid =
>>> 0x33879243de00ee, negotiated timeout = 180000
>>> 11/11/14 11:21:46 INFO mapred.MapTask: io.sort.mb = 100
>>> 11/11/14 11:21:46 INFO mapred.MapTask: data buffer = 79691776/99614720
>>> 11/11/14 11:21:46 INFO mapred.MapTask: record buffer = 262144/327680
>>> ...............................................
>>> 11/11/14 11:21:46 INFO mapred.MapTask: Finished spill 0
>>> 11/11/14 11:21:46 INFO mapred.TaskRunner:
>>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
>>> commiting
>>> 11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>>> 11/11/14 11:21:46 INFO mapred.TaskRunner: Task
>>> 'attempt_local_0001_m_000000_0' done.
>>> 11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>>> 11/11/14 11:21:46 INFO mapred.Merger: Merging 1 sorted segments
>>> 11/11/14 11:21:46 INFO mapred.Merger: Down to the last merge-pass, with
>>> 1 segments left of total size: 103 bytes
>>> 11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>>> 11/11/14 11:21:46 INFO mapred.TaskRunner:
>>> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
>>> commiting
>>> 11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>>> 11/11/14 11:21:46 INFO mapred.TaskRunner: Task
>>> attempt_local_0001_r_000000_0 is allowed to commit now
>>> 11/11/14 11:21:46 INFO output.FileOutputCommitter: Saved output of task
>>> 'attempt_local_0001_r_000000_0' to /stuti2
>>> 11/11/14 11:21:46 INFO mapred.LocalJobRunner: reduce > reduce
>>> 11/11/14 11:21:46 INFO mapred.TaskRunner: Task
>>> 'attempt_local_0001_r_000000_0' done.
>>> 11/11/14 11:21:47 INFO mapred.JobClient:  map 100% reduce 100%
>>> 11/11/14 11:21:47 INFO mapred.JobClient: Job complete: job_local_0001
>>> 11/11/14 11:21:47 INFO mapred.JobClient: Counters: 12
>>> 11/11/14 11:21:47 INFO mapred.JobClient:   FileSystemCounters
>>> 11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_READ=40923
>>> 11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=82343
>>> 11/11/14 11:21:47 INFO mapred.JobClient:   Map-Reduce Framework
>>> 11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input groups=5
>>> 11/11/14 11:21:47 INFO mapred.JobClient:     Combine output records=0
>>> 11/11/14 11:21:47 INFO mapred.JobClient:     Map input records=5
>>> 11/11/14 11:21:47 INFO mapred.JobClient:     Reduce shuffle bytes=0
>>> 11/11/14 11:21:47 INFO mapred.JobClient:     Reduce output records=5
>>> 11/11/14 11:21:47 INFO mapred.JobClient:     Spilled Records=10
>>> 11/11/14 11:21:47 INFO mapred.JobClient:     Map output bytes=91
>>> 11/11/14 11:21:47 INFO mapred.JobClient:     Combine input records=0
>>> 11/11/14 11:21:47 INFO mapred.JobClient:     Map output records=5
>>> 11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input records=5
>>> 
>>> 
>>> Please Suggest
>>> 
>>> -----Original Message-----
>>> From: Joey Echeverria [mailto:joey@cloudera.com]
>>> Sent: Friday, November 11, 2011 10:38 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: MR - Input from Hbase output to HDFS
>>> 
>>> There are two APIs (old and new), and you appear to be mixing them.
>>> TableMapReduceUtil only works with the new API. The solution is to
>>> import the new version of FileOutputFormat which takes a Job:
>>> 
>>> 
>>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
>>> 
>>> -Joey
>>> 
>>> On Fri, Nov 11, 2011 at 12:55 AM, Stuti Awasthi <st...@hcl.com>
>>> wrote:
>>>> The method " setOutputPath (JobConf,Path)" take JobConf as a
>>>> parameter not the Job object.
>>>> At least this is the error Im getting while compiling with Hadoop
>>>> 0.20.2 jar with eclipse.
>>>> 
>>>> FileOutputFormat.setOutputPath(conf, new Path("/output"));
>>>> 
>>>> -----Original Message-----
>>>> From: Prashant Sharma [mailto:prashant.iiith@gmail.com]
>>>> Sent: Friday, November 11, 2011 11:20 AM
>>>> To: user@hbase.apache.org
>>>> Subject: Re: MR - Input from Hbase output to HDFS
>>>> 
>>>> Hi stuti,
>>>> I was wondering why  you are not using job object to set output path
>>>> like this.
>>>> 
>>>> FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") );
>>>> 
>>>> 
>>>> thanks
>>>> 
>>>> On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi
>>>> <st...@hcl.com>wrote:
>>>> 
>>>>> Hi Andrie,
>>>>> Well I am bit confused. When I use Jobconf , and associate with
>>>>> JobClient to run the job then I get the error that "Input directory
>>>>> is not set".
>>>>> Since I want my input to be taken by Hbase table which I already
>>>>> configured with "TableMapReduceUtil.initTableMapperJob". I don't want
>>>>> to set input directory via jobconf.
>>>>> How to mix these 2 so that I can get input from Hbase and write
>>>>> ouput  to HDFS.
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Andrei Cojocaru [mailto:majormax@gmail.com]
>>>>> Sent: Thursday, November 10, 2011 7:09 PM
>>>>> To: user@hbase.apache.org
>>>>> Subject: Re: MR - Input from Hbase output to HDFS
>>>>> 
>>>>> Stuti,
>>>>> 
>>>>> I don't see you associating JobConf with Job anywhere.
>>>>> -Andrei
>>>>> 
>>>>> ::DISCLAIMER::
>>>>> 
>>>>> --------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -------------------------------------------------
>>>>> 
>>>>> The contents of this e-mail and any attachment(s) are confidential
>>>>> and intended for the named recipient(s) only.
>>>>> It shall not attach any liability on the originator or HCL or its
>>>>> affiliates. Any views or opinions presented in this email are solely
>>>>> those of the author and may not necessarily reflect the opinions of
>>>>> HCL or its affiliates.
>>>>> Any form of reproduction, dissemination, copying, disclosure,
>>>>> modification, distribution and / or publication of this message
>>>>> without the prior written consent of the author of this e-mail is
>>>>> strictly prohibited. If you have received this email in error please
>>>>> delete it and notify the sender immediately. Before opening any mail
>>>>> and attachments please check them for viruses and defect.
>>>>> 
>>>>> 
>>>>> --------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -------------------------------------------------
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Joseph Echeverria
>>> Cloudera, Inc.
>>> 443.305.9434
>>> 
>> 
>> 
>> 


Re: MR - Input from Hbase output to HDFS

Posted by Denis Kreis <de...@gmail.com>.
Hi

Is org.apache.hadoop.mapred.FileInputFormat to be considered
as obsolete/deprecated?

Thanks!

2011/11/15 Stuti Awasthi <st...@hcl.com>

> Sure Doug,
> Thanks
>
> -----Original Message-----
> From: Doug Meil [mailto:doug.meil@explorysmedical.com]
> Sent: Monday, November 14, 2011 9:08 PM
> To: user@hbase.apache.org
> Subject: Re: MR - Input from Hbase output to HDFS
>
>
> Glad to worked through that and everything is working.  I will add an
> example of MR to Hbase-to-HDFS in the book.
>
>
>
>
>
> On 11/14/11 1:24 AM, "Stuti Awasthi" <st...@hcl.com> wrote:
>
> >Hi,
> >I think that issue is with Filesystem Configuration, as in config, it
> >is picking HbaseConfiguration. When I modified my output directory path
> >to absolute path of HDFS :
> >FileOutputFormat.setOutputPath(job, new
> >Path("hdfs://master:54310/MR/stuti3"));
> >
> >The MR jobs runs successfully and I am able to see stuti3 directory
> >inside HDFS at desired path.
> >
> >
> >-----Original Message-----
> >From: Stuti Awasthi
> >Sent: Monday, November 14, 2011 11:40 AM
> >To: user@hbase.apache.org
> >Subject: RE: MR - Input from Hbase output to HDFS
> >
> >Hi Joey,
> >Thanks for pointing this. After importing "FileOutputFormat" as you
> >suggested, I am able to run MR job from eclipse (Windows) the only
> >problem is I am not able to see the output directory this code is
> >creating. HDFS and HBase are on Linux machine.
> >
> >Code :
> >               Configuration config = HBaseConfiguration.create();
> >               config.set("hbase.zookeeper.quorum", "master");
> >               config.set("hbase.zookeeper.property.clientPort", "2181");
> >
> >               Job job = new Job(config, "Hbase_Read_Write");
> >               job.setJarByClass(ReadWriteDriver.class);
> >               Scan scan = new Scan();
> >               scan.setCaching(500);
> >               scan.setCacheBlocks(false);
> >               TableMapReduceUtil.initTableMapperJob("users",
> >scan,ReadWriteMapper.class, Text.class, IntWritable.class, job);
> >               job.setOutputFormatClass(TextOutputFormat.class);
> >               FileOutputFormat.setOutputPath(job, new Path("/stuti2"));
> >
> >After executing this code, the MR jobs runs successfully but when I
> >look hdfs no directory is created "/stuti2". I also looked directory in
> >local filesystem of Linux machine as well as windows machine, but not
> >able to find the output folder anywhere.
> >
> >Eclipse console Output :
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.version=1.6.0_27
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.vendor=Sun Microsystems Inc.
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.home=C:\Program Files\Java\jdk1.6.0_27\jre
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.class.path=D:\workspace\Hbase\MRHbaseReadWrite\bin;D:\
> >wor
> >kspace\Hbase\MRHbaseReadWrite\lib\commons-cli-1.2.jar;D:\workspace\Hbas
> >e\M
> >RHbaseReadWrite\lib\commons-httpclient-3.0.1.jar;D:\workspace\Hbase\MRH
> >bas
> >eReadWrite\lib\commons-logging-1.0.4.jar;D:\workspace\Hbase\MRHbaseRead
> >Wri
> >te\lib\hadoop-0.20.2-core.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\h
> >bas
> >e-0.90.3.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\log4j-1.2.15.jar;D
> >:\w orkspace\Hbase\MRHbaseReadWrite\lib\zookeeper-3.3.2.jar
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.library.path=C:\Program
> >Files\Java\jdk1.6.0_27\jre\bin;C:\Windows\Sun\Java\bin;C:\Windows\syste
> >m32 ;C:\Windows;C:/Program Files/Java/jre6/bin/client;C:/Program
> >Files/Java/jre6/bin;C:/Program
> >Files/Java/jre6/lib/i386;C:\Windows\system32;C:\Windows;C:\Windows\Syst
> >em3 2\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
> >Files\Java\jdk1.6.0_27;C:\Program
> >Files\TortoiseSVN\bin;C:\cygwin\bin;D:\apache-maven-3.0.3\bin;D:\eclips
> >e;;
> >.
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.io.tmpdir=C:\Users\STUTIA~1\AppData\Local\Temp\
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:java.compiler=<NA>
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:os.name=Windows 7
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:os.arch=x86
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:os.version=6.1
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:user.name=stutiawasthi
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:user.home=C:\Users\stutiawasthi
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
> >environment:user.dir=D:\workspace\Hbase\MRHbaseReadWrite
> >11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Initiating client
> >connection,
> >connectString=master:2181 sessionTimeout=180000 watcher=hconnection
> >11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Opening socket connection
> >to server master/10.33.64.235:2181
> >11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Socket connection
> >established to master/10.33.64.235:2181, initiating session
> >11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Session establishment
> >complete on server master/10.33.64.235:2181, sessionid =
> >0x33879243de00ec, negotiated timeout = 180000
> >11/11/14 11:21:46 INFO mapred.JobClient: Running job: job_local_0001
> >11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client
> >connection,
> >connectString=master:2181 sessionTimeout=180000 watcher=hconnection
> >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection
> >to server master/10.33.64.235:2181
> >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection
> >established to master/10.33.64.235:2181, initiating session
> >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment
> >complete on server master/10.33.64.235:2181, sessionid =
> >0x33879243de00ed, negotiated timeout = 180000
> >11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client
> >connection,
> >connectString=master:2181 sessionTimeout=180000 watcher=hconnection
> >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection
> >to server master/10.33.64.235:2181
> >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection
> >established to master/10.33.64.235:2181, initiating session
> >11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment
> >complete on server master/10.33.64.235:2181, sessionid =
> >0x33879243de00ee, negotiated timeout = 180000
> >11/11/14 11:21:46 INFO mapred.MapTask: io.sort.mb = 100
> >11/11/14 11:21:46 INFO mapred.MapTask: data buffer = 79691776/99614720
> >11/11/14 11:21:46 INFO mapred.MapTask: record buffer = 262144/327680
> >...............................................
> >11/11/14 11:21:46 INFO mapred.MapTask: Finished spill 0
> >11/11/14 11:21:46 INFO mapred.TaskRunner:
> >Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> >commiting
> >11/11/14 11:21:46 INFO mapred.LocalJobRunner:
> >11/11/14 11:21:46 INFO mapred.TaskRunner: Task
> >'attempt_local_0001_m_000000_0' done.
> >11/11/14 11:21:46 INFO mapred.LocalJobRunner:
> >11/11/14 11:21:46 INFO mapred.Merger: Merging 1 sorted segments
> >11/11/14 11:21:46 INFO mapred.Merger: Down to the last merge-pass, with
> >1 segments left of total size: 103 bytes
> >11/11/14 11:21:46 INFO mapred.LocalJobRunner:
> >11/11/14 11:21:46 INFO mapred.TaskRunner:
> >Task:attempt_local_0001_r_000000_0 is done. And is in the process of
> >commiting
> >11/11/14 11:21:46 INFO mapred.LocalJobRunner:
> >11/11/14 11:21:46 INFO mapred.TaskRunner: Task
> >attempt_local_0001_r_000000_0 is allowed to commit now
> >11/11/14 11:21:46 INFO output.FileOutputCommitter: Saved output of task
> >'attempt_local_0001_r_000000_0' to /stuti2
> >11/11/14 11:21:46 INFO mapred.LocalJobRunner: reduce > reduce
> >11/11/14 11:21:46 INFO mapred.TaskRunner: Task
> >'attempt_local_0001_r_000000_0' done.
> >11/11/14 11:21:47 INFO mapred.JobClient:  map 100% reduce 100%
> >11/11/14 11:21:47 INFO mapred.JobClient: Job complete: job_local_0001
> >11/11/14 11:21:47 INFO mapred.JobClient: Counters: 12
> >11/11/14 11:21:47 INFO mapred.JobClient:   FileSystemCounters
> >11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_READ=40923
> >11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=82343
> >11/11/14 11:21:47 INFO mapred.JobClient:   Map-Reduce Framework
> >11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input groups=5
> >11/11/14 11:21:47 INFO mapred.JobClient:     Combine output records=0
> >11/11/14 11:21:47 INFO mapred.JobClient:     Map input records=5
> >11/11/14 11:21:47 INFO mapred.JobClient:     Reduce shuffle bytes=0
> >11/11/14 11:21:47 INFO mapred.JobClient:     Reduce output records=5
> >11/11/14 11:21:47 INFO mapred.JobClient:     Spilled Records=10
> >11/11/14 11:21:47 INFO mapred.JobClient:     Map output bytes=91
> >11/11/14 11:21:47 INFO mapred.JobClient:     Combine input records=0
> >11/11/14 11:21:47 INFO mapred.JobClient:     Map output records=5
> >11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input records=5
> >
> >
> >Please Suggest
> >
> >-----Original Message-----
> >From: Joey Echeverria [mailto:joey@cloudera.com]
> >Sent: Friday, November 11, 2011 10:38 PM
> >To: user@hbase.apache.org
> >Subject: Re: MR - Input from Hbase output to HDFS
> >
> >There are two APIs (old and new), and you appear to be mixing them.
> >TableMapReduceUtil only works with the new API. The solution is to
> >import the new version of FileOutputFormat which takes a Job:
> >
> >
> >import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
> >
> >-Joey
> >
> >On Fri, Nov 11, 2011 at 12:55 AM, Stuti Awasthi <st...@hcl.com>
> >wrote:
> >> The method " setOutputPath (JobConf,Path)" take JobConf as a
> >>parameter not the Job object.
> >> At least this is the error Im getting while compiling with Hadoop
> >>0.20.2 jar with eclipse.
> >>
> >> FileOutputFormat.setOutputPath(conf, new Path("/output"));
> >>
> >> -----Original Message-----
> >> From: Prashant Sharma [mailto:prashant.iiith@gmail.com]
> >> Sent: Friday, November 11, 2011 11:20 AM
> >> To: user@hbase.apache.org
> >> Subject: Re: MR - Input from Hbase output to HDFS
> >>
> >> Hi stuti,
> >> I was wondering why  you are not using job object to set output path
> >>like this.
> >>
> >> FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") );
> >>
> >>
> >> thanks
> >>
> >> On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi
> >><st...@hcl.com>wrote:
> >>
> >>> Hi Andrie,
> >>> Well I am bit confused. When I use Jobconf , and associate with
> >>>JobClient to run the job then I get the error that "Input directory
> >>>is not set".
> >>> Since I want my input to be taken by Hbase table which I already
> >>>configured with "TableMapReduceUtil.initTableMapperJob". I don't want
> >>>to set input directory via jobconf.
> >>> How to mix these 2 so that I can get input from Hbase and write
> >>>ouput  to HDFS.
> >>>
> >>> Thanks
> >>>
> >>> -----Original Message-----
> >>> From: Andrei Cojocaru [mailto:majormax@gmail.com]
> >>> Sent: Thursday, November 10, 2011 7:09 PM
> >>> To: user@hbase.apache.org
> >>> Subject: Re: MR - Input from Hbase output to HDFS
> >>>
> >>> Stuti,
> >>>
> >>> I don't see you associating JobConf with Job anywhere.
> >>> -Andrei
> >>>
> >>> ::DISCLAIMER::
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> -
> >>> -------------------------------------------------
> >>>
> >>> The contents of this e-mail and any attachment(s) are confidential
> >>> and intended for the named recipient(s) only.
> >>> It shall not attach any liability on the originator or HCL or its
> >>> affiliates. Any views or opinions presented in this email are solely
> >>> those of the author and may not necessarily reflect the opinions of
> >>> HCL or its affiliates.
> >>> Any form of reproduction, dissemination, copying, disclosure,
> >>> modification, distribution and / or publication of this message
> >>> without the prior written consent of the author of this e-mail is
> >>> strictly prohibited. If you have received this email in error please
> >>> delete it and notify the sender immediately. Before opening any mail
> >>> and attachments please check them for viruses and defect.
> >>>
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> -
> >>> -------------------------------------------------
> >>>
> >>
> >
> >
> >
> >--
> >Joseph Echeverria
> >Cloudera, Inc.
> >443.305.9434
> >
>
>
>

RE: MR - Input from Hbase output to HDFS

Posted by Stuti Awasthi <st...@hcl.com>.
Sure Doug,
Thanks

-----Original Message-----
From: Doug Meil [mailto:doug.meil@explorysmedical.com] 
Sent: Monday, November 14, 2011 9:08 PM
To: user@hbase.apache.org
Subject: Re: MR - Input from Hbase output to HDFS


Glad to worked through that and everything is working.  I will add an example of MR to Hbase-to-HDFS in the book.





On 11/14/11 1:24 AM, "Stuti Awasthi" <st...@hcl.com> wrote:

>Hi,
>I think that issue is with Filesystem Configuration, as in config, it 
>is picking HbaseConfiguration. When I modified my output directory path 
>to absolute path of HDFS :
>FileOutputFormat.setOutputPath(job, new 
>Path("hdfs://master:54310/MR/stuti3"));
>
>The MR jobs runs successfully and I am able to see stuti3 directory 
>inside HDFS at desired path.
>
>
>-----Original Message-----
>From: Stuti Awasthi
>Sent: Monday, November 14, 2011 11:40 AM
>To: user@hbase.apache.org
>Subject: RE: MR - Input from Hbase output to HDFS
>
>Hi Joey,
>Thanks for pointing this. After importing "FileOutputFormat" as you 
>suggested, I am able to run MR job from eclipse (Windows) the only 
>problem is I am not able to see the output directory this code is 
>creating. HDFS and HBase are on Linux machine.
>
>Code :
>		Configuration config = HBaseConfiguration.create();
>		config.set("hbase.zookeeper.quorum", "master");
>		config.set("hbase.zookeeper.property.clientPort", "2181");
>			
>		Job job = new Job(config, "Hbase_Read_Write");
>		job.setJarByClass(ReadWriteDriver.class);
>		Scan scan = new Scan();
>		scan.setCaching(500);
>		scan.setCacheBlocks(false);
>		TableMapReduceUtil.initTableMapperJob("users",
>scan,ReadWriteMapper.class, Text.class, IntWritable.class, job);
>		job.setOutputFormatClass(TextOutputFormat.class);
>		FileOutputFormat.setOutputPath(job, new Path("/stuti2"));
>
>After executing this code, the MR jobs runs successfully but when I 
>look hdfs no directory is created "/stuti2". I also looked directory in 
>local filesystem of Linux machine as well as windows machine, but not 
>able to find the output folder anywhere.
>	
>Eclipse console Output :
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.version=1.6.0_27
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:java.vendor=Sun Microsystems Inc.
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:java.home=C:\Program Files\Java\jdk1.6.0_27\jre
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:java.class.path=D:\workspace\Hbase\MRHbaseReadWrite\bin;D:\
>wor 
>kspace\Hbase\MRHbaseReadWrite\lib\commons-cli-1.2.jar;D:\workspace\Hbas
>e\M 
>RHbaseReadWrite\lib\commons-httpclient-3.0.1.jar;D:\workspace\Hbase\MRH
>bas 
>eReadWrite\lib\commons-logging-1.0.4.jar;D:\workspace\Hbase\MRHbaseRead
>Wri 
>te\lib\hadoop-0.20.2-core.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\h
>bas 
>e-0.90.3.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\log4j-1.2.15.jar;D
>:\w orkspace\Hbase\MRHbaseReadWrite\lib\zookeeper-3.3.2.jar
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:java.library.path=C:\Program
>Files\Java\jdk1.6.0_27\jre\bin;C:\Windows\Sun\Java\bin;C:\Windows\syste
>m32 ;C:\Windows;C:/Program Files/Java/jre6/bin/client;C:/Program
>Files/Java/jre6/bin;C:/Program
>Files/Java/jre6/lib/i386;C:\Windows\system32;C:\Windows;C:\Windows\Syst
>em3 2\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
>Files\Java\jdk1.6.0_27;C:\Program
>Files\TortoiseSVN\bin;C:\cygwin\bin;D:\apache-maven-3.0.3\bin;D:\eclips
>e;;
>.
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:java.io.tmpdir=C:\Users\STUTIA~1\AppData\Local\Temp\
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:java.compiler=<NA>
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:os.name=Windows 7
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:os.arch=x86
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:os.version=6.1
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:user.name=stutiawasthi
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:user.home=C:\Users\stutiawasthi
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:user.dir=D:\workspace\Hbase\MRHbaseReadWrite
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Initiating client 
>connection,
>connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Opening socket connection 
>to server master/10.33.64.235:2181
>11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Socket connection 
>established to master/10.33.64.235:2181, initiating session
>11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Session establishment 
>complete on server master/10.33.64.235:2181, sessionid = 
>0x33879243de00ec, negotiated timeout = 180000
>11/11/14 11:21:46 INFO mapred.JobClient: Running job: job_local_0001
>11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client 
>connection,
>connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection 
>to server master/10.33.64.235:2181
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection 
>established to master/10.33.64.235:2181, initiating session
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment 
>complete on server master/10.33.64.235:2181, sessionid = 
>0x33879243de00ed, negotiated timeout = 180000
>11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client 
>connection,
>connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection 
>to server master/10.33.64.235:2181
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection 
>established to master/10.33.64.235:2181, initiating session
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment 
>complete on server master/10.33.64.235:2181, sessionid = 
>0x33879243de00ee, negotiated timeout = 180000
>11/11/14 11:21:46 INFO mapred.MapTask: io.sort.mb = 100
>11/11/14 11:21:46 INFO mapred.MapTask: data buffer = 79691776/99614720
>11/11/14 11:21:46 INFO mapred.MapTask: record buffer = 262144/327680 
>...............................................
>11/11/14 11:21:46 INFO mapred.MapTask: Finished spill 0
>11/11/14 11:21:46 INFO mapred.TaskRunner:
>Task:attempt_local_0001_m_000000_0 is done. And is in the process of 
>commiting
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.TaskRunner: Task 
>'attempt_local_0001_m_000000_0' done.
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.Merger: Merging 1 sorted segments
>11/11/14 11:21:46 INFO mapred.Merger: Down to the last merge-pass, with 
>1 segments left of total size: 103 bytes
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.TaskRunner:
>Task:attempt_local_0001_r_000000_0 is done. And is in the process of 
>commiting
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.TaskRunner: Task
>attempt_local_0001_r_000000_0 is allowed to commit now
>11/11/14 11:21:46 INFO output.FileOutputCommitter: Saved output of task 
>'attempt_local_0001_r_000000_0' to /stuti2
>11/11/14 11:21:46 INFO mapred.LocalJobRunner: reduce > reduce
>11/11/14 11:21:46 INFO mapred.TaskRunner: Task 
>'attempt_local_0001_r_000000_0' done.
>11/11/14 11:21:47 INFO mapred.JobClient:  map 100% reduce 100%
>11/11/14 11:21:47 INFO mapred.JobClient: Job complete: job_local_0001
>11/11/14 11:21:47 INFO mapred.JobClient: Counters: 12
>11/11/14 11:21:47 INFO mapred.JobClient:   FileSystemCounters
>11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_READ=40923
>11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=82343
>11/11/14 11:21:47 INFO mapred.JobClient:   Map-Reduce Framework
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input groups=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Combine output records=0
>11/11/14 11:21:47 INFO mapred.JobClient:     Map input records=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce shuffle bytes=0
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce output records=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Spilled Records=10
>11/11/14 11:21:47 INFO mapred.JobClient:     Map output bytes=91
>11/11/14 11:21:47 INFO mapred.JobClient:     Combine input records=0
>11/11/14 11:21:47 INFO mapred.JobClient:     Map output records=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input records=5
>
>
>Please Suggest
>
>-----Original Message-----
>From: Joey Echeverria [mailto:joey@cloudera.com]
>Sent: Friday, November 11, 2011 10:38 PM
>To: user@hbase.apache.org
>Subject: Re: MR - Input from Hbase output to HDFS
>
>There are two APIs (old and new), and you appear to be mixing them.
>TableMapReduceUtil only works with the new API. The solution is to 
>import the new version of FileOutputFormat which takes a Job:
>
>
>import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
>
>-Joey
>
>On Fri, Nov 11, 2011 at 12:55 AM, Stuti Awasthi <st...@hcl.com>
>wrote:
>> The method " setOutputPath (JobConf,Path)" take JobConf as a 
>>parameter not the Job object.
>> At least this is the error Im getting while compiling with Hadoop
>>0.20.2 jar with eclipse.
>>
>> FileOutputFormat.setOutputPath(conf, new Path("/output"));
>>
>> -----Original Message-----
>> From: Prashant Sharma [mailto:prashant.iiith@gmail.com]
>> Sent: Friday, November 11, 2011 11:20 AM
>> To: user@hbase.apache.org
>> Subject: Re: MR - Input from Hbase output to HDFS
>>
>> Hi stuti,
>> I was wondering why  you are not using job object to set output path 
>>like this.
>>
>> FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") );
>>
>>
>> thanks
>>
>> On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi
>><st...@hcl.com>wrote:
>>
>>> Hi Andrie,
>>> Well I am bit confused. When I use Jobconf , and associate with  
>>>JobClient to run the job then I get the error that "Input directory 
>>>is not set".
>>> Since I want my input to be taken by Hbase table which I already  
>>>configured with "TableMapReduceUtil.initTableMapperJob". I don't want  
>>>to set input directory via jobconf.
>>> How to mix these 2 so that I can get input from Hbase and write 
>>>ouput  to HDFS.
>>>
>>> Thanks
>>>
>>> -----Original Message-----
>>> From: Andrei Cojocaru [mailto:majormax@gmail.com]
>>> Sent: Thursday, November 10, 2011 7:09 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: MR - Input from Hbase output to HDFS
>>>
>>> Stuti,
>>>
>>> I don't see you associating JobConf with Job anywhere.
>>> -Andrei
>>>
>>> ::DISCLAIMER::
>>>
>>> --------------------------------------------------------------------
>>> -
>>> -
>>> -------------------------------------------------
>>>
>>> The contents of this e-mail and any attachment(s) are confidential 
>>> and intended for the named recipient(s) only.
>>> It shall not attach any liability on the originator or HCL or its 
>>> affiliates. Any views or opinions presented in this email are solely 
>>> those of the author and may not necessarily reflect the opinions of 
>>> HCL or its affiliates.
>>> Any form of reproduction, dissemination, copying, disclosure, 
>>> modification, distribution and / or publication of this message 
>>> without the prior written consent of the author of this e-mail is 
>>> strictly prohibited. If you have received this email in error please 
>>> delete it and notify the sender immediately. Before opening any mail 
>>> and attachments please check them for viruses and defect.
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> -
>>> -------------------------------------------------
>>>
>>
>
>
>
>--
>Joseph Echeverria
>Cloudera, Inc.
>443.305.9434
>



Re: MR - Input from Hbase output to HDFS

Posted by Doug Meil <do...@explorysmedical.com>.
Glad to worked through that and everything is working.  I will add an
example of MR to Hbase-to-HDFS in the book.





On 11/14/11 1:24 AM, "Stuti Awasthi" <st...@hcl.com> wrote:

>Hi,
>I think that issue is with Filesystem Configuration, as in config, it is
>picking HbaseConfiguration. When I modified my output directory path to
>absolute path of HDFS :
>FileOutputFormat.setOutputPath(job, new
>Path("hdfs://master:54310/MR/stuti3"));
>
>The MR jobs runs successfully and I am able to see stuti3 directory
>inside HDFS at desired path.
>
>
>-----Original Message-----
>From: Stuti Awasthi
>Sent: Monday, November 14, 2011 11:40 AM
>To: user@hbase.apache.org
>Subject: RE: MR - Input from Hbase output to HDFS
>
>Hi Joey,
>Thanks for pointing this. After importing "FileOutputFormat" as you
>suggested, I am able to run MR job from eclipse (Windows) the only
>problem is I am not able to see the output directory this code is
>creating. HDFS and HBase are on Linux machine.
>
>Code :
>		Configuration config = HBaseConfiguration.create();
>		config.set("hbase.zookeeper.quorum", "master");
>		config.set("hbase.zookeeper.property.clientPort", "2181");
>			
>		Job job = new Job(config, "Hbase_Read_Write");
>		job.setJarByClass(ReadWriteDriver.class);
>		Scan scan = new Scan();
>		scan.setCaching(500);
>		scan.setCacheBlocks(false);
>		TableMapReduceUtil.initTableMapperJob("users",
>scan,ReadWriteMapper.class, Text.class, IntWritable.class, job);
>		job.setOutputFormatClass(TextOutputFormat.class);
>		FileOutputFormat.setOutputPath(job, new Path("/stuti2"));
>
>After executing this code, the MR jobs runs successfully but when I look
>hdfs no directory is created "/stuti2". I also looked directory in local
>filesystem of Linux machine as well as windows machine, but not able to
>find the output folder anywhere.
>	
>Eclipse console Output :
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.version=1.6.0_27
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.vendor=Sun Microsystems Inc.
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.home=C:\Program Files\Java\jdk1.6.0_27\jre
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.class.path=D:\workspace\Hbase\MRHbaseReadWrite\bin;D:\wor
>kspace\Hbase\MRHbaseReadWrite\lib\commons-cli-1.2.jar;D:\workspace\Hbase\M
>RHbaseReadWrite\lib\commons-httpclient-3.0.1.jar;D:\workspace\Hbase\MRHbas
>eReadWrite\lib\commons-logging-1.0.4.jar;D:\workspace\Hbase\MRHbaseReadWri
>te\lib\hadoop-0.20.2-core.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\hbas
>e-0.90.3.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\log4j-1.2.15.jar;D:\w
>orkspace\Hbase\MRHbaseReadWrite\lib\zookeeper-3.3.2.jar
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.library.path=C:\Program
>Files\Java\jdk1.6.0_27\jre\bin;C:\Windows\Sun\Java\bin;C:\Windows\system32
>;C:\Windows;C:/Program Files/Java/jre6/bin/client;C:/Program
>Files/Java/jre6/bin;C:/Program
>Files/Java/jre6/lib/i386;C:\Windows\system32;C:\Windows;C:\Windows\System3
>2\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
>Files\Java\jdk1.6.0_27;C:\Program
>Files\TortoiseSVN\bin;C:\cygwin\bin;D:\apache-maven-3.0.3\bin;D:\eclipse;;
>.
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.io.tmpdir=C:\Users\STUTIA~1\AppData\Local\Temp\
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.compiler=<NA>
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:os.name=Windows 7
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:os.arch=x86
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:os.version=6.1
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:user.name=stutiawasthi
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:user.home=C:\Users\stutiawasthi
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:user.dir=D:\workspace\Hbase\MRHbaseReadWrite
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Initiating client connection,
>connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Opening socket connection to
>server master/10.33.64.235:2181
>11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Socket connection
>established to master/10.33.64.235:2181, initiating session
>11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Session establishment
>complete on server master/10.33.64.235:2181, sessionid =
>0x33879243de00ec, negotiated timeout = 180000
>11/11/14 11:21:46 INFO mapred.JobClient: Running job: job_local_0001
>11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client connection,
>connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection to
>server master/10.33.64.235:2181
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection
>established to master/10.33.64.235:2181, initiating session
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment
>complete on server master/10.33.64.235:2181, sessionid =
>0x33879243de00ed, negotiated timeout = 180000
>11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client connection,
>connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection to
>server master/10.33.64.235:2181
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection
>established to master/10.33.64.235:2181, initiating session
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment
>complete on server master/10.33.64.235:2181, sessionid =
>0x33879243de00ee, negotiated timeout = 180000
>11/11/14 11:21:46 INFO mapred.MapTask: io.sort.mb = 100
>11/11/14 11:21:46 INFO mapred.MapTask: data buffer = 79691776/99614720
>11/11/14 11:21:46 INFO mapred.MapTask: record buffer = 262144/327680
>...............................................
>11/11/14 11:21:46 INFO mapred.MapTask: Finished spill 0
>11/11/14 11:21:46 INFO mapred.TaskRunner:
>Task:attempt_local_0001_m_000000_0 is done. And is in the process of
>commiting
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.TaskRunner: Task
>'attempt_local_0001_m_000000_0' done.
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.Merger: Merging 1 sorted segments
>11/11/14 11:21:46 INFO mapred.Merger: Down to the last merge-pass, with 1
>segments left of total size: 103 bytes
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.TaskRunner:
>Task:attempt_local_0001_r_000000_0 is done. And is in the process of
>commiting
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.TaskRunner: Task
>attempt_local_0001_r_000000_0 is allowed to commit now
>11/11/14 11:21:46 INFO output.FileOutputCommitter: Saved output of task
>'attempt_local_0001_r_000000_0' to /stuti2
>11/11/14 11:21:46 INFO mapred.LocalJobRunner: reduce > reduce
>11/11/14 11:21:46 INFO mapred.TaskRunner: Task
>'attempt_local_0001_r_000000_0' done.
>11/11/14 11:21:47 INFO mapred.JobClient:  map 100% reduce 100%
>11/11/14 11:21:47 INFO mapred.JobClient: Job complete: job_local_0001
>11/11/14 11:21:47 INFO mapred.JobClient: Counters: 12
>11/11/14 11:21:47 INFO mapred.JobClient:   FileSystemCounters
>11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_READ=40923
>11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=82343
>11/11/14 11:21:47 INFO mapred.JobClient:   Map-Reduce Framework
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input groups=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Combine output records=0
>11/11/14 11:21:47 INFO mapred.JobClient:     Map input records=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce shuffle bytes=0
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce output records=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Spilled Records=10
>11/11/14 11:21:47 INFO mapred.JobClient:     Map output bytes=91
>11/11/14 11:21:47 INFO mapred.JobClient:     Combine input records=0
>11/11/14 11:21:47 INFO mapred.JobClient:     Map output records=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input records=5
>
>
>Please Suggest
>
>-----Original Message-----
>From: Joey Echeverria [mailto:joey@cloudera.com]
>Sent: Friday, November 11, 2011 10:38 PM
>To: user@hbase.apache.org
>Subject: Re: MR - Input from Hbase output to HDFS
>
>There are two APIs (old and new), and you appear to be mixing them.
>TableMapReduceUtil only works with the new API. The solution is to import
>the new version of FileOutputFormat which takes a Job:
>
>
>import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
>
>-Joey
>
>On Fri, Nov 11, 2011 at 12:55 AM, Stuti Awasthi <st...@hcl.com>
>wrote:
>> The method " setOutputPath (JobConf,Path)" take JobConf as a parameter
>>not the Job object.
>> At least this is the error Im getting while compiling with Hadoop
>>0.20.2 jar with eclipse.
>>
>> FileOutputFormat.setOutputPath(conf, new Path("/output"));
>>
>> -----Original Message-----
>> From: Prashant Sharma [mailto:prashant.iiith@gmail.com]
>> Sent: Friday, November 11, 2011 11:20 AM
>> To: user@hbase.apache.org
>> Subject: Re: MR - Input from Hbase output to HDFS
>>
>> Hi stuti,
>> I was wondering why  you are not using job object to set output path
>>like this.
>>
>> FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") );
>>
>>
>> thanks
>>
>> On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi
>><st...@hcl.com>wrote:
>>
>>> Hi Andrie,
>>> Well I am bit confused. When I use Jobconf , and associate with
>>> JobClient to run the job then I get the error that "Input directory is
>>>not set".
>>> Since I want my input to be taken by Hbase table which I already
>>> configured with "TableMapReduceUtil.initTableMapperJob". I don't want
>>> to set input directory via jobconf.
>>> How to mix these 2 so that I can get input from Hbase and write ouput
>>> to HDFS.
>>>
>>> Thanks
>>>
>>> -----Original Message-----
>>> From: Andrei Cojocaru [mailto:majormax@gmail.com]
>>> Sent: Thursday, November 10, 2011 7:09 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: MR - Input from Hbase output to HDFS
>>>
>>> Stuti,
>>>
>>> I don't see you associating JobConf with Job anywhere.
>>> -Andrei
>>>
>>> ::DISCLAIMER::
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> -------------------------------------------------
>>>
>>> The contents of this e-mail and any attachment(s) are confidential
>>> and intended for the named recipient(s) only.
>>> It shall not attach any liability on the originator or HCL or its
>>> affiliates. Any views or opinions presented in this email are solely
>>> those of the author and may not necessarily reflect the opinions of
>>> HCL or its affiliates.
>>> Any form of reproduction, dissemination, copying, disclosure,
>>> modification, distribution and / or publication of this message
>>> without the prior written consent of the author of this e-mail is
>>> strictly prohibited. If you have received this email in error please
>>> delete it and notify the sender immediately. Before opening any mail
>>> and attachments please check them for viruses and defect.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> -------------------------------------------------
>>>
>>
>
>
>
>--
>Joseph Echeverria
>Cloudera, Inc.
>443.305.9434
>



RE: MR - Input from Hbase output to HDFS

Posted by Stuti Awasthi <st...@hcl.com>.
Hi,
I think that issue is with Filesystem Configuration, as in config, it is picking HbaseConfiguration. When I modified my output directory path to absolute path of HDFS :
FileOutputFormat.setOutputPath(job, new Path("hdfs://master:54310/MR/stuti3"));

The MR jobs runs successfully and I am able to see stuti3 directory inside HDFS at desired path.


-----Original Message-----
From: Stuti Awasthi 
Sent: Monday, November 14, 2011 11:40 AM
To: user@hbase.apache.org
Subject: RE: MR - Input from Hbase output to HDFS

Hi Joey,
Thanks for pointing this. After importing "FileOutputFormat" as you suggested, I am able to run MR job from eclipse (Windows) the only problem is I am not able to see the output directory this code is creating. HDFS and HBase are on Linux machine.

Code :
		Configuration config = HBaseConfiguration.create();
		config.set("hbase.zookeeper.quorum", "master");
		config.set("hbase.zookeeper.property.clientPort", "2181");
			
		Job job = new Job(config, "Hbase_Read_Write");
		job.setJarByClass(ReadWriteDriver.class);
		Scan scan = new Scan();
		scan.setCaching(500);
		scan.setCacheBlocks(false);
		TableMapReduceUtil.initTableMapperJob("users", scan,ReadWriteMapper.class, Text.class, IntWritable.class, job);
		job.setOutputFormatClass(TextOutputFormat.class);
		FileOutputFormat.setOutputPath(job, new Path("/stuti2"));

After executing this code, the MR jobs runs successfully but when I look hdfs no directory is created "/stuti2". I also looked directory in local filesystem of Linux machine as well as windows machine, but not able to find the output folder anywhere.
	
Eclipse console Output :
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_27
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.home=C:\Program Files\Java\jdk1.6.0_27\jre
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.class.path=D:\workspace\Hbase\MRHbaseReadWrite\bin;D:\workspace\Hbase\MRHbaseReadWrite\lib\commons-cli-1.2.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\commons-httpclient-3.0.1.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\commons-logging-1.0.4.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\hadoop-0.20.2-core.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\hbase-0.90.3.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\log4j-1.2.15.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\zookeeper-3.3.2.jar
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.library.path=C:\Program Files\Java\jdk1.6.0_27\jre\bin;C:\Windows\Sun\Java\bin;C:\Windows\system32;C:\Windows;C:/Program Files/Java/jre6/bin/client;C:/Program Files/Java/jre6/bin;C:/Program Files/Java/jre6/lib/i386;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\Java\jdk1.6.0_27;C:\Program Files\TortoiseSVN\bin;C:\cygwin\bin;D:\apache-maven-3.0.3\bin;D:\eclipse;;.
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=C:\Users\STUTIA~1\AppData\Local\Temp\
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:os.name=Windows 7
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:os.arch=x86
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:os.version=6.1
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:user.name=stutiawasthi
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:user.home=C:\Users\stutiawasthi
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:user.dir=D:\workspace\Hbase\MRHbaseReadWrite
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=180000 watcher=hconnection
11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Opening socket connection to server master/10.33.64.235:2181
11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Socket connection established to master/10.33.64.235:2181, initiating session
11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Session establishment complete on server master/10.33.64.235:2181, sessionid = 0x33879243de00ec, negotiated timeout = 180000
11/11/14 11:21:46 INFO mapred.JobClient: Running job: job_local_0001
11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=180000 watcher=hconnection
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection to server master/10.33.64.235:2181
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection established to master/10.33.64.235:2181, initiating session
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment complete on server master/10.33.64.235:2181, sessionid = 0x33879243de00ed, negotiated timeout = 180000
11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=180000 watcher=hconnection
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection to server master/10.33.64.235:2181
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection established to master/10.33.64.235:2181, initiating session
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment complete on server master/10.33.64.235:2181, sessionid = 0x33879243de00ee, negotiated timeout = 180000
11/11/14 11:21:46 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 11:21:46 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 11:21:46 INFO mapred.MapTask: record buffer = 262144/327680 ...............................................
11/11/14 11:21:46 INFO mapred.MapTask: Finished spill 0
11/11/14 11:21:46 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
11/11/14 11:21:46 INFO mapred.LocalJobRunner: 
11/11/14 11:21:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
11/11/14 11:21:46 INFO mapred.LocalJobRunner: 
11/11/14 11:21:46 INFO mapred.Merger: Merging 1 sorted segments
11/11/14 11:21:46 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 103 bytes
11/11/14 11:21:46 INFO mapred.LocalJobRunner: 
11/11/14 11:21:46 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
11/11/14 11:21:46 INFO mapred.LocalJobRunner: 
11/11/14 11:21:46 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
11/11/14 11:21:46 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to /stuti2
11/11/14 11:21:46 INFO mapred.LocalJobRunner: reduce > reduce
11/11/14 11:21:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
11/11/14 11:21:47 INFO mapred.JobClient:  map 100% reduce 100%
11/11/14 11:21:47 INFO mapred.JobClient: Job complete: job_local_0001
11/11/14 11:21:47 INFO mapred.JobClient: Counters: 12
11/11/14 11:21:47 INFO mapred.JobClient:   FileSystemCounters
11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_READ=40923
11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=82343
11/11/14 11:21:47 INFO mapred.JobClient:   Map-Reduce Framework
11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input groups=5
11/11/14 11:21:47 INFO mapred.JobClient:     Combine output records=0
11/11/14 11:21:47 INFO mapred.JobClient:     Map input records=5
11/11/14 11:21:47 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/11/14 11:21:47 INFO mapred.JobClient:     Reduce output records=5
11/11/14 11:21:47 INFO mapred.JobClient:     Spilled Records=10
11/11/14 11:21:47 INFO mapred.JobClient:     Map output bytes=91
11/11/14 11:21:47 INFO mapred.JobClient:     Combine input records=0
11/11/14 11:21:47 INFO mapred.JobClient:     Map output records=5
11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input records=5


Please Suggest

-----Original Message-----
From: Joey Echeverria [mailto:joey@cloudera.com]
Sent: Friday, November 11, 2011 10:38 PM
To: user@hbase.apache.org
Subject: Re: MR - Input from Hbase output to HDFS

There are two APIs (old and new), and you appear to be mixing them.
TableMapReduceUtil only works with the new API. The solution is to import the new version of FileOutputFormat which takes a Job:


import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

-Joey

On Fri, Nov 11, 2011 at 12:55 AM, Stuti Awasthi <st...@hcl.com> wrote:
> The method " setOutputPath (JobConf,Path)" take JobConf as a parameter not the Job object.
> At least this is the error Im getting while compiling with Hadoop 0.20.2 jar with eclipse.
>
> FileOutputFormat.setOutputPath(conf, new Path("/output"));
>
> -----Original Message-----
> From: Prashant Sharma [mailto:prashant.iiith@gmail.com]
> Sent: Friday, November 11, 2011 11:20 AM
> To: user@hbase.apache.org
> Subject: Re: MR - Input from Hbase output to HDFS
>
> Hi stuti,
> I was wondering why  you are not using job object to set output path like this.
>
> FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") );
>
>
> thanks
>
> On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi <st...@hcl.com>wrote:
>
>> Hi Andrie,
>> Well I am bit confused. When I use Jobconf , and associate with 
>> JobClient to run the job then I get the error that "Input directory is not set".
>> Since I want my input to be taken by Hbase table which I already 
>> configured with "TableMapReduceUtil.initTableMapperJob". I don't want 
>> to set input directory via jobconf.
>> How to mix these 2 so that I can get input from Hbase and write ouput 
>> to HDFS.
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Andrei Cojocaru [mailto:majormax@gmail.com]
>> Sent: Thursday, November 10, 2011 7:09 PM
>> To: user@hbase.apache.org
>> Subject: Re: MR - Input from Hbase output to HDFS
>>
>> Stuti,
>>
>> I don't see you associating JobConf with Job anywhere.
>> -Andrei
>>
>> ::DISCLAIMER::
>>
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------------
>>
>> The contents of this e-mail and any attachment(s) are confidential 
>> and intended for the named recipient(s) only.
>> It shall not attach any liability on the originator or HCL or its 
>> affiliates. Any views or opinions presented in this email are solely 
>> those of the author and may not necessarily reflect the opinions of 
>> HCL or its affiliates.
>> Any form of reproduction, dissemination, copying, disclosure, 
>> modification, distribution and / or publication of this message 
>> without the prior written consent of the author of this e-mail is 
>> strictly prohibited. If you have received this email in error please 
>> delete it and notify the sender immediately. Before opening any mail 
>> and attachments please check them for viruses and defect.
>>
>>
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------------
>>
>



--
Joseph Echeverria
Cloudera, Inc.
443.305.9434

RE: MR - Input from Hbase output to HDFS

Posted by Stuti Awasthi <st...@hcl.com>.
Hi Joey,
Thanks for pointing this. After importing "FileOutputFormat" as you suggested, I am able to run MR job from eclipse (Windows) the only problem is I am not able to see the output directory this code is creating. HDFS and HBase are on Linux machine.

Code :
		Configuration config = HBaseConfiguration.create();
		config.set("hbase.zookeeper.quorum", "master");
		config.set("hbase.zookeeper.property.clientPort", "2181");
			
		Job job = new Job(config, "Hbase_Read_Write");
		job.setJarByClass(ReadWriteDriver.class);
		Scan scan = new Scan();
		scan.setCaching(500);
		scan.setCacheBlocks(false);
		TableMapReduceUtil.initTableMapperJob("users", scan,ReadWriteMapper.class, Text.class, IntWritable.class, job);
		job.setOutputFormatClass(TextOutputFormat.class);
		FileOutputFormat.setOutputPath(job, new Path("/stuti2"));

After executing this code, the MR jobs runs successfully but when I look hdfs no directory is created "/stuti2". I also looked directory in local filesystem of Linux machine as well as windows machine, but not able to find the output folder anywhere.
	
Eclipse console Output :
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_27
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.home=C:\Program Files\Java\jdk1.6.0_27\jre
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.class.path=D:\workspace\Hbase\MRHbaseReadWrite\bin;D:\workspace\Hbase\MRHbaseReadWrite\lib\commons-cli-1.2.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\commons-httpclient-3.0.1.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\commons-logging-1.0.4.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\hadoop-0.20.2-core.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\hbase-0.90.3.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\log4j-1.2.15.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\zookeeper-3.3.2.jar
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.library.path=C:\Program Files\Java\jdk1.6.0_27\jre\bin;C:\Windows\Sun\Java\bin;C:\Windows\system32;C:\Windows;C:/Program Files/Java/jre6/bin/client;C:/Program Files/Java/jre6/bin;C:/Program Files/Java/jre6/lib/i386;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\Java\jdk1.6.0_27;C:\Program Files\TortoiseSVN\bin;C:\cygwin\bin;D:\apache-maven-3.0.3\bin;D:\eclipse;;.
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=C:\Users\STUTIA~1\AppData\Local\Temp\
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:os.name=Windows 7
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:os.arch=x86
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:os.version=6.1
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:user.name=stutiawasthi
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:user.home=C:\Users\stutiawasthi
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client environment:user.dir=D:\workspace\Hbase\MRHbaseReadWrite
11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=180000 watcher=hconnection
11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Opening socket connection to server master/10.33.64.235:2181
11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Socket connection established to master/10.33.64.235:2181, initiating session
11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Session establishment complete on server master/10.33.64.235:2181, sessionid = 0x33879243de00ec, negotiated timeout = 180000
11/11/14 11:21:46 INFO mapred.JobClient: Running job: job_local_0001
11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=180000 watcher=hconnection
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection to server master/10.33.64.235:2181
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection established to master/10.33.64.235:2181, initiating session
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment complete on server master/10.33.64.235:2181, sessionid = 0x33879243de00ed, negotiated timeout = 180000
11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=180000 watcher=hconnection
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection to server master/10.33.64.235:2181
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection established to master/10.33.64.235:2181, initiating session
11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment complete on server master/10.33.64.235:2181, sessionid = 0x33879243de00ee, negotiated timeout = 180000
11/11/14 11:21:46 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 11:21:46 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 11:21:46 INFO mapred.MapTask: record buffer = 262144/327680
...............................................
11/11/14 11:21:46 INFO mapred.MapTask: Finished spill 0
11/11/14 11:21:46 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
11/11/14 11:21:46 INFO mapred.LocalJobRunner: 
11/11/14 11:21:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
11/11/14 11:21:46 INFO mapred.LocalJobRunner: 
11/11/14 11:21:46 INFO mapred.Merger: Merging 1 sorted segments
11/11/14 11:21:46 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 103 bytes
11/11/14 11:21:46 INFO mapred.LocalJobRunner: 
11/11/14 11:21:46 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
11/11/14 11:21:46 INFO mapred.LocalJobRunner: 
11/11/14 11:21:46 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
11/11/14 11:21:46 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to /stuti2
11/11/14 11:21:46 INFO mapred.LocalJobRunner: reduce > reduce
11/11/14 11:21:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
11/11/14 11:21:47 INFO mapred.JobClient:  map 100% reduce 100%
11/11/14 11:21:47 INFO mapred.JobClient: Job complete: job_local_0001
11/11/14 11:21:47 INFO mapred.JobClient: Counters: 12
11/11/14 11:21:47 INFO mapred.JobClient:   FileSystemCounters
11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_READ=40923
11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=82343
11/11/14 11:21:47 INFO mapred.JobClient:   Map-Reduce Framework
11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input groups=5
11/11/14 11:21:47 INFO mapred.JobClient:     Combine output records=0
11/11/14 11:21:47 INFO mapred.JobClient:     Map input records=5
11/11/14 11:21:47 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/11/14 11:21:47 INFO mapred.JobClient:     Reduce output records=5
11/11/14 11:21:47 INFO mapred.JobClient:     Spilled Records=10
11/11/14 11:21:47 INFO mapred.JobClient:     Map output bytes=91
11/11/14 11:21:47 INFO mapred.JobClient:     Combine input records=0
11/11/14 11:21:47 INFO mapred.JobClient:     Map output records=5
11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input records=5


Please Suggest

-----Original Message-----
From: Joey Echeverria [mailto:joey@cloudera.com] 
Sent: Friday, November 11, 2011 10:38 PM
To: user@hbase.apache.org
Subject: Re: MR - Input from Hbase output to HDFS

There are two APIs (old and new), and you appear to be mixing them.
TableMapReduceUtil only works with the new API. The solution is to import the new version of FileOutputFormat which takes a Job:


import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

-Joey

On Fri, Nov 11, 2011 at 12:55 AM, Stuti Awasthi <st...@hcl.com> wrote:
> The method " setOutputPath (JobConf,Path)" take JobConf as a parameter not the Job object.
> At least this is the error Im getting while compiling with Hadoop 0.20.2 jar with eclipse.
>
> FileOutputFormat.setOutputPath(conf, new Path("/output"));
>
> -----Original Message-----
> From: Prashant Sharma [mailto:prashant.iiith@gmail.com]
> Sent: Friday, November 11, 2011 11:20 AM
> To: user@hbase.apache.org
> Subject: Re: MR - Input from Hbase output to HDFS
>
> Hi stuti,
> I was wondering why  you are not using job object to set output path like this.
>
> FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") );
>
>
> thanks
>
> On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi <st...@hcl.com>wrote:
>
>> Hi Andrie,
>> Well I am bit confused. When I use Jobconf , and associate with 
>> JobClient to run the job then I get the error that "Input directory is not set".
>> Since I want my input to be taken by Hbase table which I already 
>> configured with "TableMapReduceUtil.initTableMapperJob". I don't want 
>> to set input directory via jobconf.
>> How to mix these 2 so that I can get input from Hbase and write ouput 
>> to HDFS.
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Andrei Cojocaru [mailto:majormax@gmail.com]
>> Sent: Thursday, November 10, 2011 7:09 PM
>> To: user@hbase.apache.org
>> Subject: Re: MR - Input from Hbase output to HDFS
>>
>> Stuti,
>>
>> I don't see you associating JobConf with Job anywhere.
>> -Andrei
>>
>> ::DISCLAIMER::
>>
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------------
>>
>> The contents of this e-mail and any attachment(s) are confidential 
>> and intended for the named recipient(s) only.
>> It shall not attach any liability on the originator or HCL or its 
>> affiliates. Any views or opinions presented in this email are solely 
>> those of the author and may not necessarily reflect the opinions of 
>> HCL or its affiliates.
>> Any form of reproduction, dissemination, copying, disclosure, 
>> modification, distribution and / or publication of this message 
>> without the prior written consent of the author of this e-mail is 
>> strictly prohibited. If you have received this email in error please 
>> delete it and notify the sender immediately. Before opening any mail 
>> and attachments please check them for viruses and defect.
>>
>>
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------------
>>
>



--
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: MR - Input from Hbase output to HDFS

Posted by Joey Echeverria <jo...@cloudera.com>.
There are two APIs (old and new), and you appear to be mixing them.
TableMapReduceUtil only works with the new API. The solution is to
import the new version of FileOutputFormat which takes a Job:


import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

-Joey

On Fri, Nov 11, 2011 at 12:55 AM, Stuti Awasthi <st...@hcl.com> wrote:
> The method " setOutputPath (JobConf,Path)" take JobConf as a parameter not the Job object.
> At least this is the error Im getting while compiling with Hadoop 0.20.2 jar with eclipse.
>
> FileOutputFormat.setOutputPath(conf, new Path("/output"));
>
> -----Original Message-----
> From: Prashant Sharma [mailto:prashant.iiith@gmail.com]
> Sent: Friday, November 11, 2011 11:20 AM
> To: user@hbase.apache.org
> Subject: Re: MR - Input from Hbase output to HDFS
>
> Hi stuti,
> I was wondering why  you are not using job object to set output path like this.
>
> FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") );
>
>
> thanks
>
> On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi <st...@hcl.com>wrote:
>
>> Hi Andrie,
>> Well I am bit confused. When I use Jobconf , and associate with
>> JobClient to run the job then I get the error that "Input directory is not set".
>> Since I want my input to be taken by Hbase table which I already
>> configured with "TableMapReduceUtil.initTableMapperJob". I don't want
>> to set input directory via jobconf.
>> How to mix these 2 so that I can get input from Hbase and write ouput
>> to HDFS.
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Andrei Cojocaru [mailto:majormax@gmail.com]
>> Sent: Thursday, November 10, 2011 7:09 PM
>> To: user@hbase.apache.org
>> Subject: Re: MR - Input from Hbase output to HDFS
>>
>> Stuti,
>>
>> I don't see you associating JobConf with Job anywhere.
>> -Andrei
>>
>> ::DISCLAIMER::
>>
>> ----------------------------------------------------------------------
>> -------------------------------------------------
>>
>> The contents of this e-mail and any attachment(s) are confidential and
>> intended for the named recipient(s) only.
>> It shall not attach any liability on the originator or HCL or its
>> affiliates. Any views or opinions presented in this email are solely
>> those of the author and may not necessarily reflect the opinions of
>> HCL or its affiliates.
>> Any form of reproduction, dissemination, copying, disclosure,
>> modification, distribution and / or publication of this message
>> without the prior written consent of the author of this e-mail is
>> strictly prohibited. If you have received this email in error please
>> delete it and notify the sender immediately. Before opening any mail
>> and attachments please check them for viruses and defect.
>>
>>
>> ----------------------------------------------------------------------
>> -------------------------------------------------
>>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

RE: MR - Input from Hbase output to HDFS

Posted by Stuti Awasthi <st...@hcl.com>.
The method " setOutputPath (JobConf,Path)" take JobConf as a parameter not the Job object. 
At least this is the error Im getting while compiling with Hadoop 0.20.2 jar with eclipse.

FileOutputFormat.setOutputPath(conf, new Path("/output"));

-----Original Message-----
From: Prashant Sharma [mailto:prashant.iiith@gmail.com] 
Sent: Friday, November 11, 2011 11:20 AM
To: user@hbase.apache.org
Subject: Re: MR - Input from Hbase output to HDFS

Hi stuti,
I was wondering why  you are not using job object to set output path like this.

FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") );


thanks

On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi <st...@hcl.com>wrote:

> Hi Andrie,
> Well I am bit confused. When I use Jobconf , and associate with 
> JobClient to run the job then I get the error that "Input directory is not set".
> Since I want my input to be taken by Hbase table which I already 
> configured with "TableMapReduceUtil.initTableMapperJob". I don't want 
> to set input directory via jobconf.
> How to mix these 2 so that I can get input from Hbase and write ouput 
> to HDFS.
>
> Thanks
>
> -----Original Message-----
> From: Andrei Cojocaru [mailto:majormax@gmail.com]
> Sent: Thursday, November 10, 2011 7:09 PM
> To: user@hbase.apache.org
> Subject: Re: MR - Input from Hbase output to HDFS
>
> Stuti,
>
> I don't see you associating JobConf with Job anywhere.
> -Andrei
>
> ::DISCLAIMER::
>
> ----------------------------------------------------------------------
> -------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and 
> intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its 
> affiliates. Any views or opinions presented in this email are solely 
> those of the author and may not necessarily reflect the opinions of 
> HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, 
> modification, distribution and / or publication of this message 
> without the prior written consent of the author of this e-mail is 
> strictly prohibited. If you have received this email in error please 
> delete it and notify the sender immediately. Before opening any mail 
> and attachments please check them for viruses and defect.
>
>
> ----------------------------------------------------------------------
> -------------------------------------------------
>

Re: MR - Input from Hbase output to HDFS

Posted by Prashant Sharma <pr...@gmail.com>.
Hi stuti,
I was wondering why  you are not using job object to set output path like
this.

FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") );


thanks

On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi <st...@hcl.com>wrote:

> Hi Andrie,
> Well I am bit confused. When I use Jobconf , and associate with JobClient
> to run the job then I get the error that "Input directory is not set".
> Since I want my input to be taken by Hbase table which I already configured
> with "TableMapReduceUtil.initTableMapperJob". I don't want to set input
> directory via jobconf.
> How to mix these 2 so that I can get input from Hbase and write ouput to
> HDFS.
>
> Thanks
>
> -----Original Message-----
> From: Andrei Cojocaru [mailto:majormax@gmail.com]
> Sent: Thursday, November 10, 2011 7:09 PM
> To: user@hbase.apache.org
> Subject: Re: MR - Input from Hbase output to HDFS
>
> Stuti,
>
> I don't see you associating JobConf with Job anywhere.
> -Andrei
>
> ::DISCLAIMER::
>
> -----------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its
> affiliates. Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect
> the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure,
> modification, distribution and / or publication of
> this message without the prior written consent of the author of this
> e-mail is strictly prohibited. If you have
> received this email in error please delete it and notify the sender
> immediately. Before opening any mail and
> attachments please check them for viruses and defect.
>
>
> -----------------------------------------------------------------------------------------------------------------------
>

RE: MR - Input from Hbase output to HDFS

Posted by Stuti Awasthi <st...@hcl.com>.
Hi Andrie,
Well I am bit confused. When I use Jobconf , and associate with JobClient to run the job then I get the error that "Input directory is not set". Since I want my input to be taken by Hbase table which I already configured with "TableMapReduceUtil.initTableMapperJob". I don't want to set input directory via jobconf.
How to mix these 2 so that I can get input from Hbase and write ouput to HDFS.

Thanks

-----Original Message-----
From: Andrei Cojocaru [mailto:majormax@gmail.com]
Sent: Thursday, November 10, 2011 7:09 PM
To: user@hbase.apache.org
Subject: Re: MR - Input from Hbase output to HDFS

Stuti,

I don't see you associating JobConf with Job anywhere.
-Andrei

::DISCLAIMER::
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
received this email in error please delete it and notify the sender immediately. Before opening any mail and
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

Re: MR - Input from Hbase output to HDFS

Posted by Andrei Cojocaru <ma...@gmail.com>.
Stuti,

I don't see you associating JobConf with Job anywhere.
-Andrei

RE: MR - Input from Hbase output to HDFS

Posted by Stuti Awasthi <st...@hcl.com>.
Hi Prashant,
Yes I am running as root and "new Path("/outputReadWrite"))" represent relative path of HDFS where I want my output directory to be created.

-----Original Message-----
From: Prashant Sharma [mailto:prashant.iiith@gmail.com] 
Sent: Thursday, November 10, 2011 1:55 PM
To: user@hbase.apache.org
Subject: Re: MR - Input from Hbase output to HDFS

>
> new Path("/outputReadWrite"))

I am afraid are you running as root ?

On Thu, Nov 10, 2011 at 1:31 PM, Stuti Awasthi <st...@hcl.com> wrote:

> Hi Tim,
>
> My Job driver class looks like this :
>
> Job job = new Job(config, "Hbase_Read_Write");
>                job.setJarByClass(ReadWriteDriver.class);
>                JobConf conf = new JobConf(ReadWriteDriver.class);
>
>                Scan scan = new Scan();
>                scan.setCaching(500);
>                scan.setCacheBlocks(false);
>
>                TableMapReduceUtil.initTableMapperJob("users", scan,
>                                ReadWriteMapper.class, Text.class, 
> IntWritable.class, job);
>
>                job.setOutputFormatClass(TextOutputFormat.class);
>                FileOutputFormat.setOutputPath(conf, new 
> Path("/outputReadWrite"));
>
>                boolean b;
>                try {
>                        b = job.waitForCompletion(true);
>                        if (!b) {
>                                throw new IOException("error with job!");
>                        }
>                } catch (InterruptedException e) {
>                        e.printStackTrace();
>                } catch (ClassNotFoundException e) {
>                        e.printStackTrace();
>                }
>
> But getting error :
>
> Exception in thread "main"
> org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set.
>        at
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:120)
>        at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:448)
>        at readwrite.ReadWriteDriver.main(ReadWriteDriver.java:46)
>
> -----Original Message-----
> From: Tim Robertson [mailto:timrobertson100@gmail.com]
> Sent: Thursday, November 10, 2011 1:15 PM
> To: user@hbase.apache.org
> Subject: Re: MR - Input from Hbase output to HDFS
>
> Hi Stuti,
>
> I would have thought it was something like:
>  conf.setOutputFormat(TextOutputFormat.class);
>  FileOutputFormat.setOutputPath(conf, new Path(<YOUR_LOCATION>));
>
> Cheers,
> Tim
>
>
>
>
> On Thu, Nov 10, 2011 at 8:31 AM, Stuti Awasthi <st...@hcl.com>
> wrote:
> > Hi
> > Currently I am understading Hbase MapReduce support. I followed
> http://hbase.apache.org/book/mapreduce.example.html and executed it 
> successfully.
> > But I am not sure what changes to be done to  MR which takes input 
> > from
> Hbase table and put output to HDFS.
> >
> > How to set output dir . I tried to set with JobConf but it gives me
> error that output directory is not set.
> > Please Suggest.
> >
> > Regards,
> > Stuti Awasthi
> > HCL Comnet Systems and Services Ltd
> > F-8/9 Basement, Sec-3,Noida.
> >
> >
> > ________________________________
> > ::DISCLAIMER::
> > --------------------------------------------------------------------
> > --
> > -------------------------------------------------
> >
> > The contents of this e-mail and any attachment(s) are confidential 
> > and
> intended for the named recipient(s) only.
> > It shall not attach any liability on the originator or HCL or its 
> > affiliates. Any views or opinions presented in this email are solely
> those of the author and may not necessarily reflect the opinions of 
> HCL or its affiliates.
> > Any form of reproduction, dissemination, copying, disclosure, 
> > modification, distribution and / or publication of this message 
> > without the prior written consent of the author of this e-mail is 
> > strictly prohibited. If you have received this email in error please
> delete it and notify the sender immediately. Before opening any mail 
> and attachments please check them for viruses and defect.
> >
> > --------------------------------------------------------------------
> > --
> > -------------------------------------------------
> >
>

Re: MR - Input from Hbase output to HDFS

Posted by Prashant Sharma <pr...@gmail.com>.
>
> new Path("/outputReadWrite"))

I am afraid are you running as root ?

On Thu, Nov 10, 2011 at 1:31 PM, Stuti Awasthi <st...@hcl.com> wrote:

> Hi Tim,
>
> My Job driver class looks like this :
>
> Job job = new Job(config, "Hbase_Read_Write");
>                job.setJarByClass(ReadWriteDriver.class);
>                JobConf conf = new JobConf(ReadWriteDriver.class);
>
>                Scan scan = new Scan();
>                scan.setCaching(500);
>                scan.setCacheBlocks(false);
>
>                TableMapReduceUtil.initTableMapperJob("users", scan,
>                                ReadWriteMapper.class, Text.class,
> IntWritable.class, job);
>
>                job.setOutputFormatClass(TextOutputFormat.class);
>                FileOutputFormat.setOutputPath(conf, new
> Path("/outputReadWrite"));
>
>                boolean b;
>                try {
>                        b = job.waitForCompletion(true);
>                        if (!b) {
>                                throw new IOException("error with job!");
>                        }
>                } catch (InterruptedException e) {
>                        e.printStackTrace();
>                } catch (ClassNotFoundException e) {
>                        e.printStackTrace();
>                }
>
> But getting error :
>
> Exception in thread "main"
> org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set.
>        at
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:120)
>        at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:448)
>        at readwrite.ReadWriteDriver.main(ReadWriteDriver.java:46)
>
> -----Original Message-----
> From: Tim Robertson [mailto:timrobertson100@gmail.com]
> Sent: Thursday, November 10, 2011 1:15 PM
> To: user@hbase.apache.org
> Subject: Re: MR - Input from Hbase output to HDFS
>
> Hi Stuti,
>
> I would have thought it was something like:
>  conf.setOutputFormat(TextOutputFormat.class);
>  FileOutputFormat.setOutputPath(conf, new Path(<YOUR_LOCATION>));
>
> Cheers,
> Tim
>
>
>
>
> On Thu, Nov 10, 2011 at 8:31 AM, Stuti Awasthi <st...@hcl.com>
> wrote:
> > Hi
> > Currently I am understading Hbase MapReduce support. I followed
> http://hbase.apache.org/book/mapreduce.example.html and executed it
> successfully.
> > But I am not sure what changes to be done to  MR which takes input from
> Hbase table and put output to HDFS.
> >
> > How to set output dir . I tried to set with JobConf but it gives me
> error that output directory is not set.
> > Please Suggest.
> >
> > Regards,
> > Stuti Awasthi
> > HCL Comnet Systems and Services Ltd
> > F-8/9 Basement, Sec-3,Noida.
> >
> >
> > ________________________________
> > ::DISCLAIMER::
> > ----------------------------------------------------------------------
> > -------------------------------------------------
> >
> > The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> > It shall not attach any liability on the originator or HCL or its
> > affiliates. Any views or opinions presented in this email are solely
> those of the author and may not necessarily reflect the opinions of HCL or
> its affiliates.
> > Any form of reproduction, dissemination, copying, disclosure,
> > modification, distribution and / or publication of this message
> > without the prior written consent of the author of this e-mail is
> > strictly prohibited. If you have received this email in error please
> delete it and notify the sender immediately. Before opening any mail and
> attachments please check them for viruses and defect.
> >
> > ----------------------------------------------------------------------
> > -------------------------------------------------
> >
>

RE: MR - Input from Hbase output to HDFS

Posted by Stuti Awasthi <st...@hcl.com>.
Hi Tim,

My Job driver class looks like this :

Job job = new Job(config, "Hbase_Read_Write");
		job.setJarByClass(ReadWriteDriver.class);
		JobConf conf = new JobConf(ReadWriteDriver.class);
		
		Scan scan = new Scan();
		scan.setCaching(500);
		scan.setCacheBlocks(false);

		TableMapReduceUtil.initTableMapperJob("users", scan,
				ReadWriteMapper.class, Text.class, IntWritable.class, job);
		
		job.setOutputFormatClass(TextOutputFormat.class);
		FileOutputFormat.setOutputPath(conf, new Path("/outputReadWrite"));
		
		boolean b;
		try {
			b = job.waitForCompletion(true);
			if (!b) {
				throw new IOException("error with job!");
			}
		} catch (InterruptedException e) {
			e.printStackTrace();
		} catch (ClassNotFoundException e) {
			e.printStackTrace();
		}

But getting error :

Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set.
	at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:120)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:448)
	at readwrite.ReadWriteDriver.main(ReadWriteDriver.java:46)

-----Original Message-----
From: Tim Robertson [mailto:timrobertson100@gmail.com] 
Sent: Thursday, November 10, 2011 1:15 PM
To: user@hbase.apache.org
Subject: Re: MR - Input from Hbase output to HDFS

Hi Stuti,

I would have thought it was something like:
  conf.setOutputFormat(TextOutputFormat.class);
  FileOutputFormat.setOutputPath(conf, new Path(<YOUR_LOCATION>));

Cheers,
Tim




On Thu, Nov 10, 2011 at 8:31 AM, Stuti Awasthi <st...@hcl.com> wrote:
> Hi
> Currently I am understading Hbase MapReduce support. I followed http://hbase.apache.org/book/mapreduce.example.html and executed it successfully.
> But I am not sure what changes to be done to  MR which takes input from Hbase table and put output to HDFS.
>
> How to set output dir . I tried to set with JobConf but it gives me error that output directory is not set.
> Please Suggest.
>
> Regards,
> Stuti Awasthi
> HCL Comnet Systems and Services Ltd
> F-8/9 Basement, Sec-3,Noida.
>
>
> ________________________________
> ::DISCLAIMER::
> ----------------------------------------------------------------------
> -------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its 
> affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, 
> modification, distribution and / or publication of this message 
> without the prior written consent of the author of this e-mail is 
> strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect.
>
> ----------------------------------------------------------------------
> -------------------------------------------------
>

Re: MR - Input from Hbase output to HDFS

Posted by Tim Robertson <ti...@gmail.com>.
Hi Stuti,

I would have thought it was something like:
  conf.setOutputFormat(TextOutputFormat.class);
  FileOutputFormat.setOutputPath(conf, new Path(<YOUR_LOCATION>));

Cheers,
Tim




On Thu, Nov 10, 2011 at 8:31 AM, Stuti Awasthi <st...@hcl.com> wrote:
> Hi
> Currently I am understading Hbase MapReduce support. I followed http://hbase.apache.org/book/mapreduce.example.html and executed it successfully.
> But I am not sure what changes to be done to  MR which takes input from Hbase table and put output to HDFS.
>
> How to set output dir . I tried to set with JobConf but it gives me error that output directory is not set.
> Please Suggest.
>
> Regards,
> Stuti Awasthi
> HCL Comnet Systems and Services Ltd
> F-8/9 Basement, Sec-3,Noida.
>
>
> ________________________________
> ::DISCLAIMER::
> -----------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
> this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
> received this email in error please delete it and notify the sender immediately. Before opening any mail and
> attachments please check them for viruses and defect.
>
> -----------------------------------------------------------------------------------------------------------------------
>