You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Tom Nichols <tm...@gmail.com> on 2009/02/10 17:17:38 UTC
help w/ table mapreduce job
Hi,
I'm trying to write a M/R job to do the following:
- Scan a given table and collect unique column names
- Write those column names to another table with the source table name
as the row key, and columns in the format (Cols:<sourceColName>,
<sourceColName)
I'm not sure what I'm doing wrong. It appears that the mapper is
configured and then immediately closed w/o scanning any rows. I've
verified there is data in my source table. I don't see the reduce
task run at all. Attached is the code and below is the log output
from running the task. Thanks in advance for any pointers.
2009-02-10 11:14:55,087 INFO (main) JvmMetrics - Initializing JVM
Metrics with processName=JobTracker, sessionId=
2009-02-10 11:14:55,151 WARN (main) JobClient - No job jar file set.
User classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
2009-02-10 11:14:55,610 INFO (main) TableInputFormatBase - split:
0->localhost:,
2009-02-10 11:14:55,840 INFO (main) JobClient - Running job: job_local_0001
2009-02-10 11:14:55,890 INFO (Thread-11) TableInputFormatBase -
split: 0->localhost:,
2009-02-10 11:14:55,931 INFO (Thread-11) MapTask - numReduceTasks: 1
2009-02-10 11:14:55,957 INFO (Thread-11) MapTask - io.sort.mb = 100
2009-02-10 11:14:56,239 INFO (Thread-11) MapTask - data buffer =
79691776/99614720
2009-02-10 11:14:56,239 INFO (Thread-11) MapTask - record buffer =
262144/327680
2009-02-10 11:14:56,863 INFO (main) JobClient - map 0% reduce 0%
2009-02-10 11:14:59,558 INFO (Thread-11)
MetadataMapReduceJob$MetadatMapper - Mapper for table
DayAheadHourlyLMP configured
2009-02-10 11:14:59,561 INFO (Thread-11)
MetadataMapReduceJob$MetadatMapper - Mapper for table
DayAheadHourlyLMP closed
2009-02-10 11:14:59,561 INFO (Thread-11) MapTask - Starting flush of
map output
2009-02-10 11:14:59,612 INFO (Thread-11) MapTask - Index: (0, 2, 6)
2009-02-10 11:14:59,613 INFO (Thread-11) TaskRunner -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting
2009-02-10 11:14:59,616 INFO (Thread-11) LocalJobRunner -
2009-02-10 11:14:59,616 INFO (Thread-11) TaskRunner - Task
'attempt_local_0001_m_000000_0' done.
2009-02-10 11:14:59,648 INFO (Thread-11)
MetadataMapReduceJob$MetadataReducer - Metadata Reducer is configured
2009-02-10 11:14:59,663 INFO (Thread-11) Merger - Merging 1 sorted segments
2009-02-10 11:14:59,666 INFO (Thread-11) Merger - Down to the last
merge-pass, with 0 segments left of total size: 0 bytes
2009-02-10 11:14:59,708 INFO (Thread-11) TaskRunner -
Task:attempt_local_0001_r_000000_0 is done. And is in the process of
commiting
2009-02-10 11:14:59,709 INFO (Thread-11) LocalJobRunner - reduce > reduce
2009-02-10 11:14:59,710 INFO (Thread-11) TaskRunner - Task
'attempt_local_0001_r_000000_0' done.
2009-02-10 11:14:59,866 INFO (main) JobClient - Job complete: job_local_0001
2009-02-10 11:14:59,867 INFO (main) JobClient - Counters: 11
2009-02-10 11:14:59,867 INFO (main) JobClient - File Systems
2009-02-10 11:14:59,868 INFO (main) JobClient - Local bytes read=38422
2009-02-10 11:14:59,868 INFO (main) JobClient - Local bytes written=77300
2009-02-10 11:14:59,868 INFO (main) JobClient - Map-Reduce Framework
2009-02-10 11:14:59,868 INFO (main) JobClient - Reduce input groups=0
2009-02-10 11:14:59,868 INFO (main) JobClient - Combine output records=0
2009-02-10 11:14:59,868 INFO (main) JobClient - Map input records=0
2009-02-10 11:14:59,868 INFO (main) JobClient - Reduce output records=0
2009-02-10 11:14:59,869 INFO (main) JobClient - Map output bytes=0
2009-02-10 11:14:59,869 INFO (main) JobClient - Map input bytes=0
2009-02-10 11:14:59,869 INFO (main) JobClient - Combine input records=0
2009-02-10 11:14:59,869 INFO (main) JobClient - Map output records=0
2009-02-10 11:14:59,869 INFO (main) JobClient - Reduce input records=0
Re: help w/ table mapreduce job
Posted by Tom Nichols <tm...@gmail.com>.
I think I figured it out. It was actually the column filter I was
using ("LMP:* instead of "LMP:") so it was passing 0 rows to the
Mapper. There were a couple other errors as well but I worked through
them fairly quickly after that. Thanks!
2009/2/11 stack <st...@duboce.net>:
> There's a lot of zeros in your job report Tom and your log.trace doesn't
> seem to be emitting records that made it into the map. Why you use
> log.trace? Is that a possibility on log4j -- or are you using a different
> logger? Add logging to TableInputFormat? To the split function and to its
> gets from the table?
>
> St.Ack
>
>
>
>
> On Tue, Feb 10, 2009 at 8:17 AM, Tom Nichols <tm...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm trying to write a M/R job to do the following:
>>
>> - Scan a given table and collect unique column names
>> - Write those column names to another table with the source table name
>> as the row key, and columns in the format (Cols:<sourceColName>,
>> <sourceColName)
>>
>> I'm not sure what I'm doing wrong. It appears that the mapper is
>> configured and then immediately closed w/o scanning any rows. I've
>> verified there is data in my source table. I don't see the reduce
>> task run at all. Attached is the code and below is the log output
>> from running the task. Thanks in advance for any pointers.
>>
>> 2009-02-10 11:14:55,087 INFO (main) JvmMetrics - Initializing JVM
>> Metrics with processName=JobTracker, sessionId=
>> 2009-02-10 11:14:55,151 WARN (main) JobClient - No job jar file set.
>> User classes may not be found. See JobConf(Class) or
>> JobConf#setJar(String).
>> 2009-02-10 11:14:55,610 INFO (main) TableInputFormatBase - split:
>> 0->localhost:,
>> 2009-02-10 11:14:55,840 INFO (main) JobClient - Running job:
>> job_local_0001
>> 2009-02-10 11:14:55,890 INFO (Thread-11) TableInputFormatBase -
>> split: 0->localhost:,
>> 2009-02-10 11:14:55,931 INFO (Thread-11) MapTask - numReduceTasks: 1
>> 2009-02-10 11:14:55,957 INFO (Thread-11) MapTask - io.sort.mb = 100
>> 2009-02-10 11:14:56,239 INFO (Thread-11) MapTask - data buffer =
>> 79691776/99614720
>> 2009-02-10 11:14:56,239 INFO (Thread-11) MapTask - record buffer =
>> 262144/327680
>> 2009-02-10 11:14:56,863 INFO (main) JobClient - map 0% reduce 0%
>> 2009-02-10 11:14:59,558 INFO (Thread-11)
>> MetadataMapReduceJob$MetadatMapper - Mapper for table
>> DayAheadHourlyLMP configured
>> 2009-02-10 11:14:59,561 INFO (Thread-11)
>> MetadataMapReduceJob$MetadatMapper - Mapper for table
>> DayAheadHourlyLMP closed
>> 2009-02-10 11:14:59,561 INFO (Thread-11) MapTask - Starting flush of
>> map output
>> 2009-02-10 11:14:59,612 INFO (Thread-11) MapTask - Index: (0, 2, 6)
>> 2009-02-10 11:14:59,613 INFO (Thread-11) TaskRunner -
>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
>> commiting
>> 2009-02-10 11:14:59,616 INFO (Thread-11) LocalJobRunner -
>> 2009-02-10 11:14:59,616 INFO (Thread-11) TaskRunner - Task
>> 'attempt_local_0001_m_000000_0' done.
>> 2009-02-10 11:14:59,648 INFO (Thread-11)
>> MetadataMapReduceJob$MetadataReducer - Metadata Reducer is configured
>> 2009-02-10 11:14:59,663 INFO (Thread-11) Merger - Merging 1 sorted
>> segments
>> 2009-02-10 11:14:59,666 INFO (Thread-11) Merger - Down to the last
>> merge-pass, with 0 segments left of total size: 0 bytes
>> 2009-02-10 11:14:59,708 INFO (Thread-11) TaskRunner -
>> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
>> commiting
>> 2009-02-10 11:14:59,709 INFO (Thread-11) LocalJobRunner - reduce > reduce
>> 2009-02-10 11:14:59,710 INFO (Thread-11) TaskRunner - Task
>> 'attempt_local_0001_r_000000_0' done.
>> 2009-02-10 11:14:59,866 INFO (main) JobClient - Job complete:
>> job_local_0001
>> 2009-02-10 11:14:59,867 INFO (main) JobClient - Counters: 11
>> 2009-02-10 11:14:59,867 INFO (main) JobClient - File Systems
>> 2009-02-10 11:14:59,868 INFO (main) JobClient - Local bytes
>> read=38422
>> 2009-02-10 11:14:59,868 INFO (main) JobClient - Local bytes
>> written=77300
>> 2009-02-10 11:14:59,868 INFO (main) JobClient - Map-Reduce Framework
>> 2009-02-10 11:14:59,868 INFO (main) JobClient - Reduce input groups=0
>> 2009-02-10 11:14:59,868 INFO (main) JobClient - Combine output
>> records=0
>> 2009-02-10 11:14:59,868 INFO (main) JobClient - Map input records=0
>> 2009-02-10 11:14:59,868 INFO (main) JobClient - Reduce output
>> records=0
>> 2009-02-10 11:14:59,869 INFO (main) JobClient - Map output bytes=0
>> 2009-02-10 11:14:59,869 INFO (main) JobClient - Map input bytes=0
>> 2009-02-10 11:14:59,869 INFO (main) JobClient - Combine input
>> records=0
>> 2009-02-10 11:14:59,869 INFO (main) JobClient - Map output records=0
>> 2009-02-10 11:14:59,869 INFO (main) JobClient - Reduce input
>> records=0
>>
>
Re: help w/ table mapreduce job
Posted by stack <st...@duboce.net>.
There's a lot of zeros in your job report Tom and your log.trace doesn't
seem to be emitting records that made it into the map. Why you use
log.trace? Is that a possibility on log4j -- or are you using a different
logger? Add logging to TableInputFormat? To the split function and to its
gets from the table?
St.Ack
On Tue, Feb 10, 2009 at 8:17 AM, Tom Nichols <tm...@gmail.com> wrote:
> Hi,
>
> I'm trying to write a M/R job to do the following:
>
> - Scan a given table and collect unique column names
> - Write those column names to another table with the source table name
> as the row key, and columns in the format (Cols:<sourceColName>,
> <sourceColName)
>
> I'm not sure what I'm doing wrong. It appears that the mapper is
> configured and then immediately closed w/o scanning any rows. I've
> verified there is data in my source table. I don't see the reduce
> task run at all. Attached is the code and below is the log output
> from running the task. Thanks in advance for any pointers.
>
> 2009-02-10 11:14:55,087 INFO (main) JvmMetrics - Initializing JVM
> Metrics with processName=JobTracker, sessionId=
> 2009-02-10 11:14:55,151 WARN (main) JobClient - No job jar file set.
> User classes may not be found. See JobConf(Class) or
> JobConf#setJar(String).
> 2009-02-10 11:14:55,610 INFO (main) TableInputFormatBase - split:
> 0->localhost:,
> 2009-02-10 11:14:55,840 INFO (main) JobClient - Running job:
> job_local_0001
> 2009-02-10 11:14:55,890 INFO (Thread-11) TableInputFormatBase -
> split: 0->localhost:,
> 2009-02-10 11:14:55,931 INFO (Thread-11) MapTask - numReduceTasks: 1
> 2009-02-10 11:14:55,957 INFO (Thread-11) MapTask - io.sort.mb = 100
> 2009-02-10 11:14:56,239 INFO (Thread-11) MapTask - data buffer =
> 79691776/99614720
> 2009-02-10 11:14:56,239 INFO (Thread-11) MapTask - record buffer =
> 262144/327680
> 2009-02-10 11:14:56,863 INFO (main) JobClient - map 0% reduce 0%
> 2009-02-10 11:14:59,558 INFO (Thread-11)
> MetadataMapReduceJob$MetadatMapper - Mapper for table
> DayAheadHourlyLMP configured
> 2009-02-10 11:14:59,561 INFO (Thread-11)
> MetadataMapReduceJob$MetadatMapper - Mapper for table
> DayAheadHourlyLMP closed
> 2009-02-10 11:14:59,561 INFO (Thread-11) MapTask - Starting flush of
> map output
> 2009-02-10 11:14:59,612 INFO (Thread-11) MapTask - Index: (0, 2, 6)
> 2009-02-10 11:14:59,613 INFO (Thread-11) TaskRunner -
> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> commiting
> 2009-02-10 11:14:59,616 INFO (Thread-11) LocalJobRunner -
> 2009-02-10 11:14:59,616 INFO (Thread-11) TaskRunner - Task
> 'attempt_local_0001_m_000000_0' done.
> 2009-02-10 11:14:59,648 INFO (Thread-11)
> MetadataMapReduceJob$MetadataReducer - Metadata Reducer is configured
> 2009-02-10 11:14:59,663 INFO (Thread-11) Merger - Merging 1 sorted
> segments
> 2009-02-10 11:14:59,666 INFO (Thread-11) Merger - Down to the last
> merge-pass, with 0 segments left of total size: 0 bytes
> 2009-02-10 11:14:59,708 INFO (Thread-11) TaskRunner -
> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
> commiting
> 2009-02-10 11:14:59,709 INFO (Thread-11) LocalJobRunner - reduce > reduce
> 2009-02-10 11:14:59,710 INFO (Thread-11) TaskRunner - Task
> 'attempt_local_0001_r_000000_0' done.
> 2009-02-10 11:14:59,866 INFO (main) JobClient - Job complete:
> job_local_0001
> 2009-02-10 11:14:59,867 INFO (main) JobClient - Counters: 11
> 2009-02-10 11:14:59,867 INFO (main) JobClient - File Systems
> 2009-02-10 11:14:59,868 INFO (main) JobClient - Local bytes
> read=38422
> 2009-02-10 11:14:59,868 INFO (main) JobClient - Local bytes
> written=77300
> 2009-02-10 11:14:59,868 INFO (main) JobClient - Map-Reduce Framework
> 2009-02-10 11:14:59,868 INFO (main) JobClient - Reduce input groups=0
> 2009-02-10 11:14:59,868 INFO (main) JobClient - Combine output
> records=0
> 2009-02-10 11:14:59,868 INFO (main) JobClient - Map input records=0
> 2009-02-10 11:14:59,868 INFO (main) JobClient - Reduce output
> records=0
> 2009-02-10 11:14:59,869 INFO (main) JobClient - Map output bytes=0
> 2009-02-10 11:14:59,869 INFO (main) JobClient - Map input bytes=0
> 2009-02-10 11:14:59,869 INFO (main) JobClient - Combine input
> records=0
> 2009-02-10 11:14:59,869 INFO (main) JobClient - Map output records=0
> 2009-02-10 11:14:59,869 INFO (main) JobClient - Reduce input
> records=0
>