You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by John Zeng <jo...@dataguise.com> on 2014/08/06 19:17:25 UTC

Key is null in map when OrcNewInputFormat is used as Input Format Class

Dear OrcNewInputFormat owner,

When using OrcNewInputFormat as input format class for my map reduce job, I find its key is always null in my map method. This gives me no way to get row number in my map method.  If you compare RCFileInputFormat (for RC file), its key in map method returns the row number so I know which row I am processing. 

Is there any workaround for me to get the row number from my map method?  Of course, I can count the row number by myself.  But that has two problems: #1 I have to assume the row is coming in the order; #2 I will get duplicated (and wrong) row numbers if a big input file causes multiple file splits (which will trigger my map method multiple times in different data nodes).   At this point, I am really seeking a better way to get row number for each processed row in map method.

Here is what I have in my map logs:

	[2014-08-06 09:39:25 DEBUG com.xxxx.hadoop.orcfile.OrcFileMap]: Mapper Input Key: (null)
	[2014-08-06 09:39:25 DEBUG com.xxxx.hadoop.orcfile.OrcFileMap]: Mapper Input Value: {Q81510000, T99760000, 699760000, 81567560000, 9667981610000, 978989898980000, Laura, Lauraxxx@gmail.com}

My map method is:

	protected void map(Object key, Writable value, Context context)
			throws IOException, InterruptedException {
		logger.debug("Mapper Input Key: " + key);
		logger.debug("Mapper Input Value: " + value.toString());
		.....
	}

Thanks

John

RE: Key is null in map when OrcNewInputFormat is used as Input Format Class

Posted by John Zeng <jo...@dataguise.com>.
FYI:  I have created following jira task:

https://issues.apache.org/jira/browse/HIVE-7853

-----Original Message-----
From: John Zeng [mailto:john.zeng@dataguise.com] 
Sent: Friday, August 8, 2014 10:33 AM
To: dev@hive.apache.org
Subject: RE: Key is null in map when OrcNewInputFormat is used as Input Format Class

Any update from anybody?  Should I file a bug?

Thanks

-----Original Message-----
From: John Zeng [mailto:john.zeng@dataguise.com] 
Sent: Wednesday, August 6, 2014 10:17 AM
To: dev@hive.apache.org
Subject: Key is null in map when OrcNewInputFormat is used as Input Format Class

Dear OrcNewInputFormat owner,

When using OrcNewInputFormat as input format class for my map reduce job, I find its key is always null in my map method. This gives me no way to get row number in my map method.  If you compare RCFileInputFormat (for RC file), its key in map method returns the row number so I know which row I am processing. 

Is there any workaround for me to get the row number from my map method?  Of course, I can count the row number by myself.  But that has two problems: #1 I have to assume the row is coming in the order; #2 I will get duplicated (and wrong) row numbers if a big input file causes multiple file splits (which will trigger my map method multiple times in different data nodes).   At this point, I am really seeking a better way to get row number for each processed row in map method.

Here is what I have in my map logs:

	[2014-08-06 09:39:25 DEBUG com.xxxx.hadoop.orcfile.OrcFileMap]: Mapper Input Key: (null)
	[2014-08-06 09:39:25 DEBUG com.xxxx.hadoop.orcfile.OrcFileMap]: Mapper Input Value: {Q81510000, T99760000, 699760000, 81567560000, 9667981610000, 978989898980000, Laura, Lauraxxx@gmail.com}

My map method is:

	protected void map(Object key, Writable value, Context context)
			throws IOException, InterruptedException {
		logger.debug("Mapper Input Key: " + key);
		logger.debug("Mapper Input Value: " + value.toString());
		.....
	}

Thanks

John

RE: Key is null in map when OrcNewInputFormat is used as Input Format Class

Posted by John Zeng <jo...@dataguise.com>.
Any update from anybody?  Should I file a bug?

Thanks

-----Original Message-----
From: John Zeng [mailto:john.zeng@dataguise.com] 
Sent: Wednesday, August 6, 2014 10:17 AM
To: dev@hive.apache.org
Subject: Key is null in map when OrcNewInputFormat is used as Input Format Class

Dear OrcNewInputFormat owner,

When using OrcNewInputFormat as input format class for my map reduce job, I find its key is always null in my map method. This gives me no way to get row number in my map method.  If you compare RCFileInputFormat (for RC file), its key in map method returns the row number so I know which row I am processing. 

Is there any workaround for me to get the row number from my map method?  Of course, I can count the row number by myself.  But that has two problems: #1 I have to assume the row is coming in the order; #2 I will get duplicated (and wrong) row numbers if a big input file causes multiple file splits (which will trigger my map method multiple times in different data nodes).   At this point, I am really seeking a better way to get row number for each processed row in map method.

Here is what I have in my map logs:

	[2014-08-06 09:39:25 DEBUG com.xxxx.hadoop.orcfile.OrcFileMap]: Mapper Input Key: (null)
	[2014-08-06 09:39:25 DEBUG com.xxxx.hadoop.orcfile.OrcFileMap]: Mapper Input Value: {Q81510000, T99760000, 699760000, 81567560000, 9667981610000, 978989898980000, Laura, Lauraxxx@gmail.com}

My map method is:

	protected void map(Object key, Writable value, Context context)
			throws IOException, InterruptedException {
		logger.debug("Mapper Input Key: " + key);
		logger.debug("Mapper Input Value: " + value.toString());
		.....
	}

Thanks

John