You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mattar, Marwan" <mm...@ea.com> on 2014/07/31 16:59:21 UTC

column-based mapreduce

Hi,

I'm new to Hive and was wondering if anyone knows why nextColumnsBatch() in RCFile.Reader has been deprecated? I'd like to write a MapReduce where the mapper receives an entire column at a time (more specifically, the key is the column ID and the value is the column values for that row batch). Writing a column-reader class (instead of the existing record reader RCFIleMapReduceRecordReader) that uses getColumn() and nextColumnsBatch() seems like the natural choice. Any pointers are appreciated (particularly if there is an existing class/utility that achieves this).

Thanks,
Marwan