You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Gang Luo <lg...@yahoo.com.cn> on 2010/08/05 18:36:36 UTC

centralized record reader in new API

Hi all,
to create a RecordReader in new API, we needs a TaskAttemptContext object, which 
seems to me the RecordReader should only be created on each split that has been 
assigned a task ID. However, I want to do a centralized sampling and create 
record reader on some splits before the job is submitted. What I am doing is 
create a dummy TaskAttemptContext and use it to create record reader, but not 
sure whether there is some side-effects. Is there any better way to do this? Why 
we are not supposed to create record reader centrally as indicated by the new 
API?

Thanks,
-Gang