You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Gang Luo <lg...@yahoo.com.cn> on 2010/08/05 18:36:36 UTC
centralized record reader in new API
Hi all,
to create a RecordReader in new API, we needs a TaskAttemptContext object, which
seems to me the RecordReader should only be created on each split that has been
assigned a task ID. However, I want to do a centralized sampling and create
record reader on some splits before the job is submitted. What I am doing is
create a dummy TaskAttemptContext and use it to create record reader, but not
sure whether there is some side-effects. Is there any better way to do this? Why
we are not supposed to create record reader centrally as indicated by the new
API?
Thanks,
-Gang