You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Saptarshi Guha <sa...@gmail.com> on 2009/03/02 05:07:09 UTC

RecordReader and non thread safe JNI libraries

Hello,
My RecordReader subclass reads from object X. To parse this object and
emit records, i need the use of a C library and a JNI wrapper.

	public boolean next(LongWritable key, BytesWritable value) throws IOException {
	    if (leftover == 0) return false;
	    long wi = pos + split.getStart();
	    key.set(wi);
	    value.readFields(X.at( wi);
	    pos ++; leftover --;
	    return true;
	}

X.at uses the JNI lib to read a record number wi

My question is who running this?
1) For a given job, is one instance of this running on each
tasktracker? reading records and feeding to the mappers on its
machine?
Or,
2) as I have mapred.tasktracker.map.tasks.maximum == 7, does each jvm
launched have one RecordReader running feeding records to the maps its
jvm is running.

If it's either (1) or (2), I guess I'm safe from threading issues.

Please correct me if i'm totally wrong.
Regards

Saptarshi Guha

Re: RecordReader and non thread safe JNI libraries

Posted by Aaron Kimball <aa...@cloudera.com>.
It's situation (2). Each map task gets its own JVM instance; this has its
own RecordReader and its own Mapper implementation. There's basically a loop
in each task jvm that says:

while (recordReader.hasNext()) {
  recordReader.getNext(k, v);
  myMapper.map(k, v, output, reporter);
}

If your mapper and the RR use the same library and tread on one another's
state, you're going to have undefined results.

- Aaron


On Sun, Mar 1, 2009 at 8:33 PM, Saptarshi Guha <sa...@gmail.com>wrote:

> Hello,
> I am quite confused and my email seems to prove it. My question is
> essentially, I need to use this non thread safe library in the Mapper,
> Reducer and RecordReader. assume, i do not create threads.
> Will I run into any thread safety issues?
>
> In a given JVM, the maps will run sequentially, so will the reduces,
> but will maps run alongside recorder reader?
>
> Hope this is clearer.
> Regards
>
>
> Saptarshi Guha
>
>
>
> On Sun, Mar 1, 2009 at 11:07 PM, Saptarshi Guha
> <sa...@gmail.com> wrote:
> > Hello,
> > My RecordReader subclass reads from object X. To parse this object and
> > emit records, i need the use of a C library and a JNI wrapper.
> >
> >        public boolean next(LongWritable key, BytesWritable value) throws
> IOException {
> >            if (leftover == 0) return false;
> >            long wi = pos + split.getStart();
> >            key.set(wi);
> >            value.readFields(X.at( wi);
> >            pos ++; leftover --;
> >            return true;
> >        }
> >
> > X.at uses the JNI lib to read a record number wi
> >
> > My question is who running this?
> > 1) For a given job, is one instance of this running on each
> > tasktracker? reading records and feeding to the mappers on its
> > machine?
> > Or,
> > 2) as I have mapred.tasktracker.map.tasks.maximum == 7, does each jvm
> > launched have one RecordReader running feeding records to the maps its
> > jvm is running.
> >
> > If it's either (1) or (2), I guess I'm safe from threading issues.
> >
> > Please correct me if i'm totally wrong.
> > Regards
> >
> > Saptarshi Guha
> >
>

Re: RecordReader and non thread safe JNI libraries

Posted by Saptarshi Guha <sa...@gmail.com>.
Hello,
I am quite confused and my email seems to prove it. My question is
essentially, I need to use this non thread safe library in the Mapper,
Reducer and RecordReader. assume, i do not create threads.
Will I run into any thread safety issues?

In a given JVM, the maps will run sequentially, so will the reduces,
but will maps run alongside recorder reader?

Hope this is clearer.
Regards


Saptarshi Guha



On Sun, Mar 1, 2009 at 11:07 PM, Saptarshi Guha
<sa...@gmail.com> wrote:
> Hello,
> My RecordReader subclass reads from object X. To parse this object and
> emit records, i need the use of a C library and a JNI wrapper.
>
>        public boolean next(LongWritable key, BytesWritable value) throws IOException {
>            if (leftover == 0) return false;
>            long wi = pos + split.getStart();
>            key.set(wi);
>            value.readFields(X.at( wi);
>            pos ++; leftover --;
>            return true;
>        }
>
> X.at uses the JNI lib to read a record number wi
>
> My question is who running this?
> 1) For a given job, is one instance of this running on each
> tasktracker? reading records and feeding to the mappers on its
> machine?
> Or,
> 2) as I have mapred.tasktracker.map.tasks.maximum == 7, does each jvm
> launched have one RecordReader running feeding records to the maps its
> jvm is running.
>
> If it's either (1) or (2), I guess I'm safe from threading issues.
>
> Please correct me if i'm totally wrong.
> Regards
>
> Saptarshi Guha
>