You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Tom Hirschfeld <to...@gmail.com> on 2018/05/11 02:19:00 UTC

EOF exception from ramDirectory search in spark

Hey All,
I have a fun issue I'm dealing with at the junction of lucene and spark.

I have an RDD[(key, iterator1, iterator2)]

I run a mapPartitions on the RDD, and for each partition, I create a
ramDirectory, I index all of the elements in interator1, and then I search
the index for each element in iterator2. The issue that I am having is all
of my searches on the ramDirectory fail with an "EOF exception" Here is an
example of one of the EOF exceptions:

java.lang.RuntimeException: java.io.EOFException: seek beyond EOF:
pos=69377 vs length=53924:
RAMInputStream(name=RAMInputStream(name=_1bjl_Lucene54_0.dvd)
[slice=randomaccess]), java.lang.RuntimeException: java.io.EOFException:
seek beyond EOF: pos=98833 vs length=48835:



To recap: each executor loops through, create a ram directory, writes to
it, and then reads from it.


I have been trying for the past few days to address this issue but I have
been unable to find out whats going on. Any hint as to what might be
happening here?

Best,
Tom Hirschfeld

Re: EOF exception from ramDirectory search in spark

Posted by Tom Hirschfeld <to...@gmail.com>.

Here is an example of a full stack trace

java.lang.RuntimeException: java.io.EOFException: seek beyond EOF:
pos=3080604 vs length=533151:
RAMInputStream(name=RAMInputStream(name=_cw4e_Lucene54_0.dvd)
[slice=randomaccess])
	at org.apache.lucene.util.packed.DirectReader$DirectPackedReader48.get(DirectReader.java:307)
	at org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer$2.get(Lucene54DocValuesProducer.java:501)
	at org.apache.lucene.util.LongValues.get(LongValues.java:45)
	at com.uber.pindrop.lib.lucene.index.IndexSearcher.LuceneIndex.getNumericDocValues(LuceneIndex.java:318)
	at com.uber.pindrop.lib.lucene.index.IndexSearcher.LuceneIndex.extractPlacesFromDocs(LuceneIndex.java:403)
	at com.uber.pindrop.lib.lucene.index.IndexSearcher.LuceneIndex.search(LuceneIndex.java:185)
	at com.uber.pindrop.spark.SessionEvaluation.SessionEvaluation$$anonfun$9$$anonfun$11.apply(SessionEvaluation.scala:334)
	at com.uber.pindrop.spark.SessionEvaluation.SessionEvaluation$$anonfun$9$$anonfun$11.apply(SessionEvaluation.scala:324)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
	at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
	at com.uber.pindrop.spark.SessionEvaluation.SessionEvaluation$$anonfun$9.apply(SessionEvaluation.scala:324)
	at com.uber.pindrop.spark.SessionEvaluation.SessionEvaluation$$anonfun$9.apply(SessionEvaluation.scala:306)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:789)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:789)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:332)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:330)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:958)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:949)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:889)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:949)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException: seek beyond EOF: pos=3080604 vs
length=533151: RAMInputStream(name=RAMInputStream(name=_cw4e_Lucene54_0.dvd)
[slice=randomaccess])
	at org.apache.lucene.store.RAMInputStream.seek(RAMInputStream.java:109)
	at org.apache.lucene.store.RAMInputStream$1.seek(RAMInputStream.java:150)
	at org.apache.lucene.store.IndexInput$1.readLong(IndexInput.java:149)
	at org.apache.lucene.util.packed.DirectReader$DirectPackedReader48.get(DirectReader.java:305)
	... 35 more



On Fri, May 11, 2018 at 1:15 AM, Adrien Grand <jp...@gmail.com> wrote:

> Can you share the full stack trace?
>
> Le ven. 11 mai 2018 à 04:19, Tom Hirschfeld <to...@gmail.com> a
> écrit :
>
> > Hey All,
> > I have a fun issue I'm dealing with at the junction of lucene and spark.
> >
> > I have an RDD[(key, iterator1, iterator2)]
> >
> > I run a mapPartitions on the RDD, and for each partition, I create a
> > ramDirectory, I index all of the elements in interator1, and then I
> search
> > the index for each element in iterator2. The issue that I am having is
> all
> > of my searches on the ramDirectory fail with an "EOF exception" Here is
> an
> > example of one of the EOF exceptions:
> >
> > java.lang.RuntimeException: java.io.EOFException: seek beyond EOF:
> > pos=69377 vs length=53924:
> > RAMInputStream(name=RAMInputStream(name=_1bjl_Lucene54_0.dvd)
> > [slice=randomaccess]), java.lang.RuntimeException: java.io.EOFException:
> > seek beyond EOF: pos=98833 vs length=48835:
> >
> >
> >
> > To recap: each executor loops through, create a ram directory, writes to
> > it, and then reads from it.
> >
> >
> > I have been trying for the past few days to address this issue but I have
> > been unable to find out whats going on. Any hint as to what might be
> > happening here?
> >
> > Best,
> > Tom Hirschfeld
> >
>

Re: EOF exception from ramDirectory search in spark

Posted by Adrien Grand <jp...@gmail.com>.

Can you share the full stack trace?

Le ven. 11 mai 2018 à 04:19, Tom Hirschfeld <to...@gmail.com> a
écrit :

> Hey All,
> I have a fun issue I'm dealing with at the junction of lucene and spark.
>
> I have an RDD[(key, iterator1, iterator2)]
>
> I run a mapPartitions on the RDD, and for each partition, I create a
> ramDirectory, I index all of the elements in interator1, and then I search
> the index for each element in iterator2. The issue that I am having is all
> of my searches on the ramDirectory fail with an "EOF exception" Here is an
> example of one of the EOF exceptions:
>
> java.lang.RuntimeException: java.io.EOFException: seek beyond EOF:
> pos=69377 vs length=53924:
> RAMInputStream(name=RAMInputStream(name=_1bjl_Lucene54_0.dvd)
> [slice=randomaccess]), java.lang.RuntimeException: java.io.EOFException:
> seek beyond EOF: pos=98833 vs length=48835:
>
>
>
> To recap: each executor loops through, create a ram directory, writes to
> it, and then reads from it.
>
>
> I have been trying for the past few days to address this issue but I have
> been unable to find out whats going on. Any hint as to what might be
> happening here?
>
> Best,
> Tom Hirschfeld
>