You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ecos <ec...@gmail.com> on 2017/05/02 04:48:23 UTC

IndexFormatTooNewException - MapReduceIndexerTool for PDF files

Hi I'm getting the following error when trying to index PDF documents using
the MapReduceIndexerTool in Cloudera:

<http://lucene.472066.n3.nabble.com/file/n4332881/Screenshot_from_2017-04-28_21-39-13.png> 

The cause of the error is:
org.apache.lucene.index.IndexFormatTooNewException: Format version is not
supported (resource: BufferedChecksumIndexInput (segments_1)): 4 (needs to
be between 0 and 3).

Reading out there I found the exception is thrown when Lucene detects an
index that is newer that the Lucene version.

My configuration is:
SOLR: 4.10.3
Cloudera: 5.8.0
Hadoop: 2.6.0



In order to index I´m following the tutorial:  </a>
<https://www.cloudera.com/documentation/enterprise/5-8-x/topics/search_batch_index_use_mapreduce.html> 

Using the following hadoop command:
hadoop jar /usr/lib/solr/contrib/mr/search-mr-*-job.jar \
org.apache.solr.hadoop.MapReduceIndexerTool \
-D mapreduce.job.maps=1 \
-D mapreduce.job.reduces=1 \
-D dfs.replication=1 \
--morphline-file /root/$COLLECTION/conf/pdf_morphlines.conf \
--output-dir hdfs://localhost:8020/user/$USER/outdir --verbose \
--solr-home-dir $HOME/$COLLECTION --shards 1 \
hdfs://localhost:8020/user/$USER/indir

The morphlines file:
pdf_morphlines.conf
<http://lucene.472066.n3.nabble.com/file/n4332881/pdf_morphlines.conf>  

And the schema file:
schema.xml <http://lucene.472066.n3.nabble.com/file/n4332881/schema.xml>  

Thank you.





--
View this message in context: http://lucene.472066.n3.nabble.com/IndexFormatTooNewException-MapReduceIndexerTool-for-PDF-files-tp4332881.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: IndexFormatTooNewException - MapReduceIndexerTool for PDF files

Posted by ravi432 <a....@gmail.com>.
Hi ecos,

Is it giving solr documents when running mapreduce indexer tool with debug
mode.
if not can you run it with debug mode and send out any error.
 



--
View this message in context: http://lucene.472066.n3.nabble.com/IndexFormatTooNewException-MapReduceIndexerTool-for-PDF-files-tp4332881p4332935.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: IndexFormatTooNewException - MapReduceIndexerTool for PDF files

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/1/2017 10:48 PM, ecos wrote:
> The cause of the error is:
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not
> supported (resource: BufferedChecksumIndexInput (segments_1)): 4 (needs to
> be between 0 and 3).
>
> Reading out there I found the exception is thrown when Lucene detects an
> index that is newer that the Lucene version.
<snip>
> In order to index I´m following the tutorial:  </a>
> <https://www.cloudera.com/documentation/enterprise/5-8-x/topics/search_batch_index_use_mapreduce.html> 

Normally my guess would be that the Lucene/Solr version in the mapreduce
tool is at least 5.0, which would produce an index that Solr 4.10.3
cannot read.

I have located information saying that the Cloudera 5.8 version includes
Solr 4.10.3, which makes me wonder if maybe the Solr version that is
trying to read the index is perhaps older than 4.10.3.

You might need to go to Cloudera for support on this problem, because
you are running their software.

Thanks,
Shawn