You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Shah, Nirmal" <ns...@columnit.com> on 2010/03/03 05:36:09 UTC

DIH onError question

Hi all,

I am using Solr 1.5 from trunk.  I am getting the below error on a full
load, and it is causing the import to fail and rollback.  I am not
concerned about the error but rather that I cannot seem to tell the
indexing to continue.  I have two entities, and I have tried all (4)
combinations of "skip" and "continue" for their onError attributes.

SEVERE: Exception while processing: f document : null
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:652)
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:606)
	at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
:261)
	at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
5)
	at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
r.java:333)
	at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
:391)
	at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
372)
Caused by: java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider
	at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
)
	at
org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
	at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
5)
	at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
	at
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
	at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
	at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
	at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
yProcessor.java:124)
	at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
ProcessorWrapper.java:233)
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:580)
	... 6 more
Mar 2, 2010 10:21:05 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:652)
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:606)
	at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
:261)
	at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
5)
	at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
r.java:333)
	at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
:391)
	at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
372)
Caused by: java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider
	at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
)
	at
org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
	at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
5)
	at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
	at
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
	at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
	at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
	at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
yProcessor.java:124)
	at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
ProcessorWrapper.java:233)
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:580)
	... 6 more
Mar 2, 2010 10:21:05 PM org.apache.solr.update.DirectUpdateHandler2
rollback
INFO: start rollback


My data-config file:
<dataConfig>
  <dataSource name="binaryFile" type="BinFileDataSource" />
  <document>
    <entity name="f" processor="FileListEntityProcessor"
transformer="RegexTransformer,TemplateTransformer" baseDir="C:\Docs"
fileName=".*pdf" recursive="true"     	rootEntity="false" pk="id"
dataSource="binaryFile" onError="skip">
	<field column="id" sourceColName="fileAbsolutePath" regex="\\"
replaceWith="/" />
      <entity dataSource="binaryFile" name="x"
processor="TikaEntityProcessor" url="${f.fileAbsolutePath}"
onError="continue" >
        <field column="text" name="text" />
      </entity>
    </entity>
  </document>
</dataConfig>


Thanks,
Nirmal

RE: DIH onError question

Posted by "Shah, Nirmal" <ns...@columnit.com>.
Thanks for your prompt reply.  I resolved the ERROR, and used "continue" to bypass any EXCEPTIONS.

Nirmal Shah
Remedy Consultant|Column Technologies|Cell: (630) 244-1648


-----Original Message-----
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com] 
Sent: Tuesday, March 02, 2010 11:13 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH onError question

onError only handles Exception (not Error or Throwable). I your case
it is a NoClassDefFoundError . If it is an Error or Throwable it is a
symptom of a larger problem. If you fix the NoClassDefFoundError it
should be ok

On Wed, Mar 3, 2010 at 10:06 AM, Shah, Nirmal <ns...@columnit.com> wrote:
> Hi all,
>
> I am using Solr 1.5 from trunk.  I am getting the below error on a full
> load, and it is causing the import to fail and rollback.  I am not
> concerned about the error but rather that I cannot seem to tell the
> indexing to continue.  I have two entities, and I have tried all (4)
> combinations of "skip" and "continue" for their onError attributes.
>
> SEVERE: Exception while processing: f document : null
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:652)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:606)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> :261)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> 5)
>        at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> r.java:333)
>        at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> :391)
>        at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
> 372)
> Caused by: java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
>        at
> org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
> )
>        at
> org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
>        at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
> 5)
>        at
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
>        at
> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
>        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
>        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
>        at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
>        at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> yProcessor.java:124)
>        at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> ProcessorWrapper.java:233)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:580)
>        ... 6 more
> Mar 2, 2010 10:21:05 PM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> SEVERE: Full Import failed
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:652)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:606)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> :261)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> 5)
>        at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> r.java:333)
>        at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> :391)
>        at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
> 372)
> Caused by: java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
>        at
> org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
> )
>        at
> org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
>        at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
> 5)
>        at
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
>        at
> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
>        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
>        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
>        at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
>        at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> yProcessor.java:124)
>        at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> ProcessorWrapper.java:233)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:580)
>        ... 6 more
> Mar 2, 2010 10:21:05 PM org.apache.solr.update.DirectUpdateHandler2
> rollback
> INFO: start rollback
>
>
> My data-config file:
> <dataConfig>
>  <dataSource name="binaryFile" type="BinFileDataSource" />
>  <document>
>    <entity name="f" processor="FileListEntityProcessor"
> transformer="RegexTransformer,TemplateTransformer" baseDir="C:\Docs"
> fileName=".*pdf" recursive="true"       rootEntity="false" pk="id"
> dataSource="binaryFile" onError="skip">
>        <field column="id" sourceColName="fileAbsolutePath" regex="\\"
> replaceWith="/" />
>      <entity dataSource="binaryFile" name="x"
> processor="TikaEntityProcessor" url="${f.fileAbsolutePath}"
> onError="continue" >
>        <field column="text" name="text" />
>      </entity>
>    </entity>
>  </document>
> </dataConfig>
>
>
> Thanks,
> Nirmal
>



-- 
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DIH onError question

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
onError only handles Exception (not Error or Throwable). I your case
it is a NoClassDefFoundError . If it is an Error or Throwable it is a
symptom of a larger problem. If you fix the NoClassDefFoundError it
should be ok

On Wed, Mar 3, 2010 at 10:06 AM, Shah, Nirmal <ns...@columnit.com> wrote:
> Hi all,
>
> I am using Solr 1.5 from trunk.  I am getting the below error on a full
> load, and it is causing the import to fail and rollback.  I am not
> concerned about the error but rather that I cannot seem to tell the
> indexing to continue.  I have two entities, and I have tried all (4)
> combinations of "skip" and "continue" for their onError attributes.
>
> SEVERE: Exception while processing: f document : null
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:652)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:606)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> :261)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> 5)
>        at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> r.java:333)
>        at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> :391)
>        at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
> 372)
> Caused by: java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
>        at
> org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
> )
>        at
> org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
>        at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
> 5)
>        at
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
>        at
> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
>        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
>        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
>        at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
>        at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> yProcessor.java:124)
>        at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> ProcessorWrapper.java:233)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:580)
>        ... 6 more
> Mar 2, 2010 10:21:05 PM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> SEVERE: Full Import failed
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:652)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:606)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> :261)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> 5)
>        at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> r.java:333)
>        at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> :391)
>        at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
> 372)
> Caused by: java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
>        at
> org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
> )
>        at
> org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
>        at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
> 5)
>        at
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
>        at
> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
>        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
>        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
>        at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
>        at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> yProcessor.java:124)
>        at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> ProcessorWrapper.java:233)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:580)
>        ... 6 more
> Mar 2, 2010 10:21:05 PM org.apache.solr.update.DirectUpdateHandler2
> rollback
> INFO: start rollback
>
>
> My data-config file:
> <dataConfig>
>  <dataSource name="binaryFile" type="BinFileDataSource" />
>  <document>
>    <entity name="f" processor="FileListEntityProcessor"
> transformer="RegexTransformer,TemplateTransformer" baseDir="C:\Docs"
> fileName=".*pdf" recursive="true"       rootEntity="false" pk="id"
> dataSource="binaryFile" onError="skip">
>        <field column="id" sourceColName="fileAbsolutePath" regex="\\"
> replaceWith="/" />
>      <entity dataSource="binaryFile" name="x"
> processor="TikaEntityProcessor" url="${f.fileAbsolutePath}"
> onError="continue" >
>        <field column="text" name="text" />
>      </entity>
>    </entity>
>  </document>
> </dataConfig>
>
>
> Thanks,
> Nirmal
>



-- 
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com