You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Shah, Nirmal" <ns...@columnit.com> on 2010/03/03 05:36:09 UTC
DIH onError question
Hi all,
I am using Solr 1.5 from trunk. I am getting the below error on a full
load, and it is causing the import to fail and rollback. I am not
concerned about the error but rather that I cannot seem to tell the
indexing to continue. I have two entities, and I have tried all (4)
combinations of "skip" and "continue" for their onError attributes.
SEVERE: Exception while processing: f document : null
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:652)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:606)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
:261)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
5)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
r.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
:391)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
372)
Caused by: java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider
at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
)
at
org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
5)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
at
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
yProcessor.java:124)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
ProcessorWrapper.java:233)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:580)
... 6 more
Mar 2, 2010 10:21:05 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:652)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:606)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
:261)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
5)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
r.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
:391)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
372)
Caused by: java.lang.NoClassDefFoundError:
org/bouncycastle/jce/provider/BouncyCastleProvider
at
org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
)
at
org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
5)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
at
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
yProcessor.java:124)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
ProcessorWrapper.java:233)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:580)
... 6 more
Mar 2, 2010 10:21:05 PM org.apache.solr.update.DirectUpdateHandler2
rollback
INFO: start rollback
My data-config file:
<dataConfig>
<dataSource name="binaryFile" type="BinFileDataSource" />
<document>
<entity name="f" processor="FileListEntityProcessor"
transformer="RegexTransformer,TemplateTransformer" baseDir="C:\Docs"
fileName=".*pdf" recursive="true" rootEntity="false" pk="id"
dataSource="binaryFile" onError="skip">
<field column="id" sourceColName="fileAbsolutePath" regex="\\"
replaceWith="/" />
<entity dataSource="binaryFile" name="x"
processor="TikaEntityProcessor" url="${f.fileAbsolutePath}"
onError="continue" >
<field column="text" name="text" />
</entity>
</entity>
</document>
</dataConfig>
Thanks,
Nirmal
RE: DIH onError question
Posted by "Shah, Nirmal" <ns...@columnit.com>.
Thanks for your prompt reply. I resolved the ERROR, and used "continue" to bypass any EXCEPTIONS.
Nirmal Shah
Remedy Consultant|Column Technologies|Cell: (630) 244-1648
-----Original Message-----
From: Noble Paul നോബിള് नोब्ळ् [mailto:noble.paul@gmail.com]
Sent: Tuesday, March 02, 2010 11:13 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH onError question
onError only handles Exception (not Error or Throwable). I your case
it is a NoClassDefFoundError . If it is an Error or Throwable it is a
symptom of a larger problem. If you fix the NoClassDefFoundError it
should be ok
On Wed, Mar 3, 2010 at 10:06 AM, Shah, Nirmal <ns...@columnit.com> wrote:
> Hi all,
>
> I am using Solr 1.5 from trunk. I am getting the below error on a full
> load, and it is causing the import to fail and rollback. I am not
> concerned about the error but rather that I cannot seem to tell the
> indexing to continue. I have two entities, and I have tried all (4)
> combinations of "skip" and "continue" for their onError attributes.
>
> SEVERE: Exception while processing: f document : null
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:652)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:606)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> :261)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> 5)
> at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> r.java:333)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> :391)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
> 372)
> Caused by: java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
> at
> org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
> )
> at
> org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
> at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
> 5)
> at
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
> at
> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> yProcessor.java:124)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> ProcessorWrapper.java:233)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:580)
> ... 6 more
> Mar 2, 2010 10:21:05 PM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> SEVERE: Full Import failed
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:652)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:606)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> :261)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> 5)
> at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> r.java:333)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> :391)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
> 372)
> Caused by: java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
> at
> org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
> )
> at
> org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
> at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
> 5)
> at
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
> at
> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> yProcessor.java:124)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> ProcessorWrapper.java:233)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:580)
> ... 6 more
> Mar 2, 2010 10:21:05 PM org.apache.solr.update.DirectUpdateHandler2
> rollback
> INFO: start rollback
>
>
> My data-config file:
> <dataConfig>
> <dataSource name="binaryFile" type="BinFileDataSource" />
> <document>
> <entity name="f" processor="FileListEntityProcessor"
> transformer="RegexTransformer,TemplateTransformer" baseDir="C:\Docs"
> fileName=".*pdf" recursive="true" rootEntity="false" pk="id"
> dataSource="binaryFile" onError="skip">
> <field column="id" sourceColName="fileAbsolutePath" regex="\\"
> replaceWith="/" />
> <entity dataSource="binaryFile" name="x"
> processor="TikaEntityProcessor" url="${f.fileAbsolutePath}"
> onError="continue" >
> <field column="text" name="text" />
> </entity>
> </entity>
> </document>
> </dataConfig>
>
>
> Thanks,
> Nirmal
>
--
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com
Re: DIH onError question
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
onError only handles Exception (not Error or Throwable). I your case
it is a NoClassDefFoundError . If it is an Error or Throwable it is a
symptom of a larger problem. If you fix the NoClassDefFoundError it
should be ok
On Wed, Mar 3, 2010 at 10:06 AM, Shah, Nirmal <ns...@columnit.com> wrote:
> Hi all,
>
> I am using Solr 1.5 from trunk. I am getting the below error on a full
> load, and it is causing the import to fail and rollback. I am not
> concerned about the error but rather that I cannot seem to tell the
> indexing to continue. I have two entities, and I have tried all (4)
> combinations of "skip" and "continue" for their onError attributes.
>
> SEVERE: Exception while processing: f document : null
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:652)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:606)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> :261)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> 5)
> at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> r.java:333)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> :391)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
> 372)
> Caused by: java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
> at
> org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
> )
> at
> org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
> at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
> 5)
> at
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
> at
> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> yProcessor.java:124)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> ProcessorWrapper.java:233)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:580)
> ... 6 more
> Mar 2, 2010 10:21:05 PM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> SEVERE: Full Import failed
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:652)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:606)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> :261)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> 5)
> at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> r.java:333)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> :391)
> at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
> 372)
> Caused by: java.lang.NoClassDefFoundError:
> org/bouncycastle/jce/provider/BouncyCastleProvider
> at
> org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1108
> )
> at
> org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
> at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:23
> 5)
> at
> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
> at
> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> yProcessor.java:124)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> ProcessorWrapper.java:233)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:580)
> ... 6 more
> Mar 2, 2010 10:21:05 PM org.apache.solr.update.DirectUpdateHandler2
> rollback
> INFO: start rollback
>
>
> My data-config file:
> <dataConfig>
> <dataSource name="binaryFile" type="BinFileDataSource" />
> <document>
> <entity name="f" processor="FileListEntityProcessor"
> transformer="RegexTransformer,TemplateTransformer" baseDir="C:\Docs"
> fileName=".*pdf" recursive="true" rootEntity="false" pk="id"
> dataSource="binaryFile" onError="skip">
> <field column="id" sourceColName="fileAbsolutePath" regex="\\"
> replaceWith="/" />
> <entity dataSource="binaryFile" name="x"
> processor="TikaEntityProcessor" url="${f.fileAbsolutePath}"
> onError="continue" >
> <field column="text" name="text" />
> </entity>
> </entity>
> </document>
> </dataConfig>
>
>
> Thanks,
> Nirmal
>
--
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com