You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by dboychuck <db...@build.com> on 2015/01/21 00:19:11 UTC
Solr DIH using JDBC with TIKA
I'm trying to index certain data from a table and documents located on disk
using jdbc and tika. I can derive the file locations from the table and
using that data I want to also import documents into Solr. However I'm
having trouble with my configuration.
<dataConfig>
<dataSource type="JdbcDataSource"
name="db"
jndiName="java:comp/env/jdbc/BuildDB"
/>
<dataSource name="data" type="BinURLDataSource" />
<dataSource name="dataUrl" type="BinURLDataSource"/>
<document>
<entity
name="productDocument"
onError="skip"
datsource="db"
query="SELECT pa.prdAttachmentID id, pa.productId, pa.manufacturer,
pa.fileName, pa.attachmentType, pa.displayName,
lower('/mnt/shares/nasdev/mediabase/specifications/' + pa.manufacturer +
'/' + CAST(pm.productid_manufacturer_id AS VARCHAR(50))) basePath,
pa.fileName
FROM mmc.dbo.product_attachments pa WITH (NOLOCK)
INNER JOIN mmc.dbo.productid_manufacturer pm WITH (NOLOCK) ON
pa.productId = pm.productid and pa.manufacturer = pm.manufacturer
WHERE pa.productid = '3551LF'"
>
<field column="id" name="id"/>
<field column="productCompositeid" name="productCompositeid"/>
<field column="productid" name="productid"/>
<field column="manufacturer" name="manufacturer"/>
<field column="filename" name="filename"/>
<field column="displayname" name="displayname"/>
<field column="attachmentType" type="text" indexed="true"
stored="true" />
<entity name="f" processor="FileListEntityProcessor"
baseDir="${productDocument.basePath}" fileName="${productDocument.filename}"
dataSource="data" onError="skip">
<entity name="extract" processor="TikaEntityProcessor"
url="${f.fileAbsolutePath}" >
<field column="title" meta="true" name="author"/>
<field column="text" name="text"/>
</entity>
</entity>
</entity>
</document>
</dataConfig>
The error is as follows:
32367126 [Thread-1180] ERROR org.apache.solr.handler.dataimport.DocBuilder
? Exception while processing: productDocument document :
SolrInputDocument(fields: [id=395623, manufacturer=Delta,
filename=delta_3551lf_parts_1027.pdf, displayname=Parts Breakdown,
attachmentType=ExplodedParts,
productid=3551LF]):org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query:
/mnt/shares/nasdev/mediabase/specifications/delta/181075/delta_3551lf_spec_1027.pdf
Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:281)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:238)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:42)
at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:112)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:477)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:331)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:239)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:464)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax
near '/'.
at
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1515)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:792)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:689)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.execute(SQLServerStatement.java:662)
at
org.apache.tomcat.dbcp.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
at
org.apache.tomcat.dbcp.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:274)
... 13 more
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-DIH-using-JDBC-with-TIKA-tp4180737.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr DIH using JDBC with TIKA
Posted by ANNAMANENI RAVEENDRA <a....@gmail.com>.
Yes it can be local directory
File:///full path
On Tue, 4 Jul 2017 at 10:25 PM, d0ct0r4r6a <ar...@gmail.com> wrote:
> For the URL param in the "extract" entity, can it be a local directory? If
> yes, how do you specify the path?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-DIH-using-JDBC-with-TIKA-tp4180737p4344273.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Re: Solr DIH using JDBC with TIKA
Posted by d0ct0r4r6a <ar...@gmail.com>.
For the URL param in the "extract" entity, can it be a local directory? If
yes, how do you specify the path?
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-DIH-using-JDBC-with-TIKA-tp4180737p4344273.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr DIH using JDBC with TIKA
Posted by dboychuck <db...@build.com>.
Got it working with the updated config:
<dataConfig>
<dataSource type="JdbcDataSource"
name="db"
jndiName="java:comp/env/jdbc/BuildDB"
/>
<dataSource name="bin" type="BinFileDataSource" />
<document>
<entity
name="productDocument"
onError="skip"
datsource="db"
query="SELECT pa.prdAttachmentID id, pa.productId, pa.manufacturer,
pa.fileName, pa.attachmentType, pa.displayName,
lower('/mnt/shares/nasdev/mediabase/specifications/' + pa.manufacturer +
'/' + CAST(pm.productid_manufacturer_id AS VARCHAR(50)) + '/' + pa.fileName)
URL
FROM mmc.dbo.product_attachments pa WITH (NOLOCK)
INNER JOIN mmc.dbo.productid_manufacturer pm WITH (NOLOCK) ON
pa.productId = pm.productid and pa.manufacturer = pm.manufacturer
WHERE pa.productid = '3551LF'"
>
<field column="id" name="id"/>
<field column="productCompositeid" name="productCompositeid"/>
<field column="productid" name="productid"/>
<field column="manufacturer" name="manufacturer"/>
<field column="filename" name="filename"/>
<field column="displayname" name="displayname"/>
<field column="attachmentType" type="text" indexed="true"
stored="true" />
<entity name="extract" dataSource="bin"
processor="TikaEntityProcessor" url="${productDocument.URL}" format="text">
<field column="title" meta="true" name="title"/>
<field column="text" name="text"/>
</entity>
</entity>
</document>
</dataConfig>
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-DIH-using-JDBC-with-TIKA-tp4180737p4180742.html
Sent from the Solr - User mailing list archive at Nabble.com.