You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by dboychuck <db...@build.com> on 2015/01/21 00:19:11 UTC

Solr DIH using JDBC with TIKA

I'm trying to index certain data from a table and documents located on disk
using jdbc and tika. I can derive the file locations from the table and
using that data I want to also import documents into Solr. However I'm
having trouble with my configuration.

<dataConfig>
<dataSource type="JdbcDataSource"
  	name="db"
    jndiName="java:comp/env/jdbc/BuildDB"
  />
  
  <dataSource name="data" type="BinURLDataSource" />
  <dataSource  name="dataUrl" type="BinURLDataSource"/>
  <document>
    <entity
      name="productDocument"
      onError="skip"
      datsource="db"
      query="SELECT pa.prdAttachmentID id, pa.productId, pa.manufacturer,
pa.fileName, pa.attachmentType, pa.displayName,
		lower('/mnt/shares/nasdev/mediabase/specifications/' + pa.manufacturer +
'/' + CAST(pm.productid_manufacturer_id AS VARCHAR(50))) basePath,
pa.fileName
		FROM mmc.dbo.product_attachments pa WITH (NOLOCK) 
			INNER JOIN mmc.dbo.productid_manufacturer pm WITH (NOLOCK) ON
pa.productId = pm.productid and pa.manufacturer = pm.manufacturer
		WHERE pa.productid = '3551LF'"
    >
      <field column="id" name="id"/>
      <field column="productCompositeid" name="productCompositeid"/>
      <field column="productid" name="productid"/>
      <field column="manufacturer" name="manufacturer"/>
      <field column="filename" name="filename"/>
      <field column="displayname" name="displayname"/>
      <field column="attachmentType" type="text" indexed="true"
stored="true" />

      <entity name="f" processor="FileListEntityProcessor"
baseDir="${productDocument.basePath}" fileName="${productDocument.filename}"
dataSource="data" onError="skip">
	      
	      <entity name="extract" processor="TikaEntityProcessor"
url="${f.fileAbsolutePath}" >
	          <field column="title" meta="true" name="author"/>
	          <field column="text" name="text"/>
	      </entity>
	  </entity>
    </entity>
  </document>
</dataConfig>


The error is as follows:
32367126 [Thread-1180] ERROR org.apache.solr.handler.dataimport.DocBuilder 
? Exception while processing: productDocument document :
SolrInputDocument(fields: [id=395623, manufacturer=Delta,
filename=delta_3551lf_parts_1027.pdf, displayname=Parts Breakdown,
attachmentType=ExplodedParts,
productid=3551LF]):org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query:
/mnt/shares/nasdev/mediabase/specifications/delta/181075/delta_3551lf_spec_1027.pdf
Processing Document # 1
	at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
	at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:281)
	at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:238)
	at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:42)
	at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:112)
	at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:477)
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
	at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:331)
	at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:239)
	at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
	at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
	at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:464)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax
near '/'.
	at
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:216)
	at
com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1515)
	at
com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:792)
	at
com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:689)
	at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696)
	at
com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715)
	at
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180)
	at
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155)
	at
com.microsoft.sqlserver.jdbc.SQLServerStatement.execute(SQLServerStatement.java:662)
	at
org.apache.tomcat.dbcp.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
	at
org.apache.tomcat.dbcp.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
	at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:274)
	... 13 more



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-DIH-using-JDBC-with-TIKA-tp4180737.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr DIH using JDBC with TIKA

Posted by ANNAMANENI RAVEENDRA <a....@gmail.com>.
Yes it can be local directory
File:///full path


On Tue, 4 Jul 2017 at 10:25 PM, d0ct0r4r6a <ar...@gmail.com> wrote:

> For the URL param in the "extract" entity, can it be a local directory? If
> yes, how do you specify the path?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-DIH-using-JDBC-with-TIKA-tp4180737p4344273.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr DIH using JDBC with TIKA

Posted by d0ct0r4r6a <ar...@gmail.com>.
For the URL param in the "extract" entity, can it be a local directory? If
yes, how do you specify the path?



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-DIH-using-JDBC-with-TIKA-tp4180737p4344273.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr DIH using JDBC with TIKA

Posted by dboychuck <db...@build.com>.
Got it working with the updated config:

<dataConfig>
<dataSource type="JdbcDataSource"
  	name="db"
    jndiName="java:comp/env/jdbc/BuildDB"
  />
  
  <dataSource name="bin" type="BinFileDataSource" />
  <document>
    <entity
      name="productDocument"
      onError="skip"
      datsource="db"
      query="SELECT pa.prdAttachmentID id, pa.productId, pa.manufacturer,
pa.fileName, pa.attachmentType, pa.displayName,
		lower('/mnt/shares/nasdev/mediabase/specifications/' + pa.manufacturer +
'/' + CAST(pm.productid_manufacturer_id AS VARCHAR(50)) + '/' + pa.fileName)
URL
		FROM mmc.dbo.product_attachments pa WITH (NOLOCK) 
			INNER JOIN mmc.dbo.productid_manufacturer pm WITH (NOLOCK) ON
pa.productId = pm.productid and pa.manufacturer = pm.manufacturer
		WHERE pa.productid = '3551LF'"
    >
      <field column="id" name="id"/>
      <field column="productCompositeid" name="productCompositeid"/>
      <field column="productid" name="productid"/>
      <field column="manufacturer" name="manufacturer"/>
      <field column="filename" name="filename"/>
      <field column="displayname" name="displayname"/>
      <field column="attachmentType" type="text" indexed="true"
stored="true" />

      
      <entity name="extract" dataSource="bin"
processor="TikaEntityProcessor" url="${productDocument.URL}" format="text">
          <field column="title" meta="true" name="title"/>
          <field column="text" name="text"/>
      </entity>
    </entity>
  </document>
</dataConfig>



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-DIH-using-JDBC-with-TIKA-tp4180737p4180742.html
Sent from the Solr - User mailing list archive at Nabble.com.