You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jorg Heymans <jo...@gmail.com> on 2010/02/04 16:57:35 UTC

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

Hi,

I'm having some troubles getting this to work on a snapshot from 3rd feb  My
config looks as follows

    <dataSource name="ora" driver="oracle.jdbc.OracleDriver" url="...." />
    <datasource name="orablob" type="FieldStreamDataSource" />
    <document name="mydoc">
        <entity dataSource="ora" name="meta" query="select id, filename,
bytes from documents" >
            <field column="ID" name="id" />
            <field column="FILENAME" name="filename" />
            <entity dataSource="orablob" processor="TikaEntityProcessor"
url="bytes" dataField="meta.BYTES">
              <field column="text" name="mainDocument"/>
            </entity>
         </entity>
     </document>

and i get this stacktrace

org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: bytes Processing Document # 1
        at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
        at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:253)
        at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
        at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
        at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98)

It seems that whatever is in the url attribute it is trying to execute as a
query. So i thought i put url="select bytes from documents where id =
${meta.ID}" but then i get a classcastexception.

Caused by: java.lang.ClassCastException:
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1
        at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:233)

Any ideas what is wrong with the config ?

Thanks
Jorg

2010/1/27 Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>

> There is no corresponding DataSurce which can be used with
> TikaEntityProcessor which reads from BLOB
> I have opened an issue.https://issues.apache.org/jira/browse/SOLR-1737
>
> On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal <ns...@columnit.com> wrote:
> > Hi,
> >
> >
> >
> > I am fairly new to Solr and would like to use the DIH to pull rich text
> > files (pdfs, etc) from BLOB fields in my database.
> >
> >
> >
> > There was a suggestion made to use the FieldReaderDataSource with the
> > recently commited TikaEntityProcessor.  Has anyone accomplished this?
> >
> > This is my configuration, and the resulting error - I'm not sure if I'm
> > using the FieldReaderDataSource correctly.  If anyone could shed light
> > on whether I am going the right direction or not, it would be
> > appreciated.
> >
> >
> >
> > ---------------Data-config.xml:
> >
> > <dataConfig>
> >
> >   <datasource name="f1" type="FieldReaderDataSource" />
> >
> >   <dataSource name="orcle" driver="oracle.jdbc.driver.OracleDriver"
> > url="jdbc:oracle:thin:un/pw@host:1521:sid" />
> >
> >      <document>
> >
> >      <entity dataSource="orcle" name="attach" query="select id as name,
> > attachment from testtable2">
> >
> >         <entity dataSource="f1" processor="TikaEntityProcessor"
> > dataField="attach.attachment" format="text">
> >
> >            <field column="text" name="NAME" />
> >
> >         </entity>
> >
> >      </entity>
> >
> >   </document>
> >
> > </dataConfig>
> >
> >
> >
> >
> >
> > -------------Debug error:
> >
> > <response>
> >
> > <lst name="responseHeader">
> >
> > <int name="status">0</int>
> >
> > <int name="QTime">203</int>
> >
> > </lst>
> >
> > <lst name="initArgs">
> >
> > <lst name="defaults">
> >
> > <str name="config">testdb-data-config.xml</str>
> >
> > </lst>
> >
> > </lst>
> >
> > <str name="command">full-import</str>
> >
> > <str name="mode">debug</str>
> >
> > <null name="documents"/>
> >
> > <lst name="verbose-output">
> >
> > <lst name="entity:attach">
> >
> > <lst name="document#1">
> >
> > <str name="query">select id as name, attachment from testtable2</str>
> >
> > <str name="time-taken">0:0:0.32</str>
> >
> > <str>----------- row #1-------------</str>
> >
> > <str name="NAME">java.math.BigDecimal:2</str>
> >
> > <str name="ATTACHMENT">oracle.sql.BLOB:oracle.sql.BLOB@1c8e807</str>
> >
> > <str>---------------------------------------------</str>
> >
> > <lst name="entity:253433571801723">
> >
> > <str name="EXCEPTION">
> >
> > org.apache.solr.handler.dataimport.DataImportHandlerException: No
> > dataSource :f1 available for entity :253433571801723 Processing Document
> > # 1
> >
> >                at
> > org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da
> > taImporter.java:279)
> >
> >                at
> > org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl
> > .java:93)
> >
> >                at
> > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> > yProcessor.java:97)
> >
> >                at
> > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> > ProcessorWrapper.java:237)
> >
> >                at
> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> > ava:357)
> >
> >                at
> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> > ava:383)
> >
> >                at
> > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> > :242)
> >
> >                at
> > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> > 0)
> >
> >                at
> > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> > r.java:331)
> >
> >                at
> > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> > :389)
> >
> >                at
> > org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D
> > ataImportHandler.java:203)
> >
> >                at
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
> > ase.java:131)
> >
> >                at
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> >
> >                at
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
> > va:338)
> >
> >                at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
> > ava:241)
> >
> >                at
> > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan
> > dler.java:1089)
> >
> >                at
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> >
> >                at
> > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2
> > 16)
> >
> >                at
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> >
> >                at
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> >
> >                at
> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> >
> >                at
> > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler
> > Collection.java:211)
> >
> >                at
> > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav
> > a:114)
> >
> >                at
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> >
> >                at org.mortbay.jetty.Server.handle(Server.java:285)
> >
> >                at
> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> >
> >                at
> > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne
> > ction.java:821)
> >
> >                at
> > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> >
> >                at
> > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> >
> >                at
> > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> >
> >                at
> > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.jav
> > a:226)
> >
> >                at
> > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.ja
> > va:442)
> >
> >
> >
> > Thanks,
> >
> > Nirmal
> >
> >
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Systems Architect| AOL | http://aol.com
>

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

Posted by Jorg Heymans <jo...@gmail.com>.
there is one now :)

https://issues.apache.org/jira/browse/SOLR-1758

Cheers,
Jorg

2010/2/5 Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>

> unfortunately, no
>
> On Fri, Feb 5, 2010 at 2:23 PM, Jorg Heymans <jo...@gmail.com>
> wrote:
> > dow, thanks for that Paul :-|
> >
> > I suppose schema validation for data-config.xml is already in Jira
> somewhere
> > ?
> >
> > Jorg
> >
> > 2010/2/5 Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>
> >
> >> wrong   <datasource name="orablob" type="FieldStreamDataSource" />
> >> right     <dataSource name="orablob" type="FieldStreamDataSource" />
> >>
> >> On Thu, Feb 4, 2010 at 9:27 PM, Jorg Heymans <jo...@gmail.com>
> >> wrote:
> >> > Hi,
> >> > I'm having some troubles getting this to work on a snapshot from 3rd
> feb
> >>  My
> >> > config looks as follows
> >> >     <dataSource name="ora" driver="oracle.jdbc.OracleDriver"
> url="...."
> >> />
> >> >     <datasource name="orablob" type="FieldStreamDataSource" />
> >> >     <document name="mydoc">
> >> >         <entity dataSource="ora" name="meta" query="select id,
> filename,
> >> > bytes from documents" >
> >> >             <field column="ID" name="id" />
> >> >             <field column="FILENAME" name="filename" />
> >> >             <entity dataSource="orablob"
> processor="TikaEntityProcessor"
> >> > url="bytes" dataField="meta.BYTES">
> >> >               <field column="text" name="mainDocument"/>
> >> >             </entity>
> >> >          </entity>
> >> >      </document>
> >> > and i get this stacktrace
> >> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
> to
> >> > execute query: bytes Processing Document # 1
> >> >         at
> >> >
> >>
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
> >> >         at
> >> >
> >>
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:253)
> >> >         at
> >> >
> >>
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
> >> >         at
> >> >
> >>
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
> >> >         at
> >> >
> >>
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98)
> >> > It seems that whatever is in the url attribute it is trying to execute
> as
> >> a
> >> > query. So i thought i put url="select bytes from documents where id =
> >> > ${meta.ID}" but then i get a classcastexception.
> >> > Caused by: java.lang.ClassCastException:
> >> > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1
> >> >         at
> >> >
> >>
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98)
> >> >         at
> >> >
> >>
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:233)
> >> > Any ideas what is wrong with the config ?
> >> > Thanks
> >> > Jorg
> >> > 2010/1/27 Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>
> >> >>
> >> >> There is no corresponding DataSurce which can be used with
> >> >> TikaEntityProcessor which reads from BLOB
> >> >> I have opened an issue.
> https://issues.apache.org/jira/browse/SOLR-1737
> >> >>
> >> >> On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal <ns...@columnit.com>
> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> >
> >> >> >
> >> >> > I am fairly new to Solr and would like to use the DIH to pull rich
> >> text
> >> >> > files (pdfs, etc) from BLOB fields in my database.
> >> >> >
> >> >> >
> >> >> >
> >> >> > There was a suggestion made to use the FieldReaderDataSource with
> the
> >> >> > recently commited TikaEntityProcessor.  Has anyone accomplished
> this?
> >> >> >
> >> >> > This is my configuration, and the resulting error - I'm not sure if
> >> I'm
> >> >> > using the FieldReaderDataSource correctly.  If anyone could shed
> light
> >> >> > on whether I am going the right direction or not, it would be
> >> >> > appreciated.
> >> >> >
> >> >> >
> >> >> >
> >> >> > ---------------Data-config.xml:
> >> >> >
> >> >> > <dataConfig>
> >> >> >
> >> >> >   <datasource name="f1" type="FieldReaderDataSource" />
> >> >> >
> >> >> >   <dataSource name="orcle" driver="oracle.jdbc.driver.OracleDriver"
> >> >> > url="jdbc:oracle:thin:un/pw@host:1521:sid" />
> >> >> >
> >> >> >      <document>
> >> >> >
> >> >> >      <entity dataSource="orcle" name="attach" query="select id as
> >> name,
> >> >> > attachment from testtable2">
> >> >> >
> >> >> >         <entity dataSource="f1" processor="TikaEntityProcessor"
> >> >> > dataField="attach.attachment" format="text">
> >> >> >
> >> >> >            <field column="text" name="NAME" />
> >> >> >
> >> >> >         </entity>
> >> >> >
> >> >> >      </entity>
> >> >> >
> >> >> >   </document>
> >> >> >
> >> >> > </dataConfig>
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > -------------Debug error:
> >> >> >
> >> >> > <response>
> >> >> >
> >> >> > <lst name="responseHeader">
> >> >> >
> >> >> > <int name="status">0</int>
> >> >> >
> >> >> > <int name="QTime">203</int>
> >> >> >
> >> >> > </lst>
> >> >> >
> >> >> > <lst name="initArgs">
> >> >> >
> >> >> > <lst name="defaults">
> >> >> >
> >> >> > <str name="config">testdb-data-config.xml</str>
> >> >> >
> >> >> > </lst>
> >> >> >
> >> >> > </lst>
> >> >> >
> >> >> > <str name="command">full-import</str>
> >> >> >
> >> >> > <str name="mode">debug</str>
> >> >> >
> >> >> > <null name="documents"/>
> >> >> >
> >> >> > <lst name="verbose-output">
> >> >> >
> >> >> > <lst name="entity:attach">
> >> >> >
> >> >> > <lst name="document#1">
> >> >> >
> >> >> > <str name="query">select id as name, attachment from
> testtable2</str>
> >> >> >
> >> >> > <str name="time-taken">0:0:0.32</str>
> >> >> >
> >> >> > <str>----------- row #1-------------</str>
> >> >> >
> >> >> > <str name="NAME">java.math.BigDecimal:2</str>
> >> >> >
> >> >> > <str name="ATTACHMENT">oracle.sql.BLOB:oracle.sql.BLOB@1c8e807
> </str>
> >> >> >
> >> >> > <str>---------------------------------------------</str>
> >> >> >
> >> >> > <lst name="entity:253433571801723">
> >> >> >
> >> >> > <str name="EXCEPTION">
> >> >> >
> >> >> > org.apache.solr.handler.dataimport.DataImportHandlerException: No
> >> >> > dataSource :f1 available for entity :253433571801723 Processing
> >> Document
> >> >> > # 1
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da
> >> >> > taImporter.java:279)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl
> >> >> > .java:93)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> >> >> > yProcessor.java:97)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> >> >> > ProcessorWrapper.java:237)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> >> >> > ava:357)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> >> >> > ava:383)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> >> >> > :242)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> >> >> > 0)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> >> >> > r.java:331)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> >> >> > :389)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D
> >> >> > ataImportHandler.java:203)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
> >> >> > ase.java:131)
> >> >> >
> >> >> >                at
> >> >> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
> >> >> > va:338)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
> >> >> > ava:241)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan
> >> >> > dler.java:1089)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2
> >> >> > 16)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> >> >> >
> >> >> >                at
> >> >> >
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler
> >> >> > Collection.java:211)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav
> >> >> > a:114)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> >> >> >
> >> >> >                at org.mortbay.jetty.Server.handle(Server.java:285)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne
> >> >> > ction.java:821)
> >> >> >
> >> >> >                at
> >> >> > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> >> >> >
> >> >> >                at
> >> >> > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> >> >> >
> >> >> >                at
> >> >> > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.jav
> >> >> > a:226)
> >> >> >
> >> >> >                at
> >> >> >
> >> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.ja
> >> >> > va:442)
> >> >> >
> >> >> >
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Nirmal
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> -----------------------------------------------------
> >> >> Noble Paul | Systems Architect| AOL | http://aol.com
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> -----------------------------------------------------
> >> Noble Paul | Systems Architect| AOL | http://aol.com
> >>
> >
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Systems Architect| AOL | http://aol.com
>

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
unfortunately, no

On Fri, Feb 5, 2010 at 2:23 PM, Jorg Heymans <jo...@gmail.com> wrote:
> dow, thanks for that Paul :-|
>
> I suppose schema validation for data-config.xml is already in Jira somewhere
> ?
>
> Jorg
>
> 2010/2/5 Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>
>
>> wrong   <datasource name="orablob" type="FieldStreamDataSource" />
>> right     <dataSource name="orablob" type="FieldStreamDataSource" />
>>
>> On Thu, Feb 4, 2010 at 9:27 PM, Jorg Heymans <jo...@gmail.com>
>> wrote:
>> > Hi,
>> > I'm having some troubles getting this to work on a snapshot from 3rd feb
>>  My
>> > config looks as follows
>> >     <dataSource name="ora" driver="oracle.jdbc.OracleDriver" url="...."
>> />
>> >     <datasource name="orablob" type="FieldStreamDataSource" />
>> >     <document name="mydoc">
>> >         <entity dataSource="ora" name="meta" query="select id, filename,
>> > bytes from documents" >
>> >             <field column="ID" name="id" />
>> >             <field column="FILENAME" name="filename" />
>> >             <entity dataSource="orablob" processor="TikaEntityProcessor"
>> > url="bytes" dataField="meta.BYTES">
>> >               <field column="text" name="mainDocument"/>
>> >             </entity>
>> >          </entity>
>> >      </document>
>> > and i get this stacktrace
>> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
>> > execute query: bytes Processing Document # 1
>> >         at
>> >
>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
>> >         at
>> >
>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:253)
>> >         at
>> >
>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
>> >         at
>> >
>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
>> >         at
>> >
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98)
>> > It seems that whatever is in the url attribute it is trying to execute as
>> a
>> > query. So i thought i put url="select bytes from documents where id =
>> > ${meta.ID}" but then i get a classcastexception.
>> > Caused by: java.lang.ClassCastException:
>> > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1
>> >         at
>> >
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98)
>> >         at
>> >
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:233)
>> > Any ideas what is wrong with the config ?
>> > Thanks
>> > Jorg
>> > 2010/1/27 Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>
>> >>
>> >> There is no corresponding DataSurce which can be used with
>> >> TikaEntityProcessor which reads from BLOB
>> >> I have opened an issue.https://issues.apache.org/jira/browse/SOLR-1737
>> >>
>> >> On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal <ns...@columnit.com>
>> wrote:
>> >> > Hi,
>> >> >
>> >> >
>> >> >
>> >> > I am fairly new to Solr and would like to use the DIH to pull rich
>> text
>> >> > files (pdfs, etc) from BLOB fields in my database.
>> >> >
>> >> >
>> >> >
>> >> > There was a suggestion made to use the FieldReaderDataSource with the
>> >> > recently commited TikaEntityProcessor.  Has anyone accomplished this?
>> >> >
>> >> > This is my configuration, and the resulting error - I'm not sure if
>> I'm
>> >> > using the FieldReaderDataSource correctly.  If anyone could shed light
>> >> > on whether I am going the right direction or not, it would be
>> >> > appreciated.
>> >> >
>> >> >
>> >> >
>> >> > ---------------Data-config.xml:
>> >> >
>> >> > <dataConfig>
>> >> >
>> >> >   <datasource name="f1" type="FieldReaderDataSource" />
>> >> >
>> >> >   <dataSource name="orcle" driver="oracle.jdbc.driver.OracleDriver"
>> >> > url="jdbc:oracle:thin:un/pw@host:1521:sid" />
>> >> >
>> >> >      <document>
>> >> >
>> >> >      <entity dataSource="orcle" name="attach" query="select id as
>> name,
>> >> > attachment from testtable2">
>> >> >
>> >> >         <entity dataSource="f1" processor="TikaEntityProcessor"
>> >> > dataField="attach.attachment" format="text">
>> >> >
>> >> >            <field column="text" name="NAME" />
>> >> >
>> >> >         </entity>
>> >> >
>> >> >      </entity>
>> >> >
>> >> >   </document>
>> >> >
>> >> > </dataConfig>
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > -------------Debug error:
>> >> >
>> >> > <response>
>> >> >
>> >> > <lst name="responseHeader">
>> >> >
>> >> > <int name="status">0</int>
>> >> >
>> >> > <int name="QTime">203</int>
>> >> >
>> >> > </lst>
>> >> >
>> >> > <lst name="initArgs">
>> >> >
>> >> > <lst name="defaults">
>> >> >
>> >> > <str name="config">testdb-data-config.xml</str>
>> >> >
>> >> > </lst>
>> >> >
>> >> > </lst>
>> >> >
>> >> > <str name="command">full-import</str>
>> >> >
>> >> > <str name="mode">debug</str>
>> >> >
>> >> > <null name="documents"/>
>> >> >
>> >> > <lst name="verbose-output">
>> >> >
>> >> > <lst name="entity:attach">
>> >> >
>> >> > <lst name="document#1">
>> >> >
>> >> > <str name="query">select id as name, attachment from testtable2</str>
>> >> >
>> >> > <str name="time-taken">0:0:0.32</str>
>> >> >
>> >> > <str>----------- row #1-------------</str>
>> >> >
>> >> > <str name="NAME">java.math.BigDecimal:2</str>
>> >> >
>> >> > <str name="ATTACHMENT">oracle.sql.BLOB:oracle.sql.BLOB@1c8e807</str>
>> >> >
>> >> > <str>---------------------------------------------</str>
>> >> >
>> >> > <lst name="entity:253433571801723">
>> >> >
>> >> > <str name="EXCEPTION">
>> >> >
>> >> > org.apache.solr.handler.dataimport.DataImportHandlerException: No
>> >> > dataSource :f1 available for entity :253433571801723 Processing
>> Document
>> >> > # 1
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da
>> >> > taImporter.java:279)
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl
>> >> > .java:93)
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
>> >> > yProcessor.java:97)
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
>> >> > ProcessorWrapper.java:237)
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
>> >> > ava:357)
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
>> >> > ava:383)
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
>> >> > :242)
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
>> >> > 0)
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
>> >> > r.java:331)
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
>> >> > :389)
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D
>> >> > ataImportHandler.java:203)
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
>> >> > ase.java:131)
>> >> >
>> >> >                at
>> >> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
>> >> > va:338)
>> >> >
>> >> >                at
>> >> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
>> >> > ava:241)
>> >> >
>> >> >                at
>> >> >
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan
>> >> > dler.java:1089)
>> >> >
>> >> >                at
>> >> >
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>> >> >
>> >> >                at
>> >> >
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2
>> >> > 16)
>> >> >
>> >> >                at
>> >> >
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>> >> >
>> >> >                at
>> >> >
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>> >> >
>> >> >                at
>> >> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>> >> >
>> >> >                at
>> >> >
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler
>> >> > Collection.java:211)
>> >> >
>> >> >                at
>> >> >
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav
>> >> > a:114)
>> >> >
>> >> >                at
>> >> >
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>> >> >
>> >> >                at org.mortbay.jetty.Server.handle(Server.java:285)
>> >> >
>> >> >                at
>> >> >
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>> >> >
>> >> >                at
>> >> >
>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne
>> >> > ction.java:821)
>> >> >
>> >> >                at
>> >> > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
>> >> >
>> >> >                at
>> >> > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
>> >> >
>> >> >                at
>> >> > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>> >> >
>> >> >                at
>> >> >
>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.jav
>> >> > a:226)
>> >> >
>> >> >                at
>> >> >
>> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.ja
>> >> > va:442)
>> >> >
>> >> >
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Nirmal
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> -----------------------------------------------------
>> >> Noble Paul | Systems Architect| AOL | http://aol.com
>> >
>> >
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul | Systems Architect| AOL | http://aol.com
>>
>



-- 
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

Posted by Jorg Heymans <jo...@gmail.com>.
dow, thanks for that Paul :-|

I suppose schema validation for data-config.xml is already in Jira somewhere
?

Jorg

2010/2/5 Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>

> wrong   <datasource name="orablob" type="FieldStreamDataSource" />
> right     <dataSource name="orablob" type="FieldStreamDataSource" />
>
> On Thu, Feb 4, 2010 at 9:27 PM, Jorg Heymans <jo...@gmail.com>
> wrote:
> > Hi,
> > I'm having some troubles getting this to work on a snapshot from 3rd feb
>  My
> > config looks as follows
> >     <dataSource name="ora" driver="oracle.jdbc.OracleDriver" url="...."
> />
> >     <datasource name="orablob" type="FieldStreamDataSource" />
> >     <document name="mydoc">
> >         <entity dataSource="ora" name="meta" query="select id, filename,
> > bytes from documents" >
> >             <field column="ID" name="id" />
> >             <field column="FILENAME" name="filename" />
> >             <entity dataSource="orablob" processor="TikaEntityProcessor"
> > url="bytes" dataField="meta.BYTES">
> >               <field column="text" name="mainDocument"/>
> >             </entity>
> >          </entity>
> >      </document>
> > and i get this stacktrace
> > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> > execute query: bytes Processing Document # 1
> >         at
> >
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
> >         at
> >
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:253)
> >         at
> >
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
> >         at
> >
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
> >         at
> >
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98)
> > It seems that whatever is in the url attribute it is trying to execute as
> a
> > query. So i thought i put url="select bytes from documents where id =
> > ${meta.ID}" but then i get a classcastexception.
> > Caused by: java.lang.ClassCastException:
> > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1
> >         at
> >
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98)
> >         at
> >
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:233)
> > Any ideas what is wrong with the config ?
> > Thanks
> > Jorg
> > 2010/1/27 Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>
> >>
> >> There is no corresponding DataSurce which can be used with
> >> TikaEntityProcessor which reads from BLOB
> >> I have opened an issue.https://issues.apache.org/jira/browse/SOLR-1737
> >>
> >> On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal <ns...@columnit.com>
> wrote:
> >> > Hi,
> >> >
> >> >
> >> >
> >> > I am fairly new to Solr and would like to use the DIH to pull rich
> text
> >> > files (pdfs, etc) from BLOB fields in my database.
> >> >
> >> >
> >> >
> >> > There was a suggestion made to use the FieldReaderDataSource with the
> >> > recently commited TikaEntityProcessor.  Has anyone accomplished this?
> >> >
> >> > This is my configuration, and the resulting error - I'm not sure if
> I'm
> >> > using the FieldReaderDataSource correctly.  If anyone could shed light
> >> > on whether I am going the right direction or not, it would be
> >> > appreciated.
> >> >
> >> >
> >> >
> >> > ---------------Data-config.xml:
> >> >
> >> > <dataConfig>
> >> >
> >> >   <datasource name="f1" type="FieldReaderDataSource" />
> >> >
> >> >   <dataSource name="orcle" driver="oracle.jdbc.driver.OracleDriver"
> >> > url="jdbc:oracle:thin:un/pw@host:1521:sid" />
> >> >
> >> >      <document>
> >> >
> >> >      <entity dataSource="orcle" name="attach" query="select id as
> name,
> >> > attachment from testtable2">
> >> >
> >> >         <entity dataSource="f1" processor="TikaEntityProcessor"
> >> > dataField="attach.attachment" format="text">
> >> >
> >> >            <field column="text" name="NAME" />
> >> >
> >> >         </entity>
> >> >
> >> >      </entity>
> >> >
> >> >   </document>
> >> >
> >> > </dataConfig>
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > -------------Debug error:
> >> >
> >> > <response>
> >> >
> >> > <lst name="responseHeader">
> >> >
> >> > <int name="status">0</int>
> >> >
> >> > <int name="QTime">203</int>
> >> >
> >> > </lst>
> >> >
> >> > <lst name="initArgs">
> >> >
> >> > <lst name="defaults">
> >> >
> >> > <str name="config">testdb-data-config.xml</str>
> >> >
> >> > </lst>
> >> >
> >> > </lst>
> >> >
> >> > <str name="command">full-import</str>
> >> >
> >> > <str name="mode">debug</str>
> >> >
> >> > <null name="documents"/>
> >> >
> >> > <lst name="verbose-output">
> >> >
> >> > <lst name="entity:attach">
> >> >
> >> > <lst name="document#1">
> >> >
> >> > <str name="query">select id as name, attachment from testtable2</str>
> >> >
> >> > <str name="time-taken">0:0:0.32</str>
> >> >
> >> > <str>----------- row #1-------------</str>
> >> >
> >> > <str name="NAME">java.math.BigDecimal:2</str>
> >> >
> >> > <str name="ATTACHMENT">oracle.sql.BLOB:oracle.sql.BLOB@1c8e807</str>
> >> >
> >> > <str>---------------------------------------------</str>
> >> >
> >> > <lst name="entity:253433571801723">
> >> >
> >> > <str name="EXCEPTION">
> >> >
> >> > org.apache.solr.handler.dataimport.DataImportHandlerException: No
> >> > dataSource :f1 available for entity :253433571801723 Processing
> Document
> >> > # 1
> >> >
> >> >                at
> >> >
> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da
> >> > taImporter.java:279)
> >> >
> >> >                at
> >> >
> org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl
> >> > .java:93)
> >> >
> >> >                at
> >> >
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> >> > yProcessor.java:97)
> >> >
> >> >                at
> >> >
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> >> > ProcessorWrapper.java:237)
> >> >
> >> >                at
> >> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> >> > ava:357)
> >> >
> >> >                at
> >> >
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> >> > ava:383)
> >> >
> >> >                at
> >> >
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> >> > :242)
> >> >
> >> >                at
> >> >
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> >> > 0)
> >> >
> >> >                at
> >> >
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> >> > r.java:331)
> >> >
> >> >                at
> >> >
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> >> > :389)
> >> >
> >> >                at
> >> >
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D
> >> > ataImportHandler.java:203)
> >> >
> >> >                at
> >> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
> >> > ase.java:131)
> >> >
> >> >                at
> >> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> >> >
> >> >                at
> >> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
> >> > va:338)
> >> >
> >> >                at
> >> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
> >> > ava:241)
> >> >
> >> >                at
> >> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan
> >> > dler.java:1089)
> >> >
> >> >                at
> >> >
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> >> >
> >> >                at
> >> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2
> >> > 16)
> >> >
> >> >                at
> >> >
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> >> >
> >> >                at
> >> >
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> >> >
> >> >                at
> >> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> >> >
> >> >                at
> >> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler
> >> > Collection.java:211)
> >> >
> >> >                at
> >> >
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav
> >> > a:114)
> >> >
> >> >                at
> >> >
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> >> >
> >> >                at org.mortbay.jetty.Server.handle(Server.java:285)
> >> >
> >> >                at
> >> >
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> >> >
> >> >                at
> >> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne
> >> > ction.java:821)
> >> >
> >> >                at
> >> > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> >> >
> >> >                at
> >> > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> >> >
> >> >                at
> >> > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> >> >
> >> >                at
> >> >
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.jav
> >> > a:226)
> >> >
> >> >                at
> >> >
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.ja
> >> > va:442)
> >> >
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Nirmal
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> -----------------------------------------------------
> >> Noble Paul | Systems Architect| AOL | http://aol.com
> >
> >
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Systems Architect| AOL | http://aol.com
>