You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nathan Adams <na...@umich.edu> on 2009/01/29 00:34:19 UTC

DIH handling of missing files

I am constructing documents from a JDBC datasource and a HTTP datasource
(see data-config file below.)  My problem is that I cannot know if a
particular HTTP URL is available at index time, so I need DIH to
continue processing even if the HTTP location returns a 404.
onError="continue" does not appear to help in this case.  Should it?

<dataConfig>

    <dataSource type="JdbcDataSource" name="db"
driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@?????"
user="???" password="???"/>

    <dataSource type="HttpDataSource" name="http"/>
 
    <document name="resources"> 

        <entity name="metadata" dataSource="db" pk="RESOURCEID"
query="select * from ????" onError="continue">

        <entity name="xmltext"
url="http://???.com/${metadata.RESOURCEID}.xml" forEach="/content"
dataSource="http" processor="XPathEntityProcessor" onError="continue">

            <field column="FULLTEXT" xpath="/content"/>

        </entity>

        </entity>

    </document> 

</dataConfig>

Thanks,
Nathan

RE: DIH handling of missing files

Posted by Nathan Adams <na...@umich.edu>.
Which appears to be v1.3, which explains the problem.  Thanks!

________________________________

From: Nathan Adams [mailto:natad@umich.edu]
Sent: Thu 01/29/2009 8:28 AM
To: solr-user@lucene.apache.org
Subject: RE: DIH handling of missing files



I'm running the example from the DIH wiki page:

http://wiki.apache.org/solr-data/attachments/DataImportHandler/attachments/example-solr-home.jar

-Nathan


________________________________

From: Noble Paul ??????? ?????? [mailto:noble.paul@gmail.com]
Sent: Wed 01/28/2009 11:32 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH handling of missing files



onError="continue" must help .

which version of DIH are you using? onError is a Solr 1.4 feature
--Noble

On Thu, Jan 29, 2009 at 5:04 AM, Nathan Adams <na...@umich.edu> wrote:
> I am constructing documents from a JDBC datasource and a HTTP datasource
> (see data-config file below.)  My problem is that I cannot know if a
> particular HTTP URL is available at index time, so I need DIH to
> continue processing even if the HTTP location returns a 404.
> onError="continue" does not appear to help in this case.  Should it?
>
> <dataConfig>
>
>    <dataSource type="JdbcDataSource" name="db"
> driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@?????"
> user="???" password="???"/>
>
>    <dataSource type="HttpDataSource" name="http"/>
>
>    <document name="resources">
>
>        <entity name="metadata" dataSource="db" pk="RESOURCEID"
> query="select * from ????" onError="continue">
>
>        <entity name="xmltext"
> url="http://???.com/$ <http:///???.com/$>  <http:///???.com/$> {metadata.RESOURCEID}.xml" forEach="/content"
> dataSource="http" processor="XPathEntityProcessor" onError="continue">
>
>            <field column="FULLTEXT" xpath="/content"/>
>
>        </entity>
>
>        </entity>
>
>    </document>
>
> </dataConfig>
>
> Thanks,
> Nathan
>



--
--Noble Paul





RE: DIH handling of missing files

Posted by Nathan Adams <na...@umich.edu>.
I'm running the example from the DIH wiki page:
 
http://wiki.apache.org/solr-data/attachments/DataImportHandler/attachments/example-solr-home.jar
 
-Nathan

 
________________________________

From: Noble Paul ??????? ?????? [mailto:noble.paul@gmail.com]
Sent: Wed 01/28/2009 11:32 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH handling of missing files



onError="continue" must help .

which version of DIH are you using? onError is a Solr 1.4 feature
--Noble

On Thu, Jan 29, 2009 at 5:04 AM, Nathan Adams <na...@umich.edu> wrote:
> I am constructing documents from a JDBC datasource and a HTTP datasource
> (see data-config file below.)  My problem is that I cannot know if a
> particular HTTP URL is available at index time, so I need DIH to
> continue processing even if the HTTP location returns a 404.
> onError="continue" does not appear to help in this case.  Should it?
>
> <dataConfig>
>
>    <dataSource type="JdbcDataSource" name="db"
> driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@?????"
> user="???" password="???"/>
>
>    <dataSource type="HttpDataSource" name="http"/>
>
>    <document name="resources">
>
>        <entity name="metadata" dataSource="db" pk="RESOURCEID"
> query="select * from ????" onError="continue">
>
>        <entity name="xmltext"
> url="http://???.com/$ <http:///???.com/$> {metadata.RESOURCEID}.xml" forEach="/content"
> dataSource="http" processor="XPathEntityProcessor" onError="continue">
>
>            <field column="FULLTEXT" xpath="/content"/>
>
>        </entity>
>
>        </entity>
>
>    </document>
>
> </dataConfig>
>
> Thanks,
> Nathan
>



--
--Noble Paul



Re: DIH handling of missing files

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
onError="continue" must help .

which version of DIH are you using? onError is a Solr 1.4 feature
--Noble

On Thu, Jan 29, 2009 at 5:04 AM, Nathan Adams <na...@umich.edu> wrote:
> I am constructing documents from a JDBC datasource and a HTTP datasource
> (see data-config file below.)  My problem is that I cannot know if a
> particular HTTP URL is available at index time, so I need DIH to
> continue processing even if the HTTP location returns a 404.
> onError="continue" does not appear to help in this case.  Should it?
>
> <dataConfig>
>
>    <dataSource type="JdbcDataSource" name="db"
> driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@?????"
> user="???" password="???"/>
>
>    <dataSource type="HttpDataSource" name="http"/>
>
>    <document name="resources">
>
>        <entity name="metadata" dataSource="db" pk="RESOURCEID"
> query="select * from ????" onError="continue">
>
>        <entity name="xmltext"
> url="http://???.com/${metadata.RESOURCEID}.xml" forEach="/content"
> dataSource="http" processor="XPathEntityProcessor" onError="continue">
>
>            <field column="FULLTEXT" xpath="/content"/>
>
>        </entity>
>
>        </entity>
>
>    </document>
>
> </dataConfig>
>
> Thanks,
> Nathan
>



-- 
--Noble Paul