You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "O. Klein" <kl...@octoweb.nl> on 2011/09/17 17:01:45 UTC

Is it possible to use different types of datasource in DIH?

I want to combine data in XML on disk and XML online. 

<dataSource type="FileDataSource" encoding="UTF-8" />

is needed to read all the XML-files on disk and

<dataSource type="URLDataSource" name="url" encoding="UTF-8"
connectionTimeout="30000" readTimeout="30000"/>

is needed to get the content from XML online.

Using them both causes problems as the FileDataSource is being used
eventhough the entity specifically calls for datasource="url".

Is there way to fix this?

--
View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-different-types-of-datasource-in-DIH-tp3344380p3344380.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is it possible to use different types of datasource in DIH?

Posted by "O. Klein" <kl...@octoweb.nl>.
Yeah, naming datasources maybe only works when they are of the same type.

I got this to work with URLdatasource and
url="file:///${crawl.fileAbsolutePath}"  (2 forward slashes doesn't work)
for the local files.





--
View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-different-types-of-datasource-in-DIH-tp3344380p3348257.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is it possible to use different types of datasource in DIH?

Posted by Ahmet Arslan <io...@yahoo.com>.
> I did some more testing and it seems
> that as soon as you use FileDataSource
> it overrides any other dataSource.
> 
> <dataConfig>
> <dataSource type="HttpDataSource" name="url"
> encoding="UTF-8"
> connectionTimeout="30000" readTimeout="30000"/>
> <dataSource type="FileDataSource" encoding="UTF-8"
> />
> <document>
> 
> <entity name="xmlroot" datasource="url"
> rootEntity="false"
> url="http://www.server.com/rss.xml"
> processor="XPathEntityProcessor"
> forEach="/rss/channel/item" >
>     <field column="link"
> xpath="/rss/channel/item/link"/>
>  </entity>
>      
> </document>
> </dataConfig>
> 
> will not work, unless you remove FileDataSource. Anyone
> know a way to fix
> this (except removing FileDataSource) ?

Did you try to give a name to FileDataSource? e.g.

<dataSource type="FileDataSource" encoding="UTF-8" name="fileData"/>

Re: Is it possible to use different types of datasource in DIH?

Posted by "O. Klein" <kl...@octoweb.nl>.
I did some more testing and it seems that as soon as you use FileDataSource
it overrides any other dataSource.

<dataConfig>
<dataSource type="HttpDataSource" name="url" encoding="UTF-8"
connectionTimeout="30000" readTimeout="30000"/>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>

<entity name="xmlroot" datasource="url" rootEntity="false"
url="http://www.server.com/rss.xml" processor="XPathEntityProcessor"
forEach="/rss/channel/item" >
    <field column="link" xpath="/rss/channel/item/link"/>
 </entity>
     
</document>
</dataConfig>

will not work, unless you remove FileDataSource. Anyone know a way to fix
this (except removing FileDataSource) ?

--
View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-different-types-of-datasource-in-DIH-tp3344380p3348011.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is it possible to use different types of datasource in DIH?

Posted by "O. Klein" <kl...@octoweb.nl>.
That doesn't really help.

Using multiple datasources of the same type or combination of e.g.
FileDataSource and BinURLDataSource is no problem.

Using FileDataSource and URLDataSource doesn't work, cause FileDataSource is
always being used, even if the entity is using a URLDataSource as
datasource.

--
View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-different-types-of-datasource-in-DIH-tp3344380p3344668.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is it possible to use different types of datasource in DIH?

Posted by Ahmet Arslan <io...@yahoo.com>.
> I want to combine data in XML on disk
> and XML online. 
> 
> <dataSource type="FileDataSource" encoding="UTF-8"
> />
> 
> is needed to read all the XML-files on disk and
> 
> <dataSource type="URLDataSource" name="url"
> encoding="UTF-8"
> connectionTimeout="30000" readTimeout="30000"/>
> 
> is needed to get the content from XML online.
> 
> Using them both causes problems as the FileDataSource is
> being used
> eventhough the entity specifically calls for
> datasource="url".
> 
> Is there way to fix this?

Multiple_DataSources?
http://wiki.apache.org/solr/DataImportHandler#Multiple_DataSources