You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Amit Nithian (JIRA)" <ji...@apache.org> on 2010/08/31 08:53:54 UTC

[jira] Updated: (SOLR-2096) DIH should be able read data directly from HDFS for indexing

     [ https://issues.apache.org/jira/browse/SOLR-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amit Nithian updated SOLR-2096:
-------------------------------

    Attachment: hdfs_reader.tar

> DIH should be able read data directly from HDFS for indexing
> ------------------------------------------------------------
>
>                 Key: SOLR-2096
>                 URL: https://issues.apache.org/jira/browse/SOLR-2096
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4.1
>            Reporter: Amit Nithian
>             Fix For: 1.4.2
>
>         Attachments: hdfs_reader.tar
>
>
> DIH doesn't support reading from the hdfs:// protocol which makes it hard to index data generated by a M/R job. This tarball contains a subclass of the URLDataSource along with an HDFSReader that allows for this. The data is assumed to be in text format and able to be processed by the LineEntityProcessor.
> Here is an example DIH-Config snippet:
>   <dataSource name="queryData" type="org.apache.solr.handler.dataimport.hdfs.HDFSDataSource" 
>   baseUrl="hdfs://<YOURSERVER>:9000/" encoding="UTF-8" 
>   connectionTimeout="5000" readTimeout="10000"/>
> 	<document name="autoSuggester">
> 		<entity name="jc" processor="LineEntityProcessor"
> 			url="<YOUR FOLDER>/part*" dataSource="queryData">
> <!-- Field mappings here if necessary -->
> 		</entity>
> 	</document>
> </dataConfig>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org