You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Amit Nithian (JIRA)" <ji...@apache.org> on 2010/08/31 08:53:54 UTC
[jira] Updated: (SOLR-2096) DIH should be able read data directly
from HDFS for indexing
[ https://issues.apache.org/jira/browse/SOLR-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amit Nithian updated SOLR-2096:
-------------------------------
Attachment: hdfs_reader.tar
> DIH should be able read data directly from HDFS for indexing
> ------------------------------------------------------------
>
> Key: SOLR-2096
> URL: https://issues.apache.org/jira/browse/SOLR-2096
> Project: Solr
> Issue Type: New Feature
> Components: contrib - DataImportHandler
> Affects Versions: 1.4.1
> Reporter: Amit Nithian
> Fix For: 1.4.2
>
> Attachments: hdfs_reader.tar
>
>
> DIH doesn't support reading from the hdfs:// protocol which makes it hard to index data generated by a M/R job. This tarball contains a subclass of the URLDataSource along with an HDFSReader that allows for this. The data is assumed to be in text format and able to be processed by the LineEntityProcessor.
> Here is an example DIH-Config snippet:
> <dataSource name="queryData" type="org.apache.solr.handler.dataimport.hdfs.HDFSDataSource"
> baseUrl="hdfs://<YOURSERVER>:9000/" encoding="UTF-8"
> connectionTimeout="5000" readTimeout="10000"/>
> <document name="autoSuggester">
> <entity name="jc" processor="LineEntityProcessor"
> url="<YOUR FOLDER>/part*" dataSource="queryData">
> <!-- Field mappings here if necessary -->
> </entity>
> </document>
> </dataConfig>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org