You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by "Nathan Adams (JIRA)" <ji...@apache.org> on 2009/01/26 18:13:59 UTC

[jira] Created: (SOLR-987) Add a new DataImportHandler EntityProcessor to handle non-XML files

Add a new DataImportHandler EntityProcessor to handle non-XML files
-------------------------------------------------------------------

                 Key: SOLR-987
                 URL: https://issues.apache.org/jira/browse/SOLR-987
             Project: Solr
          Issue Type: New Feature
          Components: contrib - DataImportHandler
            Reporter: Nathan Adams


Need a way to use Data Import Handler to index non-XML (i.e. simple text) files (either via HTTP or FileSystem)?  This would assist in putting the entire contents of a text file into a single field of a document for which the other fields are being pulled out of another DataSource.  An EntityProcessor looks like the right place for this as it may help us add more attributes if needed.  We could also consider support for other file formats (PDF, office, etc.), which may overlap with some of the Extraction/Tika work.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-987) Add a new DataImportHandler EntityProcessor to handle non-XML files

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667579#action_12667579 ] 

Noble Paul commented on SOLR-987:
---------------------------------

isn't this same as SOLR-980 ? 

> Add a new DataImportHandler EntityProcessor to handle non-XML files
> -------------------------------------------------------------------
>
>                 Key: SOLR-987
>                 URL: https://issues.apache.org/jira/browse/SOLR-987
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>            Reporter: Nathan Adams
>
> Need a way to use Data Import Handler to index non-XML (i.e. simple text) files (either via HTTP or FileSystem)?  This would assist in putting the entire contents of a text file into a single field of a document for which the other fields are being pulled out of another DataSource.  An EntityProcessor looks like the right place for this as it may help us add more attributes if needed.  We could also consider support for other file formats (PDF, office, etc.), which may overlap with some of the Extraction/Tika work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-987) Add a new DataImportHandler EntityProcessor to handle non-XML files

Posted by "Nathan Adams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667672#action_12667672 ] 

Nathan Adams commented on SOLR-987:
-----------------------------------

Yes it is - I didn't realize you had already created an issue for this.

> Add a new DataImportHandler EntityProcessor to handle non-XML files
> -------------------------------------------------------------------
>
>                 Key: SOLR-987
>                 URL: https://issues.apache.org/jira/browse/SOLR-987
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>            Reporter: Nathan Adams
>
> Need a way to use Data Import Handler to index non-XML (i.e. simple text) files (either via HTTP or FileSystem)?  This would assist in putting the entire contents of a text file into a single field of a document for which the other fields are being pulled out of another DataSource.  An EntityProcessor looks like the right place for this as it may help us add more attributes if needed.  We could also consider support for other file formats (PDF, office, etc.), which may overlap with some of the Extraction/Tika work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-987) Add a new DataImportHandler EntityProcessor to handle non-XML files

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar resolved SOLR-987.
----------------------------------------

    Resolution: Duplicate

Duplicate of SOLR-980

> Add a new DataImportHandler EntityProcessor to handle non-XML files
> -------------------------------------------------------------------
>
>                 Key: SOLR-987
>                 URL: https://issues.apache.org/jira/browse/SOLR-987
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>            Reporter: Nathan Adams
>
> Need a way to use Data Import Handler to index non-XML (i.e. simple text) files (either via HTTP or FileSystem)?  This would assist in putting the entire contents of a text file into a single field of a document for which the other fields are being pulled out of another DataSource.  An EntityProcessor looks like the right place for this as it may help us add more attributes if needed.  We could also consider support for other file formats (PDF, office, etc.), which may overlap with some of the Extraction/Tika work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.