You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vasu p <ac...@gmail.com> on 2010/12/20 02:57:57 UTC

Custom transformer to get file content from file path

Hi,
I have a custom library, which is used to input a file path and it returns
file content as a string output.
My DB has a file path in one of the table and using DIH configuration in
Solr to do the indexing. I couldnt use TikaEntityProcessor to do indexing of
a file located in file system. I though of using Custom Transformer to
transform file_path to file_content field in the row.

I would like to know following details:
1. Setting file content as a string to a custom file_content field might
cause memory issue if a very big file over hundreds of mega bites might
consume the RAM space. Is it possible to send a stream as input to Solr?
What is the filed type should be configured in schema.xml?
2. Is there any better approach than a custom transformer?
3. Any other best approach to implement indexing based on a file path?
Thanks a lot.

Re: Custom transformer to get file content from file path

Posted by Ahmet Arslan <io...@yahoo.com>.
> I have a custom library, which is used to input a file path
> and it returns
> file content as a string output.
> My DB has a file path in one of the table and using DIH
> configuration in
> Solr to do the indexing. I couldnt use TikaEntityProcessor
> to do indexing of
> a file located in file system. I though of using Custom
> Transformer to
> transform file_path to file_content field in the row.
> 
> I would like to know following details:
> 1. Setting file content as a string to a custom
> file_content field might
> cause memory issue if a very big file over hundreds of mega
> bites might
> consume the RAM space. Is it possible to send a stream as
> input to Solr?
> What is the filed type should be configured in schema.xml?
> 2. Is there any better approach than a custom transformer?
> 3. Any other best approach to implement indexing based on a
> file path?

http://wiki.apache.org/solr/DataImportHandler#PlainTextEntityProcessor should do the trick.