You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2013/04/13 23:14:15 UTC

[jira] [Resolved] (LUCENE-1757) Support adding a "stored" field via a Reader

     [ https://issues.apache.org/jira/browse/LUCENE-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erick Erickson resolved LUCENE-1757.
------------------------------------

    Resolution: Won't Fix

SPRING_CLEANING_2013 JIRAS. I think this has been long since changed.
                
> Support adding a "stored" field via a Reader
> --------------------------------------------
>
>                 Key: LUCENE-1757
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1757
>             Project: Lucene - Core
>          Issue Type: Wish
>          Components: core/index
>            Reporter: Tim Smith
>
> All current constructors for Field() that take a Reader explicitly say they will not be stored.
> It would be highly desirable to support adding a stored field to a Document using a Reader (or some special interface that can go direct to the source data)
> This could greatly reduce memory required for adding very large stored fields (if used efficiently by IndexWriter)
> This will support two primary use cases:
> 1. can create stored field from arbitrary CharSequence 
> I may internally use a MutableString type class during document processing to conserve memory, however, i would currently have to convert this to a String() prior to adding it as a stored field. If i could just pass a Reader for this mutable string/char sequence indexing could be smart enough to not require allocating double the space.
> 2. can create a stored field from a file on disk
> If adding large stored fields, the actual value may be on disk to reduce memory use during indexing. In order to support using this as a Stored Field, it would currently have to be entirely loaded into memory as a String/byte[] in order to be added to a Field() (this could be quite large and provoke OutOfMemory error)
> Document retrieval considerations:
> It would then also be ideal if when fetching a Document from the index, you could specify a "max string size" for the returned stored field
> if the field was larger than this cutoff, a Reader going directly to disk would be returned instead of a String/byte[]  This would again allow smart applications to save memory during document retrieval (this would be especially be nice for highlighting as the source data could be streamed right into the highlighter)
> It would also be acceptable if some new interface would be accepted instead of Reader
> this could be some form of "sized" input stream that will return the number of bytes/chars that will be produced in total
> ex:
> {code}
> public interface FieldSource {
>   /** Size of stored field value (in bytes if isBinary() is true, in chars if isBinary() is false) */
>   public int size();
>   /** if true, use getInputStream(), if false, use getReader() */
>   public boolean isBinary();
>   /** Get the input stream for pulling this from its source (null if isBinary() is false) */
>   public InputStream getInputStream();
>   /** Get the reader for reading character data (null if isBinary() is true) */
>   public Reader getReader();
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org