You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Naveen Kumar <id...@gmail.com> on 2010/06/29 13:40:08 UTC

Adding a new field to existing Index

Hey,

I need to add a new field (a stored , not indexed field) for all documents
present in an existing large index. Reindexing the whole index will be very
costly. Is there a way to do this or any work around?

I would also like to know, if data or term vector, of a field indexed
without storing, can somehow be retrieved. This would enable a work around
solution to my problem.

Thank you
Naveen Kumar

Re: Adding a new field to existing Index

Posted by Naveen Kumar <id...@gmail.com>.

yes, with lucene's current API's, it does not seem possible.
But, as this is a problem that many might be facing, I was hoping someone
might have figured out a solution.


On Tue, Jun 29, 2010 at 5:29 PM, Mango <bg...@gmail.com> wrote:

> Unfortunately, I don't think it is possible to add new field without
> re-indexing.
>
> As for extracting content from the field, it should be possible to
> retrieve data if the term vectors
> were stored with positions offset
> (Field.TermVector.WITH_POSITIONS_OFFSETS). If not, I don't
> think it's possible.
>
> On Tue, Jun 29, 2010 at 1:40 PM, Naveen Kumar <id...@gmail.com> wrote:
> > Hey,
> >
> > I need to add a new field (a stored , not indexed field) for all
> documents
> > present in an existing large index. Reindexing the whole index will be
> very
> > costly. Is there a way to do this or any work around?
> >
> > I would also like to know, if data or term vector, of a field indexed
> > without storing, can somehow be retrieved. This would enable a work
> around
> > solution to my problem.
> >
> > Thank you
> > Naveen Kumar
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Adding a new field to existing Index

Posted by Mango <bg...@gmail.com>.

Unfortunately, I don't think it is possible to add new field without
re-indexing.

As for extracting content from the field, it should be possible to
retrieve data if the term vectors
were stored with positions offset
(Field.TermVector.WITH_POSITIONS_OFFSETS). If not, I don't
think it's possible.

On Tue, Jun 29, 2010 at 1:40 PM, Naveen Kumar <id...@gmail.com> wrote:
> Hey,
>
> I need to add a new field (a stored , not indexed field) for all documents
> present in an existing large index. Reindexing the whole index will be very
> costly. Is there a way to do this or any work around?
>
> I would also like to know, if data or term vector, of a field indexed
> without storing, can somehow be retrieved. This would enable a work around
> solution to my problem.
>
> Thank you
> Naveen Kumar
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Adding a new field to existing Index

Posted by Naveen Kumar <id...@gmail.com>.

Thanks for the quick reply!
I will go ahead with reindexing of all the data.

On Wed, Jul 7, 2010 at 6:27 PM, Andrzej Bialecki <ab...@getopt.org> wrote:

> On 2010-07-07 14:49, Naveen Kumar wrote:
>
>> Hi Andrzej Bialecki
>>
>> When you suggested -
>>     "There are some other low-level ways to do this, but the easiest is to
>>       use a FilterIndexReader, especially since you just want to add a
>> stored
>>       field - implement a subclass of FilterIndexReader that adds a new
>> field
>>       in getFieldNames() and document(int). Then use
>>       IndexWriter.addIndexes(IndexReader[]) to create the output index."
>> I believe you assumed that all the existing fields are stored. I have a
>> few
>> fields which are only indexed, not stored. Is there a way to add a new
>> Field(stored, not indexed) to document in such an index, without
>> reindexing
>> the whole index.
>> Any suggestions will be very helpful!
>>
>
> Unfortunately no - my previous advice still applies:
>
>
>  I would also like to know, if data or term vector, of a field
>>>> indexed without storing, can somehow be retrieved. This would enable
>>>> a work around solution to my problem.
>>>>
>>>
>>> Not really, and the re-construction is very costly. Indexing is a lossy
>>> process, so not all content can be recovered. See the "Reconstruct&
>>> Edit" functionality in Luke (http://www.getopt.org/luke).
>>>
>>
> At this point it will be less costly to reindex.
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Adding a new field to existing Index

Posted by Andrzej Bialecki <ab...@getopt.org>.

On 2010-07-07 14:49, Naveen Kumar wrote:
> Hi Andrzej Bialecki
>
> When you suggested -
>      "There are some other low-level ways to do this, but the easiest is to
>        use a FilterIndexReader, especially since you just want to add a
> stored
>        field - implement a subclass of FilterIndexReader that adds a new
> field
>        in getFieldNames() and document(int). Then use
>        IndexWriter.addIndexes(IndexReader[]) to create the output index."
> I believe you assumed that all the existing fields are stored. I have a few
> fields which are only indexed, not stored. Is there a way to add a new
> Field(stored, not indexed) to document in such an index, without reindexing
> the whole index.
> Any suggestions will be very helpful!

Unfortunately no - my previous advice still applies:

>>> I would also like to know, if data or term vector, of a field
>>> indexed without storing, can somehow be retrieved. This would enable
>>> a work around solution to my problem.
>>
>> Not really, and the re-construction is very costly. Indexing is a lossy
>> process, so not all content can be recovered. See the "Reconstruct&
>> Edit" functionality in Luke (http://www.getopt.org/luke).

At this point it will be less costly to reindex.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Adding a new field to existing Index

Posted by Naveen Kumar <id...@gmail.com>.

Hi Andrzej Bialecki

When you suggested -
    "There are some other low-level ways to do this, but the easiest is to
      use a FilterIndexReader, especially since you just want to add a
stored
      field - implement a subclass of FilterIndexReader that adds a new
field
      in getFieldNames() and document(int). Then use
      IndexWriter.addIndexes(IndexReader[]) to create the output index."
I believe you assumed that all the existing fields are stored. I have a few
fields which are only indexed, not stored. Is there a way to add a new
Field(stored, not indexed) to document in such an index, without reindexing
the whole index.
Any suggestions will be very helpful!

Thank you
Naveen Kumar

On Wed, Jun 30, 2010 at 12:34 PM, Andrzej Bialecki <ab...@getopt.org> wrote:

> On 2010-06-29 13:40, Naveen Kumar wrote:
> > Hey,
> >
> > I need to add a new field (a stored , not indexed field) for all
> > documents present in an existing large index. Reindexing the whole
> > index will be very costly. Is there a way to do this or any work
> > around?
>
> There are some other low-level ways to do this, but the easiest is to
> use a FilterIndexReader, especially since you just want to add a stored
> field - implement a subclass of FilterIndexReader that adds a new field
> in getFieldNames() and document(int). Then use
> IndexWriter.addIndexes(IndexReader[]) to create the output index.
>
> >
> > I would also like to know, if data or term vector, of a field
> > indexed without storing, can somehow be retrieved. This would enable
> > a work around solution to my problem.
>
> Not really, and the re-construction is very costly. Indexing is a lossy
> process, so not all content can be recovered. See the "Reconstruct &
> Edit" functionality in Luke (http://www.getopt.org/luke).
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Adding a new field to existing Index

Posted by Andrzej Bialecki <ab...@getopt.org>.

On 2010-06-29 13:40, Naveen Kumar wrote:
> Hey,
> 
> I need to add a new field (a stored , not indexed field) for all
> documents present in an existing large index. Reindexing the whole
> index will be very costly. Is there a way to do this or any work
> around?

There are some other low-level ways to do this, but the easiest is to
use a FilterIndexReader, especially since you just want to add a stored
field - implement a subclass of FilterIndexReader that adds a new field
in getFieldNames() and document(int). Then use
IndexWriter.addIndexes(IndexReader[]) to create the output index.

> 
> I would also like to know, if data or term vector, of a field
> indexed without storing, can somehow be retrieved. This would enable
> a work around solution to my problem.

Not really, and the re-construction is very costly. Indexing is a lossy
process, so not all content can be recovered. See the "Reconstruct &
Edit" functionality in Luke (http://www.getopt.org/luke).

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org