You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by spinergywmy <sp...@gmail.com> on 2006/12/12 09:34:54 UTC

How to delete partial index

Hi,

   I have ask this question before but may be the question wasn't clear.

   How can I delete particular index that I want to and keep the rest? For
instance, I have been indexed document Id, date, user Id and contents, my
question is does that particular contents will be deleted if I just
specified the document Id, and I used reader.deleteDocument(document Id).

   And I have another question is if I do normal cut and paste the document,
how can I delete the index content from one destination and restore to
another destination and the index file must merge.

   Thanks


regards,
Wooi Meng
-- 
View this message in context: http://www.nabble.com/How-to-delete-partial-index-tf2806204.html#a7829277
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to delete partial index

Posted by spinergywmy <sp...@gmail.com>.
Hi,

   I m just wondering is there any unique key that I can use to delete
particular document? How can I check the postion of a particular document
inside index file? Is there any example that I can refer to on how to delete
documents by a term.

   For second scenario, the reason why I m doing this is because when I move
the file from one destination to another, and if I do not do this, the exact
location of the file might not be correct. For example, when I index the
file in C drive and now I do cut and paste, the file is at D drive, so I m
wondering when I do index search and return result, and when I view the
file, can I still view it after the file has been moved.

   How you can provide me some idea for the above scenario. Thanks.


regards,
Wooi Meng
-- 
View this message in context: http://www.nabble.com/How-to-delete-partial-index-tf2806204.html#a7844899
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to delete partial index

Posted by spinergywmy <sp...@gmail.com>.
Hi,

   I manage to delete the document based on term, but that is just 1 part. I
wonder do lucene support how I can pull out the info that I have been
indexed and place it into other index file. Is it the only way that I have
to use indexwriter to perform indexing again with all the necessary fields.
Or actually lucene provide short cut for doing this. The above scenario is
the action cut and paste that I need to do.

   Hope you can provide me some idea on how to do this. Thanks


regards,
Wooi Meng
-- 
View this message in context: http://www.nabble.com/How-to-delete-partial-index-tf2806204.html#a7846137
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to delete partial index

Posted by Erick Erickson <er...@gmail.com>.
you have to search against something known. You simply (as has been
mentioned many times) cannot rely on the document IDs.

So, I'd store the full path (untokenized) of the file. When you move a file,
search for the path in the appropriate field in your index that the file was
originally stored in. Then delete. Or just delete it by that term.

Erick

On 12/12/06, spinergywmy <sp...@gmail.com> wrote:
>
>
> Hi,
>
>    When I perform delete document and delete document based on the Id,
> does
> the Id is the unique key and by deleting based on the Id, all the related
> info will be deleted as well? If so, how can I know the document Id?
> Thanks.
>
>
> regards,
> Wooi Meng
> --
> View this message in context:
> http://www.nabble.com/How-to-delete-partial-index-tf2806204.html#a7844980
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: How to delete partial index

Posted by spinergywmy <sp...@gmail.com>.
Hi,

   When I perform delete document and delete document based on the Id, does
the Id is the unique key and by deleting based on the Id, all the related
info will be deleted as well? If so, how can I know the document Id? Thanks.


regards,
Wooi Meng
-- 
View this message in context: http://www.nabble.com/How-to-delete-partial-index-tf2806204.html#a7844980
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to delete partial index

Posted by Doron Cohen <DO...@il.ibm.com>.
spinergywmy <sp...@gmail.com> wrote:
>
> Hi,
>
>    I have ask this question before but may be the question wasn't clear.
>
>    How can I delete particular index that I want to and keep the rest?
For
> instance, I have been indexed document Id, date, user Id and contents, my
> question is does that particular contents will be deleted if I just
> specified the document Id, and I used reader.deleteDocument(document Id).

I am not 100% sure what you mean by "delete particular index".
Here "index" is mostly used as in "The result of indexing documents with
Lucene is an __INDEX__ that can then be searched, updated, etc."
But perhaps by index you mean "an __ID__ of a certain document"?

Anyhow, assume you created a lucene index and added to it some 1000
documents. You can now delete documents from that index. The remaining
index would still be valid and useful but would have less documents that
are valid as search results. For instance, if you make 500 calls:
reader.deleteDocument(0),reader.deleteDocument(2),...,reader.deleteDocument(998);,
 your index would now have only 500 remaining documents that are valid as
search results. Their (internal) docids would (temporarily) be:
1,3,5,...,999. A few things to notice about this:
1) The fact that these documents were deleted would be:
- reflected immediately in searches that use a Searcher opened against the
same IndexReader used for these deletions.
- reflected in the index Directory only once the deleting IndexReader is
closed.
- reflected in searches through other IndexSearchers only if they are
opened after the deleting IndexReader is closed.
2) The "deleted" documents are still in the index, for some time. They are
excluded from search results, since they are marked deleted. When an index
segment is merged (either exlicitely as result of call to optimize or
implicitely as result of adding a document or closing an index writer), the
segment's deleted documnets are actually deleted, and the docids are
modified so as to discard the (internal) docids gaps. In the example above,
after optimize(), you would have (internal) docid: 0,1,2,..,499.
3) As a consequence of all this, it is usually not the best thing to count
on (internal) docids, and so deleting doucments by a term would usually be
safer.

>
>    And I have another question is if I do normal cut and paste the
document,
> how can I delete the index content from one destination and restore to
> another destination and the index file must merge.

Again I am not sure what you mean here.
Is the scenario that you have, say, two Lucene indexes, I1 and I2, and I1
has 100 documents (0.99), and I2 has 100 documents (0..99), and you want to
"cut and paste", say, some documents (say, all documents containing the
term "MoveMeToTheOtherIndex") from index I1 to I2, and, assume there are 50
documents like this in I1, after that there would be 50 (undeleted) docs
left in I1 and 150 docs in I2...? If this is the case Lucene does not
supports this and an application would need to implement this by itself -
adding the "moved" documents to index I2. Why would you want to do this?

>
>    Thanks
>
>
> regards,
> Wooi Meng
> --
> View this message in context:
> http://www.nabble.com/How-to-delete-partial-index-tf2806204.html#a7829277


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org