You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Stefano Fornari <st...@gmail.com> on 2007/05/17 00:58:30 UTC

Recreating a document from its index

Hi All,
I have a question to which I could not answer reading the
documentation and searching the mailing list archive: is it possible
to recreate a document (or a good approximation of it) from its index?

If I well understood the doc, the index stores each term, and per each
term the positions where it occurs in the document. Does it mean that
if I enumerate all terms I can recreate the document knowing their
positions? If the answer is yes, is there a way to disable the storage
of terms positions? Is it used for the search too?

Thanks in advance to anyone getting the time to answer.

Stefano

-- 
Stefano Fornari - Funambol Chief Architect / Funambol CTO
=======================================================
Home:
http://www.funambol.org

Documents:
http://www.funambol.org/documentation/documents.html

FAQ:
http://www.funambol.org/support/faq.html

WIKI:
https://wiki.objectweb.org/sync4j/

Mailinglist archives:
http://groups.yahoo.com/group/Sync4j (login required)
http://sourceforge.net/mailarchive/forum.php?forum_id=215 (sync4j-users)
http://sourceforge.net/mailarchive/forum.php?forum_id=48877
(funambol-dev)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Recreating a document from its index

Posted by Stefano Fornari <st...@gmail.com>.
Thanks a lot. Sorry for the wrong list usage.

Stefano

On 5/17/07, Chris Hostetter <ho...@fucit.org> wrote:
>
> : thanks a lot. I see part of the email was more for the users list.
> : However, assuming this is not what I want (I do not want to be able to
> : reconstruct a document or a good approximation of it), would it be
> : very disruptive for Lucene architecture avoid to store the positions?
> : Would it be a change that would touch the core of the library or the
> : indexing engine?
>
> see this thread, starting with this particular message...
>
> http://www.nabble.com/Index-a-source%2C-but-not-store-it...-can-it-be-done--tf3369910.html#a9387997
>
> And for the record: the dev list is more geared for questions/discussions
> about changing/improving the APIs/internals once you already have a
> specific idea in mind about changing hte internals ... if the nature of
> your question is: "is this possible" or "what would it take to make this
> possible" you should start with the user list.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Stefano Fornari - Funambol Chief Architect / Funambol CTO
=======================================================
Home:
http://www.funambol.org

Documents:
http://www.funambol.org/documentation/documents.html

FAQ:
http://www.funambol.org/support/faq.html

WIKI:
https://wiki.objectweb.org/sync4j/

Mailinglist archives:
http://groups.yahoo.com/group/Sync4j (login required)
http://sourceforge.net/mailarchive/forum.php?forum_id=215 (sync4j-users)
http://sourceforge.net/mailarchive/forum.php?forum_id=48877
(funambol-dev)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Recreating a document from its index

Posted by Chris Hostetter <ho...@fucit.org>.
: thanks a lot. I see part of the email was more for the users list.
: However, assuming this is not what I want (I do not want to be able to
: reconstruct a document or a good approximation of it), would it be
: very disruptive for Lucene architecture avoid to store the positions?
: Would it be a change that would touch the core of the library or the
: indexing engine?

see this thread, starting with this particular message...

http://www.nabble.com/Index-a-source%2C-but-not-store-it...-can-it-be-done--tf3369910.html#a9387997

And for the record: the dev list is more geared for questions/discussions
about changing/improving the APIs/internals once you already have a
specific idea in mind about changing hte internals ... if the nature of
your question is: "is this possible" or "what would it take to make this
possible" you should start with the user list.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Recreating a document from its index

Posted by Stefano Fornari <st...@gmail.com>.
Hi Daniel,
thanks a lot. I see part of the email was more for the users list.
However, assuming this is not what I want (I do not want to be able to
reconstruct a document or a good approximation of it), would it be
very disruptive for Lucene architecture avoid to store the positions?
Would it be a change that would touch the core of the library or the
indexing engine?

I will do my investigation in the code, but maybe the experts here
have already a piece of the answer.

Thanks agains.

Stefano

On 5/17/07, Daniel Naber <lu...@danielnaber.de> wrote:
> On Thursday 17 May 2007 00:58, Stefano Fornari wrote:
>
> > I have a question to which I could not answer reading the
> > documentation and searching the mailing list archive:
>
> This actually belongs more to the user list...  try Luke and click on the
> "Reconstruct & Edit" button, then on the "Tokenized" tab. This will show
> you what can be recreated. This depends on the stopwords and the other
> normalizations made by the Analyzer.
>
> Regards
>  Daniel
>
> --
> http://www.danielnaber.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Stefano Fornari - Funambol Chief Architect / Funambol CTO
=======================================================
Home:
http://www.funambol.org

Documents:
http://www.funambol.org/documentation/documents.html

FAQ:
http://www.funambol.org/support/faq.html

WIKI:
https://wiki.objectweb.org/sync4j/

Mailinglist archives:
http://groups.yahoo.com/group/Sync4j (login required)
http://sourceforge.net/mailarchive/forum.php?forum_id=215 (sync4j-users)
http://sourceforge.net/mailarchive/forum.php?forum_id=48877
(funambol-dev)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Recreating a document from its index

Posted by Daniel Naber <lu...@danielnaber.de>.
On Thursday 17 May 2007 00:58, Stefano Fornari wrote:

> I have a question to which I could not answer reading the
> documentation and searching the mailing list archive:

This actually belongs more to the user list...  try Luke and click on the 
"Reconstruct & Edit" button, then on the "Tokenized" tab. This will show 
you what can be recreated. This depends on the stopwords and the other 
normalizations made by the Analyzer.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org