You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dan Luria <da...@dotdan.com> on 2007/09/12 23:06:59 UTC

Tokenization question

If I have a tokenized unstored field in a document, and I want to
transfer the document to another index, is it possible to carry of the
tokenization with terms?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Tokenization question

Posted by Mike Klaas <mi...@gmail.com>.
On 13-Sep-07, at 12:37 PM, Dan Luria wrote:

> What I do is
>
> Doc1 = source_doc
> Doc2 = new Document()
> foreach (field f in doc1.getfields) {
> Doc2.Add(new Field(doc1.getField(key), doc1.getField(value));
> }
>
> but when i pull the fields from Doc1, i never get the tokenized  
> field..
> it just doesnt appear.
>
> so my question is -- i can see that field in the index, and search
> against it, but how do i transfer it to a different index?
>
> (PS: The above is pseudo-code... not syntax)

indexed document fields are not stored anywhere.  There are bits and  
pieces of the document all over the place (this is the nature of an  
inverted index).

You can (quite time-consumedly) reconstruct by iterating over the  
whole index.  I think luke can do this.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Tokenization question

Posted by Dan Luria <da...@dotdan.com>.
What I do is

Doc1 = source_doc
Doc2 = new Document()
foreach (field f in doc1.getfields) {
Doc2.Add(new Field(doc1.getField(key), doc1.getField(value));
}

but when i pull the fields from Doc1, i never get the tokenized field..
it just doesnt appear. 

so my question is -- i can see that field in the index, and search
against it, but how do i transfer it to a different index?

(PS: The above is pseudo-code... not syntax)


On Thu, 13 Sep 2007 09:55:31 -0400, "Erick Erickson"
<er...@gmail.com> said:
> If I'm understanding you correctly, the answer is... maybe, kinda.
> Take a look at some of the Luke code. That tries to reconstruct
> document fields from the index, but it's lossy. So it depends
> upon what kind of fidelity you need.
> 
> Erick
> 
> On 9/12/07, Dan Luria <da...@dotdan.com> wrote:
> >
> > If I have a tokenized unstored field in a document, and I want to
> > transfer the document to another index, is it possible to carry of the
> > tokenization with terms?
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Tokenization question

Posted by Erick Erickson <er...@gmail.com>.
If I'm understanding you correctly, the answer is... maybe, kinda.
Take a look at some of the Luke code. That tries to reconstruct
document fields from the index, but it's lossy. So it depends
upon what kind of fidelity you need.

Erick

On 9/12/07, Dan Luria <da...@dotdan.com> wrote:
>
> If I have a tokenized unstored field in a document, and I want to
> transfer the document to another index, is it possible to carry of the
> tokenization with terms?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>