You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kevin Burton <bu...@newsmonster.org> on 2004/05/13 08:19:09 UTC

Possible to fetch a document without all fields for performance?

Say I have a query result for the term Linux... now I just want the 
TITLE of these documents not the BODY.

To further this scenario imagine the TITLE is 500 bytes but the  BODY is 
50M. 

The current impl of fetching a document will pull in ALL 50,000,500 
bytes not just the 500 that I need. 

Obviously if I could just get the TITLE field this would be a HUGE speedup.

Is there a somewhat simple and efficient way to get a document with a 
restricted set of fields?  Digging through the API it didnt' seem obvious.

Kevin

-- 

Please reply using PGP.

    http://peerfear.org/pubkey.asc    
    
    NewsMonster - http://www.newsmonster.org/
    
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


Re: Possible to fetch a document without all fields for performance?

Posted by Holger Klawitter <li...@klawitter.de>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Is there a somewhat simple and efficient way to get a document with a
> restricted set of fields?  Digging through the API it didnt' seem obvious.

You should probably store the "body" of the document outside Lucene. You can 
add a field instead which contains some kind of pointer to the body itself.

Mit freundlichem Gruß / With kind regards
	Holger Klawitter
- --
lists <at> klawitter <dot> de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAox9o1Xdt0HKSwgYRAvptAJ4nbXjKu1Tqp64djZwCUg4Jsi7FhgCghmhQ
VsP5qMYs48lflC0UjcoF/co=
=ZQlx
-----END PGP SIGNATURE-----


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Possible to fetch a document without all fields for performance?

Posted by Kevin Burton <bu...@newsmonster.org>.
Morus Walter wrote:

>I don't understand that.
>You get the document object which does not contain the documents field
>contents. It just provides access to this data.
>It's up to you which fields you access.
>And remember that you don't have to store fields at all, if you don't need 
>to retrieve them (e.g. because the original documents are somewhere else). 
>  
>
Nope... When you get the Document the fields are already pre-parsed from 
disk. If you don't call ANY methods to get fields it still has to read 
all the fields off disk.

Kevin

-- 

Please reply using PGP.

    http://peerfear.org/pubkey.asc    
    
    NewsMonster - http://www.newsmonster.org/
    
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


RE: Possible to fetch a document without all fields for performance?

Posted by Karthik N S <ka...@controlnet.co.in>.
Hey
    Have a look at the  "org.apache.lucene.demo.html.Test.java" in Lucene
1.3 final
   u can strip out the Title or summary from the HTML Page and add it to
relative Index field.

Karthik

-----Original Message-----
From: Morus Walter [mailto:morus.walter@tanto.de]
Sent: Thursday, May 13, 2004 12:32 PM
To: Lucene Users List
Subject: Re: Possible to fetch a document without all fields for
performance?


Kevin Burton writes:
> Say I have a query result for the term Linux... now I just want the
> TITLE of these documents not the BODY.
>
> To further this scenario imagine the TITLE is 500 bytes but the  BODY is
> 50M.
>
> The current impl of fetching a document will pull in ALL 50,000,500
> bytes not just the 500 that I need.
>
> Obviously if I could just get the TITLE field this would be a HUGE
speedup.
>
> Is there a somewhat simple and efficient way to get a document with a
> restricted set of fields?  Digging through the API it didnt' seem obvious.
>
I don't understand that.
You get the document object which does not contain the documents field
contents. It just provides access to this data.
It's up to you which fields you access.
And remember that you don't have to store fields at all, if you don't need
to retrieve them (e.g. because the original documents are somewhere else).

Morus



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Possible to fetch a document without all fields for performance?

Posted by Morus Walter <mo...@tanto.de>.
Kevin Burton writes:
> Say I have a query result for the term Linux... now I just want the 
> TITLE of these documents not the BODY.
> 
> To further this scenario imagine the TITLE is 500 bytes but the  BODY is 
> 50M. 
> 
> The current impl of fetching a document will pull in ALL 50,000,500 
> bytes not just the 500 that I need. 
> 
> Obviously if I could just get the TITLE field this would be a HUGE speedup.
> 
> Is there a somewhat simple and efficient way to get a document with a 
> restricted set of fields?  Digging through the API it didnt' seem obvious.
> 
I don't understand that.
You get the document object which does not contain the documents field
contents. It just provides access to this data.
It's up to you which fields you access.
And remember that you don't have to store fields at all, if you don't need 
to retrieve them (e.g. because the original documents are somewhere else). 

Morus



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Possible to fetch a document without all fields for performance?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Kevin,

There is no API for this, and I agree it would be handy.

Otis

--- Kevin Burton <bu...@newsmonster.org> wrote:
> Say I have a query result for the term Linux... now I just want the 
> TITLE of these documents not the BODY.
> 
> To further this scenario imagine the TITLE is 500 bytes but the  BODY
> is 
> 50M. 
> 
> The current impl of fetching a document will pull in ALL 50,000,500 
> bytes not just the 500 that I need. 
> 
> Obviously if I could just get the TITLE field this would be a HUGE
> speedup.
> 
> Is there a somewhat simple and efficient way to get a document with a
> 
> restricted set of fields?  Digging through the API it didnt' seem
> obvious.
> 
> Kevin
> 
> -- 
> 
> Please reply using PGP.
> 
>     http://peerfear.org/pubkey.asc    
>     
>     NewsMonster - http://www.newsmonster.org/
>     
> Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
>        AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
>   IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
> 
> > begin:vcard
> fn:Kevin Burton
> n:Burton;Kevin
> email;internet:burton@sofari.com
> tel;work:415-595-9965
> tel;home:415-595-9965
> tel;cell:415-595-9965
> x-mozilla-html:TRUE
> version:2.1
> end:vcard
> 
> 

> ATTACHMENT part 2 application/pgp-signature name=signature.asc



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org