You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Kainth, Sachin" <Sa...@atkinsglobal.com> on 2007/03/13 18:24:07 UTC

IndexReader.GetTermFreqVectors

Hi all,

The documentation for the above method mentions something called a
vectorized field.  Does anyone know what a vectorized field is?




This email and any attached files are confidential and copyright protected. If you are not the addressee, any dissemination of this communication is strictly prohibited. Unless otherwise expressly agreed in writing, nothing stated in this communication shall be legally binding.

The ultimate parent company of the Atkins Group is WS Atkins plc.  Registered in England No. 1885586.  Registered Office Woodcote Grove, Ashley Road, Epsom, Surrey KT18 5BW.

Consider the environment. Please don't print this e-mail unless you really need to. 

Re: Wildcard searches with * or ? as the first character - Thanks

Posted by Oystein Reigem <oy...@aksis.uib.no>.
Thanks Steven and Antony.

I read the FAQ not very long ago, but that slipped my attention. Or 
perhaps it's a recent change.

- Øystein -

-- 
Øystein Reigem, The department of culture, language and information technology (Aksis), Allegt 27, N-5007 Bergen, Norway. Tel: +47 55 58 32 42. Fax: +47 55 58 94 70. E-mail: <oy...@aksis.uib.no>. Home tel: +47 56 14 06 11. Mobile: +47 97 16 96 64. Home e-mail: <or...@broadpark.no>. Aksis home page: <www.aksis.uib.no>.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Wildcard searches with * or ? as the first character

Posted by Steven Parkes <st...@esseff.org>.
It's possible to do leading wildcard searches in Lucene as of 2.1. See 
http://wiki.apache.org/lucene-java/LuceneFAQ#head-4d62118417eaef0dcb87f4370583f809848ea695
(http://tinyurl.com/366suf)

-----Original Message-----
From: Oystein Reigem [mailto:oystein.reigem@aksis.uib.no] 
Sent: Tuesday, March 13, 2007 11:31 AM
To: java-user@lucene.apache.org
Subject: Wildcard searches with * or ? as the first character

Hi,

I have read that with Lucene it is not possible to do wildcard searches 
with * or ? as the first character. Wildcard searches with * as the 
first character (or both first and last character) are useful for text 
in languages that have a lot of compound words, like German and the 
Scandinavian languages.

Some systems do offer such searches, but at a penalty. I assume such 
systems sometimes do a sequential search of the text, which is slow, and 
sometimes a sequential search of an index, which might be a bit faster, 
but still quite slow.

But a slow search might be better than no search, as long as the user is 
aware of the consequences of doing wildcard searches starting with a 
wildcard character.

Any comments?

Cheers,

- Øystein -

-- 
Øystein Reigem, The department of culture, language and information technology (Aksis), Allegt 27, N-5007 Bergen, Norway. Tel: +47 55 58 32 42. Fax: +47 55 58 94 70. E-mail: <oy...@aksis.uib.no>. Home tel: +47 56 14 06 11. Mobile: +47 97 16 96 64. Home e-mail: <or...@broadpark.no>. Aksis home page: <www.aksis.uib.no>.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Wildcard searches with * or ? as the first character

Posted by Antony Bowesman <ad...@teamware.com>.
> I have read that with Lucene it is not possible to do wildcard searches 
> with * or ? as the first character. Wildcard searches with * as the 

Lucene supports it.  If you are using QueryParser to parse your queries see

http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html#setAllowLeadingWildcard(boolean)

Antony




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Wildcard searches with * or ? as the first character

Posted by Oystein Reigem <oy...@aksis.uib.no>.
Hi,

I have read that with Lucene it is not possible to do wildcard searches 
with * or ? as the first character. Wildcard searches with * as the 
first character (or both first and last character) are useful for text 
in languages that have a lot of compound words, like German and the 
Scandinavian languages.

Some systems do offer such searches, but at a penalty. I assume such 
systems sometimes do a sequential search of the text, which is slow, and 
sometimes a sequential search of an index, which might be a bit faster, 
but still quite slow.

But a slow search might be better than no search, as long as the user is 
aware of the consequences of doing wildcard searches starting with a 
wildcard character.

Any comments?

Cheers,

- Øystein -

-- 
Øystein Reigem, The department of culture, language and information technology (Aksis), Allegt 27, N-5007 Bergen, Norway. Tel: +47 55 58 32 42. Fax: +47 55 58 94 70. E-mail: <oy...@aksis.uib.no>. Home tel: +47 56 14 06 11. Mobile: +47 97 16 96 64. Home e-mail: <or...@broadpark.no>. Aksis home page: <www.aksis.uib.no>.


Re: IndexReader.GetTermFreqVectors

Posted by Ian Lea <ia...@gmail.com>.
>From http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.TermVector.html:

"A term vector is a list of the document's terms and their number of
occurences in that document."


--
Ian.


On 3/14/07, Kainth, Sachin <Sa...@atkinsglobal.com> wrote:
> Yes but what is a term vector?
>
> -----Original Message-----
> From: Grant Ingersoll [mailto:grant.ingersoll@gmail.com]
> Sent: 13 March 2007 19:28
> To: java-user@lucene.apache.org
> Subject: Re: IndexReader.GetTermFreqVectors
>
> It means it return the term vectors for all the fields on that document
> where you have enabled TermVector when creating the Document.
>
> i.e. new Field(...., TermVector.YES) (see http://lucene.apache.org/
> java/docs/api/org/apache/lucene/document/Field.TermVector.html for the
> full array of options)
>
> -Grant
>
> On Mar 13, 2007, at 1:24 PM, Kainth, Sachin wrote:
>
> > Hi all,
> >
> > The documentation for the above method mentions something called a
> > vectorized field.  Does anyone know what a vectorized field is?
> >
> >
> >
> >
> > This email and any attached files are confidential and copyright
> > protected. If you are not the addressee, any dissemination of this
> > communication is strictly prohibited. Unless otherwise expressly
> > agreed in writing, nothing stated in this communication shall be
> > legally binding.
> >
> > The ultimate parent company of the Atkins Group is WS Atkins plc.
> > Registered in England No. 1885586.  Registered Office Woodcote Grove,
> > Ashley Road, Epsom, Surrey KT18 5BW.
> >
> > Consider the environment. Please don't print this e-mail unless you
> > really need to.
>
> ------------------------------------------------------
> Grant Ingersoll
> http://www.grantingersoll.com/
> http://lucene.grantingersoll.com
> http://www.paperoftheweek.com/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> This message has been scanned for viruses by MailControl - (see
> http://bluepages.wsatkins.co.uk/?6875772)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: IndexReader.GetTermFreqVectors

Posted by "Kainth, Sachin" <Sa...@atkinsglobal.com>.
Yes but what is a term vector? 

-----Original Message-----
From: Grant Ingersoll [mailto:grant.ingersoll@gmail.com] 
Sent: 13 March 2007 19:28
To: java-user@lucene.apache.org
Subject: Re: IndexReader.GetTermFreqVectors

It means it return the term vectors for all the fields on that document
where you have enabled TermVector when creating the Document.

i.e. new Field(...., TermVector.YES) (see http://lucene.apache.org/
java/docs/api/org/apache/lucene/document/Field.TermVector.html for the
full array of options)

-Grant

On Mar 13, 2007, at 1:24 PM, Kainth, Sachin wrote:

> Hi all,
>
> The documentation for the above method mentions something called a 
> vectorized field.  Does anyone know what a vectorized field is?
>
>
>
>
> This email and any attached files are confidential and copyright 
> protected. If you are not the addressee, any dissemination of this 
> communication is strictly prohibited. Unless otherwise expressly 
> agreed in writing, nothing stated in this communication shall be 
> legally binding.
>
> The ultimate parent company of the Atkins Group is WS Atkins plc.   
> Registered in England No. 1885586.  Registered Office Woodcote Grove, 
> Ashley Road, Epsom, Surrey KT18 5BW.
>
> Consider the environment. Please don't print this e-mail unless you 
> really need to.

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



This message has been scanned for viruses by MailControl - (see
http://bluepages.wsatkins.co.uk/?6875772)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader.GetTermFreqVectors

Posted by Grant Ingersoll <gr...@gmail.com>.
It means it return the term vectors for all the fields on that  
document where you have enabled TermVector when creating the Document.

i.e. new Field(...., TermVector.YES) (see http://lucene.apache.org/ 
java/docs/api/org/apache/lucene/document/Field.TermVector.html for  
the full array of options)

-Grant

On Mar 13, 2007, at 1:24 PM, Kainth, Sachin wrote:

> Hi all,
>
> The documentation for the above method mentions something called a
> vectorized field.  Does anyone know what a vectorized field is?
>
>
>
>
> This email and any attached files are confidential and copyright  
> protected. If you are not the addressee, any dissemination of this  
> communication is strictly prohibited. Unless otherwise expressly  
> agreed in writing, nothing stated in this communication shall be  
> legally binding.
>
> The ultimate parent company of the Atkins Group is WS Atkins plc.   
> Registered in England No. 1885586.  Registered Office Woodcote  
> Grove, Ashley Road, Epsom, Surrey KT18 5BW.
>
> Consider the environment. Please don't print this e-mail unless you  
> really need to.

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org