You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Kainth, Sachin" <Sa...@atkinsglobal.com> on 2007/03/13 18:24:07 UTC
IndexReader.GetTermFreqVectors
Hi all,
The documentation for the above method mentions something called a
vectorized field. Does anyone know what a vectorized field is?
This email and any attached files are confidential and copyright protected. If you are not the addressee, any dissemination of this communication is strictly prohibited. Unless otherwise expressly agreed in writing, nothing stated in this communication shall be legally binding.
The ultimate parent company of the Atkins Group is WS Atkins plc. Registered in England No. 1885586. Registered Office Woodcote Grove, Ashley Road, Epsom, Surrey KT18 5BW.
Consider the environment. Please don't print this e-mail unless you really need to.
Re: Wildcard searches with * or ? as the first character - Thanks
Posted by Oystein Reigem <oy...@aksis.uib.no>.
Thanks Steven and Antony.
I read the FAQ not very long ago, but that slipped my attention. Or
perhaps it's a recent change.
- Øystein -
--
Øystein Reigem, The department of culture, language and information technology (Aksis), Allegt 27, N-5007 Bergen, Norway. Tel: +47 55 58 32 42. Fax: +47 55 58 94 70. E-mail: <oy...@aksis.uib.no>. Home tel: +47 56 14 06 11. Mobile: +47 97 16 96 64. Home e-mail: <or...@broadpark.no>. Aksis home page: <www.aksis.uib.no>.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Wildcard searches with * or ? as the first character
Posted by Steven Parkes <st...@esseff.org>.
It's possible to do leading wildcard searches in Lucene as of 2.1. See
http://wiki.apache.org/lucene-java/LuceneFAQ#head-4d62118417eaef0dcb87f4370583f809848ea695
(http://tinyurl.com/366suf)
-----Original Message-----
From: Oystein Reigem [mailto:oystein.reigem@aksis.uib.no]
Sent: Tuesday, March 13, 2007 11:31 AM
To: java-user@lucene.apache.org
Subject: Wildcard searches with * or ? as the first character
Hi,
I have read that with Lucene it is not possible to do wildcard searches
with * or ? as the first character. Wildcard searches with * as the
first character (or both first and last character) are useful for text
in languages that have a lot of compound words, like German and the
Scandinavian languages.
Some systems do offer such searches, but at a penalty. I assume such
systems sometimes do a sequential search of the text, which is slow, and
sometimes a sequential search of an index, which might be a bit faster,
but still quite slow.
But a slow search might be better than no search, as long as the user is
aware of the consequences of doing wildcard searches starting with a
wildcard character.
Any comments?
Cheers,
- Øystein -
--
Øystein Reigem, The department of culture, language and information technology (Aksis), Allegt 27, N-5007 Bergen, Norway. Tel: +47 55 58 32 42. Fax: +47 55 58 94 70. E-mail: <oy...@aksis.uib.no>. Home tel: +47 56 14 06 11. Mobile: +47 97 16 96 64. Home e-mail: <or...@broadpark.no>. Aksis home page: <www.aksis.uib.no>.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Wildcard searches with * or ? as the first character
Posted by Antony Bowesman <ad...@teamware.com>.
> I have read that with Lucene it is not possible to do wildcard searches
> with * or ? as the first character. Wildcard searches with * as the
Lucene supports it. If you are using QueryParser to parse your queries see
http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html#setAllowLeadingWildcard(boolean)
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Wildcard searches with * or ? as the first character
Posted by Oystein Reigem <oy...@aksis.uib.no>.
Hi,
I have read that with Lucene it is not possible to do wildcard searches
with * or ? as the first character. Wildcard searches with * as the
first character (or both first and last character) are useful for text
in languages that have a lot of compound words, like German and the
Scandinavian languages.
Some systems do offer such searches, but at a penalty. I assume such
systems sometimes do a sequential search of the text, which is slow, and
sometimes a sequential search of an index, which might be a bit faster,
but still quite slow.
But a slow search might be better than no search, as long as the user is
aware of the consequences of doing wildcard searches starting with a
wildcard character.
Any comments?
Cheers,
- Øystein -
--
Øystein Reigem, The department of culture, language and information technology (Aksis), Allegt 27, N-5007 Bergen, Norway. Tel: +47 55 58 32 42. Fax: +47 55 58 94 70. E-mail: <oy...@aksis.uib.no>. Home tel: +47 56 14 06 11. Mobile: +47 97 16 96 64. Home e-mail: <or...@broadpark.no>. Aksis home page: <www.aksis.uib.no>.
Re: IndexReader.GetTermFreqVectors
Posted by Ian Lea <ia...@gmail.com>.
>From http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.TermVector.html:
"A term vector is a list of the document's terms and their number of
occurences in that document."
--
Ian.
On 3/14/07, Kainth, Sachin <Sa...@atkinsglobal.com> wrote:
> Yes but what is a term vector?
>
> -----Original Message-----
> From: Grant Ingersoll [mailto:grant.ingersoll@gmail.com]
> Sent: 13 March 2007 19:28
> To: java-user@lucene.apache.org
> Subject: Re: IndexReader.GetTermFreqVectors
>
> It means it return the term vectors for all the fields on that document
> where you have enabled TermVector when creating the Document.
>
> i.e. new Field(...., TermVector.YES) (see http://lucene.apache.org/
> java/docs/api/org/apache/lucene/document/Field.TermVector.html for the
> full array of options)
>
> -Grant
>
> On Mar 13, 2007, at 1:24 PM, Kainth, Sachin wrote:
>
> > Hi all,
> >
> > The documentation for the above method mentions something called a
> > vectorized field. Does anyone know what a vectorized field is?
> >
> >
> >
> >
> > This email and any attached files are confidential and copyright
> > protected. If you are not the addressee, any dissemination of this
> > communication is strictly prohibited. Unless otherwise expressly
> > agreed in writing, nothing stated in this communication shall be
> > legally binding.
> >
> > The ultimate parent company of the Atkins Group is WS Atkins plc.
> > Registered in England No. 1885586. Registered Office Woodcote Grove,
> > Ashley Road, Epsom, Surrey KT18 5BW.
> >
> > Consider the environment. Please don't print this e-mail unless you
> > really need to.
>
> ------------------------------------------------------
> Grant Ingersoll
> http://www.grantingersoll.com/
> http://lucene.grantingersoll.com
> http://www.paperoftheweek.com/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> This message has been scanned for viruses by MailControl - (see
> http://bluepages.wsatkins.co.uk/?6875772)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: IndexReader.GetTermFreqVectors
Posted by "Kainth, Sachin" <Sa...@atkinsglobal.com>.
Yes but what is a term vector?
-----Original Message-----
From: Grant Ingersoll [mailto:grant.ingersoll@gmail.com]
Sent: 13 March 2007 19:28
To: java-user@lucene.apache.org
Subject: Re: IndexReader.GetTermFreqVectors
It means it return the term vectors for all the fields on that document
where you have enabled TermVector when creating the Document.
i.e. new Field(...., TermVector.YES) (see http://lucene.apache.org/
java/docs/api/org/apache/lucene/document/Field.TermVector.html for the
full array of options)
-Grant
On Mar 13, 2007, at 1:24 PM, Kainth, Sachin wrote:
> Hi all,
>
> The documentation for the above method mentions something called a
> vectorized field. Does anyone know what a vectorized field is?
>
>
>
>
> This email and any attached files are confidential and copyright
> protected. If you are not the addressee, any dissemination of this
> communication is strictly prohibited. Unless otherwise expressly
> agreed in writing, nothing stated in this communication shall be
> legally binding.
>
> The ultimate parent company of the Atkins Group is WS Atkins plc.
> Registered in England No. 1885586. Registered Office Woodcote Grove,
> Ashley Road, Epsom, Surrey KT18 5BW.
>
> Consider the environment. Please don't print this e-mail unless you
> really need to.
------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
This message has been scanned for viruses by MailControl - (see
http://bluepages.wsatkins.co.uk/?6875772)
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: IndexReader.GetTermFreqVectors
Posted by Grant Ingersoll <gr...@gmail.com>.
It means it return the term vectors for all the fields on that
document where you have enabled TermVector when creating the Document.
i.e. new Field(...., TermVector.YES) (see http://lucene.apache.org/
java/docs/api/org/apache/lucene/document/Field.TermVector.html for
the full array of options)
-Grant
On Mar 13, 2007, at 1:24 PM, Kainth, Sachin wrote:
> Hi all,
>
> The documentation for the above method mentions something called a
> vectorized field. Does anyone know what a vectorized field is?
>
>
>
>
> This email and any attached files are confidential and copyright
> protected. If you are not the addressee, any dissemination of this
> communication is strictly prohibited. Unless otherwise expressly
> agreed in writing, nothing stated in this communication shall be
> legally binding.
>
> The ultimate parent company of the Atkins Group is WS Atkins plc.
> Registered in England No. 1885586. Registered Office Woodcote
> Grove, Ashley Road, Epsom, Surrey KT18 5BW.
>
> Consider the environment. Please don't print this e-mail unless you
> really need to.
------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org