You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Praveen Peddi <pp...@contextmedia.com> on 2004/07/01 16:15:45 UTC

Sorting and tokenization

Hello all,
Now that lucene 1.4 rc3 has sorting functionality built in, I am adding sorting functionality to our searching. Before posting any question to this mailing list, I have been going thru most of the email responses in this mailing list related to sorting. I have found that I cannot tokenize the fields that I want to sort on.

Lets take the example I have.
I use lucene 1.3 final for searching. Sorting is in fact a very important feature in our application. But we found that lucene does not support out of box, we had to implement sorting by score and doc id programatically which is kind of useless for us. So I thought lucene's new sorting feature will best suit now. But unfortunately, the field called "title" is tokenized currently. And this is done purposefully because users would want to search partial matches (or rather search on multiple words of the title). So if we make it un tokenized we may lose an improtant functionality.

My question is, is there any way I can achieve sorting the objects by title and keeping title as tokenized?

Thanks in advance.

Praveen


************************************************************** 
Praveen Peddi
Sr Software Engg, Context Media, Inc. 
email:ppeddi@contextmedia.com 
Tel:  401.854.3475 
Fax:  401.861.3596 
web: http://www.contextmedia.com 
************************************************************** 
Context Media- "The Leader in Enterprise Content Integration" 

Re: Sorting and tokenization

Posted by Praveen Peddi <pp...@contextmedia.com>.
The solution you suggested is exactly as I expected and I already thought
about implementing it. But the problem is the memory in efficiency. Somce
times titles are huge. And with i18n, title can be in japanese, chinese or
any language which takes mroe memory than english.

Ok. how about taking the first token of the title and using it just for the
sake of sorting. Does anyone see any problem with it? This solution saves
atleast some memory, compared to the other solution.

Praveen

----- Original Message ----- 
From: "John Moylan" <jo...@rte.ie>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, July 01, 2004 10:24 AM
Subject: Re: Sorting and tokenization


> Hi,
>
> You just need to have another title field that is not tokenized - for
> sorting purposes.
>
> Best,
> John
>
> On Thu, 2004-07-01 at 15:15, Praveen Peddi wrote:
> > Hello all,
> > Now that lucene 1.4 rc3 has sorting functionality built in, I am adding
sorting functionality to our searching. Before posting any question to this
mailing list, I have been going thru most of the email responses in this
mailing list related to sorting. I have found that I cannot tokenize the
fields that I want to sort on.
> >
> > Lets take the example I have.
> > I use lucene 1.3 final for searching. Sorting is in fact a very
important feature in our application. But we found that lucene does not
support out of box, we had to implement sorting by score and doc id
programatically which is kind of useless for us. So I thought lucene's new
sorting feature will best suit now. But unfortunately, the field called
"title" is tokenized currently. And this is done purposefully because users
would want to search partial matches (or rather search on multiple words of
the title). So if we make it un tokenized we may lose an improtant
functionality.
> >
> > My question is, is there any way I can achieve sorting the objects by
title and keeping title as tokenized?
> >
> > Thanks in advance.
> >
> > Praveen
> >
> >
> > **************************************************************
> > Praveen Peddi
> > Sr Software Engg, Context Media, Inc.
> > email:ppeddi@contextmedia.com
> > Tel:  401.854.3475
> > Fax:  401.861.3596
> > web: http://www.contextmedia.com
> > **************************************************************
> > Context Media- "The Leader in Enterprise Content Integration"
> -- 
> John Moylan
> ----------------------
> ePublishing
> Radio Telefis Eireann,
> Montrose House,
> Donnybrook,
> Dublin 4,
> Eire
> t:+353 1 2083564
> e:john.moylan@rte.ie
>
>
>
****************************************************************************
**
> The information in this e-mail is confidential and may be legally
privileged.
> It is intended solely for the addressee. Access to this e-mail by anyone
else
> is unauthorised. If you are not the intended recipient, any disclosure,
> copying, distribution, or any action taken or omitted to be taken in
reliance
> on it, is prohibited and may be unlawful.
> Please note that emails to, from and within RTÉ may be subject to the
Freedom
> of Information Act 1997 and may be liable to disclosure.
>
****************************************************************************
**
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Sorting and tokenization

Posted by John Moylan <jo...@rte.ie>.
Hi,

You just need to have another title field that is not tokenized - for
sorting purposes.

Best,
John

On Thu, 2004-07-01 at 15:15, Praveen Peddi wrote:
> Hello all,
> Now that lucene 1.4 rc3 has sorting functionality built in, I am adding sorting functionality to our searching. Before posting any question to this mailing list, I have been going thru most of the email responses in this mailing list related to sorting. I have found that I cannot tokenize the fields that I want to sort on.
> 
> Lets take the example I have.
> I use lucene 1.3 final for searching. Sorting is in fact a very important feature in our application. But we found that lucene does not support out of box, we had to implement sorting by score and doc id programatically which is kind of useless for us. So I thought lucene's new sorting feature will best suit now. But unfortunately, the field called "title" is tokenized currently. And this is done purposefully because users would want to search partial matches (or rather search on multiple words of the title). So if we make it un tokenized we may lose an improtant functionality.
> 
> My question is, is there any way I can achieve sorting the objects by title and keeping title as tokenized?
> 
> Thanks in advance.
> 
> Praveen
> 
> 
> ************************************************************** 
> Praveen Peddi
> Sr Software Engg, Context Media, Inc. 
> email:ppeddi@contextmedia.com 
> Tel:  401.854.3475 
> Fax:  401.861.3596 
> web: http://www.contextmedia.com
> ************************************************************** 
> Context Media- "The Leader in Enterprise Content Integration" 
-- 
John Moylan
----------------------
ePublishing
Radio Telefis Eireann,
Montrose House,
Donnybrook,
Dublin 4,
Eire
t:+353 1 2083564
e:john.moylan@rte.ie


******************************************************************************
The information in this e-mail is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this e-mail by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution, or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful.
Please note that emails to, from and within RT� may be subject to the Freedom
of Information Act 1997 and may be liable to disclosure.
******************************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org