You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Marco Tedone <mt...@jemos.org> on 2003/09/05 00:32:14 UTC

[OT] Realizing a search functionality

Hi, I must admit that I don't know anything about how to realize a search
functionality. The only thing that I know is that most sites have a search
functionality which, when searching for something, return a list of links
more or less involved in the search string.

The only things I know are:

1) An index of the web site contents should be created somehow
2) The search 'action' (I'm talking in Struts terms, but I think it could be
anything) should interact with this index to match the required string
3) A list (which form does it assume) containing all the links related to
the query string should be created, eventually read and displayed to the
client

Did anyone of you realized succesfully a search functionality in its site?
Could you please address me towards some good software (possibly
open-source, possibly Jakarta, possibly java-oriented) and  patterns to use
to realize  a search functionality?

Many thanks,

Marco




Re: [OT] Realizing a search functionality

Posted by Marco Tedone <mt...@jemos.org>.
Sorry....I found Jakarta Lucene....I'll work on it :)

Marco
----- Original Message ----- 
From: "Marco Tedone" <mt...@jemos.org>
To: "Tomcat Users List" <to...@jakarta.apache.org>
Sent: Thursday, September 04, 2003 11:32 PM
Subject: [OT] Realizing a search functionality


> Hi, I must admit that I don't know anything about how to realize a search
> functionality. The only thing that I know is that most sites have a search
> functionality which, when searching for something, return a list of links
> more or less involved in the search string.
>
> The only things I know are:
>
> 1) An index of the web site contents should be created somehow
> 2) The search 'action' (I'm talking in Struts terms, but I think it could
be
> anything) should interact with this index to match the required string
> 3) A list (which form does it assume) containing all the links related to
> the query string should be created, eventually read and displayed to the
> client
>
> Did anyone of you realized succesfully a search functionality in its site?
> Could you please address me towards some good software (possibly
> open-source, possibly Jakarta, possibly java-oriented) and  patterns to
use
> to realize  a search functionality?
>
> Many thanks,
>
> Marco
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>




Re: [OT] Realizing a search functionality

Posted by John Turner <to...@johnturner.com>.
Thanks for the clarification.

John

Tim Funk wrote:

> Lucene indexes "documents". A document is composed of fields and does 
> not need (and it actuually is not) to be a physical file.
> 
> In the simplistic example of a site consisting of a single dynamic web 
> page backed by a database. You would create "documents" based on the 
> database data where the db data goes into named fields. Then when you 
> construct your query, it will return a list of documents. When you 
> iterate through each document, you need to pull the appropriate field 
> out of the document to reconstruct the appropriate URL.
> 
> In a nutshell, it can do what you want, but there is a lot of setup work 
> to construct documents and a lot of work to display results from 
> documents from queries.
> 
> -Tim
> 
> John Turner wrote:
> 
>>
>> AFAIK, Lucene indexes files.  How then, do you index a dynamic site? 
>> The only files that exist on a dynamic site are source code files. 
>> Servlets would never be indexed...how then do you index the content 
>> returned from the servlet?  Can Lucene do this?
>>
>> The Lucene site is pretty sparse in information.  Not having worked 
>> with it, and not knowing every option available when using it, I think 
>> there might be some other alternatives. 
>> John
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
> 



Re: [OT] Realizing a search functionality

Posted by Tim Funk <fu...@joedog.org>.
Lucene indexes "documents". A document is composed of fields and does not 
need (and it actuually is not) to be a physical file.

In the simplistic example of a site consisting of a single dynamic web page 
backed by a database. You would create "documents" based on the database data 
where the db data goes into named fields. Then when you construct your query, 
it will return a list of documents. When you iterate through each document, 
you need to pull the appropriate field out of the document to reconstruct the 
appropriate URL.

In a nutshell, it can do what you want, but there is a lot of setup work to 
construct documents and a lot of work to display results from documents from 
queries.

-Tim

John Turner wrote:

> 
> AFAIK, Lucene indexes files.  How then, do you index a dynamic site? The 
> only files that exist on a dynamic site are source code files. Servlets 
> would never be indexed...how then do you index the content returned from 
> the servlet?  Can Lucene do this?
> 
> The Lucene site is pretty sparse in information.  Not having worked with 
> it, and not knowing every option available when using it, I think there 
> might be some other alternatives.  
> 
> John
> 



Re: [OT] Realizing a search functionality

Posted by Marco Tedone <mt...@jemos.org>.
Thank you. I think I'll go for Lucene.

Marco
----- Original Message ----- 
From: "John Turner" <to...@johnturner.com>
To: "Tomcat Users List" <to...@jakarta.apache.org>
Sent: Friday, September 05, 2003 1:20 PM
Subject: Re: [OT] Realizing a search functionality


>
> AFAIK, Lucene indexes files.  How then, do you index a dynamic site?
> The only files that exist on a dynamic site are source code files.
> Servlets would never be indexed...how then do you index the content
> returned from the servlet?  Can Lucene do this?
>
> The Lucene site is pretty sparse in information.  Not having worked with
> it, and not knowing every option available when using it, I think there
> might be some other alternatives.  I've used Verity in the past, but
> that is a commercial product.  The other tool I've used in the past to
> great success is Atomz (http://www.atomz.com).  The "trial" is
> never-ending, so an index of up to 500 "pages" is free.  Pages also =
> URL.  The nice thing about Atomz is that it will spider your site and
> index the content returned, thus it works quite well for dynamic sites.
>
> In other words, it will take a URL like
> "http://your.domain.com/content.jsp?id=512&view=full" and index the
> content returned from that, not the actual text string of the URL.
>
> The only requirement is that you display the Atomz logo on the search
> results page.  You can pay a small annual fee to have that removed.  All
> indexes and collections are kept on the Atomz site, not yours, and you
> can define the stylesheet and template that is used to display the
> search results, as well as define the frequency of indexing.
>
> John
>
> Schalk wrote:
> > Marco
> >
> > You may to have a look at Lucene (OpenSource Jakarata project) at:
> > http://jakarta.apache.org/lucene/docs/index.html
> >
> > Kind Regards
> > Schalk Neethling
> > Volume4.Development.Multimedia.Branding
> > emotionalize.conceptualize.visualize.realize
> > Tel: +27125468436
> > Fax: +27125468436
> > email:schalk@volume4.co.za
> > web: www.volume4.co.za
> >
> >
> > :: -----Original Message-----
> > :: From: Marco Tedone [mailto:mtedone@jemos.org]
> > :: Sent: Friday, September 05, 2003 12:32 AM
> > :: To: Tomcat Users List
> > :: Subject: [OT] Realizing a search functionality
> > ::
> > :: Hi, I must admit that I don't know anything about how to realize a
search
> > :: functionality. The only thing that I know is that most sites have a
> > search
> > :: functionality which, when searching for something, return a list of
links
> > :: more or less involved in the search string.
> > ::
> > :: The only things I know are:
> > ::
> > :: 1) An index of the web site contents should be created somehow
> > :: 2) The search 'action' (I'm talking in Struts terms, but I think it
could
> > be
> > :: anything) should interact with this index to match the required
string
> > :: 3) A list (which form does it assume) containing all the links
related to
> > :: the query string should be created, eventually read and displayed to
the
> > :: client
> > ::
> > :: Did anyone of you realized succesfully a search functionality in its
> > site?
> > :: Could you please address me towards some good software (possibly
> > :: open-source, possibly Jakarta, possibly java-oriented) and  patterns
to
> > use
> > :: to realize  a search functionality?
> > ::
> > :: Many thanks,
> > ::
> > :: Marco
> > ::
> > ::
> > ::
> > ::
> > :: ---------------------------------------------------------------------
> > :: To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> > :: For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
>




Re: [OT] Realizing a search functionality

Posted by John Turner <to...@johnturner.com>.
Ulrich Mayring wrote:

> John Turner wrote:
> 
>> Ulrich Mayring wrote:
>>
>>> I can only recommend Lucene, it is vastly superior to any 
>>> pre-packaged search engine, because you do not depend on specific 
>>> features or behavior, but can customize everything to your needs.
>>
>>
>> Assuming you have time, money, skills, etc. to do so, which is not 
>> always the case.
> 
> 
> Skills is the key issue. It took me all of one week to write our own 
> custom search engine and I doubt that anyone would be able to install 
> and configure a third-party product any faster than that. I had no prior 
> exposure to Lucene, but of course knew my way around Java.

Hmmm...I had Atomz working for several clients by lunch one day. ;) I'm 
not arguing, just emphasizing that some of us are not Java developers. 
Granted, the question was somewhat in a context of "using Java" and not 
"using Tomcat", but not every Tomcat user is a developer.

John




Re: [OT] Realizing a search functionality

Posted by Ulrich Mayring <ul...@denic.de>.
John Turner wrote:
> Ulrich Mayring wrote:
> 
>> I can only recommend Lucene, it is vastly superior to any pre-packaged 
>> search engine, because you do not depend on specific features or 
>> behavior, but can customize everything to your needs.
> 
> Assuming you have time, money, skills, etc. to do so, which is not 
> always the case.

Skills is the key issue. It took me all of one week to write our own 
custom search engine and I doubt that anyone would be able to install 
and configure a third-party product any faster than that. I had no prior 
exposure to Lucene, but of course knew my way around Java.

So, I don't think time and money are factors here at all. BTW, the guy 
who originally wrote Lucene is now developing an OpenSource version of 
Google with major financial backing. So you can see that there is some 
serious technology behind Lucene and IMHO it's worth to learn it.

Ulrich



Re: [OT] Realizing a search functionality

Posted by John Turner <to...@johnturner.com>.
Ulrich Mayring wrote:

> 
> Lucene is not a search engine, but an API for writing a search engine, 
> so it can do everything that you can write in Java. By itself it does 
> nothing, like the JDK.

Thanks for the clarification.

> 
> I can only recommend Lucene, it is vastly superior to any pre-packaged 
> search engine, because you do not depend on specific features or 
> behavior, but can customize everything to your needs.

Assuming you have time, money, skills, etc. to do so, which is not 
always the case.

John




Re: [OT] Realizing a search functionality

Posted by Ulrich Mayring <ul...@denic.de>.
John Turner wrote:
> 
> AFAIK, Lucene indexes files.  How then, do you index a dynamic site? The 
> only files that exist on a dynamic site are source code files. Servlets 
> would never be indexed...how then do you index the content returned from 
> the servlet?  Can Lucene do this?

Lucene is not a search engine, but an API for writing a search engine, 
so it can do everything that you can write in Java. By itself it does 
nothing, like the JDK.

In my case I've implemented a search engine that gets local files and 
hands them to the Lucene Indexer, but that could also be implemented so 
that it retrieves files via HTTP.

I can only recommend Lucene, it is vastly superior to any pre-packaged 
search engine, because you do not depend on specific features or 
behavior, but can customize everything to your needs.

Ulrich



Re: [OT] Realizing a search functionality

Posted by Louise Pryor <li...@louisepryor.com>.
On Friday, September 5, 2003 at 1:20:00 PM, John Turner wrote:

<snip>
JT>   The other tool I've used in the past to
JT> great success is Atomz (http://www.atomz.com).  The "trial" is 
JT> never-ending, so an index of up to 500 "pages" is free.  Pages also = 
JT> URL.  The nice thing about Atomz is that it will spider your site and 
JT> index the content returned, thus it works quite well for dynamic sites.

JT> In other words, it will take a URL like 
JT> "http://your.domain.com/content.jsp?id=512&view=full" and index the 
JT> content returned from that, not the actual text string of the URL.

<snip>

I use atomz, because it's free. There are a couple of issues with it:

- the template for the search results is pretty hard to get right.
- because of the spidering, session tracking through the URL is not a
good idea. It gets up to the limit of 500 *very* quickly, as the
session id part of the URL makes it think that it's a whole new page.
Luckily my web site isn't really dependent on sessions, so I was able
to get round that (but it does mean that I can't use the struts
rewriting tags...).

Otherwise I'm very happy with atomz.

-- 
Louise Pryor
http://www.louisepryor.com



Re: [OT] Realizing a search functionality

Posted by John Turner <to...@johnturner.com>.
AFAIK, Lucene indexes files.  How then, do you index a dynamic site? 
The only files that exist on a dynamic site are source code files. 
Servlets would never be indexed...how then do you index the content 
returned from the servlet?  Can Lucene do this?

The Lucene site is pretty sparse in information.  Not having worked with 
it, and not knowing every option available when using it, I think there 
might be some other alternatives.  I've used Verity in the past, but 
that is a commercial product.  The other tool I've used in the past to 
great success is Atomz (http://www.atomz.com).  The "trial" is 
never-ending, so an index of up to 500 "pages" is free.  Pages also = 
URL.  The nice thing about Atomz is that it will spider your site and 
index the content returned, thus it works quite well for dynamic sites.

In other words, it will take a URL like 
"http://your.domain.com/content.jsp?id=512&view=full" and index the 
content returned from that, not the actual text string of the URL.

The only requirement is that you display the Atomz logo on the search 
results page.  You can pay a small annual fee to have that removed.  All 
indexes and collections are kept on the Atomz site, not yours, and you 
can define the stylesheet and template that is used to display the 
search results, as well as define the frequency of indexing.

John

Schalk wrote:
> Marco
> 
> You may to have a look at Lucene (OpenSource Jakarata project) at:
> http://jakarta.apache.org/lucene/docs/index.html
> 
> Kind Regards
> Schalk Neethling
> Volume4.Development.Multimedia.Branding
> emotionalize.conceptualize.visualize.realize
> Tel: +27125468436
> Fax: +27125468436
> email:schalk@volume4.co.za
> web: www.volume4.co.za
>  
> 
> :: -----Original Message-----
> :: From: Marco Tedone [mailto:mtedone@jemos.org]
> :: Sent: Friday, September 05, 2003 12:32 AM
> :: To: Tomcat Users List
> :: Subject: [OT] Realizing a search functionality
> :: 
> :: Hi, I must admit that I don't know anything about how to realize a search
> :: functionality. The only thing that I know is that most sites have a
> search
> :: functionality which, when searching for something, return a list of links
> :: more or less involved in the search string.
> :: 
> :: The only things I know are:
> :: 
> :: 1) An index of the web site contents should be created somehow
> :: 2) The search 'action' (I'm talking in Struts terms, but I think it could
> be
> :: anything) should interact with this index to match the required string
> :: 3) A list (which form does it assume) containing all the links related to
> :: the query string should be created, eventually read and displayed to the
> :: client
> :: 
> :: Did anyone of you realized succesfully a search functionality in its
> site?
> :: Could you please address me towards some good software (possibly
> :: open-source, possibly Jakarta, possibly java-oriented) and  patterns to
> use
> :: to realize  a search functionality?
> :: 
> :: Many thanks,
> :: 
> :: Marco
> :: 
> :: 
> :: 
> :: 
> :: ---------------------------------------------------------------------
> :: To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> :: For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
> 



RE: [OT] Realizing a search functionality

Posted by Schalk <sc...@volume4.co.za>.
Marco

You may to have a look at Lucene (OpenSource Jakarata project) at:
http://jakarta.apache.org/lucene/docs/index.html

Kind Regards
Schalk Neethling
Volume4.Development.Multimedia.Branding
emotionalize.conceptualize.visualize.realize
Tel: +27125468436
Fax: +27125468436
email:schalk@volume4.co.za
web: www.volume4.co.za
 

:: -----Original Message-----
:: From: Marco Tedone [mailto:mtedone@jemos.org]
:: Sent: Friday, September 05, 2003 12:32 AM
:: To: Tomcat Users List
:: Subject: [OT] Realizing a search functionality
:: 
:: Hi, I must admit that I don't know anything about how to realize a search
:: functionality. The only thing that I know is that most sites have a
search
:: functionality which, when searching for something, return a list of links
:: more or less involved in the search string.
:: 
:: The only things I know are:
:: 
:: 1) An index of the web site contents should be created somehow
:: 2) The search 'action' (I'm talking in Struts terms, but I think it could
be
:: anything) should interact with this index to match the required string
:: 3) A list (which form does it assume) containing all the links related to
:: the query string should be created, eventually read and displayed to the
:: client
:: 
:: Did anyone of you realized succesfully a search functionality in its
site?
:: Could you please address me towards some good software (possibly
:: open-source, possibly Jakarta, possibly java-oriented) and  patterns to
use
:: to realize  a search functionality?
:: 
:: Many thanks,
:: 
:: Marco
:: 
:: 
:: 
:: 
:: ---------------------------------------------------------------------
:: To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
:: For additional commands, e-mail: tomcat-user-help@jakarta.apache.org