You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Mark Maunder <ma...@swiftcamel.com> on 2001/10/12 17:23:08 UTC

search engine module?

I've written a search engine that searches for jobs in a database based
on keywords. I'm assembling a string of sql and then submitting it to
the database based on the user's search criteria. It's working but is
really simple right now - it just does a logical AND with all the
keywords the user submits. I'd like to include features like the ability
to submit a query like:
(perl AND apache) OR java NOT microsoft

I don't want to reinvent the wheel and I'm sure this has been done a
zillion times, so does anyone know of a module in CPAN that I can use
for this? I'm using MySQL on the back end and DBI under mod perl which
runs as a handler.




Re: [OT] search engine module?

Posted by Bill Moseley <mo...@hank.org>.
At 02:04 PM 10/16/2001 +0100, Ged Haywood wrote:
>> > Plus lots of other stuff like Glimpse and Swish which interface to
C-based
>> > engines.
>> 
>> I've had good luck with http://swish-e.org/2.2/
>
>Please make sure that it's possible to do a plain ordinary literal
>text string search.  Nothing fancy, no case-folding, no automatic
>removal of puctuation, nothing like that.  Just a literal string.
>
>Last night I tried to find "perl -V" on all the search engines
>mentioned on the mod_perl home page and they all failed in various
>interesting ways.

I assume it's how the search engine is configured.  Swish, for example, you
can define what chars make up a word.  Not sure what you mean by literal
string.  For performance reasons you can't just grep words (or parts of
words), so you have to extract out words from the text during indexing.
You might define that a dash is ok at the start of a word, but not at the
end and to ignore trailing dots, so you could find -V and -V. (at the end
of a sentence).

Some search engines let you define a set of buzzwords that should be
indexed as-is, but that's more helpful for technical writing instead of
indexing code.

Finally, in swish, if you put something like "perl -V" in quotes to use a
phrase search it will find what you are looking for most likely, even if
the dash is not indexed.



Bill Moseley
mailto:moseley@hank.org

Re: search engine module?

Posted by Perrin Harkins <pe...@elem.com>.
> Please make sure that it's possible to do a plain ordinary literal
> text string search.  Nothing fancy, no case-folding, no automatic
> removal of puctuation, nothing like that.  Just a literal string.
>
> Last night I tried to find "perl -V" on all the search engines
> mentioned on the mod_perl home page and they all failed in various
> interesting ways.

The amazingly fast ht://Dig (http://www.htdig.org/) engine can do phrase
searching, but I'm not certain how well it does with punctuation.
- Perrin


Re: search engine module?

Posted by Kee Hinckley <na...@somewhere.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 11:36 AM +0800 10/19/01, Stas Bekman wrote:
>Right, my point is that WWW::Search namespace is taken :)

Ah.  Sorry, my miscommunication.  When I said that I "ended up making 
a WWW::Search" I should have put an "an instance of" in there instead 
of "a".  Basically WWW::Search provided a good interface, but 
everything was remote, so I wrote this.  If you stick to the 
conventions provided here, it should be easy to make other variations 
using other local search engines.  I was just surprised that nobody 
seemed to have done it before.

Grep(3)        User Contributed Perl Documentation        Grep(3)


NAME
        WWW::Search::Grep - class for searching a local web site
        using grep

SYNOPSIS
            require WWW::Search;
            $search = new WWW::Search('Grep');


DESCRIPTION
        This is a grep specialization of WWW::Search.

        THis class exports no public interface; all interaction
        should be done through WWW::Search objects.

OPTIONS
        The default query syntax is:      word word OR word
        "quoted phrase" Blank separated words are implicitly
        separated by AND.  OR refers only to the word or phrases
        directly to either side.  The model is the same as that
        used by Google (http://www.google.com/).

        search_url
            Specifies the directory to search.  All .html and .htm
            files in the specified directory and any
            subdirectories will be searched.  This is an absolute
            pathname and is required.  E.g.
            /home/httpd/html/foo/searchdir/

        base_path
            This is this is the part of that pathname that should
            be stripped off before prefixing the base_url.  This
            is required.  E.g. /home/httpd/html/

        base_url
            This is prepended to the pathname after stripping the
            base_path.  This is optional, the default is none.
            E.g. http://www.somewhere.com/ or /

        search_debug,search_parse_debug
            See WWW::Search

        grep
            Pathname to grep, default is /bin/egrep.

AUTHOR
        Kee Hinckley, nazgul@somewhere.com


- -- 

Kee Hinckley - Somewhere.Com, LLC
http://consulting.somewhere.com/
nazgul@somewhere.com (or ...!alice!nazgul for time travelers :-)

I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.

-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Security 7.0.3

iQA/AwUBO8+mtSZsPfdw+r2CEQI1+wCeI3s9JcPuXvaexrriahCWnjtTS/kAnjl3
v7uvLYWz4xxxc2weT/qU0f2n
=MXIA
-----END PGP SIGNATURE-----

Re: search engine module?

Posted by Stas Bekman <st...@stason.org>.
Kee Hinckley wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> At 12:56 AM +0800 10/19/01, Stas Bekman wrote:
> 
>>Kee Hinckley wrote:
>>
>>
>>>-----BEGIN PGP SIGNED MESSAGE-----
>>>Hash: SHA1
>>>
>>>People have been talking about backend search engines, but when I 
>>>saw the subject I was thinking more about front end classes.  In 
>>>particular, last time I looked there wasn't a standard class for 
>>>integrating local search engines into your code.  I ended up making 
>>>a WWW::Search, but you kind of have to tweak the meaning of some 
>>>values.  If anyone is interested I ought to release it.  It's a 
>>>trivial example for very small web sites (it provides google-like 
>>>search syntax, and backends it with grep).
>>>
>>
>>You should have checked CPAN first: There is a load of WWW::Search:: 
>>modules there.
>>
> 
> Yes.  But my point is that they are all *offsite* searches as far as 
> I can tell.  What I wanted was a standard interface to a local search 
> engine.

Right, my point is that WWW::Search namespace is taken :)

-- 


_____________________________________________________________________
Stas Bekman             JAm_pH      --   Just Another mod_perl Hacker
http://stason.org/      mod_perl Guide   http://perl.apache.org/guide
mailto:stas@stason.org  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


Re: search engine module?

Posted by Kee Hinckley <na...@somewhere.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 12:56 AM +0800 10/19/01, Stas Bekman wrote:
>Kee Hinckley wrote:
>
>>-----BEGIN PGP SIGNED MESSAGE-----
>>Hash: SHA1
>>
>>People have been talking about backend search engines, but when I 
>>saw the subject I was thinking more about front end classes.  In 
>>particular, last time I looked there wasn't a standard class for 
>>integrating local search engines into your code.  I ended up making 
>>a WWW::Search, but you kind of have to tweak the meaning of some 
>>values.  If anyone is interested I ought to release it.  It's a 
>>trivial example for very small web sites (it provides google-like 
>>search syntax, and backends it with grep).
>
>
>You should have checked CPAN first: There is a load of WWW::Search:: 
>modules there.

Yes.  But my point is that they are all *offsite* searches as far as 
I can tell.  What I wanted was a standard interface to a local search 
engine.
- -- 

Kee Hinckley - Somewhere.Com, LLC
http://consulting.somewhere.com/
nazgul@somewhere.com (or ...!alice!nazgul for time travelers :-)

I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.

-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Security 7.0.3

iQA/AwUBO88W3yZsPfdw+r2CEQLQ8wCgrokvPCmktlUCSLPulsZsVwrBMdwAoMMQ
V1vsViU2nutZioKmgwVnqV22
=03cp
-----END PGP SIGNATURE-----

Re: search engine module?

Posted by Stas Bekman <st...@stason.org>.
Kee Hinckley wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> People have been talking about backend search engines, but when I saw 
> the subject I was thinking more about front end classes.  In 
> particular, last time I looked there wasn't a standard class for 
> integrating local search engines into your code.  I ended up making a 
> WWW::Search, but you kind of have to tweak the meaning of some 
> values.  If anyone is interested I ought to release it.  It's a 
> trivial example for very small web sites (it provides google-like 
> search syntax, and backends it with grep).


You should have checked CPAN first: There is a load of WWW::Search:: 
modules there.





_____________________________________________________________________
Stas Bekman             JAm_pH      --   Just Another mod_perl Hacker
http://stason.org/      mod_perl Guide   http://perl.apache.org/guide
mailto:stas@stason.org  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


Re: search engine module?

Posted by Kee Hinckley <na...@somewhere.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

People have been talking about backend search engines, but when I saw 
the subject I was thinking more about front end classes.  In 
particular, last time I looked there wasn't a standard class for 
integrating local search engines into your code.  I ended up making a 
WWW::Search, but you kind of have to tweak the meaning of some 
values.  If anyone is interested I ought to release it.  It's a 
trivial example for very small web sites (it provides google-like 
search syntax, and backends it with grep).
- -- 

Kee Hinckley - Somewhere.Com, LLC
http://consulting.somewhere.com/
nazgul@somewhere.com (or ...!alice!nazgul for time travelers :-)

I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.

-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Security 7.0.3

iQA/AwUBO88CGCZsPfdw+r2CEQLj9ACfSqjkFgwvFR0iXWRRS9B2oM6EcZ8AoNSd
6jkha/LM8cS1ia4mYti8tiGW
=yXL9
-----END PGP SIGNATURE-----

Re: search engine module?

Posted by Stas Bekman <st...@stason.org>.
Daniel Sully wrote:

> Is the engine used at the math forum publiclicly available?


I don't know. Why don't you ask them :)


> Once upon a time Stas Bekman shaped the electrons to say...
> 
> 
>>the engine at mathforum does a great job, it's the best mailing list 
>>archive search engine that I've ever seen, in regards to searching Perl 
>>strings and code in general. Just make sure to use the right options at:
>>http://mathforum.org/discussions/epi-search/modperl.html
>>
>  
> -D
> --
> <Zim> I am the neighbourhood baby inspector. I have come to inspect the baby.
> <Mother> Oh, goodness! Inspect him for what?
> <Zim> YOUR RESISTANCE WILL BE NOTED!
> 



-- 


_____________________________________________________________________
Stas Bekman             JAm_pH      --   Just Another mod_perl Hacker
http://stason.org/      mod_perl Guide   http://perl.apache.org/guide
mailto:stas@stason.org  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


Re: search engine module?

Posted by Daniel Sully <da...@electricrain.com>.
Is the engine used at the math forum publiclicly available?

Once upon a time Stas Bekman shaped the electrons to say...

> the engine at mathforum does a great job, it's the best mailing list 
> archive search engine that I've ever seen, in regards to searching Perl 
> strings and code in general. Just make sure to use the right options at:
> http://mathforum.org/discussions/epi-search/modperl.html
 
-D
--
<Zim> I am the neighbourhood baby inspector. I have come to inspect the baby.
<Mother> Oh, goodness! Inspect him for what?
<Zim> YOUR RESISTANCE WILL BE NOTED!

Re: search engine module?

Posted by Oleg Bartunov <ol...@sai.msu.su>.
We use OpenFTS (http://openfts.sourceforge.net) at
postgresql mailing list archive ( http://fts.postgresql.org).

	Regards,
		Oleg
On Wed, 17 Oct 2001, Stas Bekman wrote:

> Ged Haywood wrote:
>
> > Hi all,
> >
> > On Mon, 15 Oct 2001, Ask Bjoern Hansen wrote:
> >
> >
> >>On Fri, 12 Oct 2001, Perrin Harkins wrote:
> >>
> >>[...]
> >>
> >>>Plus lots of other stuff like Glimpse and Swish which interface to C-based
> >>>engines.
> >>>
> >>I've had good luck with http://swish-e.org/2.2/
> >>
> >
> > Please make sure that it's possible to do a plain ordinary literal
> > text string search.  Nothing fancy, no case-folding, no automatic
> > removal of puctuation, nothing like that.  Just a literal string.
> >
> > Last night I tried to find "perl -V" on all the search engines
> > mentioned on the mod_perl home page and they all failed in various
> > interesting ways.
> >
> > If somebody knows what I'm doing wrong, please post.
>
> the engine at mathforum does a great job, it's the best mailing list
> archive search engine that I've ever seen, in regards to searching Perl
> strings and code in general. Just make sure to use the right options at:
> http://mathforum.org/discussions/epi-search/modperl.html
>
>
> _____________________________________________________________________
> Stas Bekman             JAm_pH      --   Just Another mod_perl Hacker
> http://stason.org/      mod_perl Guide   http://perl.apache.org/guide
> mailto:stas@stason.org  http://ticketmaster.com http://apacheweek.com
> http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
>

	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: search engine module?

Posted by Stas Bekman <st...@stason.org>.
Ged Haywood wrote:

> Hi all,
> 
> On Mon, 15 Oct 2001, Ask Bjoern Hansen wrote:
> 
> 
>>On Fri, 12 Oct 2001, Perrin Harkins wrote:
>>
>>[...]
>>
>>>Plus lots of other stuff like Glimpse and Swish which interface to C-based
>>>engines.
>>>
>>I've had good luck with http://swish-e.org/2.2/
>>
> 
> Please make sure that it's possible to do a plain ordinary literal
> text string search.  Nothing fancy, no case-folding, no automatic
> removal of puctuation, nothing like that.  Just a literal string.
> 
> Last night I tried to find "perl -V" on all the search engines
> mentioned on the mod_perl home page and they all failed in various
> interesting ways.
> 
> If somebody knows what I'm doing wrong, please post.

the engine at mathforum does a great job, it's the best mailing list 
archive search engine that I've ever seen, in regards to searching Perl 
strings and code in general. Just make sure to use the right options at:
http://mathforum.org/discussions/epi-search/modperl.html


_____________________________________________________________________
Stas Bekman             JAm_pH      --   Just Another mod_perl Hacker
http://stason.org/      mod_perl Guide   http://perl.apache.org/guide
mailto:stas@stason.org  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


Re: search engine module?

Posted by Ged Haywood <ge...@www2.jubileegroup.co.uk>.
Hi all,

On Mon, 15 Oct 2001, Ask Bjoern Hansen wrote:

> On Fri, 12 Oct 2001, Perrin Harkins wrote:
> 
> [...]
> > Plus lots of other stuff like Glimpse and Swish which interface to C-based
> > engines.
> 
> I've had good luck with http://swish-e.org/2.2/

Please make sure that it's possible to do a plain ordinary literal
text string search.  Nothing fancy, no case-folding, no automatic
removal of puctuation, nothing like that.  Just a literal string.

Last night I tried to find "perl -V" on all the search engines
mentioned on the mod_perl home page and they all failed in various
interesting ways.

If somebody knows what I'm doing wrong, please post.

73,
Ged.


Re: search engine module?

Posted by Ask Bjoern Hansen <as...@valueclick.com>.
On Fri, 12 Oct 2001, Perrin Harkins wrote:

[...]
> Plus lots of other stuff like Glimpse and Swish which interface to C-based
> engines.

I've had good luck with http://swish-e.org/2.2/


 - ask

-- 
ask bjoern hansen, http://ask.netcetera.dk/         !try; do();
more than a billion impressions per week, http://valueclick.com


Re: search engine module?

Posted by Perrin Harkins <pe...@elem.com>.
> I don't want to reinvent the wheel and I'm sure this has been done a
> zillion times, so does anyone know of a module in CPAN that I can use
> for this?

Have you tried searching on http://search.cpan.org/?

DBIx::FullTextSearch
DBIxTextIndex
Search::InvertedIndex

Plus lots of other stuff like Glimpse and Swish which interface to C-based
engines.

- Perrin


Re: search engine module? [drifting OT DBI related]

Posted by Mark Maunder <ma...@swiftcamel.com>.
Mark Maunder wrote:

>  I've started using
> MySQL's MATCH/AGAINST with fulltext indexes instead, and it is extremelly
> fast (!!), but am waiting for a feature that's available in mysql 4.0 (due
> end of this month) that allows you to use +word and -word syntax to specify
> required or unwanted keywords. Also just as an asside, match/against only
> works with MyISAM tables so I've had to convert some of mine from InnoDB at
> the cost of losing transactions.

er - lo and behold, mysql 4.0 alpha has been released a few minutes ago by
Monty.
http://www.mysql.com/downloads/mysql-4.0.html







Re: search engine module? [drifting OT DBI related]

Posted by Mark Maunder <ma...@swiftcamel.com>.
"Matt J. Avitable" wrote:

> Hi,
>
> > I've written a search engine that searches for jobs in a database based
> > on keywords. I'm assembling a string of sql and then submitting it to
> > the database based on the user's search criteria. It's working but is
>
> It sounds like you are writing a web front end for mysql.  I'm not
> sure about modules on cpan about that specifically.  If you wanted to get
> a bit more fancy, you might try DBIx::FullTextSearch.

Thanks. I Checked out FullTextSearch on some earlier advice and it's not
exactly what I'm after, but quite useful none the less. I've started using
MySQL's MATCH/AGAINST with fulltext indexes instead, and it is extremelly
fast (!!), but am waiting for a feature that's available in mysql 4.0 (due
end of this month) that allows you to use +word and -word syntax to specify
required or unwanted keywords. Also just as an asside, match/against only
works with MyISAM tables so I've had to convert some of mine from InnoDB at
the cost of losing transactions.



Re: search engine module?

Posted by "Matt J. Avitable" <mj...@escapement.net>.
Hi,

> I've written a search engine that searches for jobs in a database based
> on keywords. I'm assembling a string of sql and then submitting it to
> the database based on the user's search criteria. It's working but is

It sounds like you are writing a web front end for mysql.  I'm not
sure about modules on cpan about that specifically.  If you wanted to get
a bit more fancy, you might try DBIx::FullTextSearch.

This module is nice, though mysql specific.  It creates an index 
of your content (event rows in a db), and allows the user to perform
boolean searches on that index.  Word stemming is also available by
installing a seperate module which FullTextSearch uses. 

It's a tad sluggish when the number of rows gets to be above 40,000, but
certainly not unusable.

hth, matt


>I don't want to reinvent the wheel and I'm sure this has been done a
>zillion times, so does anyone know of a module in CPAN that I can use
>for this? I'm using MySQL on the back end and DBI under mod perl which
>runs as a handler.


-- 
## Matt J. Avitable (mja@escapement.net)
## General Partner / Programmer
## Escapement Arts And Media 

## http://www.escapement.net/
## Phone: (804) 400-0605