You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Mark Maunder <ma...@swiftcamel.com> on 2001/10/12 17:23:08 UTC
search engine module?
I've written a search engine that searches for jobs in a database based
on keywords. I'm assembling a string of sql and then submitting it to
the database based on the user's search criteria. It's working but is
really simple right now - it just does a logical AND with all the
keywords the user submits. I'd like to include features like the ability
to submit a query like:
(perl AND apache) OR java NOT microsoft
I don't want to reinvent the wheel and I'm sure this has been done a
zillion times, so does anyone know of a module in CPAN that I can use
for this? I'm using MySQL on the back end and DBI under mod perl which
runs as a handler.
Re: [OT] search engine module?
Posted by Bill Moseley <mo...@hank.org>.
At 02:04 PM 10/16/2001 +0100, Ged Haywood wrote:
>> > Plus lots of other stuff like Glimpse and Swish which interface to
C-based
>> > engines.
>>
>> I've had good luck with http://swish-e.org/2.2/
>
>Please make sure that it's possible to do a plain ordinary literal
>text string search. Nothing fancy, no case-folding, no automatic
>removal of puctuation, nothing like that. Just a literal string.
>
>Last night I tried to find "perl -V" on all the search engines
>mentioned on the mod_perl home page and they all failed in various
>interesting ways.
I assume it's how the search engine is configured. Swish, for example, you
can define what chars make up a word. Not sure what you mean by literal
string. For performance reasons you can't just grep words (or parts of
words), so you have to extract out words from the text during indexing.
You might define that a dash is ok at the start of a word, but not at the
end and to ignore trailing dots, so you could find -V and -V. (at the end
of a sentence).
Some search engines let you define a set of buzzwords that should be
indexed as-is, but that's more helpful for technical writing instead of
indexing code.
Finally, in swish, if you put something like "perl -V" in quotes to use a
phrase search it will find what you are looking for most likely, even if
the dash is not indexed.
Bill Moseley
mailto:moseley@hank.org
Re: search engine module?
Posted by Perrin Harkins <pe...@elem.com>.
> Please make sure that it's possible to do a plain ordinary literal
> text string search. Nothing fancy, no case-folding, no automatic
> removal of puctuation, nothing like that. Just a literal string.
>
> Last night I tried to find "perl -V" on all the search engines
> mentioned on the mod_perl home page and they all failed in various
> interesting ways.
The amazingly fast ht://Dig (http://www.htdig.org/) engine can do phrase
searching, but I'm not certain how well it does with punctuation.
- Perrin
Re: search engine module?
Posted by Kee Hinckley <na...@somewhere.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
At 11:36 AM +0800 10/19/01, Stas Bekman wrote:
>Right, my point is that WWW::Search namespace is taken :)
Ah. Sorry, my miscommunication. When I said that I "ended up making
a WWW::Search" I should have put an "an instance of" in there instead
of "a". Basically WWW::Search provided a good interface, but
everything was remote, so I wrote this. If you stick to the
conventions provided here, it should be easy to make other variations
using other local search engines. I was just surprised that nobody
seemed to have done it before.
Grep(3) User Contributed Perl Documentation Grep(3)
NAME
WWW::Search::Grep - class for searching a local web site
using grep
SYNOPSIS
require WWW::Search;
$search = new WWW::Search('Grep');
DESCRIPTION
This is a grep specialization of WWW::Search.
THis class exports no public interface; all interaction
should be done through WWW::Search objects.
OPTIONS
The default query syntax is: word word OR word
"quoted phrase" Blank separated words are implicitly
separated by AND. OR refers only to the word or phrases
directly to either side. The model is the same as that
used by Google (http://www.google.com/).
search_url
Specifies the directory to search. All .html and .htm
files in the specified directory and any
subdirectories will be searched. This is an absolute
pathname and is required. E.g.
/home/httpd/html/foo/searchdir/
base_path
This is this is the part of that pathname that should
be stripped off before prefixing the base_url. This
is required. E.g. /home/httpd/html/
base_url
This is prepended to the pathname after stripping the
base_path. This is optional, the default is none.
E.g. http://www.somewhere.com/ or /
search_debug,search_parse_debug
See WWW::Search
grep
Pathname to grep, default is /bin/egrep.
AUTHOR
Kee Hinckley, nazgul@somewhere.com
- --
Kee Hinckley - Somewhere.Com, LLC
http://consulting.somewhere.com/
nazgul@somewhere.com (or ...!alice!nazgul for time travelers :-)
I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.
-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Security 7.0.3
iQA/AwUBO8+mtSZsPfdw+r2CEQI1+wCeI3s9JcPuXvaexrriahCWnjtTS/kAnjl3
v7uvLYWz4xxxc2weT/qU0f2n
=MXIA
-----END PGP SIGNATURE-----
Re: search engine module?
Posted by Stas Bekman <st...@stason.org>.
Kee Hinckley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> At 12:56 AM +0800 10/19/01, Stas Bekman wrote:
>
>>Kee Hinckley wrote:
>>
>>
>>>-----BEGIN PGP SIGNED MESSAGE-----
>>>Hash: SHA1
>>>
>>>People have been talking about backend search engines, but when I
>>>saw the subject I was thinking more about front end classes. In
>>>particular, last time I looked there wasn't a standard class for
>>>integrating local search engines into your code. I ended up making
>>>a WWW::Search, but you kind of have to tweak the meaning of some
>>>values. If anyone is interested I ought to release it. It's a
>>>trivial example for very small web sites (it provides google-like
>>>search syntax, and backends it with grep).
>>>
>>
>>You should have checked CPAN first: There is a load of WWW::Search::
>>modules there.
>>
>
> Yes. But my point is that they are all *offsite* searches as far as
> I can tell. What I wanted was a standard interface to a local search
> engine.
Right, my point is that WWW::Search namespace is taken :)
--
_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas@stason.org http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
Re: search engine module?
Posted by Kee Hinckley <na...@somewhere.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
At 12:56 AM +0800 10/19/01, Stas Bekman wrote:
>Kee Hinckley wrote:
>
>>-----BEGIN PGP SIGNED MESSAGE-----
>>Hash: SHA1
>>
>>People have been talking about backend search engines, but when I
>>saw the subject I was thinking more about front end classes. In
>>particular, last time I looked there wasn't a standard class for
>>integrating local search engines into your code. I ended up making
>>a WWW::Search, but you kind of have to tweak the meaning of some
>>values. If anyone is interested I ought to release it. It's a
>>trivial example for very small web sites (it provides google-like
>>search syntax, and backends it with grep).
>
>
>You should have checked CPAN first: There is a load of WWW::Search::
>modules there.
Yes. But my point is that they are all *offsite* searches as far as
I can tell. What I wanted was a standard interface to a local search
engine.
- --
Kee Hinckley - Somewhere.Com, LLC
http://consulting.somewhere.com/
nazgul@somewhere.com (or ...!alice!nazgul for time travelers :-)
I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.
-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Security 7.0.3
iQA/AwUBO88W3yZsPfdw+r2CEQLQ8wCgrokvPCmktlUCSLPulsZsVwrBMdwAoMMQ
V1vsViU2nutZioKmgwVnqV22
=03cp
-----END PGP SIGNATURE-----
Re: search engine module?
Posted by Stas Bekman <st...@stason.org>.
Kee Hinckley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> People have been talking about backend search engines, but when I saw
> the subject I was thinking more about front end classes. In
> particular, last time I looked there wasn't a standard class for
> integrating local search engines into your code. I ended up making a
> WWW::Search, but you kind of have to tweak the meaning of some
> values. If anyone is interested I ought to release it. It's a
> trivial example for very small web sites (it provides google-like
> search syntax, and backends it with grep).
You should have checked CPAN first: There is a load of WWW::Search::
modules there.
_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas@stason.org http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
Re: search engine module?
Posted by Kee Hinckley <na...@somewhere.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
People have been talking about backend search engines, but when I saw
the subject I was thinking more about front end classes. In
particular, last time I looked there wasn't a standard class for
integrating local search engines into your code. I ended up making a
WWW::Search, but you kind of have to tweak the meaning of some
values. If anyone is interested I ought to release it. It's a
trivial example for very small web sites (it provides google-like
search syntax, and backends it with grep).
- --
Kee Hinckley - Somewhere.Com, LLC
http://consulting.somewhere.com/
nazgul@somewhere.com (or ...!alice!nazgul for time travelers :-)
I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.
-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Security 7.0.3
iQA/AwUBO88CGCZsPfdw+r2CEQLj9ACfSqjkFgwvFR0iXWRRS9B2oM6EcZ8AoNSd
6jkha/LM8cS1ia4mYti8tiGW
=yXL9
-----END PGP SIGNATURE-----
Re: search engine module?
Posted by Stas Bekman <st...@stason.org>.
Daniel Sully wrote:
> Is the engine used at the math forum publiclicly available?
I don't know. Why don't you ask them :)
> Once upon a time Stas Bekman shaped the electrons to say...
>
>
>>the engine at mathforum does a great job, it's the best mailing list
>>archive search engine that I've ever seen, in regards to searching Perl
>>strings and code in general. Just make sure to use the right options at:
>>http://mathforum.org/discussions/epi-search/modperl.html
>>
>
> -D
> --
> <Zim> I am the neighbourhood baby inspector. I have come to inspect the baby.
> <Mother> Oh, goodness! Inspect him for what?
> <Zim> YOUR RESISTANCE WILL BE NOTED!
>
--
_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas@stason.org http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
Re: search engine module?
Posted by Daniel Sully <da...@electricrain.com>.
Is the engine used at the math forum publiclicly available?
Once upon a time Stas Bekman shaped the electrons to say...
> the engine at mathforum does a great job, it's the best mailing list
> archive search engine that I've ever seen, in regards to searching Perl
> strings and code in general. Just make sure to use the right options at:
> http://mathforum.org/discussions/epi-search/modperl.html
-D
--
<Zim> I am the neighbourhood baby inspector. I have come to inspect the baby.
<Mother> Oh, goodness! Inspect him for what?
<Zim> YOUR RESISTANCE WILL BE NOTED!
Re: search engine module?
Posted by Oleg Bartunov <ol...@sai.msu.su>.
We use OpenFTS (http://openfts.sourceforge.net) at
postgresql mailing list archive ( http://fts.postgresql.org).
Regards,
Oleg
On Wed, 17 Oct 2001, Stas Bekman wrote:
> Ged Haywood wrote:
>
> > Hi all,
> >
> > On Mon, 15 Oct 2001, Ask Bjoern Hansen wrote:
> >
> >
> >>On Fri, 12 Oct 2001, Perrin Harkins wrote:
> >>
> >>[...]
> >>
> >>>Plus lots of other stuff like Glimpse and Swish which interface to C-based
> >>>engines.
> >>>
> >>I've had good luck with http://swish-e.org/2.2/
> >>
> >
> > Please make sure that it's possible to do a plain ordinary literal
> > text string search. Nothing fancy, no case-folding, no automatic
> > removal of puctuation, nothing like that. Just a literal string.
> >
> > Last night I tried to find "perl -V" on all the search engines
> > mentioned on the mod_perl home page and they all failed in various
> > interesting ways.
> >
> > If somebody knows what I'm doing wrong, please post.
>
> the engine at mathforum does a great job, it's the best mailing list
> archive search engine that I've ever seen, in regards to searching Perl
> strings and code in general. Just make sure to use the right options at:
> http://mathforum.org/discussions/epi-search/modperl.html
>
>
> _____________________________________________________________________
> Stas Bekman JAm_pH -- Just Another mod_perl Hacker
> http://stason.org/ mod_perl Guide http://perl.apache.org/guide
> mailto:stas@stason.org http://ticketmaster.com http://apacheweek.com
> http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
Re: search engine module?
Posted by Stas Bekman <st...@stason.org>.
Ged Haywood wrote:
> Hi all,
>
> On Mon, 15 Oct 2001, Ask Bjoern Hansen wrote:
>
>
>>On Fri, 12 Oct 2001, Perrin Harkins wrote:
>>
>>[...]
>>
>>>Plus lots of other stuff like Glimpse and Swish which interface to C-based
>>>engines.
>>>
>>I've had good luck with http://swish-e.org/2.2/
>>
>
> Please make sure that it's possible to do a plain ordinary literal
> text string search. Nothing fancy, no case-folding, no automatic
> removal of puctuation, nothing like that. Just a literal string.
>
> Last night I tried to find "perl -V" on all the search engines
> mentioned on the mod_perl home page and they all failed in various
> interesting ways.
>
> If somebody knows what I'm doing wrong, please post.
the engine at mathforum does a great job, it's the best mailing list
archive search engine that I've ever seen, in regards to searching Perl
strings and code in general. Just make sure to use the right options at:
http://mathforum.org/discussions/epi-search/modperl.html
_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:stas@stason.org http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
Re: search engine module?
Posted by Ged Haywood <ge...@www2.jubileegroup.co.uk>.
Hi all,
On Mon, 15 Oct 2001, Ask Bjoern Hansen wrote:
> On Fri, 12 Oct 2001, Perrin Harkins wrote:
>
> [...]
> > Plus lots of other stuff like Glimpse and Swish which interface to C-based
> > engines.
>
> I've had good luck with http://swish-e.org/2.2/
Please make sure that it's possible to do a plain ordinary literal
text string search. Nothing fancy, no case-folding, no automatic
removal of puctuation, nothing like that. Just a literal string.
Last night I tried to find "perl -V" on all the search engines
mentioned on the mod_perl home page and they all failed in various
interesting ways.
If somebody knows what I'm doing wrong, please post.
73,
Ged.
Re: search engine module?
Posted by Ask Bjoern Hansen <as...@valueclick.com>.
On Fri, 12 Oct 2001, Perrin Harkins wrote:
[...]
> Plus lots of other stuff like Glimpse and Swish which interface to C-based
> engines.
I've had good luck with http://swish-e.org/2.2/
- ask
--
ask bjoern hansen, http://ask.netcetera.dk/ !try; do();
more than a billion impressions per week, http://valueclick.com
Re: search engine module?
Posted by Perrin Harkins <pe...@elem.com>.
> I don't want to reinvent the wheel and I'm sure this has been done a
> zillion times, so does anyone know of a module in CPAN that I can use
> for this?
Have you tried searching on http://search.cpan.org/?
DBIx::FullTextSearch
DBIxTextIndex
Search::InvertedIndex
Plus lots of other stuff like Glimpse and Swish which interface to C-based
engines.
- Perrin
Re: search engine module? [drifting OT DBI related]
Posted by Mark Maunder <ma...@swiftcamel.com>.
Mark Maunder wrote:
> I've started using
> MySQL's MATCH/AGAINST with fulltext indexes instead, and it is extremelly
> fast (!!), but am waiting for a feature that's available in mysql 4.0 (due
> end of this month) that allows you to use +word and -word syntax to specify
> required or unwanted keywords. Also just as an asside, match/against only
> works with MyISAM tables so I've had to convert some of mine from InnoDB at
> the cost of losing transactions.
er - lo and behold, mysql 4.0 alpha has been released a few minutes ago by
Monty.
http://www.mysql.com/downloads/mysql-4.0.html
Re: search engine module? [drifting OT DBI related]
Posted by Mark Maunder <ma...@swiftcamel.com>.
"Matt J. Avitable" wrote:
> Hi,
>
> > I've written a search engine that searches for jobs in a database based
> > on keywords. I'm assembling a string of sql and then submitting it to
> > the database based on the user's search criteria. It's working but is
>
> It sounds like you are writing a web front end for mysql. I'm not
> sure about modules on cpan about that specifically. If you wanted to get
> a bit more fancy, you might try DBIx::FullTextSearch.
Thanks. I Checked out FullTextSearch on some earlier advice and it's not
exactly what I'm after, but quite useful none the less. I've started using
MySQL's MATCH/AGAINST with fulltext indexes instead, and it is extremelly
fast (!!), but am waiting for a feature that's available in mysql 4.0 (due
end of this month) that allows you to use +word and -word syntax to specify
required or unwanted keywords. Also just as an asside, match/against only
works with MyISAM tables so I've had to convert some of mine from InnoDB at
the cost of losing transactions.
Re: search engine module?
Posted by "Matt J. Avitable" <mj...@escapement.net>.
Hi,
> I've written a search engine that searches for jobs in a database based
> on keywords. I'm assembling a string of sql and then submitting it to
> the database based on the user's search criteria. It's working but is
It sounds like you are writing a web front end for mysql. I'm not
sure about modules on cpan about that specifically. If you wanted to get
a bit more fancy, you might try DBIx::FullTextSearch.
This module is nice, though mysql specific. It creates an index
of your content (event rows in a db), and allows the user to perform
boolean searches on that index. Word stemming is also available by
installing a seperate module which FullTextSearch uses.
It's a tad sluggish when the number of rows gets to be above 40,000, but
certainly not unusable.
hth, matt
>I don't want to reinvent the wheel and I'm sure this has been done a
>zillion times, so does anyone know of a module in CPAN that I can use
>for this? I'm using MySQL on the back end and DBI under mod perl which
>runs as a handler.
--
## Matt J. Avitable (mja@escapement.net)
## General Partner / Programmer
## Escapement Arts And Media
## http://www.escapement.net/
## Phone: (804) 400-0605