You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Brian Behlendorf <br...@hyperreal.org> on 1998/05/31 05:16:32 UTC

Re: Search service done-ish

(a group of us were talking about implementing a better search engine onto
www.apache.org, and decided to continue it here)

At 12:10 AM 5/30/98 +0200, Dirk-Willem van Gulik wrote:
>Right after 1 hours of fiddling and 4 hours wasted on a bug in swish... I
>am kind of ready to proceed. Some quick questions. 
>
>1. I'd like to move this discussion on to the new-httpd 
>   list from here on. And take it from there.

Sure.  :)

>2. I assume Brian want to set up a virtual host forA the
>   search service ? search.apache.org ?

Yup, I'd be happy to do this.  Ideally like bugs.apache.org the index.cgi
at the root level is the search engine.

>3. Meanwhile; the current service does ignore mail. But does
>   a special hacks for the bugs database and a simple REFERER
>   based guess for the mirror sites.

Woohoo!  This'll be interesting to see how well it works.

>4. It currently also indexes the .c and .h files. Dunno if
>   this is usefull.

You mean through the /dist/apache_1.3b7/ directory?  Hmm.  I know we added
that there so we could link directly to files in /src/, but it has a
drawback in that it duplicates the bundled documentation and other things.
So we get duplicates.  Maybe we should change it so that we have a
http://www.apache.org/src/ directory, which is just a CVS checkout of the
current /src/ subdirectory, the same way /docs/ is a checkout of
/htdocs/manual?  Is there a reason to be able to have the trees expanded in
/dist/?  Also recall we can get any CVS version of any file through
http://www.apache.org/websrc/cvsweb.cgi.

>5. you can play at
>
>	http://mda04.jrc.it/search/search.html
>
>   This interface is intentionally simplified. You
>   can get the full wack back; but Hotbot, Inforseek
>   and altavista will always be better :-)

It looks like a good start to me... when it's indexing things locally on
www.apache.org it should look a little better.

>6. you will notice that it finds quite some 'old' leaf
>   directory and old files which where kind of hidden
>   but now suddenly show up. These should be added to
>   a small ignore file.

Or we should decide if we want them around...

>7. strcpy without proper boundary checking suck. They
>   such about 4 waster hours. :-((

Hmm, maybe they could use the ap_ string library?  :)

>Bear in mind that some links might not work as they assume 
>that you pull the page from a valid mirror.
>
>If I do not get flamed instantly; I suggest I put this
>into CVS.

Putting the config files and interfaces into a CVS module makes some sense,
sure.  For now I'll create a good directory for it on taz and let you get
started.

	Brian


--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
pure chewing satisfaction                                  brian@apache.org
                                                        brian@hyperreal.org

Re: Search service done-ish

Posted by Dirk-Willem van Gulik <di...@jrc.it>.
On Sat, 30 May 1998, Brian Behlendorf wrote:

> 
> (a group of us were talking about implementing a better search engine onto
> www.apache.org, and decided to continue it here)
 
> Yup, I'd be happy to do this.  Ideally like bugs.apache.org the index.cgi
> at the root level is the search engine.

Consider that done.
 
> Woohoo!  This'll be interesting to see how well it works.

It should. If you try one of the proxying sites it seems fine. The mirror
sites whcih use mirror.pl have not picked it up yet.
 
> >4. It currently also indexes the .c and .h files. Dunno if
> >   this is usefull.
 
> You mean through the /dist/apache_1.3b7/ directory?  Hmm.  I know we added

Yup.

> that there so we could link directly to files in /src/, but it has a
> drawback in that it duplicates the bundled documentation and other things.
> So we get duplicates.  Maybe we should change it so that we have a
> http://www.apache.org/src/ directory, which is just a CVS checkout of the
> current /src/ subdirectory, the same way /docs/ is a checkout of
> /htdocs/manual?  Is there a reason to be able to have the trees expanded in
> /dist/?  Also recall we can get any CVS version of any file through
> http://www.apache.org/websrc/cvsweb.cgi.

Yup. Trying various sources suggest it is _not_ that usefull. Perhaps we
need an extra category; 'search source' for exactly this.
 
> >	http://mda04.jrc.it/search/search.html
 
> It looks like a good start to me... when it's indexing things locally on
> www.apache.org it should look a little better.

It should now be at www.apache.org/search.html
 
Dw.
-- 
Why drink and drive whey you can smoke and fly ?