You are viewing a plain text version of this content. The canonical link for it is here.
Posted to docs@httpd.apache.org by Joshua Slive <jo...@slive.ca> on 2007/06/13 17:52:11 UTC

search engine in apache docs

It would be nice to replace the current hacked-up google search of our
docs with a proper "custom search engine" from google. The problem is
that custom search engines are tied to specific google accounts, and I
don't think it is a good idea for any single person to have control
over the config.

What if we created a google account for the docs project and stored
its userid and password in the private pmc svn and used that to create
the search engine? Does this violate google's terms of service?
Opinions?

Joshua.

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: search engine in apache docs

Posted by Joshua Slive <jo...@slive.ca>.
On 6/14/07, Tony Stevenson <to...@pc-tony.com> wrote:

> Ok, once it has come through I'll pester you so we can get started.
> Have we reached the decision that we should be using solr, over Google?

Yes. If we have people interested in putting the time in to give it a
go (that would be you and Sander), I think we should try solr.

Joshua.

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: search engine in apache docs

Posted by Tony Stevenson <to...@pc-tony.com>.
>> > If you want to take that on Tony, I'm fine with that too. But you need
>> > to realize that a lot of the work is going to be figuring out how to
>> > get this to interact well with apache infrastructure. (What server
>> > should it run on? Does it need a custom account for the necessary cron
>> > jobs? How will this interact with our mirrors? etc.) So my opinion is
>> > that we need another volunteer who knows the apache ins and outs
>> > better to help you along. I'm not all that interested in that
>> > particular job myself.
>>
>> I can help out from the infra point of view.

Great stuff, thanks Sander.  An insight into the infra will be very 
beneficial.
I will enjoy getting to grips with the infra side of things, unless it 
is anything like the majority of my clients  :-)


>
> Great. Then if Tony wants the job we'll just need to wait for his
> account to come through and then he can have at it with your help.
> I'll put off the idea of switching to a google CSE while we give this
> a try.

Ok, once it has come through I'll pester you so we can get started.
Have we reached the decision that we should be using solr, over Google?


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: search engine in apache docs

Posted by Joshua Slive <jo...@slive.ca>.
On 6/13/07, Sander Temme <sc...@apache.org> wrote:
>
> On Jun 13, 2007, at 5:35 PM, Joshua Slive wrote:
>
> > If you want to take that on Tony, I'm fine with that too. But you need
> > to realize that a lot of the work is going to be figuring out how to
> > get this to interact well with apache infrastructure. (What server
> > should it run on? Does it need a custom account for the necessary cron
> > jobs? How will this interact with our mirrors? etc.) So my opinion is
> > that we need another volunteer who knows the apache ins and outs
> > better to help you along. I'm not all that interested in that
> > particular job myself.
>
> I can help out from the infra point of view.

Great. Then if Tony wants the job we'll just need to wait for his
account to come through and then he can have at it with your help.
I'll put off the idea of switching to a google CSE while we give this
a try.

Joshua.

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: search engine in apache docs

Posted by Sander Temme <sc...@apache.org>.
On Jun 13, 2007, at 5:35 PM, Joshua Slive wrote:

> If you want to take that on Tony, I'm fine with that too. But you need
> to realize that a lot of the work is going to be figuring out how to
> get this to interact well with apache infrastructure. (What server
> should it run on? Does it need a custom account for the necessary cron
> jobs? How will this interact with our mirrors? etc.) So my opinion is
> that we need another volunteer who knows the apache ins and outs
> better to help you along. I'm not all that interested in that
> particular job myself.

I can help out from the infra point of view.

S.

-- 
Sander Temme
sctemme@apache.org
PGP FP: 51B4 8727 466A 0BC3 69F4  B7B8 B2BE BC40 1529 24AF




Re: search engine in apache docs

Posted by Joshua Slive <jo...@slive.ca>.
On 6/13/07, Tony Stevenson <to...@pc-tony.com> wrote:
> I tell you what, once a decision has been made I would be very happy to
> take this on as my first task. (Above and beyond patching, of course).
> Is there a case for trialling so called 'rival' products in a bid to
> ascertain their suitability for the task in hand.

I think that is overkill.

I was suggesting a very slight change to make our docs search a little
easier and more professional looking.

Others have suggested changing to a home-grown search engine: solr.
I'm fine with that as long as we have a volunteer to ride it through
to completion.

If you want to take that on Tony, I'm fine with that too. But you need
to realize that a lot of the work is going to be figuring out how to
get this to interact well with apache infrastructure. (What server
should it run on? Does it need a custom account for the necessary cron
jobs? How will this interact with our mirrors? etc.) So my opinion is
that we need another volunteer who knows the apache ins and outs
better to help you along. I'm not all that interested in that
particular job myself.

Joshua.

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: search engine in apache docs

Posted by Tony Stevenson <to...@pc-tony.com>.
...

Joshua Slive wrote:
>
> I have no problem at all with that, providing someone is willing to
> take charge, drive it through to completion, and assure it is
> maintained for some period of time. My primary concern is with giving
> our users the best possible search. Our experience with running our
> own search engines on *.apache.org has been mixed at best.

I tell you what, once a decision has been made I would be very happy to 
take this on as my first task. (Above and beyond patching, of course).
Is there a case for trialling so called 'rival' products in a bid to 
ascertain their suitability for the task in hand.


...

> With regards to who could access the account, it would actually be all
> apache comitters I believe. We could choose different access control
> if we wanted, but I don't think it is a big deal. (We just don't want
> anyone switching it to point to playboy.com... Or maybe we do...)
>
> With regards to 1.3 having too much weight, I would setup a separate
> search engine for each version. (I would probably also include the
> wiki in all the versions.)

This should be a relatively straight forward process once we have a 
front runner.  Enabling searching for a specific version should be a 
rather simple approach.

>
> Regarding google or some other public search engine, we already use
> google and they have a relatively easy system that doesn't show adds
> (and therefore earns them no direct revenue), so I don't really see
> any conflict.
>
>

...

--
Tony


---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: search engine in apache docs

Posted by Joshua Slive <jo...@slive.ca>.
On 6/13/07, Joshua Slive <jo...@slive.ca> wrote:

> With our current search box, all the revenue is going to google.

I take that back. Google isn't currently running adds on our searches
it appears. But I believe they have the right to do so.

Anyway, with CSE they explicitly allow non-profits to opt out of adds.

Joshua.

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: search engine in apache docs

Posted by Joshua Slive <jo...@slive.ca>.
On 6/13/07, Sander Temme <sa...@temme.net> wrote:
> Rich Bowen wrote:
>
> > I wonder, also, whether it would be possible to run solr, since it's
> > actually an Apache project. However, having never had any success
> > getting it working on my own servers, that might be a can of worms.
>
> Firmly +1 on dogfooding.  If *we* can't get Solr to work, who can?  I
> mean it's our project, right?  Even if it's from the Java Weenie side?

I have no problem at all with that, providing someone is willing to
take charge, drive it through to completion, and assure it is
maintained for some period of time. My primary concern is with giving
our users the best possible search. Our experience with running our
own search engines on *.apache.org has been mixed at best.

>
> Regarding Google, who's making the money from the searches?  I came
> across a $100M figure that Firefox is making from the search box in the
> browser.  While we'd obviously not be flying that high, would the ASF be
> entitled to any such revenue from our own search box, or the individual
> who happens to own the account?

With our current search box, all the revenue is going to google. With
CSE, since we qualify as a non-profit, we can eliminate adds entirely.
(Or we can probably participate in their revenue sharing thing if we
want; but again, that adds complication that I'm not looking for at
the moment.)

With regards to who could access the account, it would actually be all
apache comitters I believe. We could choose different access control
if we wanted, but I don't think it is a big deal. (We just don't want
anyone switching it to point to playboy.com... Or maybe we do...)

With regards to 1.3 having too much weight, I would setup a separate
search engine for each version. (I would probably also include the
wiki in all the versions.)

Regarding google or some other public search engine, we already use
google and they have a relatively easy system that doesn't show adds
(and therefore earns them no direct revenue), so I don't really see
any conflict.

(Boy, I didn't think this was going to be that complicated an issue ;-)

Joshua.

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: search engine in apache docs

Posted by Sander Temme <sc...@apache.org>.
On Jun 15, 2007, at 9:53 PM, Justin Erenkrantz wrote:

> On 6/13/07, Sander Temme <sa...@temme.net> wrote:
>> browser.  While we'd obviously not be flying that high, would the  
>> ASF be
>> entitled to any such revenue from our own search box, or the  
>> individual
>> who happens to own the account?
>
> FWIW, we don't want to be involved with any such ad-sharing from
> Google (or anyone else for that matter) - there are deep tax
> implications involved for our non-profit status.  So, turning off ads
> is just fine...  -- justin

That's why I brought it up.  I didn't even realize that this  
particular search doesn't show ads.

S.

-- 
Sander Temme
sctemme@apache.org
PGP FP: 51B4 8727 466A 0BC3 69F4  B7B8 B2BE BC40 1529 24AF




Re: search engine in apache docs

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On 6/13/07, Sander Temme <sa...@temme.net> wrote:
> browser.  While we'd obviously not be flying that high, would the ASF be
> entitled to any such revenue from our own search box, or the individual
> who happens to own the account?

FWIW, we don't want to be involved with any such ad-sharing from
Google (or anyone else for that matter) - there are deep tax
implications involved for our non-profit status.  So, turning off ads
is just fine...  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: search engine in apache docs

Posted by Mads Toftum <ma...@toftum.dk>.
On Wed, Jun 13, 2007 at 01:33:23PM -0400, Sander Temme wrote:
> Rich Bowen wrote:
> 
> > I wonder, also, whether it would be possible to run solr, since it's
> > actually an Apache project. However, having never had any success
> > getting it working on my own servers, that might be a can of worms.
> 
> Firmly +1 on dogfooding.  If *we* can't get Solr to work, who can?  I
> mean it's our project, right?  Even if it's from the Java Weenie side?
> 
Also +1 here as well. In fact I expect that something like Solr could
work much more efficiently with our structured documentation.

> Regarding Google, who's making the money from the searches?  I came

I'm not too worried about the $$, but given the amount of competition
among the search engines out there, choosing sides doesn't seem quite
right.

vh

Mads Toftum
-- 
http://soulfood.dk

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: search engine in apache docs

Posted by Sander Temme <sa...@temme.net>.
Rich Bowen wrote:

> I wonder, also, whether it would be possible to run solr, since it's
> actually an Apache project. However, having never had any success
> getting it working on my own servers, that might be a can of worms.

Firmly +1 on dogfooding.  If *we* can't get Solr to work, who can?  I
mean it's our project, right?  Even if it's from the Java Weenie side?

Regarding Google, who's making the money from the searches?  I came
across a $100M figure that Firefox is making from the search box in the
browser.  While we'd obviously not be flying that high, would the ASF be
entitled to any such revenue from our own search box, or the individual
who happens to own the account?

More than nit-picking over the money (I know Apache doesn't monetize
terribly well anyway), I'm interested in the construction an
organization like Mozilla is using vis-a-vis Google.

S.

Re: search engine in apache docs

Posted by Rich Bowen <rb...@rcbowen.com>.
On Jun 13, 2007, at 11:52, Joshua Slive wrote:

> It would be nice to replace the current hacked-up google search of our
> docs with a proper "custom search engine" from google. The problem is
> that custom search engines are tied to specific google accounts, and I
> don't think it is a good idea for any single person to have control
> over the config.
>
> What if we created a google account for the docs project and stored
> its userid and password in the private pmc svn and used that to create
> the search engine? Does this violate google's terms of service?
> Opinions?

This sounds like a good idea.

I wonder, also, whether it would be possible to run solr, since it's  
actually an Apache project. However, having never had any success  
getting it working on my own servers, that might be a can of worms.

--
If we only live,
We too will go to sea in a Sieve,---
   To the hills of the Chankly Bore!



---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: search engine in apache docs

Posted by Tony Stevenson <to...@pc-tony.com>.

Joshua Slive wrote:
> It would be nice to replace the current hacked-up google search of our
> docs with a proper "custom search engine" from google. The problem is
> that custom search engines are tied to specific google accounts, and I
> don't think it is a good idea for any single person to have control
> over the config.
Odd, I was just thinking about this a few days ago.  But I wanted to get 
to grips with the docs-project as a whole before putting my size 12's 
firmly down my throat.
:-)

Does it have to be a google based SE? I am not sure if there is any 
historical, or political reasons for this?

>
> What if we created a google account for the docs project and stored
> its userid and password in the private pmc svn and used that to create
> the search engine? Does this violate google's terms of service?

Sounds like a good place to store the details.  Who would have access to 
this? Just the PMC? People in the docs-project? or is it more granualr 
than that?

> Opinions?
>
> Joshua.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
> For additional commands, e-mail: docs-help@httpd.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: search engine in apache docs

Posted by Vincent Bray <no...@gmail.com>.
On 13/06/07, Joshua Slive <jo...@slive.ca> wrote:
> What if we created a google account for the docs project and stored
> its userid and password in the private pmc svn and used that to create
> the search engine? Does this violate google's terms of service?
> Opinions?

Given the long history of 1.3 and the nature of google's pagerank, I'd
fear that any search results would be weighted too much towards the
older versions. Of course a home-grown solution could fix that.

-- 
noodl

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Re: search engine in apache docs

Posted by Vincent Bray <no...@gmail.com>.
> It would be nice to replace the current hacked-up google search of our
> docs with a proper "custom search engine" from google. The problem is
> that custom search engines are tied to specific google accounts, and I
> don't think it is a good idea for any single person to have control
> over the config.
>
> What if we created a google account for the docs project and stored
> its userid and password in the private pmc svn and used that to create
> the search engine? Does this violate google's terms of service?
> Opinions?

Are you referring to this? http://google.com/coop/cse/

Looking at the examples I don't see why your suggestion would be a
problem, though it does violate the dogfood mantra.

-- 
noodl

---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org