You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@gump.apache.org by Davanum Srinivas <di...@yahoo.com> on 2003/12/03 14:55:46 UTC

Re: Let Googlebot crawl our cvs?

Nicola,
Let's move the discussion to gump mailing list. 

Gump Team,
Here's a heads up. We want to generate browse-able (fresh from CVS) source code that can be
searched/indexed by say Googlebot. Initial discussion heppened on infrastructure@ mailing list
where Nicola came up with a solution. Solution is to integrate the javasrc (Java-to-HTML thingy)
in alexandria with Gump and post the results in a location say (http://cvs.apache.org/code). What
do you guys think?

Thanks,
dims

--- Nicola Ken Barozzi <ni...@apache.org> wrote:
> 
> Davanum Srinivas wrote:
> 
> > Nicola,
> > 
> > ACK. Can you please do a dry run and put up the generated html's on
> > http://nagoya.apache.org/code/?  
> 
> Look here [1] and click on the "javasrc" tab. The CSS coloring doesn't 
> work on Nix browsers on that run as the style tag references the style 
> with ..\..\.. instead of ../../..
> 
> The think I have to do is to add source directories to the Gump 
> descriptors, so that Javasrc can use those and know where to search for.
> 
> [1] http://cvs.apache.org/~nicolaken/whiteboard/alexandria-docs/xml-cocoon/
> 
> -- 
> Nicola Ken Barozzi                   nicolaken@apache.org
>              - verba volant, scripta manent -
>     (discussions get forgotten, just code remains)
> ---------------------------------------------------------------------


=====
Davanum Srinivas - http://webservices.apache.org/~dims/

Re: Let Googlebot crawl our cvs?

Posted by Daniel Rall <dl...@finemaltcoding.com>.
I seem to recall that use of ViewCVS's "*checkout*"-style URLs is fairly 
expensive on the server-side.  viewcvs.py uses bincvs.rcslib -- which in turn 
uses the rcsfile binary -- to parse the ,v files in the CVS repository itself. 
This operation scales linearly with the size of the ,v file (a function of 
number of changes and size of each delta).  This is likely part of the reason we 
block robots from browsing it with a robots.txt of:

User-agent: *
Disallow: /

Reconstituting the trunk trades a little disk space for better performance and 
scalability.  Reconstituting other branches might be useful as well, but would 
end up using up a lot more disk space.


Martin van den Bemt wrote:
> Hmm think I get too much mail to remember everything :) 
> Sorry :)
> But  maybe the cvsweb url is a nice idea anyway :)
> 
> Mvgr,
> Martin
> 
> On Wed, 2003-12-03 at 15:13, Davanum Srinivas wrote:
> 
>>Martin,
>>
>>Here's my original email.....
>>
>>"I was looking for a replacement for JDK1.4 String.split that would work in 1.3 environment and
>>found that turbine had one (http://www.google.com/search?q=stringutils++split+site%3Aapache.org)
>>and then i was trying to find where in our cvs if the latest code and took a while to finally
>>found it in Jakarta commons' lang project. 
>>
>>To cut a long story short....Should we make finding existing code easier by allowing google's
>>crawler to crawl http://cvs.apache.org/viewcvs/? (currently there is a
>>http://cvs.apache.org/robots.txt that prevents this from happening)."


Re: Let Googlebot crawl our cvs?

Posted by Martin van den Bemt <ml...@mvdb.net>.
Hmm think I get too much mail to remember everything :) 
Sorry :)
But  maybe the cvsweb url is a nice idea anyway :)

Mvgr,
Martin

On Wed, 2003-12-03 at 15:13, Davanum Srinivas wrote:
> Martin,
> 
> Here's my original email.....
> 
> "I was looking for a replacement for JDK1.4 String.split that would work in 1.3 environment and
> found that turbine had one (http://www.google.com/search?q=stringutils++split+site%3Aapache.org)
> and then i was trying to find where in our cvs if the latest code and took a while to finally
> found it in Jakarta commons' lang project. 
> 
> To cut a long story short....Should we make finding existing code easier by allowing google's
> crawler to crawl http://cvs.apache.org/viewcvs/? (currently there is a
> http://cvs.apache.org/robots.txt that prevents this from happening)."
> 
> -- dims



Re: Let Googlebot crawl our cvs?

Posted by Davanum Srinivas <di...@yahoo.com>.
Martin,

Here's my original email.....

"I was looking for a replacement for JDK1.4 String.split that would work in 1.3 environment and
found that turbine had one (http://www.google.com/search?q=stringutils++split+site%3Aapache.org)
and then i was trying to find where in our cvs if the latest code and took a while to finally
found it in Jakarta commons' lang project. 

To cut a long story short....Should we make finding existing code easier by allowing google's
crawler to crawl http://cvs.apache.org/viewcvs/? (currently there is a
http://cvs.apache.org/robots.txt that prevents this from happening)."

-- dims


--- Martin van den Bemt <ml...@mvdb.net> wrote:
> (bulloks I just keep forgetting to hit reply-all..)
> 
> Why not just add the correct link to viewcvs in the commit mails ? 
> Since commit mails are normally found on mail archives through eg
> google..
> 
> Mvgr,
> Martin
> On Wed, 2003-12-03 at 14:55, Davanum Srinivas wrote:
> > Nicola,
> > Let's move the discussion to gump mailing list. 
> > 
> > Gump Team,
> > Here's a heads up. We want to generate browse-able (fresh from CVS) source code that can be
> > searched/indexed by say Googlebot. Initial discussion heppened on infrastructure@ mailing list
> > where Nicola came up with a solution. Solution is to integrate the javasrc (Java-to-HTML
> thingy)
> > in alexandria with Gump and post the results in a location say (http://cvs.apache.org/code).
> What
> > do you guys think?
> > 
> > Thanks,
> > dims
> > 
> > --- Nicola Ken Barozzi <ni...@apache.org> wrote:
> > > 
> > > Davanum Srinivas wrote:
> > > 
> > > > Nicola,
> > > > 
> > > > ACK. Can you please do a dry run and put up the generated html's on
> > > > http://nagoya.apache.org/code/?  
> > > 
> > > Look here [1] and click on the "javasrc" tab. The CSS coloring doesn't 
> > > work on Nix browsers on that run as the style tag references the style 
> > > with ..\..\.. instead of ../../..
> > > 
> > > The think I have to do is to add source directories to the Gump 
> > > descriptors, so that Javasrc can use those and know where to search for.
> > > 
> > > [1] http://cvs.apache.org/~nicolaken/whiteboard/alexandria-docs/xml-cocoon/
> > > 
> > > -- 
> > > Nicola Ken Barozzi                   nicolaken@apache.org
> > >              - verba volant, scripta manent -
> > >     (discussions get forgotten, just code remains)
> > > ---------------------------------------------------------------------
> > 
> > 
> > =====
> > Davanum Srinivas - http://webservices.apache.org/~dims/
> -- 
> Martin van den Bemt <ml...@mvdb.net>
> mvdb.com
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: gump-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: gump-help@jakarta.apache.org
> 


=====
Davanum Srinivas - http://webservices.apache.org/~dims/

Re: Let Googlebot crawl our cvs?

Posted by Martin van den Bemt <ml...@mvdb.net>.
(bulloks I just keep forgetting to hit reply-all..)

Why not just add the correct link to viewcvs in the commit mails ? 
Since commit mails are normally found on mail archives through eg
google..

Mvgr,
Martin
On Wed, 2003-12-03 at 14:55, Davanum Srinivas wrote:
> Nicola,
> Let's move the discussion to gump mailing list. 
> 
> Gump Team,
> Here's a heads up. We want to generate browse-able (fresh from CVS) source code that can be
> searched/indexed by say Googlebot. Initial discussion heppened on infrastructure@ mailing list
> where Nicola came up with a solution. Solution is to integrate the javasrc (Java-to-HTML thingy)
> in alexandria with Gump and post the results in a location say (http://cvs.apache.org/code). What
> do you guys think?
> 
> Thanks,
> dims
> 
> --- Nicola Ken Barozzi <ni...@apache.org> wrote:
> > 
> > Davanum Srinivas wrote:
> > 
> > > Nicola,
> > > 
> > > ACK. Can you please do a dry run and put up the generated html's on
> > > http://nagoya.apache.org/code/?  
> > 
> > Look here [1] and click on the "javasrc" tab. The CSS coloring doesn't 
> > work on Nix browsers on that run as the style tag references the style 
> > with ..\..\.. instead of ../../..
> > 
> > The think I have to do is to add source directories to the Gump 
> > descriptors, so that Javasrc can use those and know where to search for.
> > 
> > [1] http://cvs.apache.org/~nicolaken/whiteboard/alexandria-docs/xml-cocoon/
> > 
> > -- 
> > Nicola Ken Barozzi                   nicolaken@apache.org
> >              - verba volant, scripta manent -
> >     (discussions get forgotten, just code remains)
> > ---------------------------------------------------------------------
> 
> 
> =====
> Davanum Srinivas - http://webservices.apache.org/~dims/
-- 
Martin van den Bemt <ml...@mvdb.net>
mvdb.com