You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Shawn Talbert <st...@exploreconsulting.com> on 2008/03/02 16:13:32 UTC

searching repository code

What's the best tool for searching (both code and comments) a subversion
repository?

 

It'd be nice if there were something svn-aware (i.e. able to search only the
head revision, or a range of revisions, or revisions after date X, etc.).

 

I've considered periodically exporting the entire repo and using a generic
search engine on it, but that seems less than ideal.. 


Re: searching repository code

Posted by Toby Thain <to...@telegraphics.com.au>.
On 3-Mar-08, at 6:19 AM, david.x.grierson@jpmorgan.com wrote:

> ...
>
> W.R.T. Fisheye - it has the following problems (disclaimer - we run a
> pretty large subversion set up here - 900+ repositories consuming  
> 700GB+
> of data with 140+ of those SVN repositories configured in 3 fisheye
> instances

Not everyone is working at this scale. :-)

Are you working with Cenqua on your issues?

--Toby


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: searching repository code

Posted by da...@jpmorgan.com.
You might want to look at Krugle - http://www.krugle.com/ - who provide a 
1U search appliance which can scan Subversion and (IIRC) ClearCase 
repositories.

I haven't evaluated Krugle yet - I'm pretty interested - however you can 
imagine what the kind of pain trying to get new hardware into my kind of 
place would be like.

W.R.T. Fisheye - it has the following problems (disclaimer - we run a 
pretty large subversion set up here - 900+ repositories consuming 700GB+ 
of data with 140+ of those SVN repositories configured in 3 fisheye 
instances - the reason for 3 fisheye instances is as a workaround to the 
first issue described below).

The following is presented to this list purely as an FYI - I'm not 
necessarily looking for assistance with these issues since Atlassian have 
essentially admitted that they are known problems with no immediate 
workaround.

If anyone has any suggestions for these issues then they would be warmly 
received.

Serial repository scanning
==========================
Under normal working conditions Fisheye scans repositories linearly 
(repository1 -> repository2 -> repository3 -> ... -> repository1) - in the 
event of scanning blocking on one repository - for example a large scale 
addition to a repository - all other repositories are not updated until 
the scan of that repository is completed.

During this blocking other commits may be taking place to the other 
repositories (and may also be carried out on the repository being scanned) 
thus there is more work invovled in bringing these repositories up to 
date.

Additionally any one of these repositories may also have a large scale 
update applied to it which could cause further delay on later 
repositories.

Initial indexing is performed in a different thread from normal scanning - 
therefore the addition of new repositories does not cause blocking.

Service cannot be restarted while any repository is receiving initial 
scanning
==============================================================================
If a repository has initial scanning taking place - this takes place when 
the repository is either re-indexed (e.g. has had any configuration 
changes applied to the repository structure) or the repository has just 
been added to the configuration.

This is a large volume update and, especially in the case of re-indexing, 
can potentially take a long time to complete (days or even weeks). If the 
fisheye service is restarted during the period when re-indexing is taking 
place the new repository moves from being scanned by the initial parallel 
scanning to the serial scanning method described above regardless of where 
in the revisions the initial scanning has reached.

For example a Subversion repository is added to fisheye for the first time 
- the repository has 28,000 revisions. Fisheye will catalogue these 28,000 
revisions using the parallel initial thread.

If the fisheye server is restarted when only 12,000 revisions have been 
catalogued in this repository then upon restart 16,000 revisions will have 
to be indexed by the serial scanning thread. All other repositories will 
be blocked from updating until this initial scanning has been completed.

To compound this problem, there is only a single thread to perform initial 
scanning - consequently other repositories are queued to receive initial 
index scanning behind the currently active one.

This includes restarting the service due to crashes.

Does not cope well with branch/tag deletions
============================================
One of the operations which regularly causing blocking of repository 
scanning is the removal of branches or tags within a Subversion 
repository.

Does not cope well with unusual repository structures
=====================================================
Fisheye requires a consistent structure in order to index the content 
correctly. If a structure changes within a repository then Fisheye needs 
to have that structural change applied to the repository - this then means 
that the repository needs to be fully re-indexed (see points 1 & 2 above 
concerning this).

Dg.
--
David Grierson
JPMorgan - IB Architecture - Source Code Management Consultant
GDP 228-5574 / DDI +44 141 228 5574 / Email david.x.grierson@jpmorgan.com
Alhambra House 6th floor, 45 Waterloo Street, Glasgow G2 6HS
 



Toby Thain <to...@telegraphics.com.au> 
02/03/2008 23:26

To
Shawn Talbert <st...@exploreconsulting.com>
cc
<us...@subversion.tigris.org>
Subject
Re: searching repository code







On 2-Mar-08, at 10:13 AM, Shawn Talbert wrote:

What’s the best tool for searching (both code and comments) a subversion 
repository?
 
It’d be nice if there were something svn-aware (i.e. able to search only 
the head revision, or a range of revisions, or revisions after date X, 
etc.).

Try FishEye - play with my installation here:
https://www.telegraphics.com.au/fisheye/search/psdparse/

Main page:
https://www.telegraphics.com.au/fisheye

FishEye product:
http://www.atlassian.com/software/fisheye/

--Toby

 
I’ve considered periodically exporting the entire repo and using a generic 
search engine on it, but that seems less than ideal..



Generally, this communication is for informational purposes only
and it is not intended as an offer or solicitation for the purchase
or sale of any financial instrument or as an official confirmation
of any transaction. In the event you are receiving the offering
materials attached below related to your interest in hedge funds or
private equity, this communication may be intended as an offer or
solicitation for the purchase or sale of such fund(s).  All market
prices, data and other information are not warranted as to
completeness or accuracy and are subject to change without notice.
Any comments or statements made herein do not necessarily reflect
those of JPMorgan Chase & Co., its subsidiaries and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.
Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to UK legal entities.

Re: searching repository code

Posted by Toby Thain <to...@telegraphics.com.au>.
On 2-Mar-08, at 10:13 AM, Shawn Talbert wrote:

> What’s the best tool for searching (both code and comments) a  
> subversion repository?
>
> It’d be nice if there were something svn-aware (i.e. able to search  
> only the head revision, or a range of revisions, or revisions after  
> date X, etc.).

Try FishEye - play with my installation here:
https://www.telegraphics.com.au/fisheye/search/psdparse/

Main page:
https://www.telegraphics.com.au/fisheye

FishEye product:
http://www.atlassian.com/software/fisheye/

--Toby

>
> I’ve considered periodically exporting the entire repo and using a  
> generic search engine on it, but that seems less than ideal..