You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Alan Langford <ja...@ambitonline.com> on 2002/05/03 13:13:20 UTC

Document Management

One of the guys on the XWT discussion list (see www.xwt.org, I won't rave 
about how cool this is over here) wants an open source a document 
management system that's not web based (read that as "not HTML based"). He 
had the idea of using XWT to build a GUI for his own document management 
system. So of course I pointed him over here to Subversion. I figure 'tis 
better to add to this project than spend a dozen or so person-years 
reinventing it.

But he's come back with a question that I haven't seen and I thought I'd 
ask it here. Clearly the ability to have decent support for binary format 
files allows the repository to store and retrieve images, pdf documents, 
etc. The question is are there facilities or hooks for doing things like 
document profiling and indexing (or how difficult is it to implement this 
functionality).

It would be nice, for example, if Subversion could trigger an indexing 
process on the type of file being checked in (documents get indexed by a 
scan of the file, images accept a keyword list). Presumably this leads to 
search-based retrieval capabilities...

It would be *very* nice if Subversion could be the engine for a document 
management system like this.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Document Management

Posted by "Glenn A. Thompson" <gt...@cdr.net>.

Greg Stein wrote:

> On Fri, May 03, 2002 at 09:02:56AM -0500, Ben Collins-Sussman wrote:
> > Alan Langford <ja...@ambitonline.com> writes:
> > > The question is are there facilities or hooks for doing things like
> > > document profiling and indexing (or how difficult is it to implement
> > > this functionality).
> > >
> > > It would be nice, for example, if Subversion could trigger an indexing
> > > process on the type of file being checked in (documents get indexed by
> > > a scan of the file, images accept a keyword list). Presumably this
> > > leads to search-based retrieval capabilities...
> >
> > Sure, just like CVS, the SVN repository has pre- and post-commit hooks
> > that you can attach scripts to.  That's where you'd do your indexing.
>
> Yup. I would think that you would have a post-commit script (not pre -- you
> want to wait for the thing to actually be committed). That script would send
> the documents to index over to the indexer daemon. That would add the thing
> onto a queue (if it isn't already there), and start/continue processing the
> queue.

Duuuuuh, I like this better than what I was thinking of doing.  Come to think of
it, I think IFS ConText indexing works exactly this way.

>
>
> That script can process the triggers, or the daemon could do it. Note that
> the daemon could also talk to the repository through the libraries, for easy
> access to the content, revisions, changes, etc.
>
> Cheers,
> -g
>
> --
> Greg Stein, http://www.lyra.org/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Document Management

Posted by Greg Stein <gs...@lyra.org>.
On Fri, May 03, 2002 at 09:02:56AM -0500, Ben Collins-Sussman wrote:
> Alan Langford <ja...@ambitonline.com> writes:
> > The question is are there facilities or hooks for doing things like
> > document profiling and indexing (or how difficult is it to implement
> > this functionality).
> > 
> > It would be nice, for example, if Subversion could trigger an indexing
> > process on the type of file being checked in (documents get indexed by
> > a scan of the file, images accept a keyword list). Presumably this
> > leads to search-based retrieval capabilities...
> 
> Sure, just like CVS, the SVN repository has pre- and post-commit hooks
> that you can attach scripts to.  That's where you'd do your indexing.

Yup. I would think that you would have a post-commit script (not pre -- you
want to wait for the thing to actually be committed). That script would send
the documents to index over to the indexer daemon. That would add the thing
onto a queue (if it isn't already there), and start/continue processing the
queue.

That script can process the triggers, or the daemon could do it. Note that
the daemon could also talk to the repository through the libraries, for easy
access to the content, revisions, changes, etc.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Document Management

Posted by Ben Collins-Sussman <su...@collab.net>.
Alan Langford <ja...@ambitonline.com> writes:

> The question is are there facilities or hooks for doing things like
> document profiling and indexing (or how difficult is it to implement
> this functionality).
> 
> It would be nice, for example, if Subversion could trigger an indexing
> process on the type of file being checked in (documents get indexed by
> a scan of the file, images accept a keyword list). Presumably this
> leads to search-based retrieval capabilities...

Sure, just like CVS, the SVN repository has pre- and post-commit hooks
that you can attach scripts to.  That's where you'd do your indexing.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Document Management

Posted by "Glenn A. Thompson" <gt...@cdr.net>.
Hey:

Funny you should mention this.
My primary interest in Subversion is for document management.  Isn't that what
version control essentially is?
I will of course be switching to it for my code as well.

Alan Langford wrote:

> One of the guys on the XWT discussion list (see www.xwt.org, I won't rave
> about how cool this is over here) wants an open source a document
> management system that's not web based (read that as "not HTML based"). He
> had the idea of using XWT to build a GUI for his own document management
> system. So of course I pointed him over here to Subversion. I figure 'tis
> better to add to this project than spend a dozen or so person-years
> reinventing it.
>
> But he's come back with a question that I haven't seen and I thought I'd
> ask it here. Clearly the ability to have decent support for binary format
> files allows the repository to store and retrieve images, pdf documents,
> etc. The question is are there facilities or hooks for doing things like
> document profiling and indexing (or how difficult is it to implement this
> functionality).
>

I'm going to be doing my indexing prior to putting it in Subversion.  The
reason is that indexing (in the OCR sense) is not a 100% fool proof process.
Most OCR packages have correction flows that can be used to resolve this
before hand.  I think the commits can be held up for a review/fix process but
"boy" that seems to create a bit of a burden on Subversion if someone isn't
there to release it.

As for profiling. It could be done with hooks I would think.  If you haven't
checked out Oracle IFS you might want to.  It does all this and more.  However
it has issues that caused me to get involved here instead of using it.  On the
plus side:  It's versioned (using a locking method by default eeeh), it
provides boat loads of protocols/interfaces.  Including one that is similar to
TortoiseCVS.
On the negative:  It is a P I G pig.  No problem, hardware is cheap.  Well not
so fast.  Oracle wants a piece of you for every processor involved.  Lets stop
the madness.  Larry doesn't need another Airplane.

>
> It would be nice, for example, if Subversion could trigger an indexing
> process on the type of file being checked in (documents get indexed by a
> scan of the file, images accept a keyword list). Presumably this leads to
> search-based retrieval capabilities...

Oracle does this via "Context" and using custom parsers.  They include a XML
based parser with IFS which appeared to be rather flexible.  Worked very well.

I'm going to ease myself into indexing using Subversion properties (no more
than a dozen properties per document).  If another "better" approach comes
along I can relocate the metadata from properties and delete the properties.
For my purposes it will work fine.  After all, the current document management
system we use is called Samba:-)

>
>
> It would be *very* nice if Subversion could be the engine for a document
> management system like this.

I'm betting on it.
I think this is another way a SQL backend becomes quite useful.

Later,

gat

>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org