You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by cm...@apache.org on 2002/05/13 23:26:09 UTC

cvs commit: jakarta-lucene-sandbox/contributions/webcrawler-LARM/doc webcrawler_tech_overview.doc webcrawler_tech_overview.pdf

cmarschner    02/05/13 14:26:09

  Modified:    contributions/webcrawler-LARM README.txt
  Added:       contributions/webcrawler-LARM/doc
                        webcrawler_tech_overview.doc
                        webcrawler_tech_overview.pdf
  Log:
  added documentation
  
  Revision  Changes    Path
  1.2       +21 -12    jakarta-lucene-sandbox/contributions/webcrawler-LARM/README.txt
  
  Index: README.txt
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene-sandbox/contributions/webcrawler-LARM/README.txt,v
  retrieving revision 1.1
  retrieving revision 1.2
  diff -u -r1.1 -r1.2
  --- README.txt	4 May 2002 14:32:24 -0000	1.1
  +++ README.txt	13 May 2002 21:26:09 -0000	1.2
  @@ -1,24 +1,33 @@
  -$Id: README.txt,v 1.1 2002/05/04 14:32:24 otis Exp $
  +$Id: README.txt,v 1.2 2002/05/13 21:26:09 cmarschner Exp $
   
   This is the README file for webcrawler-LARM contribution to Lucene Sandbox.
   
  +This contribution requires:
   
  -- This contribution requires:
  -  a) HTTPClient (not Jakarta's, but this one:
  +a) HTTPClient.jar (not Jakarta's, but this one:
       http://www.innovation.ch/java/HTTPClient/
   b) Jakarta ORO package for regular expressions
   
  -- The original archive file that I got from Clemens had ORO and
  -HTTPClient in lib directory.  I don't think we should include those
  -there, so I took them out.
  +Put the .jars into the lib directory. 
   
  -- This contribution also uses 3rd party (X?)HTML parser, which is
  +Some of the HTTPClient source files will be replaced during the build, so they 
  +will be needed during the build. Sorry, I remember I couldn't do that with
  +inheritance.
  +
  +- This contribution also uses portions of the HeX HTML parser, which is
   included.
  -  I am not sure if Clemens' modified this parser in any way.  If not,
  -maybe we don't have to include it and can instead just add it to the
  -list of required packages.
   
  -- This code requires(?) JDK 1.4, as it uses assert keyword.
  +OG>  I am not sure if Clemens' modified this parser in any way.  If not,
  +OG>  maybe we don't have to include it and can instead just add it to the
  +OG>  list of required packages.
  +
  +The parser was put upside down. Although it apparently still needs some 
  +of the original interfaces, most of them can probably be removed. I will check
  +that out.
  +
  +OG>  This code requires(?) JDK 1.4, as it uses assert keyword.
   
  +No. It still contains a method called assert() for testing. I will probably 
  +rename this sometime (e.g. when changing the tests to JUnit).
   
  -$Id: README.txt,v 1.1 2002/05/04 14:32:24 otis Exp $
  \ No newline at end of file
  +$Id: README.txt,v 1.2 2002/05/13 21:26:09 cmarschner Exp $
  \ No newline at end of file
  
  
  
  1.1                  jakarta-lucene-sandbox/contributions/webcrawler-LARM/doc/webcrawler_tech_overview.doc
  
  	<<Binary file>>
  
  
  1.1                  jakarta-lucene-sandbox/contributions/webcrawler-LARM/doc/webcrawler_tech_overview.pdf
  
  	<<Binary file>>
  
  

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>