You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Alexander E Genaud <lx...@pobox.com> on 2006/03/06 19:15:17 UTC

Re: nutch-user Digest 6 Mar 2006 17:20:57 -0000 Issue 238

Thanks Stefan,

I'll look into separating the indices. It looks like I can cut the
data in half. Have you any idea about the update-ability of the
indices? If a page is modified is the index modified or is a new entry
added? Otherwise, I do not understand why the directories of segments
are dated.

Cheers,
Alex




From: Stefan Groschupf <sg...@media-style.com>
To: nutch-user@lucene.apache.org
Date: Mon, 6 Mar 2006 13:32:46 +0100
Subject: Re: Offline search (Vicaya 0.1)
Hi,
storing the index on the hdd would be a good idea.
Take a  look to the nutchBean init method to get an idea what you
need to change.
Should be simple by just allowing to provide an location for the
index that is different than the segments folder.

Stefan

Am 06.03.2006 um 12:53 schrieb Alexander E Genaud:

> Hello,
>
> I've just released a modified version of nutch071 and tomcat50 running
> off a CDROM or local harddrive cross-platform:
>
> http://sf.net/projects/vicaya
>
> My ambitions are not 'the whole web' but a small and static collection
> of pages. I intend to allow users to use nutch offline with the
> occasional online content and index update (RSS, webstart, and/or
> Subversion). Please let me know if such questions are out of scope.
>
> I have found that reading the segments on CDROM is the biggest
> performance bottleneck. However, I do not want to require that the
> user copies the entire segments directory to disk. Is it possible to
> separate some data - such as the reverse index from the other fields?
> Would this require a change to Lucene or Nutch's source code?
>
> I am considering importing content and index segments into an SVN
> repository so that users may receive periodic updates. Will the
> segments directory lend itself well to SVN patches? I have
> experimented mostly with intranet search, but I've noticed that whole
> web search creates dated indices. Might it be a matter of adding new
> crawl segments since the last update?
>

Alex
--
Those who can make you believe absurdities can make you commit atrocities
-- François Marie Arouet (Voltaire)
http://cph.blogsome.com
http://genaud.org/alex/key.asc
--
CCC7 D19D D107 F079 2F3D BF97 8443 DB5A 6DB8 9CE1

Re: nutch-user Digest 6 Mar 2006 17:20:57 -0000 Issue 238

Posted by Alexander E Genaud <lx...@pobox.com>.
Hello,

I am attempting to precompile the nutch JSPs on Tomcat-5.5
but have been unsuccessful. I am referencing:

http://tomcat.apache.org/tomcat-5.5-doc/jasper-howto.html
http://tomcat.apache.org/tomcat-5.5-doc/jasper-howto.html#Web%20Application%20Compilation

I have checked out the 0.7.1 source:

svn co http://svn.apache.org/repos/asf/lucene/nutch/tags/release-0.7.1/

And have added three targets (cleanunwar, jspc, compilejsp)
to build.xml based on the tc5.5 Jasper HOWTO.

cleanunwar extracts nutch-0.7.1.war

  <target name="cleanunwar">
      <delete dir="${build.dir}/nutchout"/>
      <unwar src="${build.dir}/nutch-0.7.1.war"
	  dest="${build.dir}/nutchout" overwrite="false" />
  </target>

targets jspc and compilejsp are identical to the jasper-howto with the
following additions to the classpath:

    <pathelement location="${build.classes}"/>
    <fileset dir="${lib.dir}">
      <include name="*.jar" />
    </fileset>

I successfully run:

ant clean compile war

Also this runs successfully:

ant cleanunwar jspc
    -Dtomcat.home=<MY_CATALINE_HOME>
    -Dwebapp.path=build/nutchout

But the final compilejsp target fails:

ant compilejsp
    -Dtomcat.home=<MY_CATALINE_HOME>
    -Dwebapp.path=build/nutchout

===========
log snippet
===========

C:\...\release-0.7.1>ant cleanunwar jspc compilejsp -Dtomcat.home=../../target/
apache-tomcat-5.5.15 -Dwebapp.path=build/nutchout
Buildfile: build.xml

cleanunwar:
   [delete] Deleting directory C:\...\release-0.7.1\build\nutchout
    [unwar] Expanding: C:\...\release-0.7.1\build\nutch-0.7.1.war into
            C:\...\release-0.7.1\build\nutchout

jspc:

compilejsp:
    [javac] Compiling 10 source files to
C:\...\release-0.7.1\build\nutchout\WEB-INF\classes
    [javac] C:\...\release-0.7.1\build\nutchout\WEB-INF\src\org\apache\jsp\cluster_jsp.java:63:
cannot resolve symbol
    [javac] symbol  : class HitsCluster
    [javac] location: class org.apache.jsp.cluster_jsp
    [javac] HitsCluster [] clusters = null;
    [javac] ^
    [javac] C:\...\release-0.7.1\build\nutchout\WEB-INF\src\org\apache\jsp\cluster_jsp.java:64:
cannot resolve symbol
    [javac] symbol  : variable clusterer
    [javac] location: class org.apache.jsp.cluster_jsp
    [javac] if (clusterer != null) {
    [javac]     ^

... etc ...

BUILD FAILED
C:\...\release-0.7.1\build.xml:83: Compile failed;
see the compiler error output for details.

Total time: 7 seconds

===========
end snippet
===========

Does anyone know how I might resolve this?

Thanks,
Alex

Attached are the three targets:

===========
ant snippet
===========

<!-- begin alex -->
  <target name="cleanunwar">
      <delete dir="${build.dir}/nutchout"/>
      <unwar src="${build.dir}/nutch-0.7.1.war"
	  dest="${build.dir}/nutchout" overwrite="false" />
  </target>
<!-- end alex -->

  <target name="jspc">

    <taskdef classname="org.apache.jasper.JspC" name="jasper2" >
      <classpath id="jspc.classpath">
        <pathelement location="${java.home}/../lib/tools.jar"/>
        <fileset dir="${tomcat.home}/bin">
          <include name="*.jar"/>
        </fileset>
        <fileset dir="${tomcat.home}/server/lib">
          <include name="*.jar"/>
        </fileset>
        <fileset dir="${tomcat.home}/common/lib">
          <include name="*.jar"/>
        </fileset>
        <fileset dir="${webapp.path}/WEB-INF/lib">
          <include name="*.jar"/>
        </fileset>
<!-- begin alex  -->
    <pathelement location="${build.classes}"/>
    <fileset dir="${lib.dir}">
      <include name="*.jar" />
    </fileset>
<!-- end alex -->
      </classpath>
    </taskdef>
    <jasper2
             validateXml="false"
             uriroot="${webapp.path}"
             webXmlFragment="${webapp.path}/WEB-INF/generated_web.xml"
             outputDir="${webapp.path}/WEB-INF/src" />
  </target>

  <target name="compilejsp">
    <mkdir dir="${webapp.path}/WEB-INF/classes"/>
    <mkdir dir="${webapp.path}/WEB-INF/lib"/>

    <javac destdir="${webapp.path}/WEB-INF/classes"
           optimize="off"
           debug="on" failonerror="true"
           srcdir="${webapp.path}/WEB-INF/src"
	   excludes="**/*.smap">
      <classpath>
        <pathelement location="${webapp.path}/WEB-INF/classes"/>
        <fileset dir="${webapp.path}/WEB-INF/lib">
          <include name="*.jar"/>
        </fileset>
<!-- begin alex -->
    <pathelement location="${build.classes}"/>
    <fileset dir="${lib.dir}">
      <include name="*.jar" />
    </fileset>
<!-- end alex -->
        <pathelement location="${tomcat.home}/common/classes"/>
        <fileset dir="${tomcat.home}/common/lib">
          <include name="*.jar"/>
        </fileset>
        <pathelement location="${tomcat.home}/shared/classes"/>
        <fileset dir="${tomcat.home}/shared/lib">
          <include name="*.jar"/>
        </fileset>
        <fileset dir="${tomcat.home}/bin">
          <include name="*.jar"/>
        </fileset>
      </classpath>
      <include name="**" />
      <exclude name="tags/**" />
    </javac>

  </target>