You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Daniel Russo <ru...@gmail.com> on 2005/04/07 22:41:00 UTC

Appending with SegmentWriter

I'm trying to use the SegmentWriter class to append data to an
existing segment, but I can't seem to construct an instance of it with
an existing segment directory.  I tried setting the "force" argument
to true, but the constructor still bombs out when it hits the
MapFile.Writer constructors for writing to the data files in the
fetcher/, content/, etc. directories.  I checked the source code for
MapFile.Writer, where I found the following code:

      File dir = new File(dirName);
      if (nfs.exists(dir)) {
          throw new IOException("already exists: " + dir);
      }
      nfs.mkdirs(dir);

Thus, MapFile.Writer can NEVER write to an existing directory.  The
SequenceFile.Writer instances created in the MapFile.Writer
constructor throw the same exception in a couple more places.  Is
there any way to work around this without rewriting all these writer
classes?  If not, then the "force" option is effectively useless, and
a segment can never be modified after it is created.

                            -Dan

Re: Appending with SegmentWriter

Posted by Andrzej Bialecki <ab...@getopt.org>.
Daniel Russo wrote:
> I'm trying to use the SegmentWriter class to append data to an
> existing segment, but I can't seem to construct an instance of it with
> an existing segment directory.  I tried setting the "force" argument

Segments are not meant to be appended to. Once created and closed, they 
are immutable. You can however create new segments by copying the data 
from the old segments, and appending new data while the segment is not 
yet closed.

> to true, but the constructor still bombs out when it hits the
> MapFile.Writer constructors for writing to the data files in the
> fetcher/, content/, etc. directories.  I checked the source code for
> MapFile.Writer, where I found the following code:
> 
>       File dir = new File(dirName);
>       if (nfs.exists(dir)) {
>           throw new IOException("already exists: " + dir);
>       }
>       nfs.mkdirs(dir);
> 
> Thus, MapFile.Writer can NEVER write to an existing directory.  The
> SequenceFile.Writer instances created in the MapFile.Writer
> constructor throw the same exception in a couple more places.  Is

The meaning of the "force" flag in SegmentWriter constructors is that if 
it's true, then the previously existing segment data will be DELETED 
first. Apparently, this does not happen, so the current behaviour must 
be fixed. However, this was never supposed to mean that you could append 
to an already existing segment.

You may be interested to look at the SegmentSlicer tool for rearranging 
segment data.

> there any way to work around this without rewriting all these writer
> classes?  If not, then the "force" option is effectively useless, and
> a segment can never be modified after it is created.

As I said, segments - once created - are immutable, so it's not possible 
to fix SegmentWriter to do that. However, the behaviour of SegmentWriter 
that you described, related to the original meaning of the "force" flag, 
should be fixed anyway...

PS. Having said the above, this is just a computer program, so of course 
if you hack your way around, there is always a way to append new records 
to the segment data... ;-) But the current API doesn't allow this, 
because there is no use case for this.

-- 
Best regards,
Andrzej Bialecki
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com