You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by chethan <ch...@gmail.com> on 2014/05/02 13:46:15 UTC

Nutch 1.7 - deleting segments

Hi,

I have a Nutch crawl with 4 segments which are fully indexed using the
bin/nutch
solrindexcommand. Now I'm all out of storage on the box, so can I delete
the 4 segments and retain only the crawldb and continue crawling from where
I left it?

Since all the segments are merged and indexed to Solr I don't see a problem
in deleting the segments, or am I wrong there?

Regards,

--
Chethan Prasad

Re: Nutch 1.7 - deleting segments

Posted by remi tassing <ta...@gmail.com>.
I usually keep the segments as long as I can keep and periodically delete
them
It basically depends on your own needs


On Sun, May 4, 2014 at 1:41 PM, chethan <ch...@gmail.com> wrote:

> Well, the only case where you would want to retain them is if you ever need
> to index the same data again to Solr without having to crawl them. So
> automating the deletion part is risky unless you're sure the indexing has
> gone right.
>
> Regards,
>
> --
> Chethan Prasad
>
>
> On Sun, May 4, 2014 at 12:13 AM, John Lafitte <jlafitte@brandextract.com
> >wrote:
>
> > What would be the case where you would want to keep the segments?  I'm
> > considering automatically deleting them after sending the data to solr
> > On May 3, 2014 2:29 AM, "chethan" <ch...@gmail.com> wrote:
> >
> > > Thanks for your reply!
> > >
> > > Regards,
> > >
> > > --
> > > Chethan Prasad
> > >
> > >
> > > On Sat, May 3, 2014 at 12:22 PM, remi tassing <ta...@gmail.com>
> > > wrote:
> > >
> > > > you are correct
> > > >
> > > >
> > > > On Fri, May 2, 2014 at 7:46 PM, chethan <ch...@gmail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have a Nutch crawl with 4 segments which are fully indexed using
> > the
> > > > > bin/nutch
> > > > > solrindexcommand. Now I'm all out of storage on the box, so can I
> > > delete
> > > > > the 4 segments and retain only the crawldb and continue crawling
> from
> > > > where
> > > > > I left it?
> > > > >
> > > > > Since all the segments are merged and indexed to Solr I don't see a
> > > > problem
> > > > > in deleting the segments, or am I wrong there?
> > > > >
> > > > > Regards,
> > > > >
> > > > > --
> > > > > Chethan Prasad
> > > > >
> > > >
> > >
> >
>

Re: Nutch 1.7 - deleting segments

Posted by chethan <ch...@gmail.com>.
Well, the only case where you would want to retain them is if you ever need
to index the same data again to Solr without having to crawl them. So
automating the deletion part is risky unless you're sure the indexing has
gone right.

Regards,

--
Chethan Prasad


On Sun, May 4, 2014 at 12:13 AM, John Lafitte <jl...@brandextract.com>wrote:

> What would be the case where you would want to keep the segments?  I'm
> considering automatically deleting them after sending the data to solr
> On May 3, 2014 2:29 AM, "chethan" <ch...@gmail.com> wrote:
>
> > Thanks for your reply!
> >
> > Regards,
> >
> > --
> > Chethan Prasad
> >
> >
> > On Sat, May 3, 2014 at 12:22 PM, remi tassing <ta...@gmail.com>
> > wrote:
> >
> > > you are correct
> > >
> > >
> > > On Fri, May 2, 2014 at 7:46 PM, chethan <ch...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a Nutch crawl with 4 segments which are fully indexed using
> the
> > > > bin/nutch
> > > > solrindexcommand. Now I'm all out of storage on the box, so can I
> > delete
> > > > the 4 segments and retain only the crawldb and continue crawling from
> > > where
> > > > I left it?
> > > >
> > > > Since all the segments are merged and indexed to Solr I don't see a
> > > problem
> > > > in deleting the segments, or am I wrong there?
> > > >
> > > > Regards,
> > > >
> > > > --
> > > > Chethan Prasad
> > > >
> > >
> >
>

Re: Nutch 1.7 - deleting segments

Posted by John Lafitte <jl...@brandextract.com>.
What would be the case where you would want to keep the segments?  I'm
considering automatically deleting them after sending the data to solr
On May 3, 2014 2:29 AM, "chethan" <ch...@gmail.com> wrote:

> Thanks for your reply!
>
> Regards,
>
> --
> Chethan Prasad
>
>
> On Sat, May 3, 2014 at 12:22 PM, remi tassing <ta...@gmail.com>
> wrote:
>
> > you are correct
> >
> >
> > On Fri, May 2, 2014 at 7:46 PM, chethan <ch...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I have a Nutch crawl with 4 segments which are fully indexed using the
> > > bin/nutch
> > > solrindexcommand. Now I'm all out of storage on the box, so can I
> delete
> > > the 4 segments and retain only the crawldb and continue crawling from
> > where
> > > I left it?
> > >
> > > Since all the segments are merged and indexed to Solr I don't see a
> > problem
> > > in deleting the segments, or am I wrong there?
> > >
> > > Regards,
> > >
> > > --
> > > Chethan Prasad
> > >
> >
>

Re: Nutch 1.7 - deleting segments

Posted by chethan <ch...@gmail.com>.
Thanks for your reply!

Regards,

--
Chethan Prasad


On Sat, May 3, 2014 at 12:22 PM, remi tassing <ta...@gmail.com> wrote:

> you are correct
>
>
> On Fri, May 2, 2014 at 7:46 PM, chethan <ch...@gmail.com> wrote:
>
> > Hi,
> >
> > I have a Nutch crawl with 4 segments which are fully indexed using the
> > bin/nutch
> > solrindexcommand. Now I'm all out of storage on the box, so can I delete
> > the 4 segments and retain only the crawldb and continue crawling from
> where
> > I left it?
> >
> > Since all the segments are merged and indexed to Solr I don't see a
> problem
> > in deleting the segments, or am I wrong there?
> >
> > Regards,
> >
> > --
> > Chethan Prasad
> >
>

Re: Nutch 1.7 - deleting segments

Posted by remi tassing <ta...@gmail.com>.
you are correct


On Fri, May 2, 2014 at 7:46 PM, chethan <ch...@gmail.com> wrote:

> Hi,
>
> I have a Nutch crawl with 4 segments which are fully indexed using the
> bin/nutch
> solrindexcommand. Now I'm all out of storage on the box, so can I delete
> the 4 segments and retain only the crawldb and continue crawling from where
> I left it?
>
> Since all the segments are merged and indexed to Solr I don't see a problem
> in deleting the segments, or am I wrong there?
>
> Regards,
>
> --
> Chethan Prasad
>