You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Tizy Ninan <ti...@gmail.com> on 2015/03/24 07:12:16 UTC

Crawl images and store locally

Hi,

Does Nutch supports crawling images from webpages? If so, what are the
steps to retrieve the images and store it locally?

Thanks and Regards,
Tizy

Re: Crawl images and store locally

Posted by Tizy Ninan <ti...@gmail.com>.
Hi Chris,

Thanks Chris for the reply.
I took the dump of the segment folder. It contains the content of the image
in bytes format.
Thanks a lot.

Regards,
Tizy

On Tue, Mar 24, 2015 at 7:26 PM, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi Tizy,
>
> After you crawl the images, take a look at ./bin/nutch dump to
> get the images out. ./bin/nutch commoncrawldumper also will
> dump into the common crawl format.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Tizy Ninan <ti...@gmail.com>
> Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
> Date: Monday, March 23, 2015 at 11:12 PM
> To: "dev@nutch.apache.org" <de...@nutch.apache.org>, "user@nutch.apache.org"
> <us...@nutch.apache.org>
> Subject: Crawl images and store locally
>
> >Hi,
> >
> >
> >Does Nutch supports crawling images from webpages? If so, what are the
> >steps to retrieve the images and store it locally?
> >
> >
> >Thanks and Regards,
> >Tizy
> >
> >
> >
> >
> >
> >
> >
> >
>
>


-- 
Thanks and Regards,
Tizy

Re: Crawl images and store locally

Posted by Tizy Ninan <ti...@gmail.com>.
Hi Chris,

Thanks Chris for the reply.
I took the dump of the segment folder. It contains the content of the image
in bytes format.
Thanks a lot.

Regards,
Tizy

On Tue, Mar 24, 2015 at 7:26 PM, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi Tizy,
>
> After you crawl the images, take a look at ./bin/nutch dump to
> get the images out. ./bin/nutch commoncrawldumper also will
> dump into the common crawl format.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Tizy Ninan <ti...@gmail.com>
> Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
> Date: Monday, March 23, 2015 at 11:12 PM
> To: "dev@nutch.apache.org" <de...@nutch.apache.org>, "user@nutch.apache.org"
> <us...@nutch.apache.org>
> Subject: Crawl images and store locally
>
> >Hi,
> >
> >
> >Does Nutch supports crawling images from webpages? If so, what are the
> >steps to retrieve the images and store it locally?
> >
> >
> >Thanks and Regards,
> >Tizy
> >
> >
> >
> >
> >
> >
> >
> >
>
>


-- 
Thanks and Regards,
Tizy

Re: Crawl images and store locally

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Hi Tizy,

After you crawl the images, take a look at ./bin/nutch dump to
get the images out. ./bin/nutch commoncrawldumper also will
dump into the common crawl format.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Tizy Ninan <ti...@gmail.com>
Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
Date: Monday, March 23, 2015 at 11:12 PM
To: "dev@nutch.apache.org" <de...@nutch.apache.org>, "user@nutch.apache.org"
<us...@nutch.apache.org>
Subject: Crawl images and store locally

>Hi,
>
>
>Does Nutch supports crawling images from webpages? If so, what are the
>steps to retrieve the images and store it locally?
>
>
>Thanks and Regards,
>Tizy
>
>
>
>
>
>
>
>


Re: Crawl images and store locally

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Hi Tizy,

After you crawl the images, take a look at ./bin/nutch dump to
get the images out. ./bin/nutch commoncrawldumper also will
dump into the common crawl format.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Tizy Ninan <ti...@gmail.com>
Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
Date: Monday, March 23, 2015 at 11:12 PM
To: "dev@nutch.apache.org" <de...@nutch.apache.org>, "user@nutch.apache.org"
<us...@nutch.apache.org>
Subject: Crawl images and store locally

>Hi,
>
>
>Does Nutch supports crawling images from webpages? If so, what are the
>steps to retrieve the images and store it locally?
>
>
>Thanks and Regards,
>Tizy
>
>
>
>
>
>
>
>