You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Pranshu Kumar <pr...@usc.edu> on 2015/02/21 05:45:10 UTC

Nutchpy crawled statistics

I just wanted to know how can we get the crawl statistics ? Is it just
using the command line options of nutch or do we need to write a script to
generate the stats using nutchpy ?

Re: Nutchpy crawled statistics

Posted by Pranshu Kumar <pr...@usc.edu>.
Hi Mohsin,

Thanks for the reply. That is exactly what i was asking. Thanks for
clarifying.

we were also using bin/nutch stats command but i just wanted to be sure if
we have to add some more details to the statistics.

And sorry Professor about the out of context mail. Will be more specific
henceforth with the queries.

On Fri, Feb 20, 2015 at 9:24 PM, Mohammad Al-Mohsin <me...@mem9.net> wrote:

> Hi Pranshu,
>
> I assume you're talking about CS-572
> <http://sunset.usc.edu/classes/cs572_2015/> class assignment at USC.
>
> I think the stats provided by bin/nutch for the crawldb are sufficient
> (Dr. Mattmann correct me if I'm wrong, please).
>
> However, you need to write a script/program to extract the MIME types you
> encountered. You can do this natively with Java or if you prefer Python ~
> like me, you can use nutchpy <https://github.com/ContinuumIO/nutchpy>.
>
>
> Best regards,
> Mohammad Al-Mohsin
>
> On Fri, Feb 20, 2015 at 8:45 PM, Pranshu Kumar <pr...@usc.edu> wrote:
>
>>
>> I just wanted to know how can we get the crawl statistics ? Is it just
>> using the command line options of nutch or do we need to write a script to
>> generate the stats using nutchpy ?
>>
>>
>>
>


-- 


*Regards,Pranshu Kumar*

*ComputerScience Grad Student*

*University of Southern California*

*E-mail: pranshuk@usc.edu <tr...@yahoo.com>Tel: +1-323-899-3830*

Re: Nutchpy crawled statistics

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Exactly, Mohammad, thank you.

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Mohammad Al-Mohsin <me...@mem9.net>
Reply-To: "dev@nutch.apache.org" <de...@nutch.apache.org>
Date: Friday, February 20, 2015 at 9:24 PM
To: "dev@nutch.apache.org" <de...@nutch.apache.org>
Subject: Re: Nutchpy crawled statistics

>Hi Pranshu,
>
>
>I assume you're talking about
>CS-572 <http://sunset.usc.edu/classes/cs572_2015/> class assignment at
>USC.
>
>
>I think the stats provided by bin/nutch for the crawldb are sufficient
>(Dr. Mattmann correct me if I'm wrong, please).
>
>
>However, you need to write a script/program to extract the MIME types you
>encountered. You can do this natively with Java or if you prefer Python ~
>like me, you can use
>nutchpy <https://github.com/ContinuumIO/nutchpy>.
>
>
>
>Best regards,
>Mohammad Al-Mohsin
>
>
>On Fri, Feb 20, 2015 at 8:45 PM, Pranshu Kumar
><pr...@usc.edu> wrote:
>
>
>I just wanted to know how can we get the crawl statistics ? Is it just
>using the command line options of nutch or do we need to write a script
>to generate the stats using nutchpy ?
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: Nutchpy crawled statistics

Posted by Mohammad Al-Mohsin <me...@mem9.net>.
Hi Pranshu,

I assume you're talking about CS-572
<http://sunset.usc.edu/classes/cs572_2015/> class assignment at USC.

I think the stats provided by bin/nutch for the crawldb are sufficient (Dr.
Mattmann correct me if I'm wrong, please).

However, you need to write a script/program to extract the MIME types you
encountered. You can do this natively with Java or if you prefer Python ~
like me, you can use nutchpy <https://github.com/ContinuumIO/nutchpy>.


Best regards,
Mohammad Al-Mohsin

On Fri, Feb 20, 2015 at 8:45 PM, Pranshu Kumar <pr...@usc.edu> wrote:

>
> I just wanted to know how can we get the crawl statistics ? Is it just
> using the command line options of nutch or do we need to write a script to
> generate the stats using nutchpy ?
>
>
>