You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Chen <yi...@u.northwestern.edu> on 2017/08/02 00:21:32 UTC

parse-zip Nutch 2.x compatibility?

Dear all,

I was trying to parse .xml.gz sitemaps with Nutch 2.x, but couldn't 
build the parse-zip plugin. parse-ext, parse-swf and feed also failed to 
build. It seems to be a known issue (NUTCH-874) and is marked for 
version 2.5.

Is there a workaround to parse gunzipped files? Is the porting of these 
plugins under active development?

Thank you!

Michael


Re: parse-zip Nutch 2.x compatibility?

Posted by Michael Chen <yi...@u.northwestern.edu>.
Maybe with processGzippedXML() from Crawler-Commons? Is this possible?

Thanks,

Michael


On 08/01/2017 05:21 PM, Michael Chen wrote:
> Dear all,
>
> I was trying to parse .xml.gz sitemaps with Nutch 2.x, but couldn't 
> build the parse-zip plugin. parse-ext, parse-swf and feed also failed 
> to build. It seems to be a known issue (NUTCH-874) and is marked for 
> version 2.5.
>
> Is there a workaround to parse gunzipped files? Is the porting of 
> these plugins under active development?
>
> Thank you!
>
> Michael
>


Re: parse-zip Nutch 2.x compatibility?

Posted by Michael Chen <yi...@u.northwestern.edu>.
Maybe with processGzippedXML() from Crawler-Commons? Is this possible?

Thanks,

Michael


On 08/01/2017 05:21 PM, Michael Chen wrote:
> Dear all,
>
> I was trying to parse .xml.gz sitemaps with Nutch 2.x, but couldn't 
> build the parse-zip plugin. parse-ext, parse-swf and feed also failed 
> to build. It seems to be a known issue (NUTCH-874) and is marked for 
> version 2.5.
>
> Is there a workaround to parse gunzipped files? Is the porting of 
> these plugins under active development?
>
> Thank you!
>
> Michael
>