You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Chen <yi...@u.northwestern.edu> on 2017/08/02 00:21:32 UTC
parse-zip Nutch 2.x compatibility?
Dear all,
I was trying to parse .xml.gz sitemaps with Nutch 2.x, but couldn't
build the parse-zip plugin. parse-ext, parse-swf and feed also failed to
build. It seems to be a known issue (NUTCH-874) and is marked for
version 2.5.
Is there a workaround to parse gunzipped files? Is the porting of these
plugins under active development?
Thank you!
Michael
Re: parse-zip Nutch 2.x compatibility?
Posted by Michael Chen <yi...@u.northwestern.edu>.
Maybe with processGzippedXML() from Crawler-Commons? Is this possible?
Thanks,
Michael
On 08/01/2017 05:21 PM, Michael Chen wrote:
> Dear all,
>
> I was trying to parse .xml.gz sitemaps with Nutch 2.x, but couldn't
> build the parse-zip plugin. parse-ext, parse-swf and feed also failed
> to build. It seems to be a known issue (NUTCH-874) and is marked for
> version 2.5.
>
> Is there a workaround to parse gunzipped files? Is the porting of
> these plugins under active development?
>
> Thank you!
>
> Michael
>
Re: parse-zip Nutch 2.x compatibility?
Posted by Michael Chen <yi...@u.northwestern.edu>.
Maybe with processGzippedXML() from Crawler-Commons? Is this possible?
Thanks,
Michael
On 08/01/2017 05:21 PM, Michael Chen wrote:
> Dear all,
>
> I was trying to parse .xml.gz sitemaps with Nutch 2.x, but couldn't
> build the parse-zip plugin. parse-ext, parse-swf and feed also failed
> to build. It seems to be a known issue (NUTCH-874) and is marked for
> version 2.5.
>
> Is there a workaround to parse gunzipped files? Is the porting of
> these plugins under active development?
>
> Thank you!
>
> Michael
>