You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Mohamed Parvez <pa...@gmail.com> on 2009/09/11 20:14:53 UTC

Error Parsing JavaScript

I am getting this error :
--------------------------------
fetching
http://business.verizon.net/SMBPortalWeb/resources/js/helpSupport.js
Error parsing:
http://business.verizon.net/SMBPortalWeb/resources/js/helpSupport.js: *
UNKNOWN!(-53,0):* Content not JavaScript: 'application/javascript'


I have this, In the file parse-plugins.xml :
---------------------------------------------------------
    <mimeType name="application/x-javascript">
        <plugin id="parse-js" />
    </mimeType>

    <mimeType name="application/javascript">
        <plugin id="parse-js" />
    </mimeType>


I have this, in the nutch-site.xml :
------------------------------------------------
<name>plugin.includes</name>
<value>field-add|protocol-http|urlfilter-regex|parse-(text|html|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|parse-js|suffix-urlfilter</value>
</property>

I am using the command :
-------------------------------------
bin/nutch crawl urls -depth 10 >crawl.log


I am using this in the urls/seed.txt :
---------------------------------------------------
http://business.verizon.net/SMBPortalWeb/appmanager/SMBPortal/smb?_nfpb=true&_pageLabel=SMBPortal_page_main_support

Thanks/Regards,
Parvez

Re: Error Parsing JavaScript

Posted by Mohamed Parvez <pa...@gmail.com>.
This error comes up when the JavaScript is minified. Minification is just a
simple process, where spaces are removed to make the JS small.

The parse-js plugin has no issue parsing a any JavaScript, but if the, same
JavaScript, has its spaces removed, Nutch fails with the said error.

Looks like it should be a simple fix.

Thanks/Regards,
Parvez



On Fri, Sep 11, 2009 at 1:14 PM, Mohamed Parvez <pa...@gmail.com> wrote:

> I am getting this error :
> --------------------------------
> fetching
> http://business.verizon.net/SMBPortalWeb/resources/js/helpSupport.js
> Error parsing:
> http://business.verizon.net/SMBPortalWeb/resources/js/helpSupport.js: *
> UNKNOWN!(-53,0):* Content not JavaScript: 'application/javascript'
>
>
> I have this, In the file parse-plugins.xml :
> ---------------------------------------------------------
>     <mimeType name="application/x-javascript">
>         <plugin id="parse-js" />
>     </mimeType>
>
>     <mimeType name="application/javascript">
>         <plugin id="parse-js" />
>     </mimeType>
>
>
> I have this, in the nutch-site.xml :
> ------------------------------------------------
> <name>plugin.includes</name>
>
> <value>field-add|protocol-http|urlfilter-regex|parse-(text|html|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|parse-js|suffix-urlfilter</value>
> </property>
>
> I am using the command :
> -------------------------------------
> bin/nutch crawl urls -depth 10 >crawl.log
>
>
> I am using this in the urls/seed.txt :
> ---------------------------------------------------
>
> http://business.verizon.net/SMBPortalWeb/appmanager/SMBPortal/smb?_nfpb=true&_pageLabel=SMBPortal_page_main_support
>
> Thanks/Regards,
> Parvez
>
>