You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by webdev1977 <we...@gmail.com> on 2012/02/13 16:48:41 UTC

Stylesheet in plugin not found when run in distributed mode

Hello All:

I am running into an interesting issue that I first thought was related to 
https://issues.apache.org/jira/browse/MAPREDUCE-967 MAPREDUCE-967  where the
crawl could not find the plugin directory because it was not unpacked
properly.  I tried the suggestions listed and for some phases of the crawl
it seems to find the plugin folder in the proper place.  But some places it
is looking for my stylesheet (in a custom plugin under a folder) in:

 {job.local.dir}\attempt_xxxxx\work\plugins\<plugin
folder>\resources\xsl\mystylesheet.xsl

I have been watching the "attempt_xxxxx" folder during job execution, and
the work folder gets created, but no "plugins" folder gets unpacked.  

Is there any way to override this setting, like what was done in
nutch-site.xml or hadoop-site.xml with the mapreduce.job.jar.unpack.pattern 
property?

--
View this message in context: http://lucene.472066.n3.nabble.com/Stylesheet-in-plugin-not-found-when-run-in-distributed-mode-tp3740629p3740629.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Stylesheet in plugin not found when run in distributed mode

Posted by webdev1977 <we...@gmail.com>.
As I suspected, based on the code changes I combed through from 1.3  to 1.4,
upgrading to 1.4 did not fix the issue.  I still can not complete solrindex. 
All other phases work fine. It still is trying to find my stylesheet in a
place that does not exist (see original post).  

Any other ideas?

--
View this message in context: http://lucene.472066.n3.nabble.com/Stylesheet-in-plugin-not-found-when-run-in-distributed-mode-tp3740629p3750072.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Stylesheet in plugin not found when run in distributed mode

Posted by webdev1977 <we...@gmail.com>.
I am wondering if this is actually a bug that has not been discovered/fixed
yet.  The problem only occurs in the solrindex phase of the crawl.  All
other cycles inject, generate, fetch, parse, invertlinks & updatedb work
fine. 

--
View this message in context: http://lucene.472066.n3.nabble.com/Stylesheet-in-plugin-not-found-when-run-in-distributed-mode-tp3740629p3747298.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Stylesheet in plugin not found when run in distributed mode

Posted by webdev1977 <we...@gmail.com>.
what if I told you that it isn't easy in any way shape or form to update my
nutch version?  Is there a patch I could apply? 

--
View this message in context: http://lucene.472066.n3.nabble.com/Stylesheet-in-plugin-not-found-when-run-in-distributed-mode-tp3740629p3740755.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Stylesheet in plugin not found when run in distributed mode

Posted by Julien Nioche <li...@gmail.com>.
This has been fixed in 1.4

On 13 February 2012 16:08, webdev1977 <we...@gmail.com> wrote:

> 1.3
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Stylesheet-in-plugin-not-found-when-run-in-distributed-mode-tp3740629p3740692.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Stylesheet in plugin not found when run in distributed mode

Posted by webdev1977 <we...@gmail.com>.
1.3

--
View this message in context: http://lucene.472066.n3.nabble.com/Stylesheet-in-plugin-not-found-when-run-in-distributed-mode-tp3740629p3740692.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Stylesheet in plugin not found when run in distributed mode

Posted by Julien Nioche <li...@gmail.com>.
which version of Nutch are you using?

On 13 February 2012 15:48, webdev1977 <we...@gmail.com> wrote:

> Hello All:
>
> I am running into an interesting issue that I first thought was related to
> https://issues.apache.org/jira/browse/MAPREDUCE-967 MAPREDUCE-967  where
> the
> crawl could not find the plugin directory because it was not unpacked
> properly.  I tried the suggestions listed and for some phases of the crawl
> it seems to find the plugin folder in the proper place.  But some places it
> is looking for my stylesheet (in a custom plugin under a folder) in:
>
>  {job.local.dir}\attempt_xxxxx\work\plugins\<plugin
> folder>\resources\xsl\mystylesheet.xsl
>
> I have been watching the "attempt_xxxxx" folder during job execution, and
> the work folder gets created, but no "plugins" folder gets unpacked.
>
> Is there any way to override this setting, like what was done in
> nutch-site.xml or hadoop-site.xml with the mapreduce.job.jar.unpack.pattern
> property?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Stylesheet-in-plugin-not-found-when-run-in-distributed-mode-tp3740629p3740629.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble