You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by 391772322 <39...@qq.com> on 2017/03/02 00:35:17 UTC

How to avoid repeatedly upload job jars

archived nutch job jar has a size of about 400M, every step will upload this archive and distribute to every work node. Is there away to upload only nutch jar, but leave depended libs on every work node?

Re: 回复： How to avoid repeatedly upload job jars

Posted by Sebastian Nagel <wa...@googlemail.com>.

Hi,

you have to subscribe to the list by sending a mail to
   user-subscribe@nutch.apache.org
for further information, see
   http://nutch.apache.org/mailing_lists.html

Best,
Sebastian

On 03/03/2017 09:03 AM, 391772322 wrote:
> Sebastian:
> 
> 
> I'm sorry. It's the first time I use mail list,  would you be kind to tell me how to start a new thread?
> 
> 
> bellow is all  I known of a mail list be:
> 
> 
> send a mail to "user@nutch.apache.org".
> 
> 
> ------------------ \u539f\u59cb\u90ae\u4ef6 ------------------
> \u53d1\u4ef6\u4eba: "Sebastian Nagel";<wa...@googlemail.com>;
> \u53d1\u9001\u65f6\u95f4: 2017\u5e743\u67083\u65e5(\u661f\u671f\u4e94) \u51cc\u66681:33
> \u6536\u4ef6\u4eba: "user"<us...@nutch.apache.org>; 
> 
> \u4e3b\u9898: Re: How to avoid repeatedly upload job jars
> 
> 
> 
> Hi,
> 
> please, start a new thread for a new topic or question.
> That will others help to find the right answer for their problem
> when searching in the mailing list archive.
> 
> Thanks,
> Sebastian
> 
> On 03/02/2017 11:01 AM, katta surendra babu wrote:
>> Hi Sebastian,
>>
>>
>>  I am looking  to work with  Json related website to crawl the data of that
>> website  by using  Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
>>
>>
>>
>> Here the problem is :
>>
>>  for the 1st round I get the Json data into Hbase, but for second round  I
>> am not getting the meta data and the html links in nutch
>>
>>
>> So, please help me out if you  can ... to crawl the Json website completely.
>>
>>
>>
>> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <wa...@googlemail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> maybe the Hadoop Distributed Cache is what you are looking for?
>>>
>>> Best,
>>> Sebastian
>>>
>>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>>> archived nutch job jar has a size of about 400M, every step will upload
>>> this archive and distribute to every work node. Is there away to upload
>>> only nutch jar, but leave depended libs on every work node?
>>>>
>>>
>>>
>>
>>
>

回复： How to avoid repeatedly upload job jars

Posted by 391772322 <39...@qq.com>.

Sebastian:


I'm sorry. It's the first time I use mail list,  would you be kind to tell me how to start a new thread?


bellow is all  I known of a mail list be:


send a mail to "user@nutch.apache.org".


------------------ 原始邮件 ------------------
发件人: "Sebastian Nagel";<wa...@googlemail.com>;
发送时间: 2017年3月3日(星期五) 凌晨1:33
收件人: "user"<us...@nutch.apache.org>; 

主题: Re: How to avoid repeatedly upload job jars



Hi,

please, start a new thread for a new topic or question.
That will others help to find the right answer for their problem
when searching in the mailing list archive.

Thanks,
Sebastian

On 03/02/2017 11:01 AM, katta surendra babu wrote:
> Hi Sebastian,
> 
> 
>  I am looking  to work with  Json related website to crawl the data of that
> website  by using  Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
> 
> 
> 
> Here the problem is :
> 
>  for the 1st round I get the Json data into Hbase, but for second round  I
> am not getting the meta data and the html links in nutch
> 
> 
> So, please help me out if you  can ... to crawl the Json website completely.
> 
> 
> 
> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <wa...@googlemail.com>
> wrote:
> 
>> Hi,
>>
>> maybe the Hadoop Distributed Cache is what you are looking for?
>>
>> Best,
>> Sebastian
>>
>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>> archived nutch job jar has a size of about 400M, every step will upload
>> this archive and distribute to every work node. Is there away to upload
>> only nutch jar, but leave depended libs on every work node?
>>>
>>
>>
> 
>

Re: How to avoid repeatedly upload job jars

Posted by Sebastian Nagel <wa...@googlemail.com>.

Hi,

please, start a new thread for a new topic or question.
That will others help to find the right answer for their problem
when searching in the mailing list archive.

Thanks,
Sebastian

On 03/02/2017 11:01 AM, katta surendra babu wrote:
> Hi Sebastian,
> 
> 
>  I am looking  to work with  Json related website to crawl the data of that
> website  by using  Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
> 
> 
> 
> Here the problem is :
> 
>  for the 1st round I get the Json data into Hbase, but for second round  I
> am not getting the meta data and the html links in nutch
> 
> 
> So, please help me out if you  can ... to crawl the Json website completely.
> 
> 
> 
> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <wa...@googlemail.com>
> wrote:
> 
>> Hi,
>>
>> maybe the Hadoop Distributed Cache is what you are looking for?
>>
>> Best,
>> Sebastian
>>
>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>> archived nutch job jar has a size of about 400M, every step will upload
>> this archive and distribute to every work node. Is there away to upload
>> only nutch jar, but leave depended libs on every work node?
>>>
>>
>>
> 
>

Re: How to avoid repeatedly upload job jars

Posted by katta surendra babu <ka...@gmail.com>.

Hi Sebastian,

 I am looking  to work with  Json related website to crawl the data of that
website  by using  Nutch 2.3.1 , Hbase0.98 , Solr5.6 .

Here the problem is :

 for the 1st round I get the Json data into Hbase, but for second round  I
am not getting the meta data and the html links in nutch

So, please help me out if you  can ... to crawl the Json website completely.

On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <wa...@googlemail.com>
wrote:

> Hi,
>
> maybe the Hadoop Distributed Cache is what you are looking for?
>
> Best,
> Sebastian
>
> On 03/02/2017 01:35 AM, 391772322 wrote:
> > archived nutch job jar has a size of about 400M, every step will upload
> this archive and distribute to every work node. Is there away to upload
> only nutch jar, but leave depended libs on every work node?
> >
>
>

-- 
Thanks & Regards
Surendra Babu Katta
8886747555

Re: How to avoid repeatedly upload job jars

Posted by Sebastian Nagel <wa...@googlemail.com>.

Hi,

maybe the Hadoop Distributed Cache is what you are looking for?

Best,
Sebastian

On 03/02/2017 01:35 AM, 391772322 wrote:
> archived nutch job jar has a size of about 400M, every step will upload this archive and distribute to every work node. Is there away to upload only nutch jar, but leave depended libs on every work node?
>