You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by 391772322 <39...@qq.com> on 2017/03/03 08:03:48 UTC

回复: How to avoid repeatedly upload job jars

Sebastian:


I'm sorry. It's the first time I use mail list,  would you be kind to tell me how to start a new thread?


bellow is all  I known of a mail list be:


send a mail to "user@nutch.apache.org".


------------------ 原始邮件 ------------------
发件人: "Sebastian Nagel";<wa...@googlemail.com>;
发送时间: 2017年3月3日(星期五) 凌晨1:33
收件人: "user"<us...@nutch.apache.org>; 

主题: Re: How to avoid repeatedly upload job jars



Hi,

please, start a new thread for a new topic or question.
That will others help to find the right answer for their problem
when searching in the mailing list archive.

Thanks,
Sebastian

On 03/02/2017 11:01 AM, katta surendra babu wrote:
> Hi Sebastian,
> 
> 
>  I am looking  to work with  Json related website to crawl the data of that
> website  by using  Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
> 
> 
> 
> Here the problem is :
> 
>  for the 1st round I get the Json data into Hbase, but for second round  I
> am not getting the meta data and the html links in nutch
> 
> 
> So, please help me out if you  can ... to crawl the Json website completely.
> 
> 
> 
> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <wa...@googlemail.com>
> wrote:
> 
>> Hi,
>>
>> maybe the Hadoop Distributed Cache is what you are looking for?
>>
>> Best,
>> Sebastian
>>
>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>> archived nutch job jar has a size of about 400M, every step will upload
>> this archive and distribute to every work node. Is there away to upload
>> only nutch jar, but leave depended libs on every work node?
>>>
>>
>>
> 
>

Re: 回复: How to avoid repeatedly upload job jars

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,

you have to subscribe to the list by sending a mail to
   user-subscribe@nutch.apache.org
for further information, see
   http://nutch.apache.org/mailing_lists.html

Best,
Sebastian

On 03/03/2017 09:03 AM, 391772322 wrote:
> Sebastian:
> 
> 
> I'm sorry. It's the first time I use mail list,  would you be kind to tell me how to start a new thread?
> 
> 
> bellow is all  I known of a mail list be:
> 
> 
> send a mail to "user@nutch.apache.org".
> 
> 
> ------------------ \u539f\u59cb\u90ae\u4ef6 ------------------
> \u53d1\u4ef6\u4eba: "Sebastian Nagel";<wa...@googlemail.com>;
> \u53d1\u9001\u65f6\u95f4: 2017\u5e743\u67083\u65e5(\u661f\u671f\u4e94) \u51cc\u66681:33
> \u6536\u4ef6\u4eba: "user"<us...@nutch.apache.org>; 
> 
> \u4e3b\u9898: Re: How to avoid repeatedly upload job jars
> 
> 
> 
> Hi,
> 
> please, start a new thread for a new topic or question.
> That will others help to find the right answer for their problem
> when searching in the mailing list archive.
> 
> Thanks,
> Sebastian
> 
> On 03/02/2017 11:01 AM, katta surendra babu wrote:
>> Hi Sebastian,
>>
>>
>>  I am looking  to work with  Json related website to crawl the data of that
>> website  by using  Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
>>
>>
>>
>> Here the problem is :
>>
>>  for the 1st round I get the Json data into Hbase, but for second round  I
>> am not getting the meta data and the html links in nutch
>>
>>
>> So, please help me out if you  can ... to crawl the Json website completely.
>>
>>
>>
>> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <wa...@googlemail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> maybe the Hadoop Distributed Cache is what you are looking for?
>>>
>>> Best,
>>> Sebastian
>>>
>>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>>> archived nutch job jar has a size of about 400M, every step will upload
>>> this archive and distribute to every work node. Is there away to upload
>>> only nutch jar, but leave depended libs on every work node?
>>>>
>>>
>>>
>>
>>
>