You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by 391772322 <39...@qq.com> on 2017/03/03 08:03:48 UTC
回复: How to avoid repeatedly upload job jars
Sebastian:
I'm sorry. It's the first time I use mail list, would you be kind to tell me how to start a new thread?
bellow is all I known of a mail list be:
send a mail to "user@nutch.apache.org".
------------------ 原始邮件 ------------------
发件人: "Sebastian Nagel";<wa...@googlemail.com>;
发送时间: 2017年3月3日(星期五) 凌晨1:33
收件人: "user"<us...@nutch.apache.org>;
主题: Re: How to avoid repeatedly upload job jars
Hi,
please, start a new thread for a new topic or question.
That will others help to find the right answer for their problem
when searching in the mailing list archive.
Thanks,
Sebastian
On 03/02/2017 11:01 AM, katta surendra babu wrote:
> Hi Sebastian,
>
>
> I am looking to work with Json related website to crawl the data of that
> website by using Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
>
>
>
> Here the problem is :
>
> for the 1st round I get the Json data into Hbase, but for second round I
> am not getting the meta data and the html links in nutch
>
>
> So, please help me out if you can ... to crawl the Json website completely.
>
>
>
> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <wa...@googlemail.com>
> wrote:
>
>> Hi,
>>
>> maybe the Hadoop Distributed Cache is what you are looking for?
>>
>> Best,
>> Sebastian
>>
>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>> archived nutch job jar has a size of about 400M, every step will upload
>> this archive and distribute to every work node. Is there away to upload
>> only nutch jar, but leave depended libs on every work node?
>>>
>>
>>
>
>
Re: 回复: How to avoid repeatedly upload job jars
Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,
you have to subscribe to the list by sending a mail to
user-subscribe@nutch.apache.org
for further information, see
http://nutch.apache.org/mailing_lists.html
Best,
Sebastian
On 03/03/2017 09:03 AM, 391772322 wrote:
> Sebastian:
>
>
> I'm sorry. It's the first time I use mail list, would you be kind to tell me how to start a new thread?
>
>
> bellow is all I known of a mail list be:
>
>
> send a mail to "user@nutch.apache.org".
>
>
> ------------------ \u539f\u59cb\u90ae\u4ef6 ------------------
> \u53d1\u4ef6\u4eba: "Sebastian Nagel";<wa...@googlemail.com>;
> \u53d1\u9001\u65f6\u95f4: 2017\u5e743\u67083\u65e5(\u661f\u671f\u4e94) \u51cc\u66681:33
> \u6536\u4ef6\u4eba: "user"<us...@nutch.apache.org>;
>
> \u4e3b\u9898: Re: How to avoid repeatedly upload job jars
>
>
>
> Hi,
>
> please, start a new thread for a new topic or question.
> That will others help to find the right answer for their problem
> when searching in the mailing list archive.
>
> Thanks,
> Sebastian
>
> On 03/02/2017 11:01 AM, katta surendra babu wrote:
>> Hi Sebastian,
>>
>>
>> I am looking to work with Json related website to crawl the data of that
>> website by using Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
>>
>>
>>
>> Here the problem is :
>>
>> for the 1st round I get the Json data into Hbase, but for second round I
>> am not getting the meta data and the html links in nutch
>>
>>
>> So, please help me out if you can ... to crawl the Json website completely.
>>
>>
>>
>> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <wa...@googlemail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> maybe the Hadoop Distributed Cache is what you are looking for?
>>>
>>> Best,
>>> Sebastian
>>>
>>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>>> archived nutch job jar has a size of about 400M, every step will upload
>>> this archive and distribute to every work node. Is there away to upload
>>> only nutch jar, but leave depended libs on every work node?
>>>>
>>>
>>>
>>
>>
>