You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by 391772322 <39...@qq.com> on 2017/03/02 00:35:17 UTC
How to avoid repeatedly upload job jars
archived nutch job jar has a size of about 400M, every step will upload this archive and distribute to every work node. Is there away to upload only nutch jar, but leave depended libs on every work node?
Re: 回复: How to avoid repeatedly upload job jars
Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,
you have to subscribe to the list by sending a mail to
user-subscribe@nutch.apache.org
for further information, see
http://nutch.apache.org/mailing_lists.html
Best,
Sebastian
On 03/03/2017 09:03 AM, 391772322 wrote:
> Sebastian:
>
>
> I'm sorry. It's the first time I use mail list, would you be kind to tell me how to start a new thread?
>
>
> bellow is all I known of a mail list be:
>
>
> send a mail to "user@nutch.apache.org".
>
>
> ------------------ \u539f\u59cb\u90ae\u4ef6 ------------------
> \u53d1\u4ef6\u4eba: "Sebastian Nagel";<wa...@googlemail.com>;
> \u53d1\u9001\u65f6\u95f4: 2017\u5e743\u67083\u65e5(\u661f\u671f\u4e94) \u51cc\u66681:33
> \u6536\u4ef6\u4eba: "user"<us...@nutch.apache.org>;
>
> \u4e3b\u9898: Re: How to avoid repeatedly upload job jars
>
>
>
> Hi,
>
> please, start a new thread for a new topic or question.
> That will others help to find the right answer for their problem
> when searching in the mailing list archive.
>
> Thanks,
> Sebastian
>
> On 03/02/2017 11:01 AM, katta surendra babu wrote:
>> Hi Sebastian,
>>
>>
>> I am looking to work with Json related website to crawl the data of that
>> website by using Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
>>
>>
>>
>> Here the problem is :
>>
>> for the 1st round I get the Json data into Hbase, but for second round I
>> am not getting the meta data and the html links in nutch
>>
>>
>> So, please help me out if you can ... to crawl the Json website completely.
>>
>>
>>
>> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <wa...@googlemail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> maybe the Hadoop Distributed Cache is what you are looking for?
>>>
>>> Best,
>>> Sebastian
>>>
>>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>>> archived nutch job jar has a size of about 400M, every step will upload
>>> this archive and distribute to every work node. Is there away to upload
>>> only nutch jar, but leave depended libs on every work node?
>>>>
>>>
>>>
>>
>>
>
回复: How to avoid repeatedly upload job jars
Posted by 391772322 <39...@qq.com>.
Sebastian:
I'm sorry. It's the first time I use mail list, would you be kind to tell me how to start a new thread?
bellow is all I known of a mail list be:
send a mail to "user@nutch.apache.org".
------------------ 原始邮件 ------------------
发件人: "Sebastian Nagel";<wa...@googlemail.com>;
发送时间: 2017年3月3日(星期五) 凌晨1:33
收件人: "user"<us...@nutch.apache.org>;
主题: Re: How to avoid repeatedly upload job jars
Hi,
please, start a new thread for a new topic or question.
That will others help to find the right answer for their problem
when searching in the mailing list archive.
Thanks,
Sebastian
On 03/02/2017 11:01 AM, katta surendra babu wrote:
> Hi Sebastian,
>
>
> I am looking to work with Json related website to crawl the data of that
> website by using Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
>
>
>
> Here the problem is :
>
> for the 1st round I get the Json data into Hbase, but for second round I
> am not getting the meta data and the html links in nutch
>
>
> So, please help me out if you can ... to crawl the Json website completely.
>
>
>
> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <wa...@googlemail.com>
> wrote:
>
>> Hi,
>>
>> maybe the Hadoop Distributed Cache is what you are looking for?
>>
>> Best,
>> Sebastian
>>
>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>> archived nutch job jar has a size of about 400M, every step will upload
>> this archive and distribute to every work node. Is there away to upload
>> only nutch jar, but leave depended libs on every work node?
>>>
>>
>>
>
>
Re: How to avoid repeatedly upload job jars
Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,
please, start a new thread for a new topic or question.
That will others help to find the right answer for their problem
when searching in the mailing list archive.
Thanks,
Sebastian
On 03/02/2017 11:01 AM, katta surendra babu wrote:
> Hi Sebastian,
>
>
> I am looking to work with Json related website to crawl the data of that
> website by using Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
>
>
>
> Here the problem is :
>
> for the 1st round I get the Json data into Hbase, but for second round I
> am not getting the meta data and the html links in nutch
>
>
> So, please help me out if you can ... to crawl the Json website completely.
>
>
>
> On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <wa...@googlemail.com>
> wrote:
>
>> Hi,
>>
>> maybe the Hadoop Distributed Cache is what you are looking for?
>>
>> Best,
>> Sebastian
>>
>> On 03/02/2017 01:35 AM, 391772322 wrote:
>>> archived nutch job jar has a size of about 400M, every step will upload
>> this archive and distribute to every work node. Is there away to upload
>> only nutch jar, but leave depended libs on every work node?
>>>
>>
>>
>
>
Re: How to avoid repeatedly upload job jars
Posted by katta surendra babu <ka...@gmail.com>.
Hi Sebastian,
I am looking to work with Json related website to crawl the data of that
website by using Nutch 2.3.1 , Hbase0.98 , Solr5.6 .
Here the problem is :
for the 1st round I get the Json data into Hbase, but for second round I
am not getting the meta data and the html links in nutch
So, please help me out if you can ... to crawl the Json website completely.
On Thu, Mar 2, 2017 at 3:21 PM, Sebastian Nagel <wa...@googlemail.com>
wrote:
> Hi,
>
> maybe the Hadoop Distributed Cache is what you are looking for?
>
> Best,
> Sebastian
>
> On 03/02/2017 01:35 AM, 391772322 wrote:
> > archived nutch job jar has a size of about 400M, every step will upload
> this archive and distribute to every work node. Is there away to upload
> only nutch jar, but leave depended libs on every work node?
> >
>
>
--
Thanks & Regards
Surendra Babu Katta
8886747555
Re: How to avoid repeatedly upload job jars
Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,
maybe the Hadoop Distributed Cache is what you are looking for?
Best,
Sebastian
On 03/02/2017 01:35 AM, 391772322 wrote:
> archived nutch job jar has a size of about 400M, every step will upload this archive and distribute to every work node. Is there away to upload only nutch jar, but leave depended libs on every work node?
>