You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Drulea, Sherban" <sd...@rand.org> on 2015/10/01 02:23:54 UTC

Re: Nutch with MongoDB

Nice job Muhamad. I¹m trying to do the same thing.

What version of SOLR and nutch are you using? Did you use the nutch
schema.xml in SOLR?

What plugins did you enable in nutch-site.xml?


On 9/30/15, 6:46 AM, "Muhamad Muchlis" <tr...@gmail.com> wrote:

>Good news. I did it.
>On Sep 30, 2015 16:02, "Muhamad Muchlis" <tr...@gmail.com> wrote:
>
>> And this is another error when i try run again
>>
>> 2015-09-30 15:59:48,061 INFO  crawl.InjectorJob - InjectorJob: starting
>>at
>> 2015-09-30 15:59:48
>> 2015-09-30 15:59:48,061 INFO  crawl.InjectorJob - InjectorJob: Injecting
>> urlDir: urls
>> 2015-09-30 15:59:49,601 INFO  crawl.InjectorJob - InjectorJob: Using
>>class
>> org.apache.gora.mongodb.store.MongoStore as the Gora storage class.
>> 2015-09-30 15:59:49,681 WARN  util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes
>>where
>> applicable
>> 2015-09-30 15:59:49,682 ERROR security.UserGroupInformation -
>> PriviledgedActionException as:creoactive cause:java.io.IOException:
>>Failed
>> to set permissions of path:
>> \tmp\hadoop-creoactive\mapred\staging\user795133740\.staging to 0700
>> 2015-09-30 15:59:49,696 ERROR crawl.InjectorJob - InjectorJob:
>> java.io.IOException: Failed to set permissions of path:
>> \tmp\hadoop-user\mapred\staging\user795133740\.staging to 0700
>>     at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
>>     at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
>>     at
>> 
>>org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.
>>java:514)
>>     at
>> 
>>org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:34
>>9)
>>     at
>> org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
>>     at
>> 
>>org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissio
>>nFiles.java:126)
>>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
>>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>     at
>> 
>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation
>>.java:1190)
>>     at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
>>     at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
>>     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
>>     at 
>>org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50)
>>     at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:231)
>>     at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
>>     at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>     at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
>>
>>
>>
>>
>>
>> On Wed, Sep 30, 2015 at 3:42 PM, Muhamad Muchlis <tr...@gmail.com>
>> wrote:
>>
>>> Thank You Alexis,
>>>
>>> I'm trying tutorial on this blog
>>>
>>>
>>> 
>>>http://www.aossama.com/search-engine-with-apache-nutch-mongodb-and-elast
>>>icsearch/
>>>
>>> But there is an error
>>>
>>> 2015-09-30 15: 30: 52.231 FATAL conf.Configuration - error parsing conf
>>> file: org.xml.sax.SAXParseException; systemId:
>>>   file: / C:
>>> /cygwin64/home/user/apache-nutch-2.3/runtime/local/conf/nutch-site.xml;
>>> linenumber: 1;
>>> columnNumber: 2; The markup in the document preceding the root element
>>> must be well-formed.
>>>
>>>
>>>
>>>
>>> On Wed, Sep 30, 2015 at 2:51 PM, Alexis Hope <ba...@gmail.com>
>>> wrote:
>>>
>>>> I dont have any tutorials written up but I use Nutch with Mongo.
>>>> Have you used mongo or nutch before?
>>>> I can send you my gora config and nutch-site config and you should be
>>>> good
>>>> to go.
>>>>
>>>> On Wed, Sep 30, 2015 at 9:02 AM, Muhamad Muchlis <tr...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > Anyone have a tutorial Nutch with MongoDB ?.
>>>> >
>>>> > Thanks.
>>>> >
>>>>
>>>
>>>
>>


__________________________________________________________________________

This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies
of the original message.


Re: Nutch with MongoDB

Posted by Muhamad Muchlis <tr...@gmail.com>.
Hi Drulea,

I use elastic search & kibana

https://www.elastic.co/products/elasticsearch
https://www.elastic.co/products/kibana



On Thu, Oct 1, 2015 at 7:23 AM, Drulea, Sherban <sd...@rand.org> wrote:

> Nice job Muhamad. I¹m trying to do the same thing.
>
> What version of SOLR and nutch are you using? Did you use the nutch
> schema.xml in SOLR?
>
> What plugins did you enable in nutch-site.xml?
>
>
> On 9/30/15, 6:46 AM, "Muhamad Muchlis" <tr...@gmail.com> wrote:
>
> >Good news. I did it.
> >On Sep 30, 2015 16:02, "Muhamad Muchlis" <tr...@gmail.com> wrote:
> >
> >> And this is another error when i try run again
> >>
> >> 2015-09-30 15:59:48,061 INFO  crawl.InjectorJob - InjectorJob: starting
> >>at
> >> 2015-09-30 15:59:48
> >> 2015-09-30 15:59:48,061 INFO  crawl.InjectorJob - InjectorJob: Injecting
> >> urlDir: urls
> >> 2015-09-30 15:59:49,601 INFO  crawl.InjectorJob - InjectorJob: Using
> >>class
> >> org.apache.gora.mongodb.store.MongoStore as the Gora storage class.
> >> 2015-09-30 15:59:49,681 WARN  util.NativeCodeLoader - Unable to load
> >> native-hadoop library for your platform... using builtin-java classes
> >>where
> >> applicable
> >> 2015-09-30 15:59:49,682 ERROR security.UserGroupInformation -
> >> PriviledgedActionException as:creoactive cause:java.io.IOException:
> >>Failed
> >> to set permissions of path:
> >> \tmp\hadoop-creoactive\mapred\staging\user795133740\.staging to 0700
> >> 2015-09-30 15:59:49,696 ERROR crawl.InjectorJob - InjectorJob:
> >> java.io.IOException: Failed to set permissions of path:
> >> \tmp\hadoop-user\mapred\staging\user795133740\.staging to 0700
> >>     at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
> >>     at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
> >>     at
> >>
> >>org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.
> >>java:514)
> >>     at
> >>
> >>org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:34
> >>9)
> >>     at
> >> org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
> >>     at
> >>
> >>org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissio
> >>nFiles.java:126)
> >>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
> >>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
> >>     at java.security.AccessController.doPrivileged(Native Method)
> >>     at javax.security.auth.Subject.doAs(Subject.java:422)
> >>     at
> >>
> >>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation
> >>.java:1190)
> >>     at
> >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
> >>     at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
> >>     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
> >>     at
> >>org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50)
> >>     at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:231)
> >>     at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
> >>     at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
> >>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >>     at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Sep 30, 2015 at 3:42 PM, Muhamad Muchlis <tr...@gmail.com>
> >> wrote:
> >>
> >>> Thank You Alexis,
> >>>
> >>> I'm trying tutorial on this blog
> >>>
> >>>
> >>>
> >>>
> http://www.aossama.com/search-engine-with-apache-nutch-mongodb-and-elast
> >>>icsearch/
> >>>
> >>> But there is an error
> >>>
> >>> 2015-09-30 15: 30: 52.231 FATAL conf.Configuration - error parsing
> conf
> >>> file: org.xml.sax.SAXParseException; systemId:
> >>>   file: / C:
> >>> /cygwin64/home/user/apache-nutch-2.3/runtime/local/conf/nutch-site.xml;
> >>> linenumber: 1;
> >>> columnNumber: 2; The markup in the document preceding the root element
> >>> must be well-formed.
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Sep 30, 2015 at 2:51 PM, Alexis Hope <ba...@gmail.com>
> >>> wrote:
> >>>
> >>>> I dont have any tutorials written up but I use Nutch with Mongo.
> >>>> Have you used mongo or nutch before?
> >>>> I can send you my gora config and nutch-site config and you should be
> >>>> good
> >>>> to go.
> >>>>
> >>>> On Wed, Sep 30, 2015 at 9:02 AM, Muhamad Muchlis <tr...@gmail.com>
> >>>> wrote:
> >>>>
> >>>> > Hi,
> >>>> >
> >>>> > Anyone have a tutorial Nutch with MongoDB ?.
> >>>> >
> >>>> > Thanks.
> >>>> >
> >>>>
> >>>
> >>>
> >>
>
>
> __________________________________________________________________________
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information. Any unauthorized review, use,
> disclosure or distribution is prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
>
>