You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Renato Marroquín Mogrovejo <re...@gmail.com> on 2014/05/13 09:54:09 UTC

Re: Nutch 2.x- Hbase - Solr Configuration

Hi David,

Sorry to take so long to get back to you. Have you try this [1] by any
chance? Maybe this will give you a better idea how pieces fit together and
then you can move into putting all this into Eclipse.


Renato M.

[1] http://wiki.apache.org/nutch/Nutch2Tutorial


2014-04-23 14:35 GMT+02:00 David Philip <da...@gmail.com>:

> Hi,
>
>   I did some good web search but I hardly found any relevant suggestion to
> resolve this issue and get started. I am stuck in setting up Nutch 2.2 with
> any data base and integration with Apache Gora.
>
> Line failed:
>  DataStore<String, WebPage> store =
> StorageUtils.createWebStore(currentJob.getConfiguration(),
>       String.class, WebPage.class);
>
> Error:
> InjectorJob: java.lang.ClassNotFoundException:
> org.apache.gora.sql.store.SqlStore
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>
> Config done: Property filed of
> Eclipse:gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
> Necessary changes in ivy is also done.
>
> Where else should the changes need to made or considered as it is still
> taking sql store?
>
>
>
> I have had used Apache Nutch 1.5 and Solr 4. This was pretty straight
> forward to me. I did svn check out of the source in eclipse, created java
> project, did the necessary settings in nutch-site xml and configured solr
> with tomcat. Finally run and it was successful.
>
>
> However with Nutch 2.2, I am unable to move forward.  I am trying to doing
> set up and run source on eclipse.
>
> I did the svn check out of source and configuration required to do with
> Apache gora properties file and nutch-site. I think I am missing something
> in configuration, so is it failing.
> One thing, Should I do Hbase installation by any chance? Should I need to
> have hadoop running for this? [Can you please point me to link on how to do
> this should be done with Apache Nutch's hadoop and hbase built on it?  - I
> am not clear]
> Should I do Apache Gora download separately and follow any specific
> installation other than the configuration of setting properties tat is
> mentioned?
>
> Thanks - David
>
>
>
>
>
>
>
> On Tue, Apr 22, 2014 at 6:51 PM, David Philip
> <da...@gmail.com>wrote:
>
> > Hi Renato,
> >
> >   Yes running from eclipse.  This is the path of the file and workspace
> of
> > eclipse.
> > home/David/Nutch2.2_WorkSpace/Nutch/conf/gora.properties
> >
> > Here is what I modified or rather added this line to
> >
> gora.properties:gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
> >
> > Thank you.
> >
> > David.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Apr 22, 2014 at 5:50 PM, Renato Marroquín Mogrovejo <
> > renatoj.marroquin@gmail.com> wrote:
> >
> >> Hi David,
> >>
> >> So where are you running this from? command-line? or eclipse? I think
> your
> >> classpath is missing the necessary files.
> >> Are you still getting the same exception as before? like if the changes
> >> you
> >> did took no where? This is probably because the gora.properties file
> being
> >> picked up inside Eclipse is not the same you have modified.
> >>
> >>
> >> Renato M.
> >>
> >>
> >>
> >> 2014-04-22 14:16 GMT+02:00 David Philip <da...@gmail.com>:
> >>
> >> > Hi Alparslan,
> >> >
> >> >   Thank you for the links. I am browsing through them to see what
> >> > configuration is missed out that is leading to the rise of this
> >> exception.
> >> >
> >> >
> >> > As for what ever you mentioned expecting the reason for exception, I
> >> have
> >> > had done everything, i.e,
> >> > 1. You should uncomment the suitable Gora artifact lines at the end of
> >> > [NUTCH_HOME]/conf/ivy.xml file.
> >> > 2. Update the "gora.datastore.default" property in
> >> > [NUTCH_HOME]/conf/gora.properties
> >> >
> >> >
> >> > Since these steps are clearly mentioned in the wiki page I was
> referring
> >> > too[1], it was done.
> >> > So as I said, I have followed bit by bit, every configuration step
> >> > mentioned in this link and after that is the error that I am getting.
> >> >
> >> > Thanks - David
> >> > [1]] https://wiki.apache.org/nutch/RunNutchInEclipse
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Apr 22, 2014 at 4:21 PM, Alparslan Avcı
> >> > <al...@agmlab.com>wrote:
> >> >
> >> > > Hi David,
> >> > >
> >> > > Welcome to Apache Nutch Community :)
> >> > >
> >> > >
> >> > > You can use other wiki pages [0] for detailed information of Nutch
> 2.x
> >> > > crawling. And also for the sample configuration files, you can use
> >> this
> >> > > link [1].
> >> > >
> >> > > For the exception, it is probably arised because of the Gora
> >> > > configuration. You should uncomment the suitable Gora artifact lines
> >> at
> >> > the
> >> > > end of [NUTCH_HOME]/conf/ivy.xml file. For example, if you want to
> use
> >> > > HBase as your database; you should uncomment the lines below:
> >> > >
> >> > > <dependency org="org.apache.gora" name="gora-core" rev="0.3"
> >> > > conf="*->default"/>
> >> > > <dependency org="org.apache.gora" name="gora-hbase" rev="0.3"
> >> > > conf="*->default"/>
> >> > >
> >> > >
> >> > > Moreover, you also should update the "gora.datastore.default"
> >> property in
> >> > > [NUTCH_HOME]/conf/gora.properties file according to your database.
> For
> >> > > instace; if you use Hbase, than you should add this line:
> >> > >
> >> > > gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
> >> > >
> >> > >
> >> > > Please feel free to ask about your future problems to this mailing
> >> list.
> >> > > We will be glad if we can help.
> >> > >
> >> > > Thanks,
> >> > > Alparslan
> >> > >
> >> > >
> >> > >
> >> > > [0] https://wiki.apache.org/nutch/Nutch2Crawling
> >> > > [1] https://wiki.apache.org/nutch/NutchConfigurationFiles-2.x
> >> > >
> >> > >
> >> > >
> >> > > On 22-04-2014 13:09, David Philip wrote:
> >> > >
> >> > >> Hi,
> >> > >>
> >> > >>    Can you please link me to a well documented blog that explains
> >> about
> >> > >> setting up Apache Nutch 2.2 end to end. Crawling - moving data to
> any
> >> > >> database  and finally to searching in Solr. [Configuration is pain]
> >> > >>
> >> > >> This link[1] documented by Thejas is good and well explained. [
> Thank
> >> > >> you].
> >> > >> However, even after following the steps mentioned in this bit by
> bit,
> >> > >> there
> >> > >> is error while running the first "nutch injector job". Error is
> >> > mentioned
> >> > >> below. I see some discussion about this error on mailing list but
> >> none
> >> > >> explains the fix. I am plainly trying to have the default setup. No
> >> > >> specific database. [So Hbase and Gora is ok.] But should I do any
> >> > >>  specific
> >> > >> configuration for it outside eclipse other than what is mentioned
> on
> >> the
> >> > >> link? I don't see that I have missed any steps. Please correct me.
> >> Also
> >> > I
> >> > >> am new to all the technologies here, so if I had to configure
> >> anything.
> >> > >> point me to that.
> >> > >>
> >> > >>
> >> > >> I was looking for any blog that may explain  [otherwise
> >> redirect]about
> >> > >> setting up default data base, may be hbase - gora. And changes that
> >> is
> >> > >> needed to be made to solr so that the index job does not fail.
> >> > >>
> >> > >>
> >> > >> Thanks - David
> >> > >>
> >> > >> [1] https://wiki.apache.org/nutch/RunNutchInEclipse
> >> > >>
> >> > >>
> >> > >> 2014-04-22 15:29:39,797 INFO  crawl.InjectorJob
> >> > >> (InjectorJob.java:inject(249)) - InjectorJob: starting at
> 2014-04-22
> >> > >> 15:29:39
> >> > >> 2014-04-22 15:29:39,799 INFO  crawl.InjectorJob
> >> > >> (InjectorJob.java:inject(250)) - InjectorJob: Injecting urlDir:
> >> > >> /home/David/ApacheNutch/apache-nutch-1.8/URLS
> >> > >> 2014-04-22 15:29:40,162 ERROR crawl.InjectorJob
> >> > >> (InjectorJob.java:run(276))
> >> > >> - InjectorJob: java.lang.ClassNotFoundException:
> >> > >> org.apache.gora.sql.store.SqlStore
> >> > >>      at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
> >> > >>      at java.security.AccessController.doPrivileged(Native Method)
> >> > >>      at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
> >> > >>      at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
> >> > >>      at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
> >> > >>      at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
> >> > >>      at java.lang.Class.forName0(Native Method)
> >> > >>      at java.lang.Class.forName(Class.java:190)
> >> > >>      at
> >> > >> org.apache.nutch.storage.StorageUtils.getDataStoreClass(
> >> > >> StorageUtils.java:90)
> >> > >>      at
> >> > >> org.apache.nutch.storage.StorageUtils.createWebStore(
> >> > >> StorageUtils.java:74)
> >> > >>      at
> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
> >> > >>      at
> >> org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
> >> > >>      at
> org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
> >> > >>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >> > >>      at
> org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)
> >> > >>
> >> > >>
> >> > >
> >> >
> >>
> >
> >
>