You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Tolga <to...@ozses.net> on 2012/09/04 15:27:25 UTC

Crawl errors

Hi,

I set up my Solr, and when I tried to crawl my website, it gave me

[mtozses@atlas bin]$ time ./nutch crawl urls -dir crawl-$(date 
+%FT%H-%M-%S) -solr http://localhost:8983/solr/ -depth 5 -topN 5
Exception in thread "main" org.apache.gora.util.GoraException: 
java.io.IOException: java.sql.SQLTransientConnectionException: 
java.net.ConnectException: Connection refused
     at 
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
     at 
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
     at 
org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:69)
     at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
     at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
     at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
     at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
     at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
Caused by: java.io.IOException: 
java.sql.SQLTransientConnectionException: java.net.ConnectException: 
Connection refused
     at org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:747)
     at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:160)
     at 
org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
     at 
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
     ... 8 more
Caused by: java.sql.SQLTransientConnectionException: 
java.net.ConnectException: Connection refused
     at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
     at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
     at org.hsqldb.jdbc.JDBCConnection.<init>(Unknown Source)
     at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source)
     at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source)
     at java.sql.DriverManager.getConnection(DriverManager.java:582)
     at java.sql.DriverManager.getConnection(DriverManager.java:185)
     at org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:739)
     ... 11 more
Caused by: org.hsqldb.HsqlException: java.net.ConnectException: 
Connection refused
     at org.hsqldb.ClientConnection.openConnection(Unknown Source)
     at org.hsqldb.ClientConnection.initConnection(Unknown Source)
     at org.hsqldb.ClientConnection.<init>(Unknown Source)
     ... 17 more
Caused by: java.net.ConnectException: Connection refused
     at java.net.PlainSocketImpl.socketConnect(Native Method)
     at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
     at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
     at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
     at java.net.Socket.connect(Socket.java:529)
     at java.net.Socket.connect(Socket.java:478)
     at java.net.Socket.<init>(Socket.java:375)
     at java.net.Socket.<init>(Socket.java:189)
     at org.hsqldb.server.HsqlSocketFactory.createSocket(Unknown Source)
     ... 20 more

Is it something with HSQLDB? I thought it was optional.

Regards,

Re: Crawl errors

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi,

On Wed, Sep 5, 2012 at 2:48 PM, Tolga <to...@ozses.net> wrote:

> Caused by: org.apache.solr.common.SolrException: ERROR:
> [doc=http://www.sabanciuniv.edu/] multiple values encountered for non
> multiValued copy field text: Sabancı Üniversitesi
>     at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:297)
>     ... 26 more
>

This type of query would typically be sent to the Solr list. I see you
have been over there and already had a problem with your Schema.xml,
maybe it's just a case of properly setting the schema config.

However...

Please also see below

http://www.mail-archive.com/user@nutch.apache.org/msg04462.html

Mailing archives are a great resource BTW

hth

Lewis

Re: Crawl errors

Posted by Tolga <to...@ozses.net>.
Oh, and by way I have the 'title' field.

On 09/05/2012 04:48 PM, Tolga wrote:
> I changed the encoding to ISO-8859-9 and restarted Solr, it didn't 
> work :S Below is the full error:
>
> SEVERE: org.apache.solr.common.SolrException: ERROR: 
> [doc=http://www.sabanciuniv.edu/] Error adding field 'title'='Sabancı 
> Üniversitesi'
>     at 
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333)
>     at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
>     at 
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
>     at 
> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)
>     at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
>     at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
>     at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>     at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
>     at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
>     at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
>     at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>     at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>     at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>     at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>     at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>     at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>     at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>     at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>     at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>     at org.mortbay.jetty.Server.handle(Server.java:326)
>     at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>     at 
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
>     at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:843)
>     at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
>     at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>     at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>     at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> Caused by: org.apache.solr.common.SolrException: ERROR: 
> [doc=http://www.sabanciuniv.edu/] multiple values encountered for non 
> multiValued copy field text: Sabancı Üniversitesi
>     at 
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:297)
>     ... 26 more
>
> Regards,
> On 09/05/2012 02:14 PM, Lewis John Mcgibbney wrote:
>> Most likely
>>
>> On Wed, Sep 5, 2012 at 12:12 PM, Tolga <to...@ozses.net> wrote:
>>> Sorry for replying to this, I can start a new thread if asked.
>>>
>>> I am crawling our website, when suddenly I got:
>>>
>>> SEVERE: org.apache.solr.common.SolrException: ERROR:
>>> [doc=http://www.sabanciuniv.edu/] Error adding field 'title'='Sabancı
>>> Üniversitesi'
>>>
>>> Is it because of 'Sabancı Üniversitesi'? 
>>> SOLR/example/solr/conf/schema.xml
>>> specifies UTF-8
>>>
>>> Regards,
>>>
>>> On 09/04/2012 05:04 PM, Lewis John Mcgibbney wrote:
>>>> I don't think you have your HSQLDB server running, this is essential
>>>> requirement to store the crawldb, WebPage and Host data etc.
>>>>
>>>> You can follow the various tutorials here to get you going
>>>>
>>>> http://wiki.apache.org/nutch/#Nutch_2.X_tutorial.28s.29
>>>>
>>>> hth
>>>>
>>>> Lewis
>>>> On Tue, Sep 4, 2012 at 2:27 PM, Tolga <to...@ozses.net> wrote:
>>>>> Hi,
>>>>>
>>>>> I set up my Solr, and when I tried to crawl my website, it gave me
>>>>>
>>>>> [mtozses@atlas bin]$ time ./nutch crawl urls -dir crawl-$(date
>>>>> +%FT%H-%M-%S)
>>>>> -solr http://localhost:8983/solr/ -depth 5 -topN 5
>>>>> Exception in thread "main" org.apache.gora.util.GoraException:
>>>>> java.io.IOException: java.sql.SQLTransientConnectionException:
>>>>> java.net.ConnectException: Connection refused
>>>>>       at
>>>>>
>>>>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167) 
>>>>>
>>>>>       at
>>>>>
>>>>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135) 
>>>>>
>>>>>       at
>>>>>
>>>>> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:69) 
>>>>>
>>>>>       at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
>>>>>       at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
>>>>>       at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
>>>>>       at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
>>>>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>       at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
>>>>> Caused by: java.io.IOException: 
>>>>> java.sql.SQLTransientConnectionException:
>>>>> java.net.ConnectException: Connection refused
>>>>>       at
>>>>> org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:747)
>>>>>       at 
>>>>> org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:160)
>>>>>       at
>>>>>
>>>>> org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102) 
>>>>>
>>>>>       at
>>>>>
>>>>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161) 
>>>>>
>>>>>       ... 8 more
>>>>> Caused by: java.sql.SQLTransientConnectionException:
>>>>> java.net.ConnectException: Connection refused
>>>>>       at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
>>>>>       at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
>>>>>       at org.hsqldb.jdbc.JDBCConnection.<init>(Unknown Source)
>>>>>       at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source)
>>>>>       at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source)
>>>>>       at java.sql.DriverManager.getConnection(DriverManager.java:582)
>>>>>       at java.sql.DriverManager.getConnection(DriverManager.java:185)
>>>>>       at
>>>>> org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:739)
>>>>>       ... 11 more
>>>>> Caused by: org.hsqldb.HsqlException: java.net.ConnectException:
>>>>> Connection
>>>>> refused
>>>>>       at org.hsqldb.ClientConnection.openConnection(Unknown Source)
>>>>>       at org.hsqldb.ClientConnection.initConnection(Unknown Source)
>>>>>       at org.hsqldb.ClientConnection.<init>(Unknown Source)
>>>>>       ... 17 more
>>>>> Caused by: java.net.ConnectException: Connection refused
>>>>>       at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>       at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
>>>>>       at
>>>>> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
>>>>>       at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
>>>>>       at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>>>>>       at java.net.Socket.connect(Socket.java:529)
>>>>>       at java.net.Socket.connect(Socket.java:478)
>>>>>       at java.net.Socket.<init>(Socket.java:375)
>>>>>       at java.net.Socket.<init>(Socket.java:189)
>>>>>       at org.hsqldb.server.HsqlSocketFactory.createSocket(Unknown 
>>>>> Source)
>>>>>       ... 20 more
>>>>>
>>>>> Is it something with HSQLDB? I thought it was optional.
>>>>>
>>>>> Regards,
>>>>
>>>>
>>
>>
>


Re: Crawl errors

Posted by Tolga <to...@ozses.net>.
I changed the encoding to ISO-8859-9 and restarted Solr, it didn't work 
:S Below is the full error:

SEVERE: org.apache.solr.common.SolrException: ERROR: 
[doc=http://www.sabanciuniv.edu/] Error adding field 'title'='Sabancı 
Üniversitesi'
     at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333)
     at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
     at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
     at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)
     at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
     at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
     at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
     at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
     at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
     at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
     at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
     at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
     at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
     at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
     at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
     at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
     at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
     at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
     at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
     at org.mortbay.jetty.Server.handle(Server.java:326)
     at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
     at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
     at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:843)
     at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
     at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
     at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
     at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: org.apache.solr.common.SolrException: ERROR: 
[doc=http://www.sabanciuniv.edu/] multiple values encountered for non 
multiValued copy field text: Sabancı Üniversitesi
     at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:297)
     ... 26 more

Regards,
On 09/05/2012 02:14 PM, Lewis John Mcgibbney wrote:
> Most likely
>
> On Wed, Sep 5, 2012 at 12:12 PM, Tolga <to...@ozses.net> wrote:
>> Sorry for replying to this, I can start a new thread if asked.
>>
>> I am crawling our website, when suddenly I got:
>>
>> SEVERE: org.apache.solr.common.SolrException: ERROR:
>> [doc=http://www.sabanciuniv.edu/] Error adding field 'title'='Sabancı
>> Üniversitesi'
>>
>> Is it because of 'Sabancı Üniversitesi'? SOLR/example/solr/conf/schema.xml
>> specifies UTF-8
>>
>> Regards,
>>
>> On 09/04/2012 05:04 PM, Lewis John Mcgibbney wrote:
>>> I don't think you have your HSQLDB server running, this is essential
>>> requirement to store the crawldb, WebPage and Host data etc.
>>>
>>> You can follow the various tutorials here to get you going
>>>
>>> http://wiki.apache.org/nutch/#Nutch_2.X_tutorial.28s.29
>>>
>>> hth
>>>
>>> Lewis
>>> On Tue, Sep 4, 2012 at 2:27 PM, Tolga <to...@ozses.net> wrote:
>>>> Hi,
>>>>
>>>> I set up my Solr, and when I tried to crawl my website, it gave me
>>>>
>>>> [mtozses@atlas bin]$ time ./nutch crawl urls -dir crawl-$(date
>>>> +%FT%H-%M-%S)
>>>> -solr http://localhost:8983/solr/ -depth 5 -topN 5
>>>> Exception in thread "main" org.apache.gora.util.GoraException:
>>>> java.io.IOException: java.sql.SQLTransientConnectionException:
>>>> java.net.ConnectException: Connection refused
>>>>       at
>>>>
>>>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
>>>>       at
>>>>
>>>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
>>>>       at
>>>>
>>>> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:69)
>>>>       at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
>>>>       at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
>>>>       at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
>>>>       at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
>>>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>       at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
>>>> Caused by: java.io.IOException: java.sql.SQLTransientConnectionException:
>>>> java.net.ConnectException: Connection refused
>>>>       at
>>>> org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:747)
>>>>       at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:160)
>>>>       at
>>>>
>>>> org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
>>>>       at
>>>>
>>>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
>>>>       ... 8 more
>>>> Caused by: java.sql.SQLTransientConnectionException:
>>>> java.net.ConnectException: Connection refused
>>>>       at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
>>>>       at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
>>>>       at org.hsqldb.jdbc.JDBCConnection.<init>(Unknown Source)
>>>>       at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source)
>>>>       at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source)
>>>>       at java.sql.DriverManager.getConnection(DriverManager.java:582)
>>>>       at java.sql.DriverManager.getConnection(DriverManager.java:185)
>>>>       at
>>>> org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:739)
>>>>       ... 11 more
>>>> Caused by: org.hsqldb.HsqlException: java.net.ConnectException:
>>>> Connection
>>>> refused
>>>>       at org.hsqldb.ClientConnection.openConnection(Unknown Source)
>>>>       at org.hsqldb.ClientConnection.initConnection(Unknown Source)
>>>>       at org.hsqldb.ClientConnection.<init>(Unknown Source)
>>>>       ... 17 more
>>>> Caused by: java.net.ConnectException: Connection refused
>>>>       at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>       at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
>>>>       at
>>>> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
>>>>       at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
>>>>       at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>>>>       at java.net.Socket.connect(Socket.java:529)
>>>>       at java.net.Socket.connect(Socket.java:478)
>>>>       at java.net.Socket.<init>(Socket.java:375)
>>>>       at java.net.Socket.<init>(Socket.java:189)
>>>>       at org.hsqldb.server.HsqlSocketFactory.createSocket(Unknown Source)
>>>>       ... 20 more
>>>>
>>>> Is it something with HSQLDB? I thought it was optional.
>>>>
>>>> Regards,
>>>
>>>
>
>


Re: Crawl errors

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Most likely

On Wed, Sep 5, 2012 at 12:12 PM, Tolga <to...@ozses.net> wrote:
> Sorry for replying to this, I can start a new thread if asked.
>
> I am crawling our website, when suddenly I got:
>
> SEVERE: org.apache.solr.common.SolrException: ERROR:
> [doc=http://www.sabanciuniv.edu/] Error adding field 'title'='Sabancı
> Üniversitesi'
>
> Is it because of 'Sabancı Üniversitesi'? SOLR/example/solr/conf/schema.xml
> specifies UTF-8
>
> Regards,
>
> On 09/04/2012 05:04 PM, Lewis John Mcgibbney wrote:
>>
>> I don't think you have your HSQLDB server running, this is essential
>> requirement to store the crawldb, WebPage and Host data etc.
>>
>> You can follow the various tutorials here to get you going
>>
>> http://wiki.apache.org/nutch/#Nutch_2.X_tutorial.28s.29
>>
>> hth
>>
>> Lewis
>> On Tue, Sep 4, 2012 at 2:27 PM, Tolga <to...@ozses.net> wrote:
>>>
>>> Hi,
>>>
>>> I set up my Solr, and when I tried to crawl my website, it gave me
>>>
>>> [mtozses@atlas bin]$ time ./nutch crawl urls -dir crawl-$(date
>>> +%FT%H-%M-%S)
>>> -solr http://localhost:8983/solr/ -depth 5 -topN 5
>>> Exception in thread "main" org.apache.gora.util.GoraException:
>>> java.io.IOException: java.sql.SQLTransientConnectionException:
>>> java.net.ConnectException: Connection refused
>>>      at
>>>
>>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
>>>      at
>>>
>>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
>>>      at
>>>
>>> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:69)
>>>      at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
>>>      at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
>>>      at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
>>>      at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
>>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>      at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
>>> Caused by: java.io.IOException: java.sql.SQLTransientConnectionException:
>>> java.net.ConnectException: Connection refused
>>>      at
>>> org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:747)
>>>      at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:160)
>>>      at
>>>
>>> org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
>>>      at
>>>
>>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
>>>      ... 8 more
>>> Caused by: java.sql.SQLTransientConnectionException:
>>> java.net.ConnectException: Connection refused
>>>      at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
>>>      at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
>>>      at org.hsqldb.jdbc.JDBCConnection.<init>(Unknown Source)
>>>      at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source)
>>>      at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source)
>>>      at java.sql.DriverManager.getConnection(DriverManager.java:582)
>>>      at java.sql.DriverManager.getConnection(DriverManager.java:185)
>>>      at
>>> org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:739)
>>>      ... 11 more
>>> Caused by: org.hsqldb.HsqlException: java.net.ConnectException:
>>> Connection
>>> refused
>>>      at org.hsqldb.ClientConnection.openConnection(Unknown Source)
>>>      at org.hsqldb.ClientConnection.initConnection(Unknown Source)
>>>      at org.hsqldb.ClientConnection.<init>(Unknown Source)
>>>      ... 17 more
>>> Caused by: java.net.ConnectException: Connection refused
>>>      at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>      at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
>>>      at
>>> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
>>>      at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
>>>      at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>>>      at java.net.Socket.connect(Socket.java:529)
>>>      at java.net.Socket.connect(Socket.java:478)
>>>      at java.net.Socket.<init>(Socket.java:375)
>>>      at java.net.Socket.<init>(Socket.java:189)
>>>      at org.hsqldb.server.HsqlSocketFactory.createSocket(Unknown Source)
>>>      ... 20 more
>>>
>>> Is it something with HSQLDB? I thought it was optional.
>>>
>>> Regards,
>>
>>
>>
>



-- 
Lewis

Re: Crawl errors

Posted by Tolga <to...@ozses.net>.
Sorry for replying to this, I can start a new thread if asked.

I am crawling our website, when suddenly I got:

SEVERE: org.apache.solr.common.SolrException: ERROR: 
[doc=http://www.sabanciuniv.edu/] Error adding field 'title'='Sabancı 
Üniversitesi'

Is it because of 'Sabancı Üniversitesi'? 
SOLR/example/solr/conf/schema.xml specifies UTF-8

Regards,

On 09/04/2012 05:04 PM, Lewis John Mcgibbney wrote:
> I don't think you have your HSQLDB server running, this is essential
> requirement to store the crawldb, WebPage and Host data etc.
>
> You can follow the various tutorials here to get you going
>
> http://wiki.apache.org/nutch/#Nutch_2.X_tutorial.28s.29
>
> hth
>
> Lewis
> On Tue, Sep 4, 2012 at 2:27 PM, Tolga <to...@ozses.net> wrote:
>> Hi,
>>
>> I set up my Solr, and when I tried to crawl my website, it gave me
>>
>> [mtozses@atlas bin]$ time ./nutch crawl urls -dir crawl-$(date +%FT%H-%M-%S)
>> -solr http://localhost:8983/solr/ -depth 5 -topN 5
>> Exception in thread "main" org.apache.gora.util.GoraException:
>> java.io.IOException: java.sql.SQLTransientConnectionException:
>> java.net.ConnectException: Connection refused
>>      at
>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
>>      at
>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
>>      at
>> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:69)
>>      at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
>>      at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
>>      at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
>>      at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>      at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
>> Caused by: java.io.IOException: java.sql.SQLTransientConnectionException:
>> java.net.ConnectException: Connection refused
>>      at org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:747)
>>      at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:160)
>>      at
>> org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
>>      at
>> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
>>      ... 8 more
>> Caused by: java.sql.SQLTransientConnectionException:
>> java.net.ConnectException: Connection refused
>>      at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
>>      at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
>>      at org.hsqldb.jdbc.JDBCConnection.<init>(Unknown Source)
>>      at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source)
>>      at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source)
>>      at java.sql.DriverManager.getConnection(DriverManager.java:582)
>>      at java.sql.DriverManager.getConnection(DriverManager.java:185)
>>      at org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:739)
>>      ... 11 more
>> Caused by: org.hsqldb.HsqlException: java.net.ConnectException: Connection
>> refused
>>      at org.hsqldb.ClientConnection.openConnection(Unknown Source)
>>      at org.hsqldb.ClientConnection.initConnection(Unknown Source)
>>      at org.hsqldb.ClientConnection.<init>(Unknown Source)
>>      ... 17 more
>> Caused by: java.net.ConnectException: Connection refused
>>      at java.net.PlainSocketImpl.socketConnect(Native Method)
>>      at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
>>      at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
>>      at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
>>      at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>>      at java.net.Socket.connect(Socket.java:529)
>>      at java.net.Socket.connect(Socket.java:478)
>>      at java.net.Socket.<init>(Socket.java:375)
>>      at java.net.Socket.<init>(Socket.java:189)
>>      at org.hsqldb.server.HsqlSocketFactory.createSocket(Unknown Source)
>>      ... 20 more
>>
>> Is it something with HSQLDB? I thought it was optional.
>>
>> Regards,
>
>


Re: Crawl errors

Posted by Lewis John Mcgibbney <le...@gmail.com>.
I don't think you have your HSQLDB server running, this is essential
requirement to store the crawldb, WebPage and Host data etc.

You can follow the various tutorials here to get you going

http://wiki.apache.org/nutch/#Nutch_2.X_tutorial.28s.29

hth

Lewis
On Tue, Sep 4, 2012 at 2:27 PM, Tolga <to...@ozses.net> wrote:
> Hi,
>
> I set up my Solr, and when I tried to crawl my website, it gave me
>
> [mtozses@atlas bin]$ time ./nutch crawl urls -dir crawl-$(date +%FT%H-%M-%S)
> -solr http://localhost:8983/solr/ -depth 5 -topN 5
> Exception in thread "main" org.apache.gora.util.GoraException:
> java.io.IOException: java.sql.SQLTransientConnectionException:
> java.net.ConnectException: Connection refused
>     at
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
>     at
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
>     at
> org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:69)
>     at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:243)
>     at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
>     at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
>     at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
> Caused by: java.io.IOException: java.sql.SQLTransientConnectionException:
> java.net.ConnectException: Connection refused
>     at org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:747)
>     at org.apache.gora.sql.store.SqlStore.initialize(SqlStore.java:160)
>     at
> org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
>     at
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
>     ... 8 more
> Caused by: java.sql.SQLTransientConnectionException:
> java.net.ConnectException: Connection refused
>     at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
>     at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
>     at org.hsqldb.jdbc.JDBCConnection.<init>(Unknown Source)
>     at org.hsqldb.jdbc.JDBCDriver.getConnection(Unknown Source)
>     at org.hsqldb.jdbc.JDBCDriver.connect(Unknown Source)
>     at java.sql.DriverManager.getConnection(DriverManager.java:582)
>     at java.sql.DriverManager.getConnection(DriverManager.java:185)
>     at org.apache.gora.sql.store.SqlStore.getConnection(SqlStore.java:739)
>     ... 11 more
> Caused by: org.hsqldb.HsqlException: java.net.ConnectException: Connection
> refused
>     at org.hsqldb.ClientConnection.openConnection(Unknown Source)
>     at org.hsqldb.ClientConnection.initConnection(Unknown Source)
>     at org.hsqldb.ClientConnection.<init>(Unknown Source)
>     ... 17 more
> Caused by: java.net.ConnectException: Connection refused
>     at java.net.PlainSocketImpl.socketConnect(Native Method)
>     at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
>     at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
>     at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
>     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>     at java.net.Socket.connect(Socket.java:529)
>     at java.net.Socket.connect(Socket.java:478)
>     at java.net.Socket.<init>(Socket.java:375)
>     at java.net.Socket.<init>(Socket.java:189)
>     at org.hsqldb.server.HsqlSocketFactory.createSocket(Unknown Source)
>     ... 20 more
>
> Is it something with HSQLDB? I thought it was optional.
>
> Regards,



-- 
Lewis