You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Žygimantas <zi...@yahoo.com> on 2013/02/13 15:30:01 UTC

Slow parse on hadoop

Hi,

I have Nutch running on a Hadoop cluster. Inject, generate, fetch are working fine, they are executed on multiple nodes. We seam to get only one mapper for the parse job and the parse step only runs on one node and it takes a minute or so to parse one page. Please see the log below (1min 41s to parse thetimes).

2013-02-13 13:46:02,658 INFO org.apache.nutch.parse.ParserJob: Parsing http://www.thetimes.co.uk/tto/news/ 2013-02-13 13:47:43,415 INFO org.apache.nutch.parse.ParserJob: Parsing http://online.wsj.com/home-page
I am using parse-html plugin to do the job. Cassandra as the DB. When running locally all is fine.
Running parse with this:
hadoop jar apache-nutch-2.1-SNAPSHOT.job org.apache.nutch.parse.ParserJob $id


Also including log from jobtracker
Hadoop job_201302131311_0006 onJob Name: parse
Job-ACLs: All users are allowed
Status: Succeeded
Started at: Wed Feb 13 13:44:06 GMT 2013
Finished at: Wed Feb 13 14:06:30 GMT 2013
Finished in: 22mins, 23sec



Counter
Map
Reduce
Total
ParserStatus success 13 0 13 
notparsed 1 0 1 
Job Counters SLOTS_MILLIS_MAPS 0 0 1,335,834 
Total time spent by all reduces waiting after reserving slots (ms) 0 0 0 
Total time spent by all maps waiting after reserving slots (ms) 0 0 0 
Launched map tasks 0 0 1 
SLOTS_MILLIS_REDUCES 0 0 0 
File Output Format Counters Bytes Written 0 0 0 
File Input Format Counters Bytes Read 0 0 0 
FileSystemCounters HDFS_BYTES_READ 689 0 689 
FILE_BYTES_WRITTEN 32,142 0 32,142 
Map-Reduce Framework Map input records 138 0 138 
Physical memory (bytes) snapshot 417,538,048 0 417,538,048 
Spilled Records 0 0 0 
Total committed heap usage (bytes) 186,449,920 0 186,449,920 
CPU time spent (ms) 1,379,340 0 1,379,340 
Virtual memory (bytes) snapshot 1,163,165,696 0 1,163,165,696 
SPLIT_RAW_BYTES 689 0 689 
Map output records 14 0 14 
________________________________

Re: Slow parse on hadoop

Posted by t_gra <al...@gmail.com>.
Hi Tejas,

Thanks, I know about this setting and already increased it to 1 minute,
getting higher percent of pages parsed successfully.

But actually more then 1 minute for every second page is tremendously slow
and looks like the page sizes themselves are not an issue, as with HBase
they are parsed the order of magnitude times faster. Root cause is somewhere
else :)

What looks suspicious to me is why map task was started only on one node
(actually, several attempts on different nodes, but always only on one node
a time).



--
View this message in context: http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4040911.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Slow parse on hadoop

Posted by Tejas Patil <te...@gmail.com>.
The warning indicates that the parser exceeds the timeout set for parsing
the document. By default the timeout value is 30 seconds. You might try
increasing the timeout value in conf/nutch-site.xml

<property>
<name>parser.timeout</name>
<value>30</value>
 <description>Timeout in seconds for the parsing of a document, otherwise
treats it as an exception and
moves on the the following documents. This parameter is applied to any
Parser implementation.
 Set to -1 to deactivate, bearing in mind that this could cause
the parsing to crash because of a very long or corrupted document.
 </description>
</property>

While setting this timeout value, pick a smart value based on the size of
documents you are dealing with. As you observe this warning message lot of
times, my guess is that you have bigger files. If not, the content must
have something that makes parser spend a lot of time.

After few trials you will end up with a value good enough so that the crawl
rate aint much affected and the % of warnings are low.

Thanks,
Tejas Patil


On Sat, Feb 16, 2013 at 1:16 PM, t_gra <al...@gmail.com> wrote:

> Hi All,
>
> Experiencing same problem as Žygimantas with Nutch 2.1 and Cassandra (with
> HBase everything works OK).
>
> Here are some details of my setup:
>
> Node1 - NameNode, SecondraryNameNode, JobTracker
> Node2..Node4 - TaskTraker, DataNode, Cassandra
>
> All these are virtual machines.
> CPU is reported as "Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz", 4Gb RAM.
>
> Running Nutch using
> hadoop jar $JAR org.apache.nutch.crawl.Crawler /seeds -threads 10 -numTasks
> 3 -depth 2 -topN 10000
>
> Getting one mapper for parse job and very slow parsing of individual pages.
>
> Getting lots of errors like this:
>
> 2013-02-16 01:26:04,217 WARN org.apache.nutch.parse.ParseUtil: Error
> parsing
> http://someurl.net/ with org.apache.nutch.parse.html.HtmlParser@63a1bc40
> java.util.concurrent.TimeoutException
>         at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:119)
>         at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:148)
>         at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:129)
>         at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:176)
>         at
> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:129)
>         at
> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:78)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:416)
>         at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
> Any suggestions how to diagnose why it is behaving this way?
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4040897.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

Re: Slow parse on hadoop

Posted by Alexey Tigarev <al...@gmail.com>.
On Wed, Feb 20, 2013 at 1:10 AM, Lewis John Mcgibbney
<le...@gmail.com> wrote:
> Hi,
> NUTCH-1420 is now committed, so you can update your local copy of Nutch 2.x
> if you are working from HEAD source.

> So there was another issue here where the parse was only running on one
> node in the cluster. Is this also the case with you?

Yes, this happens for me also. Maybe this behavior is related somehow
with having same content with multiple pages for each URL instead of
all individual pages I mentioned before.

Alexey.

Re: Slow parse on hadoop

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi,
NUTCH-1420 is now committed, so you can update your local copy of Nutch 2.x
if you are working from HEAD source.
So there was another issue here where the parse was only running on one
node in the cluster. Is this also the case with you?

On Tue, Feb 19, 2013 at 2:48 PM, t_gra <al...@gmail.com> wrote:

> Another workaround that can improve a situation a bit (but not solve
> all problems) will be ignoring pages with content larger then some
> given size. Will try if that help parsing at least some pages :)
>
> Yeah for this exact reason by default this is activated.
Lewis

Re: Slow parse on hadoop

Posted by t_gra <al...@gmail.com>.
I think that just cleaning up �'s will not remove a root cause
(however that's good workaround if it improves the situation for many
cases).
In my situation, just removing �'s will not help, as I have a
concatenation of multiple pages separated by series of �'s and other
stuff where a single page content is expected.
In a dump generated using WebTableReader I also had a similar case for
other field, "metadata ___rdrdsc__".

Another workaround that can improve a situation a bit (but not solve
all problems) will be ignoring pages with content larger then some
given size. Will try if that help parsing at least some pages :)

Regards, Alexey Tigarev
<ti...@nlp.od.ua> Jabber: tigra@jabber.od.ua Skype: t__gra

On Tue, Feb 19, 2013 at 12:50 AM, lewis john mcgibbney [via Lucene]
<ml...@n3.nabble.com> wrote:
> I actually see something similar in my Cassandra keyspace (and subsequently
> when I do a webpage dump) when using gora-cassandra. This was why I asked
> for a dump of the HBase webtable.
> Me and Renato we're wondering if there is a problem when the field data is
> written to the Data Store and it now seems that there is a problem indeed!
> Markus, has a patch for trunk [0] which removes the dreaded �, we should
> commit this for trunk and 2.x to remove these characters from the parse
> data.
> Lewis
>
>
> [0] https://issues.apache.org/jira/browse/NUTCH-1420
> On Mon, Feb 18, 2013 at 2:38 PM, t_gra <[hidden email]> wrote:
>
>> Hi Lewis and All,
>>
>> I added debug output to parse-html plugin, outputting page contents. It
>> looks really strange (sorry for long copypaste from logs) - see below.
>>
>> First, size. I've got very large size for very first document, 125
>> megabytes.
>> Second, contents. The contents actually follow a pattern: some crap, then
>> contents of page of interest, then some more crap, then contents of
>> another
>> page, etc.
>>
>> Not strange that this takes a lot of time to parse :)
>>
>> Logs (only part of one "page" printout, whole contents of real page +
>> start
>> of the next page) :
>> =================================================================
>> 2013-02-19 00:24:42,842 INFO org.apache.nutch.parse.html: >>>Parsing
>> http://jordan-11.net/ size=125981292
>> 2013-02-19 00:24:46,237 INFO org.apache.nutch.parse.html:
>> [�   get_range_slices  '    com.amman-dg:http/       st   "
>>   ��^���   com.panarabiaenquirer:http/     net.jordan-11:http/       bas
>>   http://jordan-11.net/
>>   ��� ��    cnt   �<!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0
>> Transitional//EN&quot;
>> &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;>
>> <html xmlns="http://www.w3.org/1999/xhtml">
>> <head>
>> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
>> <title>Justhost.com</title>
>>
>> </head>
>>
>> <body><center>
>>
>> <img src="http://www.justhost.com/media/shared/general/_jh/logo.gif"
>> alt="JustHost.com" border="0"/> <http://www.justhost.com/>
>>
>>
>>
>> <div align="center" class="style1"><br>
>>
>>
>>
>> <div align="center">
>>
>> There is no website configured at this address.
>>
>>                   <p class="style2">
>> You are seeing this page because there is nothing configured for the site
>> you have requested.<br>
>> If you think you are seeing this page in error, please contact the site
>> administrator or datacenter
>> responsible for this site.<br>
>> <BR></p>
>>
>>
>>
>>   <div style="background-color: #888; width: 60%; height: 1px;" /><br><br>
>>
>>
>>
>>
>>
>>          cPanel Login <http://www.justhost.com/cgi-bin/cplogin>
>>
>>          Domain Manager Login <https://www.justhost.com/dm>
>>
>>
>>
>>
>>          Support Center <http://helpdesk.justhost.com>
>>
>>          Hosting Features <http://www.justhost.com/cgi/info/web_hosting>
>>
>>
>>
>>
>> </div>
>> <div style="visibility: hidden; height: 6px; width: 1px;" alt="domain
>> hosting" /><br><br>
>>
>>
>> &copy; 2003-2012 JustHost.Com. Toll Free 1-888-755-7585
>>
>> </center>
>>
>>
>>
>>
>>
>> </body>
>> </html>
>>
>>   ��� �@    st
>>   ��� ��    typ   application/xhtml+xml
>>   ��� �(   kw.edu.aca:http/       bas   http://aca.edu.kw/
>>   ��շ(    cnt  \�
>>         <BASE href="http://aca.edu.kw/common/roar/landing/rpos/">
>>         <html>
>>         <head>
>> =================================================================
>>
>>
>>
>>
>> --
>> View this message in context:
>>
>> http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4041199.html
>
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> *Lewis*
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4041201.html
> To unsubscribe from Slow parse on hadoop, click here.
> NAML




--
View this message in context: http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4041401.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Slow parse on hadoop

Posted by Alexey Tigarev <al...@gmail.com>.
On Wed, Feb 20, 2013 at 2:08 AM, Lewis John Mcgibbney
<le...@gmail.com> wrote:
>> If you are working with Nutch 2.1 release [0], you will be working with
> gora-cassandra 0.2 by default. Do not upgrade to gora-cassandra 0.2.1 as it
> is buggy.
> gora-cassandra 0.2 is fitted to work with Cassandra version 1.0.2 over
> hector client 1.0-1

Thanks, will check if I am using these versions :)

Alexey.

Re: Slow parse on hadoop

Posted by Alexey Tigarev <al...@gmail.com>.
This is wrong behavior. As I said, this content is a concatenation of
multiple pages instead of one page identified by a given key.

Regards, Alexey Tigarev
<ti...@nlp.od.ua> Jabber: tigra@jabber.od.ua Skype: t__gra


On Wed, Feb 20, 2013 at 2:08 AM, Lewis John Mcgibbney
<le...@gmail.com> wrote:
> On Tue, Feb 19, 2013 at 3:40 PM, t_gra <al...@gmail.com> wrote:
>
>> I tried skipping pages with large content size, and it figured that
>> ALL my pages have content 125981292 bytes long (and probably the same
>> contents).
>>
>
> And this is okay? I don;t really understand.
>
>
>>
>> BTW, what number of version of gora-cassandra do I have to have with
>> Nutch 2.1 and Cassandra API version 19.33.0 ?
>>
>> If you are working with Nutch 2.1 release [0], you will be working with
> gora-cassandra 0.2 by default. Do not upgrade to gora-cassandra 0.2.1 as it
> is buggy.
> gora-cassandra 0.2 is fitted to work with Cassandra version 1.0.2 over
> hector client 1.0-1
>
> Lewis
>
> [0] http://svn.apache.org/repos/asf/nutch/tags/release-2.1/ivy/ivy.xml
> [1] http://svn.apache.org/repos/asf/gora/tags/gora-0.2/pom.xml

Re: Slow parse on hadoop

Posted by Lewis John Mcgibbney <le...@gmail.com>.
On Tue, Feb 19, 2013 at 3:40 PM, t_gra <al...@gmail.com> wrote:

> I tried skipping pages with large content size, and it figured that
> ALL my pages have content 125981292 bytes long (and probably the same
> contents).
>

And this is okay? I don;t really understand.


>
> BTW, what number of version of gora-cassandra do I have to have with
> Nutch 2.1 and Cassandra API version 19.33.0 ?
>
> If you are working with Nutch 2.1 release [0], you will be working with
gora-cassandra 0.2 by default. Do not upgrade to gora-cassandra 0.2.1 as it
is buggy.
gora-cassandra 0.2 is fitted to work with Cassandra version 1.0.2 over
hector client 1.0-1

Lewis

[0] http://svn.apache.org/repos/asf/nutch/tags/release-2.1/ivy/ivy.xml
[1] http://svn.apache.org/repos/asf/gora/tags/gora-0.2/pom.xml

Re: Slow parse on hadoop

Posted by t_gra <al...@gmail.com>.
I tried skipping pages with large content size, and it figured that
ALL my pages have content 125981292 bytes long (and probably the same
contents).

BTW, what number of version of gora-cassandra do I have to have with
Nutch 2.1 and Cassandra API version 19.33.0 ?

Regards, Alexey Tigarev

On Wed, Feb 20, 2013 at 12:47 AM, Alexey Tigarev
<al...@gmail.com> wrote:
> I think that just cleaning up �'s will not remove a root cause
> (however that's good workaround if it improves the situation for many
> cases).
> In my situation, just removing �'s will not help, as I have a
> concatenation of multiple pages separated by series of �'s and other
> stuff where a single page content is expected.
> In a dump generated using WebTableReader I also had a similar case for
> other field, "metadata ___rdrdsc__".
>
> Another workaround that can improve a situation a bit (but not solve
> all problems) will be ignoring pages with content larger then some
> given size. Will try if that help parsing at least some pages :)
>
> Regards, Alexey Tigarev
> <ti...@nlp.od.ua> Jabber: tigra@jabber.od.ua Skype: t__gra
>
> On Tue, Feb 19, 2013 at 12:50 AM, lewis john mcgibbney [via Lucene]
> <ml...@n3.nabble.com> wrote:
>> I actually see something similar in my Cassandra keyspace (and subsequently
>> when I do a webpage dump) when using gora-cassandra. This was why I asked
>> for a dump of the HBase webtable.
>> Me and Renato we're wondering if there is a problem when the field data is
>> written to the Data Store and it now seems that there is a problem indeed!
>> Markus, has a patch for trunk [0] which removes the dreaded �, we should
>> commit this for trunk and 2.x to remove these characters from the parse
>> data.
>> Lewis
>>
>>
>> [0] https://issues.apache.org/jira/browse/NUTCH-1420
>> On Mon, Feb 18, 2013 at 2:38 PM, t_gra <[hidden email]> wrote:
>>
>>> Hi Lewis and All,
>>>
>>> I added debug output to parse-html plugin, outputting page contents. It
>>> looks really strange (sorry for long copypaste from logs) - see below.
>>>
>>> First, size. I've got very large size for very first document, 125
>>> megabytes.
>>> Second, contents. The contents actually follow a pattern: some crap, then
>>> contents of page of interest, then some more crap, then contents of
>>> another
>>> page, etc.
>>>
>>> Not strange that this takes a lot of time to parse :)
>>>
>>> Logs (only part of one "page" printout, whole contents of real page +
>>> start
>>> of the next page) :
>>> =================================================================
>>> 2013-02-19 00:24:42,842 INFO org.apache.nutch.parse.html: >>>Parsing
>>> http://jordan-11.net/ size=125981292
>>> 2013-02-19 00:24:46,237 INFO org.apache.nutch.parse.html:
>>> [�   get_range_slices  '    com.amman-dg:http/       st   "
>>>   ��^���   com.panarabiaenquirer:http/     net.jordan-11:http/       bas
>>>   http://jordan-11.net/
>>>   ��� ��    cnt   �<!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0
>>> Transitional//EN&quot;
>>> &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;>
>>> <html xmlns="http://www.w3.org/1999/xhtml">
>>> <head>
>>> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
>>> <title>Justhost.com</title>
>>>
>>> </head>
>>>
>>> <body><center>
>>>
>>> <img src="http://www.justhost.com/media/shared/general/_jh/logo.gif"
>>> alt="JustHost.com" border="0"/> <http://www.justhost.com/>
>>>
>>>
>>>
>>> <div align="center" class="style1"><br>
>>>
>>>
>>>
>>> <div align="center">
>>>
>>> There is no website configured at this address.
>>>
>>>                   <p class="style2">
>>> You are seeing this page because there is nothing configured for the site
>>> you have requested.<br>
>>> If you think you are seeing this page in error, please contact the site
>>> administrator or datacenter
>>> responsible for this site.<br>
>>> <BR></p>
>>>
>>>
>>>
>>>   <div style="background-color: #888; width: 60%; height: 1px;" /><br><br>
>>>
>>>
>>>
>>>
>>>
>>>          cPanel Login <http://www.justhost.com/cgi-bin/cplogin>
>>>
>>>          Domain Manager Login <https://www.justhost.com/dm>
>>>
>>>
>>>
>>>
>>>          Support Center <http://helpdesk.justhost.com>
>>>
>>>          Hosting Features <http://www.justhost.com/cgi/info/web_hosting>
>>>
>>>
>>>
>>>
>>> </div>
>>> <div style="visibility: hidden; height: 6px; width: 1px;" alt="domain
>>> hosting" /><br><br>
>>>
>>>
>>> &copy; 2003-2012 JustHost.Com. Toll Free 1-888-755-7585
>>>
>>> </center>
>>>
>>>
>>>
>>>
>>>
>>> </body>
>>> </html>
>>>
>>>   ��� �@    st
>>>   ��� ��    typ   application/xhtml+xml
>>>   ��� �(   kw.edu.aca:http/       bas   http://aca.edu.kw/
>>>   ��շ(    cnt  \�
>>>         <BASE href="http://aca.edu.kw/common/roar/landing/rpos/">
>>>         <html>
>>>         <head>
>>> =================================================================
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>>
>>> http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4041199.html
>>
>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>> *Lewis*
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4041201.html
>> To unsubscribe from Slow parse on hadoop, click here.
>> NAML




--
View this message in context: http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4041415.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Slow parse on hadoop

Posted by Lewis John Mcgibbney <le...@gmail.com>.
I actually see something similar in my Cassandra keyspace (and subsequently
when I do a webpage dump) when using gora-cassandra. This was why I asked
for a dump of the HBase webtable.
Me and Renato we're wondering if there is a problem when the field data is
written to the Data Store and it now seems that there is a problem indeed!
Markus, has a patch for trunk [0] which removes the dreaded �, we should
commit this for trunk and 2.x to remove these characters from the parse
data.
Lewis


[0] https://issues.apache.org/jira/browse/NUTCH-1420
On Mon, Feb 18, 2013 at 2:38 PM, t_gra <al...@gmail.com> wrote:

> Hi Lewis and All,
>
> I added debug output to parse-html plugin, outputting page contents. It
> looks really strange (sorry for long copypaste from logs) - see below.
>
> First, size. I've got very large size for very first document, 125
> megabytes.
> Second, contents. The contents actually follow a pattern: some crap, then
> contents of page of interest, then some more crap, then contents of another
> page, etc.
>
> Not strange that this takes a lot of time to parse :)
>
> Logs (only part of one "page" printout, whole contents of real page + start
> of the next page) :
> =================================================================
> 2013-02-19 00:24:42,842 INFO org.apache.nutch.parse.html: >>>Parsing
> http://jordan-11.net/ size=125981292
> 2013-02-19 00:24:46,237 INFO org.apache.nutch.parse.html:
> [�   get_range_slices  '    com.amman-dg:http/       st   "
>   ��^���   com.panarabiaenquirer:http/     net.jordan-11:http/       bas
>   http://jordan-11.net/
>   ��� ��    cnt   �<!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0
> Transitional//EN&quot;
> &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;>
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
> <title>Justhost.com</title>
>
> </head>
>
> <body><center>
>
> <img src="http://www.justhost.com/media/shared/general/_jh/logo.gif"
> alt="JustHost.com" border="0"/> <http://www.justhost.com/>
>
>
>
> <div align="center" class="style1"><br>
>
>
>
> <div align="center">
>
> There is no website configured at this address.
>
>                   <p class="style2">
> You are seeing this page because there is nothing configured for the site
> you have requested.<br>
> If you think you are seeing this page in error, please contact the site
> administrator or datacenter
> responsible for this site.<br>
> <BR></p>
>
>
>
>   <div style="background-color: #888; width: 60%; height: 1px;" /><br><br>
>
>
>
>
>
>          cPanel Login <http://www.justhost.com/cgi-bin/cplogin>
>
>          Domain Manager Login <https://www.justhost.com/dm>
>
>
>
>
>          Support Center <http://helpdesk.justhost.com>
>
>          Hosting Features <http://www.justhost.com/cgi/info/web_hosting>
>
>
>
>
> </div>
> <div style="visibility: hidden; height: 6px; width: 1px;" alt="domain
> hosting" /><br><br>
>
>
> &copy; 2003-2012 JustHost.Com. Toll Free 1-888-755-7585
>
> </center>
>
>
>
>
>
> </body>
> </html>
>
>   ��� �@    st
>   ��� ��    typ   application/xhtml+xml
>   ��� �(   kw.edu.aca:http/       bas   http://aca.edu.kw/
>   ��շ(    cnt  \�
>         <BASE href="http://aca.edu.kw/common/roar/landing/rpos/">
>         <html>
>         <head>
> =================================================================
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4041199.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*Lewis*

Re: Slow parse on hadoop

Posted by t_gra <al...@gmail.com>.
Hi Lewis and All,

I added debug output to parse-html plugin, outputting page contents. It
looks really strange (sorry for long copypaste from logs) - see below.

First, size. I've got very large size for very first document, 125
megabytes.
Second, contents. The contents actually follow a pattern: some crap, then
contents of page of interest, then some more crap, then contents of another
page, etc. 

Not strange that this takes a lot of time to parse :)

Logs (only part of one "page" printout, whole contents of real page + start
of the next page) :
=================================================================
2013-02-19 00:24:42,842 INFO org.apache.nutch.parse.html: >>>Parsing
http://jordan-11.net/ size=125981292
2013-02-19 00:24:46,237 INFO org.apache.nutch.parse.html: 
[�get_range_slices'com.amman-dg:http/st"
��^���com.panarabiaenquirer:http/net.jordan-11:http/bas
http://jordan-11.net/
�����cnt�<!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0
Transitional//EN&quot;
&quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Justhost.com</title>

</head>

<body><center>

<img src="http://www.justhost.com/media/shared/general/_jh/logo.gif"
alt="JustHost.com" border="0"/> <http://www.justhost.com/>  



<div align="center" class="style1"><br>



<div align="center">
                  
There is no website configured at this address.

                  <p class="style2">
You are seeing this page because there is nothing configured for the site
you have requested.<br>
If you think you are seeing this page in error, please contact the site
administrator or datacenter
responsible for this site.<br>
<BR></p>



  <div style="background-color: #888; width: 60%; height: 1px;" /><br><br>


 
  

   	 cPanel Login <http://www.justhost.com/cgi-bin/cplogin>  
   
   	 Domain Manager Login <https://www.justhost.com/dm>  
   
  
  

   	 Support Center <http://helpdesk.justhost.com>  
   
   	 Hosting Features <http://www.justhost.com/cgi/info/web_hosting>  
   
  
 

</div>
<div style="visibility: hidden; height: 6px; width: 1px;" alt="domain
hosting" /><br><br>


&copy; 2003-2012 JustHost.Com. Toll Free 1-888-755-7585 

</center>





</body>
</html>

����@st
�����typapplication/xhtml+xml
����(kw.edu.aca:http/bashttp://aca.edu.kw/
��շ(cnt\�
        <BASE href="http://aca.edu.kw/common/roar/landing/rpos/">
        <html>
        <head>
=================================================================




--
View this message in context: http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4041199.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Slow parse on hadoop

Posted by t_gra <al...@gmail.com>.
Thanks Lewis,
Doing the dump, will let know if I will notice something interesting.



--
View this message in context: http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4040912.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Slow parse on hadoop

Posted by t_gra <al...@gmail.com>.
Hi Renato,

Regarding places in Nutch code to look:

You can look on HtmlParser.getParse() (resides at plugin/parse-html in Nutch
source distribution )

ParserJob.$ParserMapper.map() invokes ParseUtil.process(), it calls
ParseUtil.parse(), it calls Parser.getParse() (which is
HtmlParser.getParse() here).

Regards,
Alexey



--
View this message in context: http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4041039.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Slow parse on hadoop

Posted by Tejas Patil <te...@gmail.com>.
Hi Renato,

On Sat, Feb 16, 2013 at 2:01 PM, Renato Marroquín Mogrovejo <
renatoj.marroquin@gmail.com> wrote:

> Hi Tejas,
>
>
> 2013/2/16 Tejas Patil <te...@gmail.com>:
> > Hey Lewis,
> >
> > I am not knowledgeable about Gora thingy but am curious to know how
> parsing
> > perf. might affect if one uses different storage. With Hbase it worked
> fine
> > for OP but Cassandra gave this problem. Is the parsing code separate for
>
> This is one thing we (Lewis and me) were just discussing. Well, you
> set up Nutch to use Gora to persist data in whichever data store you
> want, so all writes and reads are handled separately by each different
> module. HBase relies on Avro for persisting data but Cassandra does
> not. Cassandra has its own series of serializers to write everything
> into bytes to make operations have a better performance. We believe
> there is something going on with Cassandra serializers and the way
> Gora uses them which is making this specific job to not to work as
> desired.


So this issue is while writing parsed content to Cassandra. As the
serialization is performed for every url, the problem should have been seen
for all urls. But thats not the case. Maybe it has to do something with the
content.

>
> > these two ? or its while writing parse output that the problem occurs ? I
> > think it might be the later one causing this but I am not sure.
>
> Could you point me to the parser job code so I can take a look at it?
> I am a foreigner @ Nutchland so I will appreciate your help in order
> to sort this out.
>
> The core parsing classes are present at [0]. The parser job is implemented
at [1]. Depending on the content, specific parsing is required. This is
done by the parse plugins present at [2].

[0] :
http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/parse/
[1] :
http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/parse/ParserJob.java?view=markup
[2] : http://svn.apache.org/viewvc/nutch/branches/2.x/src/plugin/


> Renato M.
>
> > Thanks,
> > Tejas Patil
> >
> >
> > On Sat, Feb 16, 2013 at 1:36 PM, Lewis John Mcgibbney <
> > lewis.mcgibbney@gmail.com> wrote:
> >
> >> Can you dump your webdb and check what the various fields are like?
> >> Can you read these in an editor?
> >> I think there may be some problems with the serializers in
> gora-cassandra
> >> but Iam not sure yet.
> >> Lewis
> >>
> >> On Saturday, February 16, 2013, t_gra <al...@gmail.com> wrote:
> >> > Hi All,
> >> >
> >> > Experiencing same problem as Žygimantas with Nutch 2.1 and Cassandra
> >> (with
> >> > HBase everything works OK).
> >> >
> >> > Here are some details of my setup:
> >> >
> >> > Node1 - NameNode, SecondraryNameNode, JobTracker
> >> > Node2..Node4 - TaskTraker, DataNode, Cassandra
> >> >
> >> > All these are virtual machines.
> >> > CPU is reported as "Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz", 4Gb
> RAM.
> >> >
> >> > Running Nutch using
> >> > hadoop jar $JAR org.apache.nutch.crawl.Crawler /seeds -threads 10
> >> -numTasks
> >> > 3 -depth 2 -topN 10000
> >> >
> >> > Getting one mapper for parse job and very slow parsing of individual
> >> pages.
> >> >
> >> > Getting lots of errors like this:
> >> >
> >> > 2013-02-16 01:26:04,217 WARN org.apache.nutch.parse.ParseUtil: Error
> >> parsing
> >> > http://someurl.net/ with
> org.apache.nutch.parse.html.HtmlParser@63a1bc40
> >> > java.util.concurrent.TimeoutException
> >> >         at
> >> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258)
> >> >         at java.util.concurrent.FutureTask.get(FutureTask.java:119)
> >> >         at
> org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:148)
> >> >         at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:129)
> >> >         at
> org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:176)
> >> >         at
> >> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:129)
> >> >         at
> >> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:78)
> >> >         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >> >         at
> >> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >> >         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >> >         at java.security.AccessController.doPrivileged(Native Method)
> >> >         at javax.security.auth.Subject.doAs(Subject.java:416)
> >> >         at
> >> >
> >>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
> >> >         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >> >
> >> > Any suggestions how to diagnose why it is behaving this way?
> >> >
> >> > Thanks!
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >>
> >>
> http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4040897.html
> >> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >> >
> >>
> >> --
> >> *Lewis*
> >>
>

Re: Slow parse on hadoop

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.
Hi Tejas,


2013/2/16 Tejas Patil <te...@gmail.com>:
> Hey Lewis,
>
> I am not knowledgeable about Gora thingy but am curious to know how parsing
> perf. might affect if one uses different storage. With Hbase it worked fine
> for OP but Cassandra gave this problem. Is the parsing code separate for

This is one thing we (Lewis and me) were just discussing. Well, you
set up Nutch to use Gora to persist data in whichever data store you
want, so all writes and reads are handled separately by each different
module. HBase relies on Avro for persisting data but Cassandra does
not. Cassandra has its own series of serializers to write everything
into bytes to make operations have a better performance. We believe
there is something going on with Cassandra serializers and the way
Gora uses them which is making this specific job to not to work as
desired.

> these two ? or its while writing parse output that the problem occurs ? I
> think it might be the later one causing this but I am not sure.

Could you point me to the parser job code so I can take a look at it?
I am a foreigner @ Nutchland so I will appreciate your help in order
to sort this out.


Renato M.

> Thanks,
> Tejas Patil
>
>
> On Sat, Feb 16, 2013 at 1:36 PM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
>> Can you dump your webdb and check what the various fields are like?
>> Can you read these in an editor?
>> I think there may be some problems with the serializers in gora-cassandra
>> but Iam not sure yet.
>> Lewis
>>
>> On Saturday, February 16, 2013, t_gra <al...@gmail.com> wrote:
>> > Hi All,
>> >
>> > Experiencing same problem as Žygimantas with Nutch 2.1 and Cassandra
>> (with
>> > HBase everything works OK).
>> >
>> > Here are some details of my setup:
>> >
>> > Node1 - NameNode, SecondraryNameNode, JobTracker
>> > Node2..Node4 - TaskTraker, DataNode, Cassandra
>> >
>> > All these are virtual machines.
>> > CPU is reported as "Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz", 4Gb RAM.
>> >
>> > Running Nutch using
>> > hadoop jar $JAR org.apache.nutch.crawl.Crawler /seeds -threads 10
>> -numTasks
>> > 3 -depth 2 -topN 10000
>> >
>> > Getting one mapper for parse job and very slow parsing of individual
>> pages.
>> >
>> > Getting lots of errors like this:
>> >
>> > 2013-02-16 01:26:04,217 WARN org.apache.nutch.parse.ParseUtil: Error
>> parsing
>> > http://someurl.net/ with org.apache.nutch.parse.html.HtmlParser@63a1bc40
>> > java.util.concurrent.TimeoutException
>> >         at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258)
>> >         at java.util.concurrent.FutureTask.get(FutureTask.java:119)
>> >         at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:148)
>> >         at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:129)
>> >         at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:176)
>> >         at
>> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:129)
>> >         at
>> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:78)
>> >         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> >         at
>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> >         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> >         at java.security.AccessController.doPrivileged(Native Method)
>> >         at javax.security.auth.Subject.doAs(Subject.java:416)
>> >         at
>> >
>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
>> >         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> >
>> > Any suggestions how to diagnose why it is behaving this way?
>> >
>> > Thanks!
>> >
>> >
>> >
>> > --
>> > View this message in context:
>>
>> http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4040897.html
>> > Sent from the Nutch - User mailing list archive at Nabble.com.
>> >
>>
>> --
>> *Lewis*
>>

Re: Slow parse on hadoop

Posted by Tejas Patil <te...@gmail.com>.
Hey Lewis,

I am not knowledgeable about Gora thingy but am curious to know how parsing
perf. might affect if one uses different storage. With Hbase it worked fine
for OP but Cassandra gave this problem. Is the parsing code separate for
these two ? or its while writing parse output that the problem occurs ? I
think it might be the later one causing this but I am not sure.

Thanks,
Tejas Patil


On Sat, Feb 16, 2013 at 1:36 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Can you dump your webdb and check what the various fields are like?
> Can you read these in an editor?
> I think there may be some problems with the serializers in gora-cassandra
> but Iam not sure yet.
> Lewis
>
> On Saturday, February 16, 2013, t_gra <al...@gmail.com> wrote:
> > Hi All,
> >
> > Experiencing same problem as Žygimantas with Nutch 2.1 and Cassandra
> (with
> > HBase everything works OK).
> >
> > Here are some details of my setup:
> >
> > Node1 - NameNode, SecondraryNameNode, JobTracker
> > Node2..Node4 - TaskTraker, DataNode, Cassandra
> >
> > All these are virtual machines.
> > CPU is reported as "Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz", 4Gb RAM.
> >
> > Running Nutch using
> > hadoop jar $JAR org.apache.nutch.crawl.Crawler /seeds -threads 10
> -numTasks
> > 3 -depth 2 -topN 10000
> >
> > Getting one mapper for parse job and very slow parsing of individual
> pages.
> >
> > Getting lots of errors like this:
> >
> > 2013-02-16 01:26:04,217 WARN org.apache.nutch.parse.ParseUtil: Error
> parsing
> > http://someurl.net/ with org.apache.nutch.parse.html.HtmlParser@63a1bc40
> > java.util.concurrent.TimeoutException
> >         at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258)
> >         at java.util.concurrent.FutureTask.get(FutureTask.java:119)
> >         at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:148)
> >         at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:129)
> >         at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:176)
> >         at
> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:129)
> >         at
> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:78)
> >         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:416)
> >         at
> >
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
> >         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >
> > Any suggestions how to diagnose why it is behaving this way?
> >
> > Thanks!
> >
> >
> >
> > --
> > View this message in context:
>
> http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4040897.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >
>
> --
> *Lewis*
>

Re: Slow parse on hadoop

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Can you dump your webdb and check what the various fields are like?
Can you read these in an editor?
I think there may be some problems with the serializers in gora-cassandra
but Iam not sure yet.
Lewis

On Saturday, February 16, 2013, t_gra <al...@gmail.com> wrote:
> Hi All,
>
> Experiencing same problem as Žygimantas with Nutch 2.1 and Cassandra (with
> HBase everything works OK).
>
> Here are some details of my setup:
>
> Node1 - NameNode, SecondraryNameNode, JobTracker
> Node2..Node4 - TaskTraker, DataNode, Cassandra
>
> All these are virtual machines.
> CPU is reported as "Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz", 4Gb RAM.
>
> Running Nutch using
> hadoop jar $JAR org.apache.nutch.crawl.Crawler /seeds -threads 10
-numTasks
> 3 -depth 2 -topN 10000
>
> Getting one mapper for parse job and very slow parsing of individual
pages.
>
> Getting lots of errors like this:
>
> 2013-02-16 01:26:04,217 WARN org.apache.nutch.parse.ParseUtil: Error
parsing
> http://someurl.net/ with org.apache.nutch.parse.html.HtmlParser@63a1bc40
> java.util.concurrent.TimeoutException
>         at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:119)
>         at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:148)
>         at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:129)
>         at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:176)
>         at
org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:129)
>         at
org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:78)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:416)
>         at
>
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
> Any suggestions how to diagnose why it is behaving this way?
>
> Thanks!
>
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4040897.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

-- 
*Lewis*

Re: Slow parse on hadoop

Posted by t_gra <al...@gmail.com>.
Hi All,

Experiencing same problem as Žygimantas with Nutch 2.1 and Cassandra (with
HBase everything works OK).

Here are some details of my setup:

Node1 - NameNode, SecondraryNameNode, JobTracker
Node2..Node4 - TaskTraker, DataNode, Cassandra

All these are virtual machines.
CPU is reported as "Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz", 4Gb RAM.

Running Nutch using 
hadoop jar $JAR org.apache.nutch.crawl.Crawler /seeds -threads 10 -numTasks
3 -depth 2 -topN 10000

Getting one mapper for parse job and very slow parsing of individual pages.

Getting lots of errors like this:

2013-02-16 01:26:04,217 WARN org.apache.nutch.parse.ParseUtil: Error parsing
http://someurl.net/ with org.apache.nutch.parse.html.HtmlParser@63a1bc40
java.util.concurrent.TimeoutException
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258)
	at java.util.concurrent.FutureTask.get(FutureTask.java:119)
	at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:148)
	at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:129)
	at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:176)
	at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:129)
	at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:78)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:416)
	at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)

Any suggestions how to diagnose why it is behaving this way?

Thanks!



--
View this message in context: http://lucene.472066.n3.nabble.com/Slow-parse-on-hadoop-tp4040215p4040897.html
Sent from the Nutch - User mailing list archive at Nabble.com.