You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Marek Bachmann (JIRA)" <ji...@apache.org> on 2011/08/24 16:10:29 UTC
[jira] [Created] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
LinkDb (invertlinks) should inform the user when it ignores internal links
--------------------------------------------------------------------------
Key: NUTCH-1090
URL: https://issues.apache.org/jira/browse/NUTCH-1090
Project: Nutch
Issue Type: Improvement
Components: linkdb
Affects Versions: 1.3
Reporter: Marek Bachmann
Priority: Trivial
Fix For: 1.3
I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090257#comment-13090257 ]
Markus Jelsma commented on NUTCH-1090:
--------------------------------------
You can patch o.a.n.crawl.LinkDB.configure() to log this information.
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.3
>
> Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
Posted by "Marek Bachmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marek Bachmann updated NUTCH-1090:
----------------------------------
Attachment: LinkDb.patch
Inserted a {{LOG.info}} command in the {{invert}} method when db.ignore.internal.links is set to true.
Added a constant value {{IGNORE_INTERNAL_LINKS}} for the {{"db.ignore.internal.links"}} string.
Moved the creation of the {{JobConf}} object at the top of the {{invert}} method
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.3
>
> Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
Posted by "Markus Jelsma (Assigned) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma reassigned NUTCH-1090:
------------------------------------
Assignee: Markus Jelsma
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Assignee: Markus Jelsma
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.5
>
> Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
Posted by "Julien Nioche (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Nioche updated NUTCH-1090:
---------------------------------
Fix Version/s: (was: 1.3)
1.5
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.5
>
> Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (NUTCH-1090) LinkDb (invertlinks)
should inform the user when it ignores internal links
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090267#comment-13090267 ]
Markus Jelsma edited comment on NUTCH-1090 at 8/24/11 2:48 PM:
---------------------------------------------------------------
Yes, the job object is created there. The can then be read like in the
configure method.
was (Author: markus17):
Yes, the job object is created there. The can then be read like in the
configure method.
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.3
>
> Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
Posted by "Marek Bachmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marek Bachmann updated NUTCH-1090:
----------------------------------
Attachment: (was: LinkDb.patch)
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.3
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
Posted by "Marek Bachmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090260#comment-13090260 ]
Marek Bachmann commented on NUTCH-1090:
---------------------------------------
Then I did it right. Thanks
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.3
>
> Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150414#comment-13150414 ]
Hudson commented on NUTCH-1090:
-------------------------------
Integrated in nutch-trunk-maven #26 (See [https://builds.apache.org/job/nutch-trunk-maven/26/])
NUTCH-1090 InvertLinks should inform when ignoring internal links
markus : http://svn.apache.org/viewvc/nutch/trunk/viewvc/?view=rev&root=&revision=1202143
Files :
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/crawl/LinkDb.java
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Assignee: Markus Jelsma
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.5
>
> Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
Posted by "Marek Bachmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marek Bachmann updated NUTCH-1090:
----------------------------------
Attachment: LinkDb.patch
I inserted a {{LOG.info}} message in the {{configure}} method.
I Don't think that this is the best place but the {{ignoreInternalLinks}} variable isn't set before this method was called
Hope the format of the patch file is correct. I never posted one before :)
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.3
>
> Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090267#comment-13090267 ]
Markus Jelsma commented on NUTCH-1090:
--------------------------------------
Yes, the job object is created there. The can then be read like in the
configure method.
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.3
>
> Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
Posted by "Markus Jelsma (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma resolved NUTCH-1090.
----------------------------------
Resolution: Fixed
Committed for 1.4 in rev. 1202143.
Thanks!
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Assignee: Markus Jelsma
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.5
>
> Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151040#comment-13151040 ]
Hudson commented on NUTCH-1090:
-------------------------------
Integrated in Nutch-trunk #1665 (See [https://builds.apache.org/job/Nutch-trunk/1665/])
NUTCH-1090 InvertLinks should inform when ignoring internal links
markus : http://svn.apache.org/viewvc/nutch/trunk/viewvc/?view=rev&root=&revision=1202143
Files :
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/crawl/LinkDb.java
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Assignee: Markus Jelsma
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.5
>
> Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Commented] (NUTCH-1090) LinkDb (invertlinks) should inform the user when it ignores internal links
Posted by Markus Jelsma <ma...@openindex.io>.
Yes, the job object is created there. The can then be read like in the
configure method.
On Wednesday 24 August 2011 16:40:29 Marek Bachmann (JIRA) wrote:
> [
> https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.p
> lugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090264#comm
> ent-13090264 ]
>
> Marek Bachmann commented on NUTCH-1090:
> ---------------------------------------
>
> Ok, I thought so too. But I was unsure that it is possible and how to read
> the conf from there. Will have a look at it.
>
> > LinkDb (invertlinks) should inform the user when it ignores internal
> > links
> > ------------------------------------------------------------------------
> > --
> >
> > Key: NUTCH-1090
> > URL: https://issues.apache.org/jira/browse/NUTCH-1090
> >
> > Project: Nutch
> >
> > Issue Type: Improvement
> > Components: linkdb
> >
> > Affects Versions: 1.3
> >
> > Reporter: Marek Bachmann
> > Priority: Trivial
> >
> > Labels: configuration, information, log
> >
> > Fix For: 1.3
> >
> > Attachments: LinkDb.patch
> >
> > I used nutch to crawl sites on a single domain. After the crawl was
> > complete I tried to build a LinkDb. The LinkDb was empty. It comes up
> > that this happens because the invertlinks command ignores internal links
> > to the same domain by default. Unfortunately the LinkDb class doesn't
> > tell anything about that. So it was hard to find out why the LinkDb was
> > empty. I suggest to add an information for the user when the invertlinks
> > command is ignoring internal links.
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
[jira] [Commented] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
Posted by "Marek Bachmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090264#comment-13090264 ]
Marek Bachmann commented on NUTCH-1090:
---------------------------------------
Ok, I thought so too. But I was unsure that it is possible and how to read the conf from there. Will have a look at it.
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.3
>
> Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1090) LinkDb (invertlinks) should inform
the user when it ignores internal links
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090261#comment-13090261 ]
Markus Jelsma commented on NUTCH-1090:
--------------------------------------
Looking at it i feel writing in the invert method is cleaner. You can read the configuration setting there as well.
> LinkDb (invertlinks) should inform the user when it ignores internal links
> --------------------------------------------------------------------------
>
> Key: NUTCH-1090
> URL: https://issues.apache.org/jira/browse/NUTCH-1090
> Project: Nutch
> Issue Type: Improvement
> Components: linkdb
> Affects Versions: 1.3
> Reporter: Marek Bachmann
> Priority: Trivial
> Labels: configuration, information, log
> Fix For: 1.3
>
> Attachments: LinkDb.patch
>
>
> I used nutch to crawl sites on a single domain. After the crawl was complete I tried to build a LinkDb. The LinkDb was empty.
> It comes up that this happens because the invertlinks command ignores internal links to the same domain by default.
> Unfortunately the LinkDb class doesn't tell anything about that. So it was hard to find out why the LinkDb was empty.
> I suggest to add an information for the user when the invertlinks command is ignoring internal links.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira