You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2011/08/31 14:43:11 UTC
[jira] [Created] (NUTCH-1102) Fetcher -noParse | -parse switches
Fetcher -noParse | -parse switches
----------------------------------
Key: NUTCH-1102
URL: https://issues.apache.org/jira/browse/NUTCH-1102
Project: Nutch
Issue Type: Bug
Components: fetcher
Affects Versions: 1.3
Reporter: Markus Jelsma
Priority: Minor
Fix For: 1.4
The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
I propose to get rid of the command option and rely on the configuration directive alone.
Please comment.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse
directive only
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142051#comment-13142051 ]
Hudson commented on NUTCH-1102:
-------------------------------
Integrated in Nutch-nutchgora #55 (See [https://builds.apache.org/job/Nutch-nutchgora/55/])
NUTCH-1191 Port NUTCH-1102 to nutchgora - consistent use of fetcher.parse
ferdy : http://svn.apache.org/viewvc/nutch/branches/nutchgora/viewvc/?view=rev&root=&revision=1196516
Files :
* /nutch/branches/nutchgora/CHANGES.txt
* /nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/Crawler.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/fetcher/FetcherJob.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/metadata/Nutch.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/tools/Benchmark.java
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
> Key: NUTCH-1102
> URL: https://issues.apache.org/jira/browse/NUTCH-1102
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.3
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Fix For: 1.4
>
> Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1102) Fetcher, rely on fetcher.parse
directive only
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma resolved NUTCH-1102.
----------------------------------
Resolution: Fixed
Committed for 1.4 in rev. 1170526.
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
> Key: NUTCH-1102
> URL: https://issues.apache.org/jira/browse/NUTCH-1102
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.3
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Fix For: 1.4
>
> Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse
directive only
Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097948#comment-13097948 ]
Julien Nioche commented on NUTCH-1102:
--------------------------------------
@Markus : in the future maybe try and have a patch related to a given issue only? It does not help reviewing otherwise + we might get confused as to which one is the correct one between this and the one in NUTCH-1067
Apart from that I don't mind if the command line option is removed as it can be specified with '-D fetcher.parse=true' anyway, but we it should not be too difficult to fix it instead so that we can use either. This would have the advantage of not forcing incompatible changes to the users.
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
> Key: NUTCH-1102
> URL: https://issues.apache.org/jira/browse/NUTCH-1102
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.3
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Fix For: 1.4
>
> Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse
directive only
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142058#comment-13142058 ]
Hudson commented on NUTCH-1102:
-------------------------------
Integrated in Nutch-nutchgora-ant #9 (See [https://builds.apache.org/job/Nutch-nutchgora-ant/9/])
NUTCH-1191 Port NUTCH-1102 to nutchgora - consistent use of fetcher.parse
ferdy : http://svn.apache.org/viewvc/nutch/branches/nutchgora/viewvc/?view=rev&root=&revision=1196516
Files :
* /nutch/branches/nutchgora/CHANGES.txt
* /nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/Crawler.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/fetcher/FetcherJob.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/metadata/Nutch.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/tools/Benchmark.java
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
> Key: NUTCH-1102
> URL: https://issues.apache.org/jira/browse/NUTCH-1102
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.3
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Fix For: 1.4
>
> Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse
directive only
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142818#comment-13142818 ]
Hudson commented on NUTCH-1102:
-------------------------------
Integrated in Nutch-nutchgora-ant #10 (See [https://builds.apache.org/job/Nutch-nutchgora-ant/10/])
NUTCH-1191 Port NUTCH-1102 to nutchgora - consistent use of fetcher.parse (forget to commit test)
ferdy : http://svn.apache.org/viewvc/nutch/branches/nutchgora/viewvc/?view=rev&root=&revision=1196551
Files :
* /nutch/branches/nutchgora/src/test/org/apache/nutch/fetcher/TestFetcher.java
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
> Key: NUTCH-1102
> URL: https://issues.apache.org/jira/browse/NUTCH-1102
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.3
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Fix For: 1.4
>
> Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (NUTCH-1102) Fetcher, rely on fetcher.parse
directive only
Posted by "Markus Jelsma (Closed) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma closed NUTCH-1102.
--------------------------------
Bulk close of resolved issues of 1.4. bulkclose-1.4-20111220
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
> Key: NUTCH-1102
> URL: https://issues.apache.org/jira/browse/NUTCH-1102
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.3
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Fix For: 1.4
>
> Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1102) Fetcher, rely on fetcher.parse
directive only
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1102:
---------------------------------
Patch Info: [Patch Available]
Assignee: Markus Jelsma
Summary: Fetcher, rely on fetcher.parse directive only (was: Fetcher -noParse | -parse switches)
I've a patch but it also includes code from NUTCH-1067. I'd prefer to include both patches at once.
Julien, any sight on progress with 1067?
Cheers
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
> Key: NUTCH-1102
> URL: https://issues.apache.org/jira/browse/NUTCH-1102
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.3
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Fix For: 1.4
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse
directive only
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142813#comment-13142813 ]
Hudson commented on NUTCH-1102:
-------------------------------
Integrated in Nutch-nutchgora #56 (See [https://builds.apache.org/job/Nutch-nutchgora/56/])
NUTCH-1191 Port NUTCH-1102 to nutchgora - consistent use of fetcher.parse (forget to commit test)
ferdy : http://svn.apache.org/viewvc/nutch/branches/nutchgora/viewvc/?view=rev&root=&revision=1196551
Files :
* /nutch/branches/nutchgora/src/test/org/apache/nutch/fetcher/TestFetcher.java
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
> Key: NUTCH-1102
> URL: https://issues.apache.org/jira/browse/NUTCH-1102
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.3
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Fix For: 1.4
>
> Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1102) Fetcher, rely on fetcher.parse
directive only
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-1102:
---------------------------------
Attachment: NUTCH-1102-1.4-1.patch
Here's the patch. Minor changes are made in Fetcher.fetch() and Fetcher.run(). Simply a case of removing all parsing method arguments and removing setting of fetcher.parse directives in jobconf. Now we rely only on nutch-site for this directive.
Original code could never set parsing to true.
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
> Key: NUTCH-1102
> URL: https://issues.apache.org/jira/browse/NUTCH-1102
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.3
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Fix For: 1.4
>
> Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse
directive only
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095431#comment-13095431 ]
Lewis John McGibbney commented on NUTCH-1102:
---------------------------------------------
This issue is quite a peculiar one. You can't help but wonder how the code ended up offering arguments for the fetch command which do not work.
Non-the-less I will get the tested and comment in due course. Thank you
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
> Key: NUTCH-1102
> URL: https://issues.apache.org/jira/browse/NUTCH-1102
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.3
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Fix For: 1.4
>
> Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse
directive only
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097953#comment-13097953 ]
Markus Jelsma commented on NUTCH-1102:
--------------------------------------
Yes i know. Problem was that there were so many patches for the fetcher at the same time. Very tricky.
Well, i can, of course, change the code again to rely on the config option only. The question is, what to do: right now nothing works anyway since the change of NUTCH-872 broke it, it not compatible with anything right now.
Of course, i prefer not having to rewrite again to support config option only ;)
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
> Key: NUTCH-1102
> URL: https://issues.apache.org/jira/browse/NUTCH-1102
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.3
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Priority: Minor
> Fix For: 1.4
>
> Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira