You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2011/08/31 14:43:11 UTC

[jira] [Created] (NUTCH-1102) Fetcher -noParse | -parse switches

Fetcher -noParse | -parse switches
----------------------------------

                 Key: NUTCH-1102
                 URL: https://issues.apache.org/jira/browse/NUTCH-1102
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 1.3
            Reporter: Markus Jelsma
            Priority: Minor
             Fix For: 1.4


The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.

How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.

I propose to get rid of the command option and rely on the configuration directive alone.

Please comment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse directive only

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142051#comment-13142051 ] 

Hudson commented on NUTCH-1102:
-------------------------------

Integrated in Nutch-nutchgora #55 (See [https://builds.apache.org/job/Nutch-nutchgora/55/])
    NUTCH-1191 Port NUTCH-1102 to nutchgora - consistent use of fetcher.parse

ferdy : http://svn.apache.org/viewvc/nutch/branches/nutchgora/viewvc/?view=rev&root=&revision=1196516
Files : 
* /nutch/branches/nutchgora/CHANGES.txt
* /nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/Crawler.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/fetcher/FetcherJob.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/metadata/Nutch.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/tools/Benchmark.java

                
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
>                 Key: NUTCH-1102
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1102
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (NUTCH-1102) Fetcher, rely on fetcher.parse directive only

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma resolved NUTCH-1102.
----------------------------------

    Resolution: Fixed

Committed for 1.4 in rev. 1170526.

> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
>                 Key: NUTCH-1102
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1102
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse directive only

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097948#comment-13097948 ] 

Julien Nioche commented on NUTCH-1102:
--------------------------------------

@Markus : in the future maybe try and have a patch related to a given issue only? It does not help reviewing otherwise + we might get confused as to which one is the correct one between this and the one in NUTCH-1067 

Apart from that I don't mind if the command line option is removed as it can be specified with '-D fetcher.parse=true' anyway, but we it should not be too difficult to fix it instead so that we can use either. This would have the advantage of not forcing incompatible changes to the users.

> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
>                 Key: NUTCH-1102
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1102
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse directive only

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142058#comment-13142058 ] 

Hudson commented on NUTCH-1102:
-------------------------------

Integrated in Nutch-nutchgora-ant #9 (See [https://builds.apache.org/job/Nutch-nutchgora-ant/9/])
    NUTCH-1191 Port NUTCH-1102 to nutchgora - consistent use of fetcher.parse

ferdy : http://svn.apache.org/viewvc/nutch/branches/nutchgora/viewvc/?view=rev&root=&revision=1196516
Files : 
* /nutch/branches/nutchgora/CHANGES.txt
* /nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/Crawler.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/fetcher/FetcherJob.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/metadata/Nutch.java
* /nutch/branches/nutchgora/src/java/org/apache/nutch/tools/Benchmark.java

                
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
>                 Key: NUTCH-1102
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1102
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse directive only

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142818#comment-13142818 ] 

Hudson commented on NUTCH-1102:
-------------------------------

Integrated in Nutch-nutchgora-ant #10 (See [https://builds.apache.org/job/Nutch-nutchgora-ant/10/])
    NUTCH-1191 Port NUTCH-1102 to nutchgora - consistent use of fetcher.parse (forget to commit test)

ferdy : http://svn.apache.org/viewvc/nutch/branches/nutchgora/viewvc/?view=rev&root=&revision=1196551
Files : 
* /nutch/branches/nutchgora/src/test/org/apache/nutch/fetcher/TestFetcher.java

                
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
>                 Key: NUTCH-1102
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1102
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (NUTCH-1102) Fetcher, rely on fetcher.parse directive only

Posted by "Markus Jelsma (Closed) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma closed NUTCH-1102.
--------------------------------


Bulk close of resolved issues of 1.4. bulkclose-1.4-20111220
                
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
>                 Key: NUTCH-1102
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1102
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1102) Fetcher, rely on fetcher.parse directive only

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1102:
---------------------------------

    Patch Info: [Patch Available]
      Assignee: Markus Jelsma
       Summary: Fetcher, rely on fetcher.parse directive only  (was: Fetcher -noParse | -parse switches)

I've a patch but it also includes code from NUTCH-1067. I'd prefer to include both patches at once. 
Julien, any sight on progress with 1067?

Cheers

> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
>                 Key: NUTCH-1102
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1102
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse directive only

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142813#comment-13142813 ] 

Hudson commented on NUTCH-1102:
-------------------------------

Integrated in Nutch-nutchgora #56 (See [https://builds.apache.org/job/Nutch-nutchgora/56/])
    NUTCH-1191 Port NUTCH-1102 to nutchgora - consistent use of fetcher.parse (forget to commit test)

ferdy : http://svn.apache.org/viewvc/nutch/branches/nutchgora/viewvc/?view=rev&root=&revision=1196551
Files : 
* /nutch/branches/nutchgora/src/test/org/apache/nutch/fetcher/TestFetcher.java

                
> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
>                 Key: NUTCH-1102
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1102
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1102) Fetcher, rely on fetcher.parse directive only

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1102:
---------------------------------

    Attachment: NUTCH-1102-1.4-1.patch

Here's the patch. Minor changes are made in Fetcher.fetch() and Fetcher.run(). Simply a case of removing all parsing method arguments and removing setting of fetcher.parse directives in jobconf. Now we rely only on nutch-site for this directive.

Original code could never set parsing to true.

> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
>                 Key: NUTCH-1102
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1102
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse directive only

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095431#comment-13095431 ] 

Lewis John McGibbney commented on NUTCH-1102:
---------------------------------------------

This issue is quite a peculiar one. You can't help but wonder how the code ended up offering arguments for the fetch command which do not work.

Non-the-less I will get the tested and comment in due course. Thank you

> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
>                 Key: NUTCH-1102
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1102
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1102) Fetcher, rely on fetcher.parse directive only

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097953#comment-13097953 ] 

Markus Jelsma commented on NUTCH-1102:
--------------------------------------

Yes i know. Problem was that there were so many patches for the fetcher at the same time. Very tricky.

Well, i can, of course, change the code again to rely on the config option only. The question is, what to do: right now nothing works anyway since the change of NUTCH-872 broke it, it not compatible with anything right now.

Of course, i prefer not having to rewrite again to support config option only ;)

> Fetcher, rely on fetcher.parse directive only
> ---------------------------------------------
>
>                 Key: NUTCH-1102
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1102
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: NUTCH-1102-1.4-1.patch
>
>
> The fetcher in 1.3 still has the -noParse option but does not do anything. A -parse switch (NUTCH-872) is ignored, it seems my build wasn't messed up afterall. The fetcher.parse configuration directive is also ignored. In short, Nutch 1.3 cannot parse fetched data immediately regardless of configuration and options.
> How to procede? It makes little sense to have both the command option and the configuration directive, it raises the question of authority and adds unnecessary confusion.
> I propose to get rid of the command option and rely on the configuration directive alone.
> Please comment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira