You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by sujen1412 <gi...@git.apache.org> on 2015/09/15 18:16:04 UTC

[GitHub] nutch pull request: Fix for NUTCH-2099 Contributed by Sujen Shah

GitHub user sujen1412 opened a pull request:

    https://github.com/apache/nutch/pull/59

    Fix for NUTCH-2099 Contributed by Sujen Shah

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sujen1412/nutch NUTCH-2099

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nutch/pull/59.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #59
    
----
commit 9daf998ae1c9b1ededf22a169af8db699ffdf1e0
Author: Sujen Shah <su...@gmail.com>
Date:   2015-09-15T16:01:37Z

    Refactoring REST endpoints for integration with webui

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nutch pull request: Fix for NUTCH-2099 Contributed by Sujen Shah

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/59#discussion_r39839080
  
    --- Diff: src/java/org/apache/nutch/metadata/Nutch.java ---
    @@ -80,4 +80,11 @@
     	public static final String STAT_PROGRESS = "progress";
     	/**Used by Nutch REST service */
     	public static final String CRAWL_ID_KEY = "storage.crawl.id";
    +	
    +	public static final String ARG_SEEDDIR = "url_dir";
    +	public static final String ARG_CRAWLDB = "crawldb";
    --- End diff --
    
    @sujen1412 any comments on augmenting trivial Javadoc?
    It makes a huge difference when documented.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nutch pull request: Fix for NUTCH-2099 Contributed by Sujen Shah

Posted by sujen1412 <gi...@git.apache.org>.
Github user sujen1412 commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/59#discussion_r39821056
  
    --- Diff: src/java/org/apache/nutch/crawl/CrawlDb.java ---
    @@ -236,10 +237,10 @@ public int run(String[] args) throws Exception {
        * Used for Nutch REST service
        */
       @Override
    -  public Map<String, Object> run(Map<String, String> args, String crawlId) throws Exception {
    +  public Map<String, Object> run(Map<String, Object> args, String crawlId) throws Exception {
    --- End diff --
    
    @lewismc, making it object allows me to parse multiple inputs in the Map args(for ex- segments in the updatedb job) as an Arraylist instead of string parsing. Also this change the 1x code similar to 2x and also makes porting the webui easier as it expects an Object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nutch pull request: Fix for NUTCH-2099 Contributed by Sujen Shah

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/59#discussion_r39662837
  
    --- Diff: src/java/org/apache/nutch/metadata/Nutch.java ---
    @@ -80,4 +80,11 @@
     	public static final String STAT_PROGRESS = "progress";
     	/**Used by Nutch REST service */
     	public static final String CRAWL_ID_KEY = "storage.crawl.id";
    +	
    +	public static final String ARG_SEEDDIR = "url_dir";
    +	public static final String ARG_CRAWLDB = "crawldb";
    --- End diff --
    
    As static finals above, it would be nice to augment the additions here with some Javadoc. They are public of course.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nutch pull request: Fix for NUTCH-2099 Contributed by Sujen Shah

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/59#discussion_r39662639
  
    --- Diff: src/java/org/apache/nutch/crawl/CrawlDb.java ---
    @@ -261,30 +262,68 @@ public int run(String[] args) throws Exception {
           additionsAllowed = false;
         }
         
    -    String crawldb = crawlId+"/crawldb";
    -    String segment_dir = crawlId+"/segments";
    -    File segmentsDir = new File(segment_dir);
    -    File[] segmentsList = segmentsDir.listFiles();  
    -    Arrays.sort(segmentsList, new Comparator<File>(){
    -      @Override
    -      public int compare(File f1, File f2) {
    -        if(f1.lastModified()>f2.lastModified())
    -          return -1;
    -        else
    -          return 0;
    -      }      
    -    });
    +    Path crawlDb;
    +    if(args.containsKey(Nutch.ARG_CRAWLDB)) {
    +    	Object crawldbPath = args.get(Nutch.ARG_CRAWLDB);
    +    	if(crawldbPath instanceof Path) {
    +    		crawlDb = (Path) crawldbPath;
    --- End diff --
    
    Formatting all of your code please.
    Julien recently committed the eclipse codeformatter to trunk
    https://github.com/apache/nutch/blob/trunk/eclipse-codeformat.xml
    If you are using some other IDE general rule is 2 space indents.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nutch pull request: Fix for NUTCH-2099 Contributed by Sujen Shah

Posted by sujen1412 <gi...@git.apache.org>.
Github user sujen1412 commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/59#discussion_r39897798
  
    --- Diff: src/java/org/apache/nutch/metadata/Nutch.java ---
    @@ -80,4 +80,11 @@
     	public static final String STAT_PROGRESS = "progress";
     	/**Used by Nutch REST service */
     	public static final String CRAWL_ID_KEY = "storage.crawl.id";
    +	
    +	public static final String ARG_SEEDDIR = "url_dir";
    +	public static final String ARG_CRAWLDB = "crawldb";
    --- End diff --
    
    Hi @lewismc I have added some documentation to the new introduced metadata. Please let me know if its proper. Thanks :)  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nutch pull request: Fix for NUTCH-2099 Contributed by Sujen Shah

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/59#discussion_r39839001
  
    --- Diff: src/java/org/apache/nutch/crawl/CrawlDb.java ---
    @@ -261,30 +262,68 @@ public int run(String[] args) throws Exception {
           additionsAllowed = false;
         }
         
    -    String crawldb = crawlId+"/crawldb";
    -    String segment_dir = crawlId+"/segments";
    -    File segmentsDir = new File(segment_dir);
    -    File[] segmentsList = segmentsDir.listFiles();  
    -    Arrays.sort(segmentsList, new Comparator<File>(){
    -      @Override
    -      public int compare(File f1, File f2) {
    -        if(f1.lastModified()>f2.lastModified())
    -          return -1;
    -        else
    -          return 0;
    -      }      
    -    });
    +    Path crawlDb;
    +    if(args.containsKey(Nutch.ARG_CRAWLDB)) {
    +    	Object crawldbPath = args.get(Nutch.ARG_CRAWLDB);
    +    	if(crawldbPath instanceof Path) {
    +    		crawlDb = (Path) crawldbPath;
    --- End diff --
    
    no probs.
    We can always sort it out pre-release. It can be a final ticket to format code.
    Thank you for attention to detail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nutch pull request: Fix for NUTCH-2099 Contributed by Sujen Shah

Posted by sujen1412 <gi...@git.apache.org>.
Github user sujen1412 commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/59#discussion_r39897636
  
    --- Diff: src/java/org/apache/nutch/crawl/CrawlDb.java ---
    @@ -261,30 +262,68 @@ public int run(String[] args) throws Exception {
           additionsAllowed = false;
         }
         
    -    String crawldb = crawlId+"/crawldb";
    -    String segment_dir = crawlId+"/segments";
    -    File segmentsDir = new File(segment_dir);
    -    File[] segmentsList = segmentsDir.listFiles();  
    -    Arrays.sort(segmentsList, new Comparator<File>(){
    -      @Override
    -      public int compare(File f1, File f2) {
    -        if(f1.lastModified()>f2.lastModified())
    -          return -1;
    -        else
    -          return 0;
    -      }      
    -    });
    +    Path crawlDb;
    +    if(args.containsKey(Nutch.ARG_CRAWLDB)) {
    +    	Object crawldbPath = args.get(Nutch.ARG_CRAWLDB);
    +    	if(crawldbPath instanceof Path) {
    +    		crawlDb = (Path) crawldbPath;
    --- End diff --
    
    @lewismc, I have corrected the formatting in the new commit. Thanks :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nutch pull request: Fix for NUTCH-2099 Contributed by Sujen Shah

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/59#discussion_r39838934
  
    --- Diff: src/java/org/apache/nutch/crawl/CrawlDb.java ---
    @@ -236,10 +237,10 @@ public int run(String[] args) throws Exception {
        * Used for Nutch REST service
        */
       @Override
    -  public Map<String, Object> run(Map<String, String> args, String crawlId) throws Exception {
    +  public Map<String, Object> run(Map<String, Object> args, String crawlId) throws Exception {
    --- End diff --
    
    ack


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nutch pull request: Fix for NUTCH-2099 Contributed by Sujen Shah

Posted by lewismc <gi...@git.apache.org>.
Github user lewismc commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/59#discussion_r39662447
  
    --- Diff: src/java/org/apache/nutch/crawl/CrawlDb.java ---
    @@ -236,10 +237,10 @@ public int run(String[] args) throws Exception {
        * Used for Nutch REST service
        */
       @Override
    -  public Map<String, Object> run(Map<String, String> args, String crawlId) throws Exception {
    +  public Map<String, Object> run(Map<String, Object> args, String crawlId) throws Exception {
    --- End diff --
    
    Why change to Object?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nutch pull request: Fix for NUTCH-2099 Contributed by Sujen Shah

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nutch/pull/59


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nutch pull request: Fix for NUTCH-2099 Contributed by Sujen Shah

Posted by sujen1412 <gi...@git.apache.org>.
Github user sujen1412 commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/59#discussion_r39821072
  
    --- Diff: src/java/org/apache/nutch/crawl/CrawlDb.java ---
    @@ -261,30 +262,68 @@ public int run(String[] args) throws Exception {
           additionsAllowed = false;
         }
         
    -    String crawldb = crawlId+"/crawldb";
    -    String segment_dir = crawlId+"/segments";
    -    File segmentsDir = new File(segment_dir);
    -    File[] segmentsList = segmentsDir.listFiles();  
    -    Arrays.sort(segmentsList, new Comparator<File>(){
    -      @Override
    -      public int compare(File f1, File f2) {
    -        if(f1.lastModified()>f2.lastModified())
    -          return -1;
    -        else
    -          return 0;
    -      }      
    -    });
    +    Path crawlDb;
    +    if(args.containsKey(Nutch.ARG_CRAWLDB)) {
    +    	Object crawldbPath = args.get(Nutch.ARG_CRAWLDB);
    +    	if(crawldbPath instanceof Path) {
    +    		crawlDb = (Path) crawldbPath;
    --- End diff --
    
    I am using eclipse and I did set the formatting, I don't know why this happened. Will take care from now on. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nutch pull request: Fix for NUTCH-2099 Contributed by Sujen Shah

Posted by chrismattmann <gi...@git.apache.org>.
Github user chrismattmann commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/59#discussion_r39874200
  
    --- Diff: src/java/org/apache/nutch/crawl/CrawlDb.java ---
    @@ -236,10 +237,10 @@ public int run(String[] args) throws Exception {
        * Used for Nutch REST service
        */
       @Override
    -  public Map<String, Object> run(Map<String, String> args, String crawlId) throws Exception {
    +  public Map<String, Object> run(Map<String, Object> args, String crawlId) throws Exception {
    --- End diff --
    
    Thanks @sujen1412 for considering [nutch-python](https://github.com/chrismattmann/nutch-python) with this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---