You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Walter Tietze (JIRA)" <ji...@apache.org> on 2012/11/23 16:18:58 UTC

[jira] [Created] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines

Walter Tietze created NUTCH-1499:
------------------------------------

             Summary: Usage of multiple ipv4 addresses and network cards on fetcher machines
                 Key: NUTCH-1499
                 URL: https://issues.apache.org/jira/browse/NUTCH-1499
             Project: Nutch
          Issue Type: New Feature
          Components: fetcher
    Affects Versions: 1.5.1
            Reporter: Walter Tietze
            Priority: Minor


Adds for the fetcher threads the ability to use multiple configured ipv4 addresses.

On some cluster machines there are several ipv4 addresses configured where each ip address is associated with its own network interface.

This patch enables to configure the protocol-http and the protocol-httpclient  to use these network interfaces in a round robin style.

If the feature is enabled, a helper class reads at *startup* the network configuration. In each http network connection the next ip address is taken. This method is synchronized, but this should be no bottleneck for the overall performance of the fetcher threads.

This feature is tested on our cluster for the protocol-http and the protocol-httpclient protocol.


 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines

Posted by "Walter Tietze (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506650#comment-13506650 ] 

Walter Tietze commented on NUTCH-1499:
--------------------------------------

@Sebastian,

please don't mind, I'm not answering until now. 

In our cluster we also use the bonding driver. I asked the networkers of our partners already why they don't wanted to use this kind of configuration and still wait for their response.

If I get the good reasons or not, I will inform you at once!


                
> Usage of multiple ipv4 addresses and network cards on fetcher machines
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-1499
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1499
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 1.5.1
>            Reporter: Walter Tietze
>            Priority: Minor
>         Attachments: apache-nutch-1.5.1.NUTCH-1499.patch
>
>
> Adds for the fetcher threads the ability to use multiple configured ipv4 addresses.
> On some cluster machines there are several ipv4 addresses configured where each ip address is associated with its own network interface.
> This patch enables to configure the protocol-http and the protocol-httpclient  to use these network interfaces in a round robin style.
> If the feature is enabled, a helper class reads at *startup* the network configuration. In each http network connection the next ip address is taken. This method is synchronized, but this should be no bottleneck for the overall performance of the fetcher threads.
> This feature is tested on our cluster for the protocol-http and the protocol-httpclient protocol.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines

Posted by "Walter Tietze (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Walter Tietze updated NUTCH-1499:
---------------------------------

    Attachment: apache-nutch-1.5.1.NUTCH-1499.patch

Patch for using multiple ip addresses and network interfaces.
                
> Usage of multiple ipv4 addresses and network cards on fetcher machines
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-1499
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1499
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 1.5.1
>            Reporter: Walter Tietze
>            Priority: Minor
>         Attachments: apache-nutch-1.5.1.NUTCH-1499.patch
>
>
> Adds for the fetcher threads the ability to use multiple configured ipv4 addresses.
> On some cluster machines there are several ipv4 addresses configured where each ip address is associated with its own network interface.
> This patch enables to configure the protocol-http and the protocol-httpclient  to use these network interfaces in a round robin style.
> If the feature is enabled, a helper class reads at *startup* the network configuration. In each http network connection the next ip address is taken. This method is synchronized, but this should be no bottleneck for the overall performance of the fetcher threads.
> This feature is tested on our cluster for the protocol-http and the protocol-httpclient protocol.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines

Posted by "Sebastian Nagel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504136#comment-13504136 ] 

Sebastian Nagel commented on NUTCH-1499:
----------------------------------------

Short and precise patch. However, is there a reason why the problem is not solved on hardware or system level, cf. [[bonding|http://www.linuxfoundation.org/collaborate/workgroups/networking/bonding]]?
                
> Usage of multiple ipv4 addresses and network cards on fetcher machines
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-1499
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1499
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 1.5.1
>            Reporter: Walter Tietze
>            Priority: Minor
>         Attachments: apache-nutch-1.5.1.NUTCH-1499.patch
>
>
> Adds for the fetcher threads the ability to use multiple configured ipv4 addresses.
> On some cluster machines there are several ipv4 addresses configured where each ip address is associated with its own network interface.
> This patch enables to configure the protocol-http and the protocol-httpclient  to use these network interfaces in a round robin style.
> If the feature is enabled, a helper class reads at *startup* the network configuration. In each http network connection the next ip address is taken. This method is synchronized, but this should be no bottleneck for the overall performance of the fetcher threads.
> This feature is tested on our cluster for the protocol-http and the protocol-httpclient protocol.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1499) Usage of multiple ipv4 addresses and network cards on fetcher machines

Posted by "Sebastian Nagel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507944#comment-13507944 ] 

Sebastian Nagel commented on NUTCH-1499:
----------------------------------------

Thanks! That's a plausible reason: (let's call it) "administrative constraints".
+1 (lean patch, look's good, I'll try to test it on a machine with suitable network settings)
                
> Usage of multiple ipv4 addresses and network cards on fetcher machines
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-1499
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1499
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 1.5.1
>            Reporter: Walter Tietze
>            Priority: Minor
>         Attachments: apache-nutch-1.5.1.NUTCH-1499.patch
>
>
> Adds for the fetcher threads the ability to use multiple configured ipv4 addresses.
> On some cluster machines there are several ipv4 addresses configured where each ip address is associated with its own network interface.
> This patch enables to configure the protocol-http and the protocol-httpclient  to use these network interfaces in a round robin style.
> If the feature is enabled, a helper class reads at *startup* the network configuration. In each http network connection the next ip address is taken. This method is synchronized, but this should be no bottleneck for the overall performance of the fetcher threads.
> This feature is tested on our cluster for the protocol-http and the protocol-httpclient protocol.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira