You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Fuad Efendi (JIRA)" <ji...@apache.org> on 2005/10/15 21:25:44 UTC

[jira] Created: (NUTCH-113) Disable permanent DNS-to-IP caching for JVM 1.4

Disable permanent DNS-to-IP caching for JVM 1.4
-----------------------------------------------

         Key: NUTCH-113
         URL: http://issues.apache.org/jira/browse/NUTCH-113
     Project: Nutch
        Type: Improvement
    Versions: 0.8-dev, 0.7.2-dev    
    Reporter: Fuad Efendi
    Priority: Trivial


DNS-to-IP mapping may change during long crawls, by default JVM 1.4 caches it forever.

Some related discussions at Jakarta-HttpClient-User
http://mail-archives.apache.org/mod_mbox/jakarta-httpclient-user/200506.mbox/%3c20050627022440.SVIL13442.lakermmtao05.cox.net@zeus%3e

http://java.sun.com/j2se/1.4.2/docs/guide/net/properties.html
   networkaddress.cache.ttl (default: -1) 
   Specified in java.security to indicate the caching policy for successful name lookups from the name service.. The value is specified as as integer to indicate the number of seconds to cache the successful lookup. 
   A value of -1 indicates "cache forever". 


We probably need this code in org.apache.nutch.fetcher.Fetcher:

  private static final int FETCHER_DNS_TTL_MINUTES =
    NutchConf.get().getInt("fetcher.dns.ttl.minutes", 120);

  static {
    java.security.Security.setProperty("networkaddress.cache.ttl", "" + FETCHER_DNS_TTL_MINUTES*60);
  }


And, new property in nutch-default.xml:

<property>
  <name>fetcher.dns.ttl.minutes</name>
  <value>120</value>
  <description>DNS-to-IP cache, Time-to-Live</description>
</property>


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


why is segslice so slow?

Posted by EM <em...@cpuedge.com>.
segslice usually performs 200-300 records/sec on my machine (quite fast 
for everything else, top of the line).
Is it just copying the segments minus the last part or some processing 
is required for each record?

Any advise how can it be optimized?

[jira] Updated: (NUTCH-113) Disable permanent DNS-to-IP caching for JVM 1.4

Posted by "Fuad Efendi (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-113?page=all ]

Fuad Efendi updated NUTCH-113:
------------------------------

    Component: fetcher

> Disable permanent DNS-to-IP caching for JVM 1.4
> -----------------------------------------------
>
>          Key: NUTCH-113
>          URL: http://issues.apache.org/jira/browse/NUTCH-113
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Versions: 0.8-dev, 0.7.2-dev
>     Reporter: Fuad Efendi
>     Priority: Trivial

>
> DNS-to-IP mapping may change during long crawls, by default JVM 1.4 caches it forever.
> Some related discussions at Jakarta-HttpClient-User
> http://mail-archives.apache.org/mod_mbox/jakarta-httpclient-user/200506.mbox/%3c20050627022440.SVIL13442.lakermmtao05.cox.net@zeus%3e
> http://java.sun.com/j2se/1.4.2/docs/guide/net/properties.html
>    networkaddress.cache.ttl (default: -1) 
>    Specified in java.security to indicate the caching policy for successful name lookups from the name service.. The value is specified as as integer to indicate the number of seconds to cache the successful lookup. 
>    A value of -1 indicates "cache forever". 
> We probably need this code in org.apache.nutch.fetcher.Fetcher:
>   private static final int FETCHER_DNS_TTL_MINUTES =
>     NutchConf.get().getInt("fetcher.dns.ttl.minutes", 120);
>   static {
>     java.security.Security.setProperty("networkaddress.cache.ttl", "" + FETCHER_DNS_TTL_MINUTES*60);
>   }
> And, new property in nutch-default.xml:
> <property>
>   <name>fetcher.dns.ttl.minutes</name>
>   <value>120</value>
>   <description>DNS-to-IP cache, Time-to-Live</description>
> </property>

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira