You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Eyeris Rodriguez Rueda <er...@uci.cu> on 2017/01/30 02:28:01 UTC

how to index response time for a url ?

Hi all.
I need to get and index response time for each url that nutch crawl.
I have added a responseTime field in solr for this value.

Is there any way to do this with configurations only or i need to do my own plugin to extract this key from crawl datum &quot;_rs_&quot; ? 
Please some help about the steps will be apprecciated.


Im have configured http.store.responsetime property to true, what im missing ?.



This is my nutch-site.xml property

<property>
  <name>http.store.responsetime</name>
  <value>true</value>
  <description>Enables us to record the response time of the 
  host which is the time period between start connection to end 
  connection of a pages host. The response time in milliseconds
  is stored in CrawlDb in CrawlDatum's meta data under key &quot;_rs_&quot;
  </description>
</property>

after i have put the key but when i do parsechecker i don´t see data related to responseTime in the output.

<property>
  <name>db.parsemeta.to.crawldb</name>
  <value>&quot;_rs_&quot;</value>
  <description>Comma-separated list of parse metadata keys to transfer to the crawldb (NUTCH-779).
   Assuming for instance that the languageidentifier plugin is enabled, setting the value to 'lang' 
   will copy both the key 'lang' and its value to the corresponding entry in the crawldb.
  </description>
</property>
La @universidad_uci es Fidel. Los j�venes no fallaremos.
#HastaSiempreComandante
#HastalaVictoriaSiempre


Re: [MASSMAIL]how to index response time for a url ?

Posted by Eyeris Rodriguez Rueda <er...@uci.cu>.
Please any body can help me or not?
this is only happening to me ?

----- Mensaje original -----
De: "Eyeris Rodriguez Rueda" <er...@uci.cu>
Para: user@nutch.apache.org
Enviados: Domingo, 29 de Enero 2017 22:28:01
Asunto: [MASSMAIL]how to index response time for a url ?

Hi all.
I need to get and index response time for each url that nutch crawl.
I have added a responseTime field in solr for this value.

Is there any way to do this with configurations only or i need to do my own plugin to extract this key from crawl datum &quot;_rs_&quot; ? 
Please some help about the steps will be apprecciated.


Im have configured http.store.responsetime property to true, what im missing ?.



This is my nutch-site.xml property

<property>
  <name>http.store.responsetime</name>
  <value>true</value>
  <description>Enables us to record the response time of the 
  host which is the time period between start connection to end 
  connection of a pages host. The response time in milliseconds
  is stored in CrawlDb in CrawlDatum's meta data under key &quot;_rs_&quot;
  </description>
</property>

after i have put the key but when i do parsechecker i don´t see data related to responseTime in the output.

<property>
  <name>db.parsemeta.to.crawldb</name>
  <value>&quot;_rs_&quot;</value>
  <description>Comma-separated list of parse metadata keys to transfer to the crawldb (NUTCH-779).
   Assuming for instance that the languageidentifier plugin is enabled, setting the value to 'lang' 
   will copy both the key 'lang' and its value to the corresponding entry in the crawldb.
  </description>
</property>
La @universidad_uci es Fidel. Los jóvenes no fallaremos.
#HastaSiempreComandante
#HastalaVictoriaSiempre

The University of Informatics Sciences invites you to participate in the
Scientific Conference UCIENCIA 2016, November 24-26.
Conferencia Científica UCIENCIA 2016, del 24 al 26 de moviembre.
http://uciencia.eventos.uci.cu/
La @universidad_uci es Fidel. Los j�venes no fallaremos.
#HastaSiempreComandante
#HastalaVictoriaSiempre