You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Eyeris Rodriguez Rueda <er...@uci.cu> on 2017/01/30 02:28:01 UTC
how to index response time for a url ?
Hi all.
I need to get and index response time for each url that nutch crawl.
I have added a responseTime field in solr for this value.
Is there any way to do this with configurations only or i need to do my own plugin to extract this key from crawl datum "_rs_" ?
Please some help about the steps will be apprecciated.
Im have configured http.store.responsetime property to true, what im missing ?.
This is my nutch-site.xml property
<property>
<name>http.store.responsetime</name>
<value>true</value>
<description>Enables us to record the response time of the
host which is the time period between start connection to end
connection of a pages host. The response time in milliseconds
is stored in CrawlDb in CrawlDatum's meta data under key "_rs_"
</description>
</property>
after i have put the key but when i do parsechecker i don´t see data related to responseTime in the output.
<property>
<name>db.parsemeta.to.crawldb</name>
<value>"_rs_"</value>
<description>Comma-separated list of parse metadata keys to transfer to the crawldb (NUTCH-779).
Assuming for instance that the languageidentifier plugin is enabled, setting the value to 'lang'
will copy both the key 'lang' and its value to the corresponding entry in the crawldb.
</description>
</property>
La @universidad_uci es Fidel. Los j�venes no fallaremos.
#HastaSiempreComandante
#HastalaVictoriaSiempre
Re: [MASSMAIL]how to index response time for a url ?
Posted by Eyeris Rodriguez Rueda <er...@uci.cu>.
Please any body can help me or not?
this is only happening to me ?
----- Mensaje original -----
De: "Eyeris Rodriguez Rueda" <er...@uci.cu>
Para: user@nutch.apache.org
Enviados: Domingo, 29 de Enero 2017 22:28:01
Asunto: [MASSMAIL]how to index response time for a url ?
Hi all.
I need to get and index response time for each url that nutch crawl.
I have added a responseTime field in solr for this value.
Is there any way to do this with configurations only or i need to do my own plugin to extract this key from crawl datum "_rs_" ?
Please some help about the steps will be apprecciated.
Im have configured http.store.responsetime property to true, what im missing ?.
This is my nutch-site.xml property
<property>
<name>http.store.responsetime</name>
<value>true</value>
<description>Enables us to record the response time of the
host which is the time period between start connection to end
connection of a pages host. The response time in milliseconds
is stored in CrawlDb in CrawlDatum's meta data under key "_rs_"
</description>
</property>
after i have put the key but when i do parsechecker i don´t see data related to responseTime in the output.
<property>
<name>db.parsemeta.to.crawldb</name>
<value>"_rs_"</value>
<description>Comma-separated list of parse metadata keys to transfer to the crawldb (NUTCH-779).
Assuming for instance that the languageidentifier plugin is enabled, setting the value to 'lang'
will copy both the key 'lang' and its value to the corresponding entry in the crawldb.
</description>
</property>
La @universidad_uci es Fidel. Los jóvenes no fallaremos.
#HastaSiempreComandante
#HastalaVictoriaSiempre
The University of Informatics Sciences invites you to participate in the
Scientific Conference UCIENCIA 2016, November 24-26.
Conferencia Científica UCIENCIA 2016, del 24 al 26 de moviembre.
http://uciencia.eventos.uci.cu/
La @universidad_uci es Fidel. Los j�venes no fallaremos.
#HastaSiempreComandante
#HastalaVictoriaSiempre