You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Ing. Eyeris Rodriguez Rueda" <er...@uci.cu> on 2012/05/11 19:55:47 UTC

Re: Indexing HTML metatags from Nutch into Solr

Hello, I am using index-metatags plugins(I supose that you have index-metatags plugins on nutch's plugins folder).
Fist you need to include on nutch-site some like this
|index-(basic|anchor|metatags|more)|
also you need to include the metadata names that you want to index(in this file also):
<property>
	<name>metatags.names</name>
	<value>category;keywords;author;comments;description;subject;last_modified</value>
	<description>For plugin index-metatags: Indicate here the name of the
	html meta tag that should be
	parsed. Use a semicolon separated list if you want multiple
	tags, or use '*' to index all.
	Example: description;keywords;role
</description>
</property>
>I have only this(category;keywords;author;comments;description;subject;last_modified).
after you have to configure your solrindex-mapping like this:
<field dest="subject" source="subject" />
<field dest="description" source="description" />
<field dest="comments" source="comments" />
<field dest="author" source="author"/>
<field dest="keywords" source="keywords" />
<field dest="category" source="category" />
<field dest="lastModified" source="lastModified"/> 

I suggest clean your segments and solr index and reindex again.
I think that your problem will be solved with this.

****************************************************************************************

----- Mensaje original -----
De: "ML mail" <ml...@yahoo.com>
Para: user@nutch.apache.org
Enviados: Viernes, 11 de Mayo 2012 6:40:36
Asunto: Indexing HTML metatags from Nutch into Solr 

Hello,

I am using Nutch 1.4 with Solr 3.6.0 and would like to get the HTML keywords and description metatags indexed into Solr. On the Nutch side I have followed thehttp://wiki.apache.org/nutch/IndexMetatags to get nutch parsing the extracting the metatags (using index-metatags and parse-metatags plugins) but now when I run the solrindex they simply don't get indexed. 

In Solr I am using the schema.xml provided by Nutch and have added the following fields for the metatags:
 
        <!-- fields for the metatags plugin -->
        <field name="metatag.description" type="text" stored="true" indexed="true"/>
        <field name="metatag.keywords" type="text" stored="true" indexed="true"/>

and have created a solrindex-mapping.xml file as follow:

<mapping>
<fields>
<field dest="description" source="metatag.description"/>
<field dest="keywords" source="metatag.keywords"/>
</fields>
</mapping>

the rest is pretty much a default install of Solr. So now my question is why can't I see the metatags indexed in solr? Did I forget maybe to configure something in Solr?

Any suggestions are welcome.

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

RE: Indexing HTML metatags from Nutch into Solr

Posted by "Ing. Eyeris Rodriguez Rueda" <er...@uci.cu>.
Hi ML, this is the configuration for index-metatags plugins

In your schema.xml(this file is the same in solr and nutch)
<field name="keywords" type="text" indexed="true" stored="true"/>
<field name="description" type="text" indexed="true" stored="true"/>
<field name="lastModified" type="date" stored="true" indexed="true"/>

In nutch-site.xml you need to put some like this:
Look name and value(not put)
<property>
    <name>metatags.names</name>
    <value>keywords;description;last_modified</
value>
    <description>For plugin index-metatags: Indicate here the name of the
    html meta tag that should be
    parsed. Use a semicolon separated list if you want multiple
    tags, or use '*' to index all.
    Example: description;keywords;role
</description>
</property>

after you have to configure your solrindex-mapping like this:
<field dest="description" source="description" /> 
<field dest="keywords" source="keywords" />
<field dest="lastModified" source="lastModified"/>


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Indexing HTML metatags from Nutch into Solr

Posted by ML mail <ml...@yahoo.com>.
I was wondering if adding the following 2 fields to Solr's schema file is enough:

<!-- fields for the metatags plugin -->
<field name="metatag.description" type="text" stored="true" indexed="true"/>
<field name="metatag.keywords" type="text" stored="true" indexed="true"/>


or is there anything else I need to configure/do on the Apache Solr side?

Regards



----- Original Message -----
From: Ing. Eyeris Rodriguez Rueda <er...@uci.cu>
To: 'ML mail' <ml...@yahoo.com>; user@nutch.apache.org
Cc: 
Sent: Sunday, May 13, 2012 5:31 PM
Subject: RE: Indexing HTML metatags from Nutch into Solr

Hi.
Im not sure to understand, If you need to see the metatags indexed you will
see your solr administration page, if you are using jetty
http://localhost:8989/solr/admin/schema.jsp or
http://localhost:8080/solr/admin/schema.jsp if is tomcat deployment. In this
page you will see the fields supported by solr. Is possible that no contain
any document because you need to reindex again like I said.
Tell me if you understand.


_____________________________________________________________________
Ing. Eyeris Rodriguez Rueda
Teléfono:837-3370
Universidad de las Ciencias Informáticas
_____________________________________________________________________

-----Mensaje original-----
De: ML mail [mailto:mlnospam@yahoo.com] 
Enviado el: domingo, 13 de mayo de 2012 7:02 AM
Para: Ing. Eyeris Rodriguez Rueda; user@nutch.apache.org
Asunto: Re: Indexing HTML metatags from Nutch into Solr

I will then try deactivating the parse-metatags plugin.... 

Btw do you or anyone know what modifications exactly are required on side of
Apache Solr to get the metatags working?

Regards



----- Original Message -----
From: Ing. Eyeris Rodriguez Rueda <er...@uci.cu>
To: ML mail <ml...@yahoo.com>; user@nutch.apache.org
Cc: 
Sent: Friday, May 11, 2012 10:38 PM
Subject: Re: Indexing HTML metatags from Nutch into Solr

Hi.
I only have index-metatags plugins in my nutch-site.xml and is function
succesfully I also was trying with parse-metatags without positive result
and finaly dont use it.
also make sure that your schema in nutch is the same in solr.

if your index is not big you can erase the folder of your solr index and
nutch data.
nutch(crawldb, linkdb, segment)
solr(index, spellchecker).




**************************************************************************

----- Mensaje original -----
De: "ML mail" <ml...@yahoo.com>
Para: "Ing. Eyeris Rodriguez Rueda" <er...@uci.cu>, user@nutch.apache.org
Enviados: Viernes, 11 de Mayo 2012 19:37:13
Asunto: Re: Indexing HTML metatags from Nutch into Solr

Hi,

Actually I have already done all that, as I followed the Nutch Wiki for this
purpose: http://wiki.apache.org/nutch/IndexMetatags

Now your suggestion about cleaning my segments as well as solr index then
re-index is a good idea. Could you just help me on the commands to achieve
these 3 steps?

Many thanks!



----- Original Message -----
From: Ing. Eyeris Rodriguez Rueda <er...@uci.cu>
To: user@nutch.apache.org; ML mail <ml...@yahoo.com>
Cc:
Sent: Friday, May 11, 2012 7:55 PM
Subject: Re: Indexing HTML metatags from Nutch into Solr

Hello, I am using index-metatags plugins(I supose that you have
index-metatags plugins on nutch's plugins folder).
Fist you need to include on nutch-site some like this
|index-(basic|anchor|metatags|more)|
also you need to include the metadata names that you want to index(in this
file also):
<property>
    <name>metatags.names</name>
   
<value>category;keywords;author;comments;description;subject;last_modified</
value>
    <description>For plugin index-metatags: Indicate here the name of the
    html meta tag that should be
    parsed. Use a semicolon separated list if you want multiple
    tags, or use '*' to index all.
    Example: description;keywords;role
</description>
</property>
>I have only
this(category;keywords;author;comments;description;subject;last_modified).
after you have to configure your solrindex-mapping like this:
<field dest="subject" source="subject" /> <field dest="description"
source="description" /> <field dest="comments" source="comments" /> <field
dest="author" source="author"/> <field dest="keywords" source="keywords" />
<field dest="category" source="category" /> <field dest="lastModified"
source="lastModified"/>

I suggest clean your segments and solr index and reindex again.
I think that your problem will be solved with this.

****************************************************************************
************

----- Mensaje original -----
De: "ML mail" <ml...@yahoo.com>
Para: user@nutch.apache.org
Enviados: Viernes, 11 de Mayo 2012 6:40:36
Asunto: Indexing HTML metatags from Nutch into Solr

Hello,

I am using Nutch 1.4 with Solr 3.6.0 and would like to get the HTML keywords
and description metatags indexed into Solr. On the Nutch side I have
followed thehttp://wiki.apache.org/nutch/IndexMetatags to get nutch parsing
the extracting the metatags (using index-metatags and parse-metatags
plugins) but now when I run the solrindex they simply don't get indexed. 

In Solr I am using the schema.xml provided by Nutch and have added the
following fields for the metatags:
 
        <!-- fields for the metatags plugin -->
        <field name="metatag.description" type="text" stored="true"
indexed="true"/>
        <field name="metatag.keywords" type="text" stored="true"
indexed="true"/>

and have created a solrindex-mapping.xml file as follow:

<mapping>
<fields>
<field dest="description" source="metatag.description"/> <field
dest="keywords" source="metatag.keywords"/> </fields> </mapping>

the rest is pretty much a default install of Solr. So now my question is why
can't I see the metatags indexed in solr? Did I forget maybe to configure
something in Solr?

Any suggestions are welcome.


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


RE: Indexing HTML metatags from Nutch into Solr

Posted by "Ing. Eyeris Rodriguez Rueda" <er...@uci.cu>.
Hi.
Im not sure to understand, If you need to see the metatags indexed you will
see your solr administration page, if you are using jetty
http://localhost:8989/solr/admin/schema.jsp or
http://localhost:8080/solr/admin/schema.jsp if is tomcat deployment. In this
page you will see the fields supported by solr. Is possible that no contain
any document because you need to reindex again like I said.
Tell me if you understand.


_____________________________________________________________________
Ing. Eyeris Rodriguez Rueda
Teléfono:837-3370
Universidad de las Ciencias Informáticas
_____________________________________________________________________

-----Mensaje original-----
De: ML mail [mailto:mlnospam@yahoo.com] 
Enviado el: domingo, 13 de mayo de 2012 7:02 AM
Para: Ing. Eyeris Rodriguez Rueda; user@nutch.apache.org
Asunto: Re: Indexing HTML metatags from Nutch into Solr

I will then try deactivating the parse-metatags plugin.... 

Btw do you or anyone know what modifications exactly are required on side of
Apache Solr to get the metatags working?

Regards



----- Original Message -----
From: Ing. Eyeris Rodriguez Rueda <er...@uci.cu>
To: ML mail <ml...@yahoo.com>; user@nutch.apache.org
Cc: 
Sent: Friday, May 11, 2012 10:38 PM
Subject: Re: Indexing HTML metatags from Nutch into Solr

Hi.
I only have index-metatags plugins in my nutch-site.xml and is function
succesfully I also was trying with parse-metatags without positive result
and finaly dont use it.
also make sure that your schema in nutch is the same in solr.

if your index is not big you can erase the folder of your solr index and
nutch data.
nutch(crawldb, linkdb, segment)
solr(index, spellchecker).




**************************************************************************

----- Mensaje original -----
De: "ML mail" <ml...@yahoo.com>
Para: "Ing. Eyeris Rodriguez Rueda" <er...@uci.cu>, user@nutch.apache.org
Enviados: Viernes, 11 de Mayo 2012 19:37:13
Asunto: Re: Indexing HTML metatags from Nutch into Solr

Hi,

Actually I have already done all that, as I followed the Nutch Wiki for this
purpose: http://wiki.apache.org/nutch/IndexMetatags

Now your suggestion about cleaning my segments as well as solr index then
re-index is a good idea. Could you just help me on the commands to achieve
these 3 steps?

Many thanks!



----- Original Message -----
From: Ing. Eyeris Rodriguez Rueda <er...@uci.cu>
To: user@nutch.apache.org; ML mail <ml...@yahoo.com>
Cc:
Sent: Friday, May 11, 2012 7:55 PM
Subject: Re: Indexing HTML metatags from Nutch into Solr

Hello, I am using index-metatags plugins(I supose that you have
index-metatags plugins on nutch's plugins folder).
Fist you need to include on nutch-site some like this
|index-(basic|anchor|metatags|more)|
also you need to include the metadata names that you want to index(in this
file also):
<property>
    <name>metatags.names</name>
   
<value>category;keywords;author;comments;description;subject;last_modified</
value>
    <description>For plugin index-metatags: Indicate here the name of the
    html meta tag that should be
    parsed. Use a semicolon separated list if you want multiple
    tags, or use '*' to index all.
    Example: description;keywords;role
</description>
</property>
>I have only
this(category;keywords;author;comments;description;subject;last_modified).
after you have to configure your solrindex-mapping like this:
<field dest="subject" source="subject" /> <field dest="description"
source="description" /> <field dest="comments" source="comments" /> <field
dest="author" source="author"/> <field dest="keywords" source="keywords" />
<field dest="category" source="category" /> <field dest="lastModified"
source="lastModified"/>

I suggest clean your segments and solr index and reindex again.
I think that your problem will be solved with this.

****************************************************************************
************

----- Mensaje original -----
De: "ML mail" <ml...@yahoo.com>
Para: user@nutch.apache.org
Enviados: Viernes, 11 de Mayo 2012 6:40:36
Asunto: Indexing HTML metatags from Nutch into Solr

Hello,

I am using Nutch 1.4 with Solr 3.6.0 and would like to get the HTML keywords
and description metatags indexed into Solr. On the Nutch side I have
followed thehttp://wiki.apache.org/nutch/IndexMetatags to get nutch parsing
the extracting the metatags (using index-metatags and parse-metatags
plugins) but now when I run the solrindex they simply don't get indexed. 

In Solr I am using the schema.xml provided by Nutch and have added the
following fields for the metatags:
 
        <!-- fields for the metatags plugin -->
        <field name="metatag.description" type="text" stored="true"
indexed="true"/>
        <field name="metatag.keywords" type="text" stored="true"
indexed="true"/>

and have created a solrindex-mapping.xml file as follow:

<mapping>
<fields>
<field dest="description" source="metatag.description"/> <field
dest="keywords" source="metatag.keywords"/> </fields> </mapping>

the rest is pretty much a default install of Solr. So now my question is why
can't I see the metatags indexed in solr? Did I forget maybe to configure
something in Solr?

Any suggestions are welcome.


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Indexing HTML metatags from Nutch into Solr

Posted by ML mail <ml...@yahoo.com>.
I will then try deactivating the parse-metatags plugin.... 

Btw do you or anyone know what modifications exactly are required on side of Apache Solr to get the metatags working?

Regards



----- Original Message -----
From: Ing. Eyeris Rodriguez Rueda <er...@uci.cu>
To: ML mail <ml...@yahoo.com>; user@nutch.apache.org
Cc: 
Sent: Friday, May 11, 2012 10:38 PM
Subject: Re: Indexing HTML metatags from Nutch into Solr

Hi.
I only have index-metatags plugins in my nutch-site.xml and is function succesfully I also was trying with parse-metatags without positive result and finaly dont use it.
also make sure that your schema in nutch is the same in solr.

if your index is not big you can erase the folder of your solr index and nutch data.
nutch(crawldb, linkdb, segment)
solr(index, spellchecker).




**************************************************************************

----- Mensaje original -----
De: "ML mail" <ml...@yahoo.com>
Para: "Ing. Eyeris Rodriguez Rueda" <er...@uci.cu>, user@nutch.apache.org
Enviados: Viernes, 11 de Mayo 2012 19:37:13
Asunto: Re: Indexing HTML metatags from Nutch into Solr

Hi,

Actually I have already done all that, as I followed the Nutch Wiki for this purpose: http://wiki.apache.org/nutch/IndexMetatags

Now your suggestion about cleaning my segments as well as solr index then re-index is a good idea. Could you just help me on the commands to achieve these 3 steps?

Many thanks!



----- Original Message -----
From: Ing. Eyeris Rodriguez Rueda <er...@uci.cu>
To: user@nutch.apache.org; ML mail <ml...@yahoo.com>
Cc:
Sent: Friday, May 11, 2012 7:55 PM
Subject: Re: Indexing HTML metatags from Nutch into Solr

Hello, I am using index-metatags plugins(I supose that you have index-metatags plugins on nutch's plugins folder).
Fist you need to include on nutch-site some like this
|index-(basic|anchor|metatags|more)|
also you need to include the metadata names that you want to index(in this file also):
<property>
    <name>metatags.names</name>
    <value>category;keywords;author;comments;description;subject;last_modified</value>
    <description>For plugin index-metatags: Indicate here the name of the
    html meta tag that should be
    parsed. Use a semicolon separated list if you want multiple
    tags, or use '*' to index all.
    Example: description;keywords;role
</description>
</property>
>I have only this(category;keywords;author;comments;description;subject;last_modified).
after you have to configure your solrindex-mapping like this:
<field dest="subject" source="subject" />
<field dest="description" source="description" />
<field dest="comments" source="comments" />
<field dest="author" source="author"/>
<field dest="keywords" source="keywords" />
<field dest="category" source="category" />
<field dest="lastModified" source="lastModified"/>

I suggest clean your segments and solr index and reindex again.
I think that your problem will be solved with this.

****************************************************************************************

----- Mensaje original -----
De: "ML mail" <ml...@yahoo.com>
Para: user@nutch.apache.org
Enviados: Viernes, 11 de Mayo 2012 6:40:36
Asunto: Indexing HTML metatags from Nutch into Solr

Hello,

I am using Nutch 1.4 with Solr 3.6.0 and would like to get the HTML keywords and description metatags indexed into Solr. On the Nutch side I have followed thehttp://wiki.apache.org/nutch/IndexMetatags to get nutch parsing the extracting the metatags (using index-metatags and parse-metatags plugins) but now when I run the solrindex they simply don't get indexed. 

In Solr I am using the schema.xml provided by Nutch and have added the following fields for the metatags:
 
        <!-- fields for the metatags plugin -->
        <field name="metatag.description" type="text" stored="true" indexed="true"/>
        <field name="metatag.keywords" type="text" stored="true" indexed="true"/>

and have created a solrindex-mapping.xml file as follow:

<mapping>
<fields>
<field dest="description" source="metatag.description"/>
<field dest="keywords" source="metatag.keywords"/>
</fields>
</mapping>

the rest is pretty much a default install of Solr. So now my question is why can't I see the metatags indexed in solr? Did I forget maybe to configure something in Solr?

Any suggestions are welcome.

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Indexing HTML metatags from Nutch into Solr

Posted by "Ing. Eyeris Rodriguez Rueda" <er...@uci.cu>.
Hi.
I only have index-metatags plugins in my nutch-site.xml and is function succesfully I also was trying with parse-metatags without positive result and finaly dont use it.
also make sure that your schema in nutch is the same in solr.

if your index is not big you can erase the folder of your solr index and nutch data.
nutch(crawldb, linkdb, segment)
solr(index, spellchecker).




**************************************************************************

----- Mensaje original -----
De: "ML mail" <ml...@yahoo.com>
Para: "Ing. Eyeris Rodriguez Rueda" <er...@uci.cu>, user@nutch.apache.org
Enviados: Viernes, 11 de Mayo 2012 19:37:13
Asunto: Re: Indexing HTML metatags from Nutch into Solr

Hi,

Actually I have already done all that, as I followed the Nutch Wiki for this purpose: http://wiki.apache.org/nutch/IndexMetatags

Now your suggestion about cleaning my segments as well as solr index then re-index is a good idea. Could you just help me on the commands to achieve these 3 steps?

Many thanks!



----- Original Message -----
From: Ing. Eyeris Rodriguez Rueda <er...@uci.cu>
To: user@nutch.apache.org; ML mail <ml...@yahoo.com>
Cc: 
Sent: Friday, May 11, 2012 7:55 PM
Subject: Re: Indexing HTML metatags from Nutch into Solr

Hello, I am using index-metatags plugins(I supose that you have index-metatags plugins on nutch's plugins folder).
Fist you need to include on nutch-site some like this
|index-(basic|anchor|metatags|more)|
also you need to include the metadata names that you want to index(in this file also):
<property>
    <name>metatags.names</name>
    <value>category;keywords;author;comments;description;subject;last_modified</value>
    <description>For plugin index-metatags: Indicate here the name of the
    html meta tag that should be
    parsed. Use a semicolon separated list if you want multiple
    tags, or use '*' to index all.
    Example: description;keywords;role
</description>
</property>
>I have only this(category;keywords;author;comments;description;subject;last_modified).
after you have to configure your solrindex-mapping like this:
<field dest="subject" source="subject" />
<field dest="description" source="description" />
<field dest="comments" source="comments" />
<field dest="author" source="author"/>
<field dest="keywords" source="keywords" />
<field dest="category" source="category" />
<field dest="lastModified" source="lastModified"/>

I suggest clean your segments and solr index and reindex again.
I think that your problem will be solved with this.

****************************************************************************************

----- Mensaje original -----
De: "ML mail" <ml...@yahoo.com>
Para: user@nutch.apache.org
Enviados: Viernes, 11 de Mayo 2012 6:40:36
Asunto: Indexing HTML metatags from Nutch into Solr

Hello,

I am using Nutch 1.4 with Solr 3.6.0 and would like to get the HTML keywords and description metatags indexed into Solr. On the Nutch side I have followed thehttp://wiki.apache.org/nutch/IndexMetatags to get nutch parsing the extracting the metatags (using index-metatags and parse-metatags plugins) but now when I run the solrindex they simply don't get indexed. 

In Solr I am using the schema.xml provided by Nutch and have added the following fields for the metatags:
 
        <!-- fields for the metatags plugin -->
        <field name="metatag.description" type="text" stored="true" indexed="true"/>
        <field name="metatag.keywords" type="text" stored="true" indexed="true"/>

and have created a solrindex-mapping.xml file as follow:

<mapping>
<fields>
<field dest="description" source="metatag.description"/>
<field dest="keywords" source="metatag.keywords"/>
</fields>
</mapping>

the rest is pretty much a default install of Solr. So now my question is why can't I see the metatags indexed in solr? Did I forget maybe to configure something in Solr?

Any suggestions are welcome.

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Indexing HTML metatags from Nutch into Solr

Posted by ML mail <ml...@yahoo.com>.
Hi,

Actually I have already done all that, as I followed the Nutch Wiki for this purpose: http://wiki.apache.org/nutch/IndexMetatags

Now your suggestion about cleaning my segments as well as solr index then re-index is a good idea. Could you just help me on the commands to achieve these 3 steps?

Many thanks!



----- Original Message -----
From: Ing. Eyeris Rodriguez Rueda <er...@uci.cu>
To: user@nutch.apache.org; ML mail <ml...@yahoo.com>
Cc: 
Sent: Friday, May 11, 2012 7:55 PM
Subject: Re: Indexing HTML metatags from Nutch into Solr

Hello, I am using index-metatags plugins(I supose that you have index-metatags plugins on nutch's plugins folder).
Fist you need to include on nutch-site some like this
|index-(basic|anchor|metatags|more)|
also you need to include the metadata names that you want to index(in this file also):
<property>
    <name>metatags.names</name>
    <value>category;keywords;author;comments;description;subject;last_modified</value>
    <description>For plugin index-metatags: Indicate here the name of the
    html meta tag that should be
    parsed. Use a semicolon separated list if you want multiple
    tags, or use '*' to index all.
    Example: description;keywords;role
</description>
</property>
>I have only this(category;keywords;author;comments;description;subject;last_modified).
after you have to configure your solrindex-mapping like this:
<field dest="subject" source="subject" />
<field dest="description" source="description" />
<field dest="comments" source="comments" />
<field dest="author" source="author"/>
<field dest="keywords" source="keywords" />
<field dest="category" source="category" />
<field dest="lastModified" source="lastModified"/>

I suggest clean your segments and solr index and reindex again.
I think that your problem will be solved with this.

****************************************************************************************

----- Mensaje original -----
De: "ML mail" <ml...@yahoo.com>
Para: user@nutch.apache.org
Enviados: Viernes, 11 de Mayo 2012 6:40:36
Asunto: Indexing HTML metatags from Nutch into Solr

Hello,

I am using Nutch 1.4 with Solr 3.6.0 and would like to get the HTML keywords and description metatags indexed into Solr. On the Nutch side I have followed thehttp://wiki.apache.org/nutch/IndexMetatags to get nutch parsing the extracting the metatags (using index-metatags and parse-metatags plugins) but now when I run the solrindex they simply don't get indexed. 

In Solr I am using the schema.xml provided by Nutch and have added the following fields for the metatags:
 
        <!-- fields for the metatags plugin -->
        <field name="metatag.description" type="text" stored="true" indexed="true"/>
        <field name="metatag.keywords" type="text" stored="true" indexed="true"/>

and have created a solrindex-mapping.xml file as follow:

<mapping>
<fields>
<field dest="description" source="metatag.description"/>
<field dest="keywords" source="metatag.keywords"/>
</fields>
</mapping>

the rest is pretty much a default install of Solr. So now my question is why can't I see the metatags indexed in solr? Did I forget maybe to configure something in Solr?

Any suggestions are welcome.

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci