You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vangelis karv <ka...@hotmail.com> on 2014/02/26 16:40:08 UTC
Parse Metatags 2.2.1
Hello again!
Does anybody know how to parse metatags at Nutch 2.2.1? I have found the patch https://issues.apache.org/jira/browse/NUTCH-1478 but it is not very clear. I use Eclipse and MySQL for my crawl loops.
Thank you in advance!
RE: Parse Metatags 2.2.1
Posted by Vangelis karv <ka...@hotmail.com>.
Hi Talat! Looks like you have put a lot of effort in this one!
Help us understand the patch:
1. <property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-domain|parse-(html|tika|metatags)|index-(basic|anchor|more|metadata)|urlnormalizer-(pass|regex|basic)|scoring-new</value>
<description> </description>
</property>
2. In nutch-site.xml can you tell us how to use those 4 new properties?
<property>
<name>index.parse.md</name>
<value>description,keywords</value>
<description></description>
</property>
<property>
<name>index.content.md</name>
<value></value>
<description> </description>
</property>
<property>
<name>index.db.md</name>
<value></value>
<description> </description>
</property>
<!-- parse-metatags plugin properties -->
<property>
<name>description;keywords</name>
<value>*</value>
<description> </description>
</property>
3. I read somewhere that we need to input
<field name="metatag.description" type="string" stored="true" indexed="true"/>
in schema.xml both in solr and nutch. Is that correct?
4. I want to see my chosen metatags at MySQL, for I find it more useful for my queries. Any ideas how to implement this?
> Date: Fri, 28 Feb 2014 12:06:20 +0200
> Subject: Re: Parse Metatags 2.2.1
> From: talat@uyarer.com
> To: user@nutch.apache.org
>
> Hi Vangelis,
>
> I update NUTCH-1478 issue. You can test it, if you want :)
>
> Thanks
>
>
> 2014-02-26 17:54 GMT+02:00 Vangelis karv <ka...@hotmail.com>:
>
> > Thank you Talat! I also hope that you will help us understand your patch
> > and attach a link to download it!
> >
> > > Date: Wed, 26 Feb 2014 17:45:55 +0200
> > > Subject: Re: Parse Metatags 2.2.1
> > > From: talat@uyarer.com
> > > To: user@nutch.apache.org
> > >
> > > Hi Vangelis,
> > >
> > > I will update this issue tonight. You can use my last patch after update.
> > >
> > > Thanks.
> > >
> > >
> > >
> > > 2014-02-26 17:40 GMT+02:00 Vangelis karv <ka...@hotmail.com>:
> > >
> > > > Hello again!
> > > >
> > > > Does anybody know how to parse metatags at Nutch 2.2.1? I have found
> > the
> > > > patch https://issues.apache.org/jira/browse/NUTCH-1478 but it is not
> > very
> > > > clear. I use Eclipse and MySQL for my crawl loops.
> > > >
> > > > Thank you in advance!
> > > >
> > >
> > >
> > >
> > >
> > > --
> > > Talat UYARER
> > > Websitesi: http://talat.uyarer.com
> > > Twitter: http://twitter.com/talatuyarer
> > > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
> >
> >
>
>
>
> --
> Talat UYARER
> Websitesi: http://talat.uyarer.com
> Twitter: http://twitter.com/talatuyarer
> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
Re: Parse Metatags 2.2.1
Posted by Talat Uyarer <ta...@uyarer.com>.
Hi Vangelis,
I update NUTCH-1478 issue. You can test it, if you want :)
Thanks
2014-02-26 17:54 GMT+02:00 Vangelis karv <ka...@hotmail.com>:
> Thank you Talat! I also hope that you will help us understand your patch
> and attach a link to download it!
>
> > Date: Wed, 26 Feb 2014 17:45:55 +0200
> > Subject: Re: Parse Metatags 2.2.1
> > From: talat@uyarer.com
> > To: user@nutch.apache.org
> >
> > Hi Vangelis,
> >
> > I will update this issue tonight. You can use my last patch after update.
> >
> > Thanks.
> >
> >
> >
> > 2014-02-26 17:40 GMT+02:00 Vangelis karv <ka...@hotmail.com>:
> >
> > > Hello again!
> > >
> > > Does anybody know how to parse metatags at Nutch 2.2.1? I have found
> the
> > > patch https://issues.apache.org/jira/browse/NUTCH-1478 but it is not
> very
> > > clear. I use Eclipse and MySQL for my crawl loops.
> > >
> > > Thank you in advance!
> > >
> >
> >
> >
> >
> > --
> > Talat UYARER
> > Websitesi: http://talat.uyarer.com
> > Twitter: http://twitter.com/talatuyarer
> > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>
>
--
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
RE: Parse Metatags 2.2.1
Posted by Vangelis karv <ka...@hotmail.com>.
Thank you Talat! I also hope that you will help us understand your patch and attach a link to download it!
> Date: Wed, 26 Feb 2014 17:45:55 +0200
> Subject: Re: Parse Metatags 2.2.1
> From: talat@uyarer.com
> To: user@nutch.apache.org
>
> Hi Vangelis,
>
> I will update this issue tonight. You can use my last patch after update.
>
> Thanks.
>
>
>
> 2014-02-26 17:40 GMT+02:00 Vangelis karv <ka...@hotmail.com>:
>
> > Hello again!
> >
> > Does anybody know how to parse metatags at Nutch 2.2.1? I have found the
> > patch https://issues.apache.org/jira/browse/NUTCH-1478 but it is not very
> > clear. I use Eclipse and MySQL for my crawl loops.
> >
> > Thank you in advance!
> >
>
>
>
>
> --
> Talat UYARER
> Websitesi: http://talat.uyarer.com
> Twitter: http://twitter.com/talatuyarer
> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
Re: Parse Metatags 2.2.1
Posted by Talat Uyarer <ta...@uyarer.com>.
Hi Vangelis,
I will update this issue tonight. You can use my last patch after update.
Thanks.
2014-02-26 17:40 GMT+02:00 Vangelis karv <ka...@hotmail.com>:
> Hello again!
>
> Does anybody know how to parse metatags at Nutch 2.2.1? I have found the
> patch https://issues.apache.org/jira/browse/NUTCH-1478 but it is not very
> clear. I use Eclipse and MySQL for my crawl loops.
>
> Thank you in advance!
>
--
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304