You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by blunderboy <sa...@gmail.com> on 2012/03/19 12:23:07 UTC

Re: Meta Tags

Hi Marek,
Thanks for sharing the useful info.
Actually I am facing the same issue which you have faced and solved :)
But after fetching meta tags information from the html documents, I want to
store them in mysql-database.
I don't want to create any index.

I am beginner in using nutch So don't know much of the stuff.
Can you please tell me where should i modify the code or can I use some king
of plugin functionality so that i need not recompile the code.

Also, Can i extract some other tag information using the above mentioned
idea.

I am using apache 1.4

Thanks

--
View this message in context: http://lucene.472066.n3.nabble.com/Meta-Tags-tp3598549p3838746.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Meta Tags

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi,

Well I would urge you to have a look at NUTCH-978
https://issues.apache.org/jira/browse/NUTCH-978

This issue can probably be adapted to suit your needs? It's open as a
Google Summer of Code project if you are interested in applying. 'Most' of
the code is alrwady there by the looks of the patch that Ammar has
attached, however I think there is some considerable updating and refining
to be done with it before it can even be considered for integration into
current development.

What do you think?

Lewis

On Wed, Mar 21, 2012 at 11:04 AM, blunderboy <sa...@gmail.com>wrote:

> Hi lewis,
> After lots of efforts, finally i was able to index up the meta tag in solR.
> Thanks a lot. Is there any other plugin by which I can extract div tags
> also
> ?
> just like index-metatags
>
> Searching on google gave me some hint to use productdiv plugin.
> Is it the right option ?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Meta-Tags-tp3598549p3845285.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*Lewis*

Re: Meta Tags

Posted by blunderboy <sa...@gmail.com>.
Hi lewis,
After lots of efforts, finally i was able to index up the meta tag in solR.
Thanks a lot. Is there any other plugin by which I can extract div tags also
?
just like index-metatags

Searching on google gave me some hint to use productdiv plugin.
Is it the right option ?

--
View this message in context: http://lucene.472066.n3.nabble.com/Meta-Tags-tp3598549p3845285.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Meta Tags

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi,

CC'ing in dev@gora

This second part of your question is prime Gora territory.

On Tue, Mar 20, 2012 at 6:30 AM, blunderboy <sa...@gmail.com> wrote:

> Hi Lewis,
> Thanks for your solution. I have another idea.
> We fetch meta tags and create Solr index.
> Now can we write a separate java program which makes use of lucene library
> which query the index and store the result in mysql database.
>

We have a pending patch [0] which would allow you to do a whole bunch of
analytics and processing of your Solr data. There are some problems but
watch this space. Currently what Gora supports is an API which enables you
to persist your data in MySQL or HSQL, HBase and Cassandra then run power
ful queries and processing jobs against the store, you could then persist
the results in MySQL. If you were going to write the Lucene code you would
maybe be best looking into the Gora API.


> I know i am going to do more stuff than actually required but it is apt for
> my need.
> So is it possible ?
>

hth

Lewis
[0]  https://issues.apache.org/jira/browse/GORA-9



-- 
*Lewis*

Re: Meta Tags

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi,

CC'ing in dev@gora

This second part of your question is prime Gora territory.

On Tue, Mar 20, 2012 at 6:30 AM, blunderboy <sa...@gmail.com> wrote:

> Hi Lewis,
> Thanks for your solution. I have another idea.
> We fetch meta tags and create Solr index.
> Now can we write a separate java program which makes use of lucene library
> which query the index and store the result in mysql database.
>

We have a pending patch [0] which would allow you to do a whole bunch of
analytics and processing of your Solr data. There are some problems but
watch this space. Currently what Gora supports is an API which enables you
to persist your data in MySQL or HSQL, HBase and Cassandra then run power
ful queries and processing jobs against the store, you could then persist
the results in MySQL. If you were going to write the Lucene code you would
maybe be best looking into the Gora API.


> I know i am going to do more stuff than actually required but it is apt for
> my need.
> So is it possible ?
>

hth

Lewis
[0]  https://issues.apache.org/jira/browse/GORA-9



-- 
*Lewis*

Re: Meta Tags

Posted by blunderboy <sa...@gmail.com>.
Hi Lewis,
Thanks for your solution. I have another idea.
We fetch meta tags and create Solr index.
Now can we write a separate java program which makes use of lucene library
which query the index and store the result in mysql database.
I know i am going to do more stuff than actually required but it is apt for
my need.
So is it possible ?

--
View this message in context: http://lucene.472066.n3.nabble.com/Meta-Tags-tp3598549p3841495.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Meta Tags

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi,



On Mon, Mar 19, 2012 at 11:23 AM, blunderboy <sa...@gmail.com>wrote:

> But after fetching meta tags information from the html documents, I want to
> store them in mysql-database.
> I don't want to create any index.
>

You should use Nutchgora branch, which you can check out from here

http://svn.apache.org/repos/asf/nutch/branches/nutchgora/

There are some patches which need to be applied so please google Nutchgora,
try Salmon Run for a recent series of blog posts on the topic. N.B You need
to use Gora 0.1.1-incubating release to use with MySql


>
> I am beginner in using nutch So don't know much of the stuff.
> Can you please tell me where should i modify the code or can I use some
> king
> of plugin functionality so that i need not recompile the code.
>

I'm afraid that in order to do what you wish you need to get clued up with
the Nutchgora branch and be able to edit code and recompile the project
with dependencies etc.

>
> Also, Can i extract some other tag information using the above mentioned
> idea.
>

I don't see why not!!!

hth