You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Talat UYARER <ta...@agmlab.com> on 2013/10/22 13:34:40 UTC

About ParseMetadata

Hi,

When I try to port ATLANTBH's filter-xpath pluigns. I saw a 
parsemetadata object. I think this used from 1.x. I do little search in 
2.x I found in HTMLParser.java. it created but it is not set any every. 
Can you explain this is need us in 2.x or we can clean this code block ? 
If this is unnecessary thing. Why do we set those ?

       metadata.set(Metadata.ORIGINAL_CHAR_ENCODING, encoding);
       metadata.set(Metadata.CHAR_ENCODING_FOR_CONVERSION, encoding);

Regards
Talat

Re: About ParseMetadata

Posted by Talat UYARER <ta...@agmlab.com>.
Hi Feng,

I think same as you. For Xpath plugin i put in metadata. Ok I will put 
metadata.

Thanks for information

Talat

27-10-2013 16:00 tarihinde, feng lu yazdı:
> Hi Talat
>
> Sorry for delay reply. yes the parseMeta is stored in segment database.
> there are many places to use parseMeta in plugins, such as
> language-identifier, scoring-depth, index-meta, you can search the usage
> of getParseMeta() method in ParseData class.
>
> I found that in Nutch2.x , contentMeta and parseMeta are all merged into
> WebPage. so I think we can put metadata in HtmlParser.java into WebPage.
>
>
>
>
>
> On Wed, Oct 23, 2013 at 6:24 PM, Talat UYARER <talat.uyarer@agmlab.com
> <ma...@agmlab.com>> wrote:
>
>     Hi Feng lu,
>
>     I am not good at 1.x. Can you give some information when we need
>     parseMeta in 1.x. is it stored in db ?
>
>     If that will be necessary, I can develop. But I should understand
>     what we need that.
>
>     Regards
>     Talat
>
>     22-10-2013 17:35 tarihinde, feng lu yazdı:
>
>
>         On Tue, Oct 22, 2013 at 7:34 PM, Talat UYARER
>         <talat.uyarer@agmlab.com <ma...@agmlab.com>
>         <mailto:talat.uyarer@agmlab.__com
>         <ma...@agmlab.com>>> wrote:
>
>              ORIGINAL_CHAR_ENCODING
>
>
>         yes, in nutch 2.x , it not use parseMeta and contentMeta in Parse
>         Object. one way is to clean this code block and another way is
>         to add
>         parseMeta in Parse Object. and another parser may will use this meta
>         data. I agree with add parseMeta to Parse object.
>
>         how do you think Talat.
>
>         Regards.
>
>
>         --
>         Don't Grow Old, Grow Up... :-)
>
>
>
>
>
> --
> Don't Grow Old, Grow Up... :-)


Re: About ParseMetadata

Posted by feng lu <am...@gmail.com>.
Hi Talat

Sorry for delay reply. yes the parseMeta is stored in segment database.
there are many places to use parseMeta in plugins, such as
language-identifier, scoring-depth, index-meta, you can search the usage of
getParseMeta() method in ParseData class.

I found that in Nutch2.x , contentMeta and parseMeta are all merged into
WebPage. so I think we can put metadata in HtmlParser.java into WebPage.





On Wed, Oct 23, 2013 at 6:24 PM, Talat UYARER <ta...@agmlab.com>wrote:

> Hi Feng lu,
>
> I am not good at 1.x. Can you give some information when we need parseMeta
> in 1.x. is it stored in db ?
>
> If that will be necessary, I can develop. But I should understand what we
> need that.
>
> Regards
> Talat
>
> 22-10-2013 17:35 tarihinde, feng lu yazdı:
>
>>
>> On Tue, Oct 22, 2013 at 7:34 PM, Talat UYARER <talat.uyarer@agmlab.com
>> <mailto:talat.uyarer@agmlab.**com <ta...@agmlab.com>>> wrote:
>>
>>     ORIGINAL_CHAR_ENCODING
>>
>>
>> yes, in nutch 2.x , it not use parseMeta and contentMeta in Parse
>> Object. one way is to clean this code block and another way is to add
>> parseMeta in Parse Object. and another parser may will use this meta
>> data. I agree with add parseMeta to Parse object.
>>
>> how do you think Talat.
>>
>> Regards.
>>
>>
>> --
>> Don't Grow Old, Grow Up... :-)
>>
>
>


-- 
Don't Grow Old, Grow Up... :-)

Re: About ParseMetadata

Posted by Talat UYARER <ta...@agmlab.com>.
Hi Feng lu,

I am not good at 1.x. Can you give some information when we need 
parseMeta in 1.x. is it stored in db ?

If that will be necessary, I can develop. But I should understand what 
we need that.

Regards
Talat

22-10-2013 17:35 tarihinde, feng lu yazdı:
>
> On Tue, Oct 22, 2013 at 7:34 PM, Talat UYARER <talat.uyarer@agmlab.com
> <ma...@agmlab.com>> wrote:
>
>     ORIGINAL_CHAR_ENCODING
>
>
> yes, in nutch 2.x , it not use parseMeta and contentMeta in Parse
> Object. one way is to clean this code block and another way is to add
> parseMeta in Parse Object. and another parser may will use this meta
> data. I agree with add parseMeta to Parse object.
>
> how do you think Talat.
>
> Regards.
>
>
> --
> Don't Grow Old, Grow Up... :-)


Re: About ParseMetadata

Posted by feng lu <am...@gmail.com>.
On Tue, Oct 22, 2013 at 7:34 PM, Talat UYARER <ta...@agmlab.com>wrote:

> ORIGINAL_CHAR_ENCODING
>

yes, in nutch 2.x , it not use parseMeta and contentMeta in Parse Object.
one way is to clean this code block and another way is to add parseMeta in
Parse Object. and another parser may will use this meta data. I agree with
add parseMeta to Parse object.

how do you think Talat.

Regards.


-- 
Don't Grow Old, Grow Up... :-)