You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by abhayd <aj...@hotmail.com> on 2012/02/03 01:10:43 UTC

index-blacklist-whitelist pluign for multiple set of urls

hi 
I have been using index-blacklist-whitelist plugin and it works really great

but recently i wanted to use it to extract text based on url pattern

so 
for http://x.y.z/?id=12 white-list will only look for div id=12
for http://x.y.z/?id=13 white-list will only look for div id=13

is this some thing doable with this plugin ?

I am using nuth-site.xml file wasnt sure if i can add something in that
config which would enable such thing..


ANy help?


--
View this message in context: http://lucene.472066.n3.nabble.com/index-blacklist-whitelist-pluign-for-multiple-set-of-urls-tp3711697p3711697.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: index-blacklist-whitelist pluign for multiple set of urls

Posted by abhayd <aj...@hotmail.com>.
thanks elisabeth I updated jura issue

https://issues.apache.org/jira/browse/NUTCH-585?focusedCommentId=13199746#comment-13199746

Hope u will be able to add new feature quickly to this nice plugin


--
View this message in context: http://lucene.472066.n3.nabble.com/index-blacklist-whitelist-pluign-for-multiple-set-of-urls-tp3711697p3713297.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: index-blacklist-whitelist pluign for multiple set of urls

Posted by Elisabeth Adler <el...@gmail.com>.
Hi,
Out of the box, the index-blacklist-whitelist plugin can't extract text 
based on url pattern. You would have to modify the plugin to your needs. 
I think this is a nice feature, so you might want to include the use 
case in the Jira item of the plugin.
Best,
Elisabeth

On 03.02.2012 01:10, abhayd wrote:
> hi
> I have been using index-blacklist-whitelist plugin and it works really great
>
> but recently i wanted to use it to extract text based on url pattern
>
> so
> for http://x.y.z/?id=12 white-list will only look for div id=12
> for http://x.y.z/?id=13 white-list will only look for div id=13
>
> is this some thing doable with this plugin ?
>
> I am using nuth-site.xml file wasnt sure if i can add something in that
> config which would enable such thing..
>
>
> ANy help?
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/index-blacklist-whitelist-pluign-for-multiple-set-of-urls-tp3711697p3711697.html
> Sent from the Nutch - User mailing list archive at Nabble.com.