You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sahi <sa...@hotmail.com> on 2009/09/02 06:09:15 UTC

Deletion of words in articles of Wikipedia

Hi,

I'm new to this site. My question is:

Articles in wikipedia can be edited by everyone and may or may not be
accurate. If any contributor writes an article and then another contributor
deletes certain content in that article would indicate that the article is
controversial. 
I need to start off with this project where we can find the ranking of
controversial articles. Could anyone kindly help me how to start?

Thanks
-- 
View this message in context: http://www.nabble.com/Deletion-of-words-in-articles-of-Wikipedia-tp25251378p25251378.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Deletion of words in articles of Wikipedia

Posted by mark harwood <ma...@yahoo.co.uk>.
>>I need to start off with this project where we can find the ranking of
>>controversial articles. Could anyone kindly help me how to start?

Check out the wikipedia "logging" dumps which contain the reasons for actions on page titles (including ip blocks and deletes) but without the bulk of the full text changes.
e.g. http://download.wikimedia.org/enwiki/20090827/enwiki-20090827-pages-logging.xml.gz

Once you get this in Lucene "Luke" can help you explore and pinpoint the key target pages for vandalism.


Cheers,
Mark




----- Original Message ----
From: Sahi <sa...@hotmail.com>
To: java-user@lucene.apache.org
Sent: Wednesday, 2 September, 2009 5:09:15
Subject: Deletion of words in articles of Wikipedia


Hi,

I'm new to this site. My question is:

Articles in wikipedia can be edited by everyone and may or may not be
accurate. If any contributor writes an article and then another contributor
deletes certain content in that article would indicate that the article is
controversial. 
I need to start off with this project where we can find the ranking of
controversial articles. Could anyone kindly help me how to start?

Thanks
-- 
View this message in context: http://www.nabble.com/Deletion-of-words-in-articles-of-Wikipedia-tp25251378p25251378.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org