You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by daidong <da...@gmail.com> on 2014/09/22 22:37:47 UTC

The wikipedia Extraction (WEX) Dataset

I watched several presentations from the AMP Camp 2013. Many of the Spark
examples are about extracting information from the tsv format Wikipedia
extraction dataset (around 66 GB). It used to be provided as an open data
set in Amazon EBS, but now it already disappeared.

I really want to use these examples in our class for introducing Spark to
students. Could anybody tell me where i can find this data set or share it
with me (if it is appropriate)? Any other similar data set recommendation
would be also great!

Thanks,

- Dong 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-wikipedia-Extraction-WEX-Dataset-tp14844.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: The wikipedia Extraction (WEX) Dataset

Posted by daidong <da...@gmail.com>.

Really sorry to brother everybody. It is my mistake. The data set is still on
the amazon and can be downloaded. The reason of my failure is that I start
an instance not in U.S., so can not attach the EBS volume. 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-wikipedia-Extraction-WEX-Dataset-tp14844p14854.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org