You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Siva Jagadeesan <si...@gmail.com> on 2015/05/27 23:35:13 UTC

SF / East Bay Area Stream Processing Meetup next Thursday (6/4)

http://www.meetup.com/Bay-Area-Stream-Processing/events/219086133/

Thursday, June 4, 2015

6:45 PM
TubeMogul
<http://maps.google.com/maps?f=q&hl=en&q=1250+53rd%2C+Emeryville%2C+CA%2C+94608%2C+us>

1250 53rd
St #1
Emeryville, CA

6:45PM to 7:00PM - Socializing

7:00PM to 8:00PM - Talks

8:00PM to 8:30PM - Socializing

Speaker :

*Bill Zhao (from TubeMogul)*

Bill was working as a researcher in the UC Berkeley AMP lab during the
creation of Spark and Tachyon, and worked on improving Spark memory
utilization and Spark Tachyon integration.  The AMP lab Working at the
intersection of three massive trends: powerful machine learning, cloud
computing, and crowdsourcing, the AMPLab is integrating Algorithms,
Machines, and People to make sense of Big Data.

Topic:

*Introduction to Spark and Tachyon*

Description:

Spark is a fast and general processing engine compatible with Hadoop data.
It can run in Hadoop clusters through YARN or Spark's standalone mode, and
it can process data in HDFS, etc.  It is designed to perform both batch
processing (similar to MapReduce).  Tachyon is a memory-centric distributed
storage system enabling reliable data sharing at memory-speed across
cluster frameworks, such as Spark and MapReduce.  It achieves high
performance by leveraging lineage information and using memory
aggressively. Tachyon caches working set files in memory, thereby avoiding
going to disk to load datasets that are frequently read. This enables
different jobs/queries and frameworks to access cached files at memory
speed.