You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@storm.apache.org by bo...@apache.org on 2016/03/16 22:01:13 UTC

svn commit: r1735297 [3/6] - in /storm/branches/bobby-versioned-site: ./ _includes/ release/ releases/0.10.0/ releases/0.9.6/

Added: storm/branches/bobby-versioned-site/releases/0.10.0/Powered-By.md
URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Powered-By.md?rev=1735297&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Powered-By.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Powered-By.md Wed Mar 16 21:01:12 2016
@@ -0,0 +1,1040 @@
+---
+title: Companies Using Apache Storm
+layout: documentation
+documentation: true
+---
+Want to be added to this page? Send an email [here](mailto:dev@storm.apache.org).
+
+<table class="table table-striped">
+
+<tr>
+<td>
+<a href="http://groupon.com">Groupon</a>
+</td>
+<td>
+<p>
+At Groupon we use Storm to build real-time data integration systems. Storm helps us analyze, clean, normalize, and resolve large amounts of non-unique data points with low latency and high throughput.
+</p>
+</td>
+</tr>
+
+<tr>
+<td><a href="http://www.weather.com/">The Weather Channel</a></td>
+<td>
+<p>At Weather Channel we use several Storm topologies to ingest and persist weather data. Each topology is responsible for fetching one dataset from an internal or external network (the Internet), reshaping the records for use by our company, and persisting the records to relational databases. It is particularly useful to have an automatic mechanism for repeating attempts to download and manipulate the data when there is a hiccup.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.fullcontact.com/">FullContact</a>
+</td>
+<td>
+<p>
+At FullContact we currently use Storm as the backbone of the system which synchronizes our Cloud Address Book with third party services such as Google Contacts and Salesforce. We also use it to provide real-time support for our contact graph analysis and federated contact search systems.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://twitter.com">Twitter</a>
+</td>
+<td>
+<p>
+Storm powers a wide variety of Twitter systems, ranging in applications from discovery, realtime analytics, personalization, search, revenue optimization, and many more. Storm integrates with the rest of Twitter's infrastructure, including database systems (Cassandra, Memcached, etc), the messaging infrastructure, Mesos, and the monitoring/alerting systems. Storm's isolation scheduler makes it easy to use the same cluster both for production applications and in-development applications, and it provides a sane way to do capacity planning.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.yahoo.com">Yahoo!</a>
+</td>
+<td>
+<p>
+Yahoo! is developing a next generation platform that enables the convergence of big-data and low-latency processing. While Hadoop is our primary technology for batch processing, Storm empowers stream/micro-batch processing of user events, content feeds, and application logs. 
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.yahoo.co.jp/">Yahoo! JAPAN</a>
+</td>
+<td>
+<p>
+Yahoo! JAPAN is a leading web portal in Japan. Storm applications are processing various streaming data such as logs or social data. We use Storm to feed contents, monitor systems, detect trending topics, and crawl on websites.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.webmd.com">WebMD</a>
+</td>
+<td>
+<p>
+We use Storm to power our Medscape Medpulse mobile application which allow medical professionals to follow important medical trends with Medscape's curated Today on Twitter feed and selection of blogs. Storm topology is capturing and processing tweets with twitter streaming API, enhance tweets with metadata and images, do real time NLP and execute several business rules. Storm also monitors selection of blogs in order to give our customers real-time updates.  We also use Storm for internal data pipelines to do ETL and for our internal marketing platform where time and freshness are essential.
+</p>
+<p>
+We use storm to power our search indexing process.  We continue to discover new use cases for storm and it became one of the core component in our technology stack.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.spotify.com">Spotify</a>
+</td>
+<td>
+<p>
+Spotify serves streaming music to over 10 million subscribers and 40 million active users. Storm powers a wide range of real-time features at Spotify, including music recommendation, monitoring, analytics, and ads targeting. Together with Kafka, memcached, Cassandra, and netty-zmtp based messaging, Storm enables us to build low-latency fault-tolerant distributed systems with ease.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.infochimps.com">Infochimps</a>
+</td>
+<td>
+<p>
+Infochimps uses Storm as part of its Big Data Enterprise Cloud. Specifically, it uses Storm as the basis for one of three of its cloud data services - namely, Data Delivery Services (DDS), which uses Storm to provide a fault-tolerant and linearly scalable enterprise data collection, transport, and complex in-stream processing cloud service. 
+</p>
+
+<p>
+In much the same way that Hadoop provides batch ETL and large-scale batch analytical processing, the Data Delivery Service provides real-time ETL and large-scale real-time analytical processing — the perfect complement to Hadoop (or in some cases, what you needed instead of Hadoop).
+</p>
+
+<p>
+DDS uses both Storm and Kafka along with a host of additional technologies to provide an enterprise-class real-time stream processing solution with features including:
+</p>
+
+<ul>
+<li>
+Integration connections to any variety of data sources in a way that is robust yet as non-invasive
+</li>
+<li>
+Optimizations for highly scalable, reliable data import and distributed ETL (extract, transform, load), fulfilling data transport needs
+</li>
+<li>
+Developer tools for rapid development of decorators, which perform the real-time stream processing
+</li>
+<li>
+Guaranteed delivery framework and data failover snapshots to send processed data to analytics systems, databases, file systems, and applications with extreme reliability
+</li>
+<li>
+Rapid solution development and deployment, along with our expert Big Data methodology and best practices
+</li>
+</ul>
+
+<p>Infochimps has extensive experience in deploying its DDS to power large-scale clickstream web data flows, massive Twitter stream processes, Foursquare event processing, customer purchase data, product pricing data, and more.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://healthmarketscience.com/">Health Market Science</a>
+</td>
+<td>
+<p>
+Health Market Science (HMS) provides data management as a service for the healthcare industry.  Storm is at the core of the HMS big data platform functioning as the data ingestion mechanism, which orchestrates the data flow across multiple persistence mechanisms that allow HMS to deliver Master Data Management (MDM) and analytics capabilities for wide range of healthcare needs: compliance, integrity, data quality, and operational decision support.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="https://www.verisigninc.com/">Verisign</a>
+</td>
+<td>
+<p>
+Verisign, a global leader in domain names and Internet security, enables Internet navigation for many of the world's most recognized domain names and provides protection for enterprises around the world.  Ensuring the security, stability, and resiliency of key Internet infrastructure and services, including the .COM and .NET top level domains and two of the Internet's DNS root servers, is at the heart of Verisign’s mission.  Storm is a component of our data analytics stack that powers a variety of real-time applications.  One example is security monitoring where we are leveraging Storm to analyze the network telemetry data of our globally distributed infrastructure in order to detect and mitigate cyber attacks.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://cerner.com/">Cerner</a>
+</td>
+<td>
+<p>
+Cerner is a leader in health care information technology. We have been using Storm since its release to process massive amounts of clinical data in real-time. Storm integrates well in our architecture, allowing us to quickly provide clinicians with the data they need to make medical decisions.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.aeris.com/">Aeris Communications</a>
+</td>
+<td>
+<p>
+Aeris Communications has the only cellular network that was designed and built exclusively for machines. Our ability to provide scalable, reliable real-time analytics - powered by Storm - for machine to machine (M2M) communication offers immense value to our customers. We are using Storm in production since Q1 of 2013.
+</p>
+</td>
+</tr>
+
+
+
+<tr>
+<td>
+<a href="http://flipboard.com/">Flipboard</a>
+</td>
+<td>
+<p>
+Flipboard is the worldʼs first social magazine, a single place to keep up with everything  you care about and collect it in ways that let reflect you. Inspired by the beauty and  ease of print media, Flipboard is designed so you can easily flip through news from around the world or stories from right at home, helping people find the one thing that  can inform, entertain or even inspire them every day.
+</p>
+<p>
+We are using Storm across a wide range of our services from content search, to realtime analytics, to generating custom magazine feeds. We then integrate Storm across our infrastructure within systems like ElasticSearch, HBase, Hadoop and HDFS to create a highly scalable data platform.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.rubiconproject.com/">Rubicon Project</a>
+</td>
+<td>
+<p>
+Storm is being used in production mode at the Rubicon Project to analyze the results of auctions of ad impressions on its RTB exchange as they occur.  It is currently processing around 650 million auction results in three data centers daily (with 3 separate Storm clusters). One simple application is identifying new creatives (ads) in real time for ad quality purposes.  A more sophisticated application is an "Inventory Valuation Service" that uses DRPC to return appraisals of new impressions before the auction takes place.  The appraisals are used for various optimization problems, such as deciding whether to auction an impression or skip it when close to maximum capacity.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.ooyala.com/">Ooyala</a>
+</td>
+<td>
+<p>
+Ooyala powers personalized multi-screen video experiences for some of the world's largest networks, brands and media companies. We provide all the technology and tools our customers need to manage, distribute and monetize digital video content at a global scale.
+</p>
+
+<p>
+At the core of our technology is an analytics engine that processes over two billion analytics events each day, derived from nearly 200 million viewers worldwide who watch video on an Ooyala-powered player.
+</p>
+
+<p>
+Ooyala will be deploying Storm in production to give our customers real-time streaming analytics on consumer viewing behavior and digital content trends. Storm enables us to rapidly mine one of the world's largest online video data sets to deliver up-to-the-minute business intelligence ranging from real-time viewing patterns to personalized content recommendations to dynamic programming guides and dozens of other insights for maximizing revenue with online video.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.taobao.com/index_global.php">Taobao</a>
+</td>
+<td>
+<p>
+We make statistics of logs and extract useful information from the statistics in almost real-time with Storm.  Logs are read from Kafka-like persistent message queues into spouts, then processed and emitted over the topologies to compute desired results, which are then stored into distributed databases to be used elsewhere. Input log count varies from 2 millions to 1.5 billion every day, whose size is up to 2 terabytes among the projects.  The main challenge here is not only real-time processing of big data set; storing and persisting result is also a challenge and needs careful design and implementation.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.alibaba.com/">Alibaba</a>
+</td>
+<td>
+<p>
+Alibaba is the leading B2B e-commerce website in the world. We use storm to process the application log and the data change in database to supply realtime stats for data apps.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://iQIYI.COM">iQIYI</a>
+</td>
+<td>
+<p>
+iQIYI is China`s largest online video platform. We are using Storm in our video advertising system, video recommendation system, log analysis system and many other scenarios. Now we have several standalone Storm clusters, and we also have Storm clusters on Mesos and on Yarn. Kafka-Storm integration and Storm–HBase integration are quite common in our production environment. We have great interests in the new development about integration of Storm with other applications, like HBase, HDFS and Kafka.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.baidu.com/">Baidu</a>
+</td>
+<td>
+<p>
+Baidu offers top searching technology services for websites, audio files and images, my group using Storm to process the searching logs to supply realtime stats for accounting pv, ar-time and so on.
+This project helps Ops to determine and monitor services status and can do great things in the future.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.yelp.com/">Yelp</a>
+</td>
+<td>
+<p>
+Yelp is using Storm with <a href="http://pyleus.org/">Pyleus</a> to build a platform for developers to consume and process high throughput streams of data in real time. We have ongoing projects to use Storm and Pyleus for overhauling our internal application metrics pipeline, building an automated Python profile analysis system, and for general ETL operations. As its support for non-JVM components matures, we hope to make Storm the standard way of processing streaming data at Yelp.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.klout.com/">Klout</a>
+</td>
+<td>
+<p>
+Klout helps everyone discover and be recognized for their influence by analyzing engagement with their content across social networks. Our analysis powers a daily Klout Score on a scale from 1-100 that shows how much influence social media users have and on what topics. We are using Storm to develop a realtime scoring and moments generation pipeline. Leveraging Storm's intuitive Trident abstraction we are able to create complex topologies which stream data from our network collectors via Kafka, processed and written out to HDFS.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.loggly.com">Loggly</a>
+</td>
+<td>
+<p>
+Loggly is the world's most popular cloud-based log management. Our cloud-based log management service helps DevOps and technical teams make sense of the the massive quantity of logs that are being produced by a growing number of cloud-centric applications – in order to solve operational problems faster. Storm is the heart of our ingestion pipeline where it filters, parses and analyses billions of log events all-day, every day and in real-time.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://premise.is/">premise.is</a>
+</td>
+<td>
+<p>
+We're building a platform for alternative, bottom-up, high-granularity econometric data capture, particularly targeting opaque developing economies (i.e., Argentina might lie about their inflation statistics, but their black market certainly doesn't). Basically we get to funnel hedge fund money into improving global economic transparency. 
+</p>
+<p>
+We've been using Storm in production since January 2012 as a streaming, time-indexed web crawl + extraction + machine learning-based semantic markup flow (about 60 physical nodes comparable to m1.large; generating a modest 25GB/hr incremental). We wanted to have an end-to-end push-based system where new inputs get percolated through the topology in realtime and appear on the website, with no batch jobs required in between steps. Storm has been really integral to realizing this goal.
+</p>
+</td>
+</tr>
+
+
+
+<tr>
+<td>
+<a href="http://www.wego.com/">Wego</a>
+</td>
+<td>
+<p>About Wego, we are one of the world’s most comprehensive travel metasearch engines, operating in 42 markets worldwide and used by millions of travelers to save time, pay less and travel more. We compare and display real-time flights, hotel pricing and availability from hundreds of leading travel sites from all around the world on one simple screen.</p>
+
+<p>At the heart of our products, Storm helps us to stream real-time meta-search data from our partners to end-users. Since data comes from many sources and with different timing, Storm topology concept naturally solves concurrency issues while helping us to continuously merge, slice and clean all the data. Additionally with a few tricks and tools provided in Storm we can easily apply incremental update to improve the flow our data (1-5GB/minute).</p>
+ 
+<p>With its simplicity, scalability, and flexibility, Storm does not only improve our current products but more importantly changes the way we work with data. Instead of keeping data static and crunching it once a while, we constantly move data all around, making use of different technologies, evaluating new ideas and building new products. We stream critical data to memory for fast access while continuously crunching and directing huge amount of data into various engines so that we can evaluate and make use of data instantly. Previously, this kind of system requires to setup and maintain quite a few things but with Storm all we need is half day of coding and a few seconds to deploy. In this sense we never think Storm is to serve our products but rather to evolve our products.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://rocketfuel.com/">RocketFuel</a>
+</td>
+<td>
+<p>
+At Rocket Fuel (an ad network) we are building a real time platform on top of Storm which imitates the time critical workflows of existing Hadoop based ETL pipeline. This platform tracks impressions, clicks, conversions, bid requests etc. in real time. We are using Kafka as message queue. To start with we are pushing per minute aggregations directly to MySQL, but we plan to go finer than one minute and may bring HBase in to the picture to handle increased write load. 
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://quicklizard.com/">QuickLizard</a>
+</td>
+<td>
+<p>
+QuickLizard builds solution for automated pricing for companies that have many products in their lists. Prices are influenced by multiple factors internal and external to company.
+</p>
+
+<p>
+Currently we use Storm to choose products that need to be priced. We get real time stream of events from client site and filters them to get much more light stream of products that need to be processed by our procedures to get price recommendation.
+</p>
+
+<p>
+In plans: use Storm also for real time data mining model calculation that should match products described on competitor sites to client products.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://spider.io/">spider.io</a>
+</td>
+<td>
+<p>
+At spider.io we've been using Storm as a core part of our classification engine since October 2011. We run Storm topologies to combine, analyse and classify real-time streams of internet traffic, to identify suspicious or undesirable website activity. Over the past 7 months we've expanded our use of Storm, so it now manages most of our real-time processing. Our classifications are displayed in a custom analytics dashboard, where Storm's distributed remote procedure call interface is used to gather data from our database and metadata services. DRPC allows us to increase the responsiveness of our user interface by distributing processing across a cluster of Amazon EC2 instances.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://8digits.com/">8digits</a>
+</td>
+<td>
+<p>
+At 8digits, we are using Storm in our analytics engine, which is one of the most crucial parts of our infrastructure. We are utilizing several cloud servers with multiple cores each for the purpose of running a real-time system making several complex calculations. Storm is a proven, solid and a powerful framework for most of the big-data problems.
+</p>
+</td>
+</tr>
+
+
+
+<tr>
+<td>
+<a href="https://www.alipay.com/">Alipay</a>
+</td>
+<td>
+<p>
+Alipay is China's leading third-party online payment platform. We are using Storm in many scenarios:
+</p>
+
+<ol>
+<li>
+Calculate realtime trade quantity, trade amount, the TOP N seller trading information, user register count. More than 100 million messages per day.
+</li>
+<li>
+Log processing, more than 6T data per day.
+</li>
+</ol>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://navisite.com/">NaviSite</a>
+</td>
+<td>
+<p>
+We are using Storm as part of our server event log monitoring/auditing system.  We send log messages from thousands of servers into a RabbitMQ cluster and then use Storm to check each message against a set of regular expressions.  If there is a match (&lt; 1% of messages), then the message is sent to a bolt that stores data in a Mongo database.  Right now we are handling a load of somewhere around 5-10k messages per second, however we tested our existing RabbitMQ + Storm clusters up to about 50k per second.  We have plans to do real time intrusion detection as an enhancement to the current log message reporting system. 
+</p>
+
+<p>
+We have Storm deployed on the NaviSite Cloud platform.  We have a ZK cluster of 3 small VMs, 1 Nimbus VM and 16 dual core/4GB VMs as supervisors.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.paywithglyph.com">Glyph</a>
+</td>
+<td>
+<p>
+Glyph is in the business of providing credit card rewards intelligence to consumers. At a given point of sale Glyph suggest its users what are the best cards to be used at a given merchant location that will provide maximum rewards. Glyph also provide suggestion on the cards the user should carry to earn maximum rewards based on his personal spending habits. Glyph provides this information to the user by retrieving and analyzing credit card transactions from banks. Storm is used in Glyph to perform this retrieval and analysis in realtime. We are using Memcached in conjuction with Storm for handling sessions. We are impressed by how Storm makes high availability and reliability of Glyph services possible. We are now using Storm and Clojure in building Glyph data analytics and insights services. We have open-sourced node-drpc wrapper module for easy Storm DRPC integration with NodeJS.
+</p>
+</td>
+</tr>
+<tr>
+<td>
+<a href="http://heartbyte.com/">Heartbyte</a>
+</td>
+<td>
+<p>
+At Heartbyte, Storm is a central piece of our realtime audience participation platform.  We are often required to process a 'vote' per second from hundreds of thousands of mobile devices simultaneously and process / aggregate all of the data within a second.  Further, we are finding that Storm is a great alternative to other ingest tools for Hadoop/HBase, which we use for batch processing after our events conclude.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://2lemetry.com/">2lemetry</a>
+</td>
+<td>
+<p>
+2lemetry uses Storm to power it's real time analytics on top of the m2m.io offering. 2lemetry is partnered with Sprint, Verizon, AT&T, and Arrow Electronics to power IoT applications world wide. Some of 2lemetry's larger projects include RTX, Kontron, and Intel. 2lemetry also works with many professional sporting teams to parse data in real time. 2lemetry receives events for every touch of the ball in every MLS soccer match. Storm is used to look for trends like passing tendencies as they develop during the game. 
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.nodeable.com/">Nodeable</a>
+</td>
+<td>
+<p>
+Nodeable uses Storm to deliver real-time continuous computation of the data we consume. Storm has made it significantly easier for us to scale our service more efficiently while ensuring the data we deliver is timely and accurate.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="https://twitsprout.com/">TwitSprout</a>
+</td>
+<td>
+<p>
+At TwitSprout, we use Storm to analyze activity on Twitter to monitor mentions of keywords (mostly client product and brand names) and trigger alerts when activity around a certain keyword spikes above normal levels. We also use Storm to back the data behind the live-infographics we produce for events sponsored by our clients. The infographics are usually in the form of a live dashboard that helps measure the audience buzz across social media as it relates to the event in realtime.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.happyelements.com/">HappyElements</a>
+</td>
+<td>
+<p>
+<a href="http://www.happyelements.com">HappyElements</a> is a leading social game developer on Facebook and other SNS platforms. We developed a real time data analysis program based on storm to analyze user activity in real time.  Storm is very easy to use, stable, scalable and maintainable.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.idexx.com/view/xhtml/en_us/corporate/home.jsf">IDEXX Laboratories</a>
+</td>
+<td>
+<p>
+IDEXX Laboratories is the leading maker of software and diagnostic instruments for the veterinary market. We collect and analyze veterinary medical data from thousands of veterinary clinics across the US. We recently embarked on a project to upgrade our aging data processing infrastructure that was unable to keep up with the rapid increase in the volume, velocity and variety of data that we were processing.
+</p>
+
+<p>
+We are utilizing the Storm system to take in the data that is extracted from the medical records in a number of different schemas, transform it into a standard schema that we created and store it in an Oracle RDBMS database. It is basically a souped up distributed ETL system. Storm takes on the plumbing necessary for a distributed system and is very easy to write code for. The ability to create small pieces of functionality and connect them together gives us the ultimate flexibility to parallelize each of the pieces differently.
+</p>
+
+<p>
+Our current cluster consists of four supervisor machines running 110 tasks inside 32 worker processes. We run two different topologies which receive messages and communicate with each other via RabbitMQ. The whole thing is deployed on Amazon Web Services and utilizes S3 for some intermediate storage, Redis as a key/value store and Oracle RDS for RDBMS storage. The bolts are all written in Java using the Spring framework with Hibernate as an ORM.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.umeng.com/">Umeng</a>
+</td>
+<td>
+Umeng is the leading and largest provider of mobile app analytics and developer services platform in China. Storm powers Umeng's realtime analytics platform, processing billions of data points per day and growing. We also use Storm in other products which requires realtime processing and it has become the core infrastructure in our company. 
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.admaster.com.cn/">Admaster</a>
+</td>
+<td>
+<p>
+We provide monitoring and precise delivery for Internet advertising. We use Storm to do the following:
+</p>
+
+<ol>
+<li>Calculate PV, UV of every advertisement.</li>
+<li>Simple data cleaning: filter out data which format error, filter out cheating data (the pv more than certain value)</li>
+</ol>
+Our cluster has 8 nodes, process several billions messages per day, about 200GB.
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://socialmetrix.com/en/">SocialMetrix</a>
+</td>
+<td>
+<p>
+Since its release, Storm was a perfect fit to our needs of real time monitoring. Its powerful API, easy administration and deploy, enabled us to rapidly build solutions to monitor presidential elections, several major events and currently it is the processing core of our new product "Socialmetrix Eventia".
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://needium.com/">Needium</a>
+</td>
+<td>
+<p>
+At Needium we love Ruby and JRuby. The Storm platform offers the right balance between simplicity, flexibility and scalability. We created RedStorm, a Ruby DSL for Storm, to keep on using Ruby on top of the power of Storm by leveraging Storm's JVM foundation with JRuby. We currently use Storm as our Twitter realtime data processing pipeline. We have Storm topologies for content filtering, geolocalisation and classification. Storm allows us to architecture our pipeline for the Twitter full firehose scale.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://parse.ly/">Parse.ly</a>
+</td>
+<td>
+<p>
+Parse.ly is using Storm for its web/content analytics system. We have a home-grown data processing and storage system built with Python and Celery, with backend stores in Redis and MongoDB. We are now using Storm for real-time unique visitor counting and are exploring options for using it for some of our richer data sources such as social share data and semantic content metadata.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.parc.com/">PARC</a>
+</td>
+<td>
+<p>
+High Performance Graph Analytics & Real-time Insights Research team at PARC uses Storm as one of the building blocks of their PARC Analytics Cloud infrastructure which comprises of Nebula based Openstack, Hadoop, SAP HANA, Storm, PARC Graph Analytics, and machine learning toolbox to enable researchers to process real-time data feeds from Sensors, web, network, social media, and security traces and easily ingest any other real-time data feeds of interest for PARC researchers.
+</p>
+<p>
+PARC researchers are working with number of industry collaborators developing new tools, algorithms, and models to analyze massive amounts of e-commerce, web clickstreams, 3rd party syndicated data, cohort data, social media data streams, and structured data from RDBMS, NOSQL, and NEWSQL systems in near real-time. PARC  team is developing a reference architecture and benchmarks for their near real-time automated insight discovery platform combining the power of all above tools and PARC’s applied research in machine learning, graph analytics, reasoning, clustering, and contextual recommendations. The High Performance Graph Analytics & Real-time Insights research at PARC is headed by Surendra Reddy<http://www.linkedin.com/in/skreddy>.  If you are interested to learn more about our use/experience of using Storm or to know more about our research or to collaborate with PARC in this area, please feel free to contact sureddy@parc.com.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://gumgum.com/">GumGum</a>
+</td>
+<td>
+<p>
+GumGum, the leading in-image advertising platform for publishers and brands, uses Storm to produce real-time data. Storm and Trident-based topologies consume various ad-related events from Kafka and persist the aggregations in MySQL and HBase. This architecture will eventually replace most existing daily Hadoop map reduce jobs. There are also plans for Kafka + Storm to replace existing distributed queue processing infrastructure built with Amazon SQS.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.crowdflower.com/">CrowdFlower</a>
+</td>
+<td>
+<p>
+CrowdFlower is using Storm with Kafka to generalize our data stream
+aggregation and realtime computation infrastructure. We replaced our
+homegrown aggregation solutions with Storm because it simplified the
+creation of fault tolerant systems. We were already using Zookeeper
+and Kafka, so Storm allowed us to build more generic abstractions for
+our analytics using tools that we had already deployed and
+battle-tested in production.
+</p>
+
+<p>
+We are currently writing to DynamoDB from Storm, so we are able to
+scale our capacity quickly by bringing up additional supervisors and
+tweaking the throughput on our Dynamo tables. We look forward to
+exploring other uses for Storm in our system, especially with the
+recent release of Trident.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.dsbox.com">Digital Sandbox</a>
+</td>
+<td>
+<p>
+At Digital Sandbox we use Storm to enable our open source information feed monitoring system.  The system uses Storm to constantly monitor and pull data from structured and unstructured information sources across the internet.  For each found item, our topology applies natural language processing based concept analysis, temporal analysis, geospatial analytics and a prioritization algorithm to enable users to monitor large special events, public safety operations, and topics of interest to a multitude of individual users and teams.
+</p>
+ 
+<p>
+Our system is built using Storm for feed retrieval and annotation, Python with Flask and jQuery for business logic and web interfaces, and MongoDB for data persistence. We use NTLK for natural language processing and the WordNet, GeoNames, and OpenStreetMap databases to enable feed item concept extraction and geolocation.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://hallo.me/">Hallo</a>
+</td>
+<td>
+With several mainstream celebrities and very popular YouTubers using Hallo to communicate with their fans, we needed a good solution to notify users via push notifications and make sure that the celebrity messages were delivered to follower timelines in near realtime. Our initial approach for broadcast push notifications would take anywhere from 2-3 hours. After re-engineering our solution on top of Storm, that time has been cut down to 5 minutes on a very small cluster. With the user base growing and user need for realtime communication, we are very happy knowing that we can easily scale Storm by adding nodes to maintain a baseline QoS for our users.
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://keepcon.com/">Keepcon</a>
+</td>
+<td>
+We provide moderation services for classifieds, kids communities, newspapers, chat rooms, facebook fan pages, youtube channels, reviews, and all kind of UGC. We use storm for the integration with our clients, find evidences within each text, persisting on cassandra and elastic search and sending results back to our clients.
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.visiblemeasures.com/">Visible Measures</a>
+</td>
+<td>
+<p>
+Visible Measures powers video campaigns and analytics for publishers and
+advertisers, tracking data for hundreds of million of videos, and billions
+of views. We are using Storm to process viewing behavior data in real time and make
+the information immediately available to our customers. We read events from
+various push and pull sources, including a Kestrel queue, filter and
+enrich the events in Storm topologies, and persist the events to Redis,
+HDFS and Vertica for real-time analytics and archiving. We are currently
+experimenting with Trident topologies, and figuring out how to move more
+of our Hadoop-based batch processing into Storm.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.o2mc.eu/en/">O2mc</a>
+</td>
+<td>
+<p>
+One of the core products of O2mc is called O2mc Community. O2mc Community performs multilingual, realtime sentiment analysis with very low latency and distributes the analyzed results to numerous clients. The input is extracted from source systems like Twitter, Facebook, e-mail and many more. After the analysis has taken place on Storm, the results are streamed to any output system ranging from HTTP streaming to clients to direct database insertion to an external business process engine to kickstart a process.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.theladders.com">The Ladders</a>
+</td>
+<td>
+<p>
+TheLadders has been committed to finding the right person for the right job since 2003. We're using Storm in a variety of ways and are happy with its versatility, robustness, and ease of development. We use Storm in conjunction with RabbitMQ for such things as sending hiring alerts: when a recruiter submits a job to our site, Storm processes that event and will aggregate jobseekers whose profiles match the position. That list is subsequently batch-processed to send an email to the list of jobseekers. We also use Storm to persist events for Business Intelligence and internal event tracking. We're continuing to find uses for Storm where fast, asynchronous, real-time event processing is a must.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://semlab.nl">SemLab</a>
+</td>
+<td>
+<p>
+SemLab develops software for knowledge discovery and information support. Our ViewerPro platform uses information extraction, natural language processing and semantic web technologies to extract structured data from unstructured sources, in domains such as financial news feeds and legal documents. We have succesfully adapted ViewerPro's processing framework to run on top of Storm. The transition to Storm has made ViewerPro a much more scalable product, allowing us to process more in less time.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://visualrevenue.com/">Visual Revenue</a>
+</td>
+<td>
+<p>
+Here at Visual Revenue, we built a decision support system to help online editors to make choices on what, when, and where to promote their content in real-time. Storm is the backbone our real-time data processing and aggregation pipelines.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.peerindex.com/">PeerIndex</a>
+</td>
+<td>
+<p>
+PeerIndex is working to deliver influence at scale. PeerIndex does this by exposing services built on top of our Influence Graph; a directed graph of who is influencing whom on the web. PeerIndex gathers data from a number of social networks to create the Influence Graph. We use Storm to process our social data, to provide real-time aggregations, and to crawl the web, before storing our data in a manner most suitable for our Hadoop based systems to batch process. Storm provided us with an intuitive API and has slotted in well with the rest of our architecture. PeerIndex looks forward to further investing resources into our Storm based real-time analytics.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://ants.vn">ANTS.VN</a>
+</td>
+<td>
+<p>
+Big Data in Advertising is Vietnam's unique platform combines ad serving, a real-time bidding (RTB) exchange, Ad Server, Analytics, yield optimization, and content valuation to deliver the highest revenue across every desktop, tablet, and mobile screen. At ANTS.VN we use Storm to process large amounts of data to provide data real time, improve our Ad quality. This platform tracks impressions, clicks, conversions, bid requests etc. in real time. Together with Kafka, Redis, memcached and Cassandra based messaging, Storm enables us to build low-latency fault-tolerant distributed systems with ease.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.wayfair.com">Wayfair</a>
+</td>
+<td>
+<p>
+At Wayfair, we use storm as a platform to drive our core order processing pipeline as an event driven system. Storm allows us to reliably process tens of thousands of orders daily while providing us the assurance of seamless process scalability as our order load increases. Given the project’s ease of use and the immense support of the community, we’ve managed to implement our bolts in php, construct a simple puppet module for configuration management, and quickly solve arising issues. We can now focus most of our development efforts in the business layer, check out more information on how we use storm <a href="http://engineering.wayfair.com/stormin-oms/">in our engineering blog</a>. </p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://innoquant.com/">InnoQuant</a>
+</td>
+<td>
+<p>
+At InnoQuant, we use Storm as a backbone of our real-time big data analytics engine in MOCA platform. MOCA is a next generation, mobile-backend-as-a-service platform (MBaaS). It provides brands and app developers with real-time in-app tracking, context-aware push messaging, user micro-segmentation based on profile, time and geo-context as well as big data analytics. Storm-based pipeline is fed with events captured by native mobile SDKs (iOS, Android), scales nicely with connected mobile app users, delivers stream-based metrics and aggregations, and finally integrates with the rest of MOCA infrastructure, including columnar storage (Cassandra) and graph storage (Titan).
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.fliptop.com/">Fliptop</a>
+</td>
+<td>
+<p>
+Fliptop is a customer intelligence platform which allows customers to integrating their contacts, and campaign data, to enhance their prospect with social identities, and to find their best leads, and most influential customers. We have been using Storm for various tasks which requires scalability and reliability, including integrating with sales/marketing platform, data appending for contacts/leads, and computing scoring of contacts/leads. It's one of our most robust and scalable infrastructure.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.trovit.com/">Trovit</a>
+</td>
+<td>
+<p>
+Trovit is a search engine for classified ads present in 39 countries and different business categories (Real Estate, Cars, Jobs, Rentals, Products and Deals). Currently we use Storm to process and index ads in a distributed and low latency fashion. Combined with other technologies like Hadoop, Hbase and Solr has allowed us to build a scalable and low latency platform to serve search results to the end user.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.openx.com/">OpenX</a>
+</td>
+<td>
+<p>
+OpenX is a unique platform combines ad serving, a real-time bidding (RTB) exchange, yield optimization, and content valuation to deliver the highest revenue across every desktop, tablet, and mobile screen
+At OpenX we use Storm to process large amounts of data to provide real time Analytics. Storm provides us to process data real time to improve our Ad quality.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://keen.io/">Keen IO</a>
+</td>
+<td>
+<p>
+Keen IO is an analytics backend-as-a-service. The Keen IO API makes it easy for customers to do internal analytics or expose analytics features to their customers. Keen IO uses Storm (DRPC) to query billion-event data sets at very low latencies. We also use Storm to control our ingestion pipeline, sourcing data from Kafka and storing it in Cassandra.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://liveperson.com/">LivePerson</a>
+</td>
+<td>
+<p>
+LivePerson is a provider of Interaction-Service over the web. Interaction between an agent and a visitor in site can be achieved using phone call, chat, banners, etc.Using Storm, LivePerson can collect and process visitor data and provide information in real time to the agents about the visitor behavior. Moreover, LivePerson gets to better decisions about how to react to visitors in a way that best addresses their needs.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://yieldbot.com/">YieldBot</a>
+</td>
+<td>
+<p>
+Yieldbot connects ads to the real-time consumer intent streaming within premium publishers. To do this, Yieldbot leverages Storm for a wide variety of real-time processing tasks. We've open sourced our clojure DSL for writing trident topologies, marceline, which we use extensively. Events are read from Kafka, most state is stored in Cassandra, and we heavily use Storm's DRPC features. Our Storm use cases range from HTML processing, to hotness-style trending, to probabilistic rankings and cardinalities. Storm topologies touch virtually all of the events generated by the Yieldbot platform.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://equinix.com/">Equinix</a>
+</td>
+<td>
+<p>
+At Equinix, we use a number of Storm topologies to process and persist various data streams generated by sensors in our data centers. We also use Storm for real-time monitoring of different infrastructure components. Other few topologies are used for processing logs in real-time for internal IT systems  which also provide insights in user behavior.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://minewhat.com/">MineWhat</a>
+</td>
+<td>
+<p>
+MineWhat provides actionable analytics for ecommerce spanning every SKU,brand and category in the store. We use Storm to process raw click stream ingestion from Kafka and compute live analytics. Storm topologies powers our complex product to user interaction analysis. Multi language feature in storm is really kick-ass, we have bolts written in Node.js, Python and Ruby. Storm has been in our production site since Nov 2012.
+</p>
+</td>
+</tr>
+
+
+<tr>
+<td>
+<a href="http://www.360.cn/">Qihoo 360</a>
+</td>
+<td>
+<p>
+360 have deployed about 50 realtime applications on top of storm including web page analysis, log processing, image processing, voice processing, etc.
+</p>
+<p>
+The use case of storm at 360 is a bit special since we deployed storm on thounds of servers which are not dedicated for storm. Storm just use little cpu/memory/network resource on each server. However theses storm clusters leverage idle resources of servers at nearly zero cost to provide great computing power and it's realtime. It's amazing.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.holidaycheck.com/">HolidayCheck</a>
+</td>
+<td>
+<p>
+HolidayCheck is an online travel site and agency available in 10
+languages worldwide visited by 30 million people a month.
+We use Storm to deliver real-time hotel and holiday package offers
+from multiple providers - reservation systems and affiliate travel
+networks - in a low latency fashion based on user-selected criteria.
+In further reservation steps we use DRPC for vacancy checks and
+bookings of chosen offers. Along with Storm in the system for offers
+delivery we use Scala, Akka, Hazelcast, Drools and MongoDB. Real-time
+offer stream is delivered outside of the system back to the front-end
+via websocket connections.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://dataminelab.com/">DataMine Lab</a>
+</td>
+<td>
+<p>
+DataMine Lab is a consulting company integrating Storm into its
+portfolio of technologies. Storm powers range of our customers'
+systems allowing us to build real time analytics on tens of millions
+of visitors to the advertising platforms we helped to create. Together
+with Redis, Cassandra and Hadoop, Storm allows us to provide real-time
+distributed data platform at a global scale.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.wizecommerce.com/">Wize Commerce</a>
+</td>
+<td>
+<p>
+Wize Commerce® is the smartest way to grow your digital business. For over ten years, we have been helping clients maximize their revenue and traffic using optimization technologies that operate at massive scale, and across digital ecosystems. We own and operate leading comparison shopping engines including Nextag®, PriceMachineTM, and <a href="http://guenstiger.de">guenstiger.de</a>, and provide services to a wide ecosystem of partner sites that use our e-commerce platform. These sites together drive over $1B in annual merchant sales.
+</p>
+<p>
+We use storm to power our core platform infrastructure and it has become a vital component of our search indexing system & Cassandra storage. Along with KAFKA, STORM has reduced our end-to-end latencies from several hours to few minutes, and being largest comparison shopping sites operator, pushing price updates to the live site is very important and storm helps a lot achieve the same. We are extensively using storm in production since Q1 2013.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://metamarkets.com">Metamarkets</a>
+</td>
+<td>
+<p>At Metamarkets, Apache Storm is used to process real-time event data streamed from Apache Kafka message brokers, and then to load that data into a <a href="http://druid.io">Druid cluster</a>, the low-latency data store at the heart of our real-time analytics service. Our Storm topologies perform various operations, ranging from simple filtering of "outdated" events, to transformations such as ID-to-name lookups, to complex multi-stream joins. Since our service is intended to respond to ad-hoc queries within seconds of ingesting events, the speed, flexibility, and robustness of those topologies make Storm a key piece of our real-time stack.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.mightytravels.com">Mighty Travels</a>
+</td>
+<td>
+<p>We are using Storm to process real-time search data stream and
+application logs. The part we like best about Storm is the ease of
+scaling up basically just by throwing more machines at it.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.polecat.co">Polecat</a>
+</td>
+<td>
+<p>Polecat's digital analyisis platform, MeaningMine, allows users to search all on-line news, blogs and social media in real-time and run bespoke analysis in order to inform corporate strategy and decision making for some of the world largest companies and governmental organisations.</p>
+<p>
+Polecat uses Storm to run an application we've called the 'Data Munger'.  We run many different topologies on a multi host storm cluster to process tens of millions of online articles and posts that we collect each day.  Storm handles our analysis of these documents so that we can provide insight on realtime data to our clients.  We output our results from Storm into one of many large Apache Solr clusters for our end user applications to query (Polecat is also a contributor to Solr).  We first starting developing our app to run on storm back in June 2012 and it has been live since roughly September 2012.  We've found Storm to be an excellent fit for our needs here, and we've always found it extremely robust and fast.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="https://www.skylight.io/">Skylight by Tilde</a>
+</td>
+<td>
+<p>Skylight is a production profiler for Ruby on Rails apps that focuses on providing detailed information about your running application that you can explore in an intuitive way. We use Storm to process traces from our agent into data structures that we can slice and dice for you in our web app.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.ad4game.com/">Ad4Game</a>
+</td>
+<td>
+<p>We are an advertising network and we use Storm to calculate priorities in real time to know which ads to show for which website, visitor and country.</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.impetus.com/">Impetus Technologies</a>
+</td>
+<td>
+<p>StreamAnalytix, a product of Impetus Technologies enables enterprises to analyze and respond to events in real-time at Big Data scale. Based on Apache Storm, StreamAnalytix is designed to rapidly build and deploy streaming analytics applications for any industry vertical, any data format, and any use case. This high-performance scalable platform comes with a pre-integrated package of components like Cassandra, Storm, Kafka and more. In addition, it also brings together the proven open source technology stack with Hadoop and NoSQL to provide massive scalability, dynamic data pipelines, and a visual designer for rapid application development.</p>
+<p>
+Through StreamAnalytix, the users can ingest, store and analyze millions of events per second and discover exceptions, patterns, and trends through live dashboards. It also provides seamless integration with indexing store (ElasticSearch) and NoSQL database (HBase, Cassandra, and Oracle NoSQL) for writing data in real-time. With the use of Storm, the product delivers high business value solutions such as log analytics, streaming ETL, deep social listening, Real-time marketing, business process acceleration and predictive maintenance.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.akazoo.com/en">Akazoo</a>
+</td>
+<td>
+<p>
+Akazoo is a platform providing music streaming services.  Storm is the backbone of all our real-time analytical processing. We use it for tracking and analyzing application events and for various other stuff, including recommendations and parallel task execution.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.mapillary.com">Mapillary</a>
+</td>
+<td>
+<p>
+At Mapillary we use storm for a wide variety of tasks. Having a system which is 100% based on kafka input storm and trident makes reasoning about our data a breeze.  
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.gutscheinrausch.de/">Gutscheinrausch.de</a>
+</td>
+<td>
+<p>
+We recently upgraded our existing IT infrastructure, using Storm as one of our main tools.
+Each day we collect sales, clicks, visits and various ecommerce metrics from various different systems (webpages, affiliate reportings, networks, tracking-scripts etc). We process this continually generated data using Storm before entering it into the backend systems for further use.
+</p>
+<p>
+Using Storm we were able to decouple our heterogeneous frontend-systems from our backends and take load off the data warehouse applications by inputting pre-processed data. This way we can easy collect and process all data and then do realtime OLAP queries using our propietary data warehouse technology.
+</p>
+<p>
+We are mostly impressed by the high speed, low maintenance approach Storm has provided us with. Also being able to easily scale up the system using more machines is a big plus. Since we're a small team it allows us to focus more on our core business instead of the underlying technology. You could say it has taken our hearts by storm!
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.appriver.com">AppRiver</a>
+</td>
+<td>
+<p>
+We are using Storm to track internet threats from varied sources around the web.  It is always fast and reliable.
+</p>
+</td>
+</tr>
+
+<tr>
+<td>
+<a href="http://www.mercadolibre.com/">MercadoLibre</a>
+</td>
+<td>
+</td>
+</tr>
+
+
+</table>

Added: storm/branches/bobby-versioned-site/releases/0.10.0/Project-ideas.md
URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Project-ideas.md?rev=1735297&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Project-ideas.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Project-ideas.md Wed Mar 16 21:01:12 2016
@@ -0,0 +1,6 @@
+---
+layout: documentation
+---
+ * **DSLs for non-JVM languages:** These DSL's should be all-inclusive and not require any Java for the creation of topologies, spouts, or bolts. Since topologies are [Thrift](http://thrift.apache.org/) structs, Nimbus is a Thrift service, and bolts can be written in any language, this is possible.
+ * **Online machine learning algorithms:** Something like [Mahout](http://mahout.apache.org/) but for online algorithms
+ * **Suite of performance benchmarks:** These benchmarks should test Storm's performance on CPU and IO intensive workloads. There should be benchmarks for different classes of applications, such as stream processing (where throughput is the priority) and distributed RPC (where latency is the priority). 
\ No newline at end of file

Added: storm/branches/bobby-versioned-site/releases/0.10.0/Rationale.md
URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Rationale.md?rev=1735297&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Rationale.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Rationale.md Wed Mar 16 21:01:12 2016
@@ -0,0 +1,33 @@
+---
+title: Rationale
+layout: documentation
+documentation: true
+---
+The past decade has seen a revolution in data processing. MapReduce, Hadoop, and related technologies have made it possible to store and process data at scales previously unthinkable. Unfortunately, these data processing technologies are not realtime systems, nor are they meant to be. There's no hack that will turn Hadoop into a realtime system; realtime data processing has a fundamentally different set of requirements than batch processing.
+
+However, realtime data processing at massive scale is becoming more and more of a requirement for businesses. The lack of a "Hadoop of realtime" has become the biggest hole in the data processing ecosystem.
+
+Storm fills that hole.
+
+Before Storm, you would typically have to manually build a network of queues and workers to do realtime processing. Workers would process messages off a queue, update databases, and send new messages to other queues for further processing. Unfortunately, this approach has serious limitations:
+
+1. **Tedious**: You spend most of your development time configuring where to send messages, deploying workers, and deploying intermediate queues. The realtime processing logic that you care about corresponds to a relatively small percentage of your codebase.
+2. **Brittle**: There's little fault-tolerance. You're responsible for keeping each worker and queue up.
+3. **Painful to scale**: When the message throughput get too high for a single worker or queue, you need to partition how the data is spread around. You need to reconfigure the other workers to know the new locations to send messages. This introduces moving parts and new pieces that can fail.
+
+Although the queues and workers paradigm breaks down for large numbers of messages, message processing is clearly the fundamental paradigm for realtime computation. The question is: how do you do it in a way that doesn't lose data, scales to huge volumes of messages, and is dead-simple to use and operate?
+
+Storm satisfies these goals. 
+
+## Why Storm is important
+
+Storm exposes a set of primitives for doing realtime computation. Like how MapReduce greatly eases the writing of parallel batch processing, Storm's primitives greatly ease the writing of parallel realtime computation.
+
+The key properties of Storm are:
+
+1. **Extremely broad set of use cases**: Storm can be used for processing messages and updating databases (stream processing), doing a continuous query on data streams and streaming the results into clients (continuous computation), parallelizing an intense query like a search query on the fly (distributed RPC), and more. Storm's small set of primitives satisfy a stunning number of use cases.
+2. **Scalable**: Storm scales to massive numbers of messages per second. To scale a topology, all you have to do is add machines and increase the parallelism settings of the topology. As an example of Storm's scale, one of Storm's initial applications processed 1,000,000 messages per second on a 10 node cluster, including hundreds of database calls per second as part of the topology. Storm's usage of Zookeeper for cluster coordination makes it scale to much larger cluster sizes.
+3. **Guarantees no data loss**: A realtime system must have strong guarantees about data being successfully processed. A system that drops data has a very limited set of use cases. Storm guarantees that every message will be processed, and this is in direct contrast with other systems like S4. 
+4. **Extremely robust**: Unlike systems like Hadoop, which are notorious for being difficult to manage, Storm clusters just work. It is an explicit goal of the Storm project to make the user experience of managing Storm clusters as painless as possible.
+5. **Fault-tolerant**: If there are faults during execution of your computation, Storm will reassign tasks as necessary. Storm makes sure that a computation can run forever (or until you kill the computation).
+6. **Programming language agnostic**: Robust and scalable realtime processing shouldn't be limited to a single platform. Storm topologies and processing components can be defined in any language, making Storm accessible to nearly anyone.

Added: storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md
URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md?rev=1735297&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Running-topologies-on-a-production-cluster.md Wed Mar 16 21:01:12 2016
@@ -0,0 +1,77 @@
+---
+title: Running Topologies on a Production Cluster
+layout: documentation
+documentation: true
+---
+Running topologies on a production cluster is similar to running in [Local mode](Local-mode.html). Here are the steps:
+
+1) Define the topology (Use [TopologyBuilder](/javadoc/apidocs/backtype/storm/topology/TopologyBuilder.html) if defining using Java)
+
+2) Use [StormSubmitter](/javadoc/apidocs/backtype/storm/StormSubmitter.html) to submit the topology to the cluster. `StormSubmitter` takes as input the name of the topology, a configuration for the topology, and the topology itself. For example:
+
+```java
+Config conf = new Config();
+conf.setNumWorkers(20);
+conf.setMaxSpoutPending(5000);
+StormSubmitter.submitTopology("mytopology", conf, topology);
+```
+
+3) Create a jar containing your code and all the dependencies of your code (except for Storm -- the Storm jars will be added to the classpath on the worker nodes).
+
+If you're using Maven, the [Maven Assembly Plugin](http://maven.apache.org/plugins/maven-assembly-plugin/) can do the packaging for you. Just add this to your pom.xml:
+
+```xml
+  <plugin>
+    <artifactId>maven-assembly-plugin</artifactId>
+    <configuration>
+      <descriptorRefs>  
+        <descriptorRef>jar-with-dependencies</descriptorRef>
+      </descriptorRefs>
+      <archive>
+        <manifest>
+          <mainClass>com.path.to.main.Class</mainClass>
+        </manifest>
+      </archive>
+    </configuration>
+  </plugin>
+```
+Then run mvn assembly:assembly to get an appropriately packaged jar. Make sure you [exclude](http://maven.apache.org/plugins/maven-assembly-plugin/examples/single/including-and-excluding-artifacts.html) the Storm jars since the cluster already has Storm on the classpath.
+
+4) Submit the topology to the cluster using the `storm` client, specifying the path to your jar, the classname to run, and any arguments it will use:
+
+`storm jar path/to/allmycode.jar org.me.MyTopology arg1 arg2 arg3`
+
+`storm jar` will submit the jar to the cluster and configure the `StormSubmitter` class to talk to the right cluster. In this example, after uploading the jar `storm jar` calls the main function on `org.me.MyTopology` with the arguments "arg1", "arg2", and "arg3".
+
+You can find out how to configure your `storm` client to talk to a Storm cluster on [Setting up development environment](Setting-up-development-environment.html).
+
+### Common configurations
+
+There are a variety of configurations you can set per topology. A list of all the configurations you can set can be found [here](/javadoc/apidocs/backtype/storm/Config.html). The ones prefixed with "TOPOLOGY" can be overridden on a topology-specific basis (the other ones are cluster configurations and cannot be overridden). Here are some common ones that are set for a topology:
+
+1. **Config.TOPOLOGY_WORKERS**: This sets the number of worker processes to use to execute the topology. For example, if you set this to 25, there will be 25 Java processes across the cluster executing all the tasks. If you had a combined 150 parallelism across all components in the topology, each worker process will have 6 tasks running within it as threads.
+2. **Config.TOPOLOGY_ACKERS**: This sets the number of tasks that will track tuple trees and detect when a spout tuple has been fully processed. Ackers are an integral part of Storm's reliability model and you can read more about them on [Guaranteeing message processing](Guaranteeing-message-processing.html).
+3. **Config.TOPOLOGY_MAX_SPOUT_PENDING**: This sets the maximum number of spout tuples that can be pending on a single spout task at once (pending means the tuple has not been acked or failed yet). It is highly recommended you set this config to prevent queue explosion.
+4. **Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS**: This is the maximum amount of time a spout tuple has to be fully completed before it is considered failed. This value defaults to 30 seconds, which is sufficient for most topologies. See [Guaranteeing message processing](Guaranteeing-message-processing.html) for more information on how Storm's reliability model works.
+5. **Config.TOPOLOGY_SERIALIZATIONS**: You can register more serializers to Storm using this config so that you can use custom types within tuples.
+
+
+### Killing a topology
+
+To kill a topology, simply run:
+
+`storm kill {stormname}`
+
+Give the same name to `storm kill` as you used when submitting the topology.
+
+Storm won't kill the topology immediately. Instead, it deactivates all the spouts so that they don't emit any more tuples, and then Storm waits Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS seconds before destroying all the workers. This gives the topology enough time to complete any tuples it was processing when it got killed.
+
+### Updating a running topology
+
+To update a running topology, the only option currently is to kill the current topology and resubmit a new one. A planned feature is to implement a `storm swap` command that swaps a running topology with a new one, ensuring minimal downtime and no chance of both topologies processing tuples at the same time. 
+
+### Monitoring topologies
+
+The best place to monitor a topology is using the Storm UI. The Storm UI provides information about errors happening in tasks and fine-grained stats on the throughput and latency performance of each component of each running topology.
+
+You can also look at the worker logs on the cluster machines.
\ No newline at end of file

Added: storm/branches/bobby-versioned-site/releases/0.10.0/Serialization-(prior-to-0.6.0).md
URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Serialization-%28prior-to-0.6.0%29.md?rev=1735297&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Serialization-(prior-to-0.6.0).md (added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Serialization-(prior-to-0.6.0).md Wed Mar 16 21:01:12 2016
@@ -0,0 +1,52 @@
+---
+title: Serialization (Prior to 0.6.0)
+layout: documentation
+documentation: true
+---
+Tuples can be comprised of objects of any types. Since Storm is a distributed system, it needs to know how to serialize and deserialize objects when they're passed between tasks. By default Storm can serialize ints, shorts, longs, floats, doubles, bools, bytes, strings, and byte arrays, but if you want to use another type in your tuples, you'll need to implement a custom serializer.
+
+### Dynamic typing
+
+There are no type declarations for fields in a Tuple. You put objects in fields and Storm figures out the serialization dynamically. Before we get to the interface for serialization, let's spend a moment understanding why Storm's tuples are dynamically typed.
+
+Adding static typing to tuple fields would add large amount of complexity to Storm's API. Hadoop, for example, statically types its keys and values but requires a huge amount of annotations on the part of the user. Hadoop's API is a burden to use and the "type safety" isn't worth it. Dynamic typing is simply easier to use.
+
+Further than that, it's not possible to statically type Storm's tuples in any reasonable way. Suppose a Bolt subscribes to multiple streams. The tuples from all those streams may have different types across the fields. When a Bolt receives a `Tuple` in `execute`, that tuple could have come from any stream and so could have any combination of types. There might be some reflection magic you can do to declare a different method for every tuple stream a bolt subscribes to, but Storm opts for the simpler, straightforward approach of dynamic typing.
+
+Finally, another reason for using dynamic typing is so Storm can be used in a straightforward manner from dynamically typed languages like Clojure and JRuby.
+
+### Custom serialization
+
+Let's dive into Storm's API for defining custom serializations. There are two steps you need to take as a user to create a custom serialization: implement the serializer, and register the serializer to Storm.
+
+#### Creating a serializer
+
+Custom serializers implement the [ISerialization](/javadoc/apidocs/backtype/storm/serialization/ISerialization.html) interface. Implementations specify how to serialize and deserialize types into a binary format.
+
+The interface looks like this:
+
+```java
+public interface ISerialization<T> {
+    public boolean accept(Class c);
+    public void serialize(T object, DataOutputStream stream) throws IOException;
+    public T deserialize(DataInputStream stream) throws IOException;
+}
+```
+
+Storm uses the `accept` method to determine if a type can be serialized by this serializer. Remember, Storm's tuples are dynamically typed so Storm determines what serializer to use at runtime.
+
+`serialize` writes the object out to the output stream in binary format. The field must be written in a way such that it can be deserialized later. For example, if you're writing out a list of objects, you'll need to write out the size of the list first so that you know how many elements to deserialize.
+
+`deserialize` reads the serialized object off of the stream and returns it.
+
+You can see example serialization implementations in the source for [SerializationFactory](https://github.com/apache/storm/blob/0.5.4/src/jvm/backtype/storm/serialization/SerializationFactory.java)
+
+#### Registering a serializer
+
+Once you create a serializer, you need to tell Storm it exists. This is done through the Storm configuration (See [Concepts](Concepts.html) for information about how configuration works in Storm). You can register serializations either through the config given when submitting a topology or in the storm.yaml files across your cluster.
+
+Serializer registrations are done through the Config.TOPOLOGY_SERIALIZATIONS config and is simply a list of serialization class names.
+
+Storm provides helpers for registering serializers in a topology config. The [Config](/javadoc/apidocs/backtype/storm/Config.html) class has a method called `addSerialization` that takes in a serializer class to add to the config.
+
+There's an advanced config called Config.TOPOLOGY_SKIP_MISSING_SERIALIZATIONS. If you set this to true, Storm will ignore any serializations that are registered but do not have their code available on the classpath. Otherwise, Storm will throw errors when it can't find a serialization. This is useful if you run many topologies on a cluster that each have different serializations, but you want to declare all the serializations across all topologies in the `storm.yaml` files.
\ No newline at end of file

Added: storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md
URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md?rev=1735297&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Serialization.md Wed Mar 16 21:01:12 2016
@@ -0,0 +1,62 @@
+---
+title: Serialization
+layout: documentation
+documentation: true
+---
+This page is about how the serialization system in Storm works for versions 0.6.0 and onwards. Storm used a different serialization system prior to 0.6.0 which is documented on [Serialization (prior to 0.6.0)](Serialization-\(prior-to-0.6.0\).html). 
+
+Tuples can be comprised of objects of any types. Since Storm is a distributed system, it needs to know how to serialize and deserialize objects when they're passed between tasks.
+
+Storm uses [Kryo](http://code.google.com/p/kryo/) for serialization. Kryo is a flexible and fast serialization library that produces small serializations.
+
+By default, Storm can serialize primitive types, strings, byte arrays, ArrayList, HashMap, HashSet, and the Clojure collection types. If you want to use another type in your tuples, you'll need to register a custom serializer.
+
+### Dynamic typing
+
+There are no type declarations for fields in a Tuple. You put objects in fields and Storm figures out the serialization dynamically. Before we get to the interface for serialization, let's spend a moment understanding why Storm's tuples are dynamically typed.
+
+Adding static typing to tuple fields would add large amount of complexity to Storm's API. Hadoop, for example, statically types its keys and values but requires a huge amount of annotations on the part of the user. Hadoop's API is a burden to use and the "type safety" isn't worth it. Dynamic typing is simply easier to use.
+
+Further than that, it's not possible to statically type Storm's tuples in any reasonable way. Suppose a Bolt subscribes to multiple streams. The tuples from all those streams may have different types across the fields. When a Bolt receives a `Tuple` in `execute`, that tuple could have come from any stream and so could have any combination of types. There might be some reflection magic you can do to declare a different method for every tuple stream a bolt subscribes to, but Storm opts for the simpler, straightforward approach of dynamic typing.
+
+Finally, another reason for using dynamic typing is so Storm can be used in a straightforward manner from dynamically typed languages like Clojure and JRuby.
+
+### Custom serialization
+
+As mentioned, Storm uses Kryo for serialization. To implement custom serializers, you need to register new serializers with Kryo. It's highly recommended that you read over [Kryo's home page](http://code.google.com/p/kryo/) to understand how it handles custom serialization.
+
+Adding custom serializers is done through the "topology.kryo.register" property in your topology config. It takes a list of registrations, where each registration can take one of two forms:
+
+1. The name of a class to register. In this case, Storm will use Kryo's `FieldsSerializer` to serialize the class. This may or may not be optimal for the class -- see the Kryo docs for more details.
+2. A map from the name of a class to register to an implementation of [com.esotericsoftware.kryo.Serializer](http://code.google.com/p/kryo/source/browse/trunk/src/com/esotericsoftware/kryo/Serializer.java).
+
+Let's look at an example.
+
+```
+topology.kryo.register:
+  - com.mycompany.CustomType1
+  - com.mycompany.CustomType2: com.mycompany.serializer.CustomType2Serializer
+  - com.mycompany.CustomType3
+```
+
+`com.mycompany.CustomType1` and `com.mycompany.CustomType3` will use the `FieldsSerializer`, whereas `com.mycompany.CustomType2` will use `com.mycompany.serializer.CustomType2Serializer` for serialization.
+
+Storm provides helpers for registering serializers in a topology config. The [Config](/javadoc/apidocs/backtype/storm/Config.html) class has a method called `registerSerialization` that takes in a registration to add to the config.
+
+There's an advanced config called `Config.TOPOLOGY_SKIP_MISSING_KRYO_REGISTRATIONS`. If you set this to true, Storm will ignore any serializations that are registered but do not have their code available on the classpath. Otherwise, Storm will throw errors when it can't find a serialization. This is useful if you run many topologies on a cluster that each have different serializations, but you want to declare all the serializations across all topologies in the `storm.yaml` files.
+
+### Java serialization
+
+If Storm encounters a type for which it doesn't have a serialization registered, it will use Java serialization if possible. If the object can't be serialized with Java serialization, then Storm will throw an error.
+
+Beware that Java serialization is extremely expensive, both in terms of CPU cost as well as the size of the serialized object. It is highly recommended that you register custom serializers when you put the topology in production. The Java serialization behavior is there so that it's easy to prototype new topologies.
+
+You can turn off the behavior to fall back on Java serialization by setting the `Config.TOPOLOGY_FALL_BACK_ON_JAVA_SERIALIZATION` config to false.
+
+### Component-specific serialization registrations
+
+Storm 0.7.0 lets you set component-specific configurations (read more about this at [Configuration](Configuration.html)). Of course, if one component defines a serialization that serialization will need to be available to other bolts -- otherwise they won't be able to receive messages from that component!
+
+When a topology is submitted, a single set of serializations is chosen to be used by all components in the topology for sending messages. This is done by merging the component-specific serializer registrations with the regular set of serialization registrations. If two components define serializers for the same class, one of the serializers is chosen arbitrarily.
+
+To force a serializer for a particular class if there's a conflict between two component-specific registrations, just define the serializer you want to use in the topology-specific configuration. The topology-specific configuration has precedence over component-specific configurations for serialization registrations.
\ No newline at end of file

Added: storm/branches/bobby-versioned-site/releases/0.10.0/Serializers.md
URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Serializers.md?rev=1735297&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Serializers.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Serializers.md Wed Mar 16 21:01:12 2016
@@ -0,0 +1,4 @@
+---
+layout: documentation
+---
+* [storm-json](https://github.com/rapportive-oss/storm-json): Simple JSON serializer for Storm
\ No newline at end of file

Added: storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md
URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md?rev=1735297&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-cluster.md Wed Mar 16 21:01:12 2016
@@ -0,0 +1,96 @@
+---
+title: Setting up a Storm Cluster
+layout: documentation
+documentation: true
+---
+This page outlines the steps for getting a Storm cluster up and running. If you're on AWS, you should check out the [storm-deploy](https://github.com/nathanmarz/storm-deploy/wiki) project. [storm-deploy](https://github.com/nathanmarz/storm-deploy/wiki) completely automates the provisioning, configuration, and installation of Storm clusters on EC2. It also sets up Ganglia for you so you can monitor CPU, disk, and network usage.
+
+If you run into difficulties with your Storm cluster, first check for a solution is in the [Troubleshooting](Troubleshooting.html) page. Otherwise, email the mailing list.
+
+Here's a summary of the steps for setting up a Storm cluster:
+
+1. Set up a Zookeeper cluster
+2. Install dependencies on Nimbus and worker machines
+3. Download and extract a Storm release to Nimbus and worker machines
+4. Fill in mandatory configurations into storm.yaml
+5. Launch daemons under supervision using "storm" script and a supervisor of your choice
+
+### Set up a Zookeeper cluster
+
+Storm uses Zookeeper for coordinating the cluster. Zookeeper **is not** used for message passing, so the load Storm places on Zookeeper is quite low. Single node Zookeeper clusters should be sufficient for most cases, but if you want failover or are deploying large Storm clusters you may want larger Zookeeper clusters. Instructions for deploying Zookeeper are [here](http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html). 
+
+A few notes about Zookeeper deployment:
+
+1. It's critical that you run Zookeeper under supervision, since Zookeeper is fail-fast and will exit the process if it encounters any error case. See [here](http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_supervision) for more details. 
+2. It's critical that you set up a cron to compact Zookeeper's data and transaction logs. The Zookeeper daemon does not do this on its own, and if you don't set up a cron, Zookeeper will quickly run out of disk space. See [here](http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_maintenance) for more details.
+
+### Install dependencies on Nimbus and worker machines
+
+Next you need to install Storm's dependencies on Nimbus and the worker machines. These are:
+
+1. Java 6
+2. Python 2.6.6
+
+These are the versions of the dependencies that have been tested with Storm. Storm may or may not work with different versions of Java and/or Python.
+
+
+### Download and extract a Storm release to Nimbus and worker machines
+
+Next, download a Storm release and extract the zip file somewhere on Nimbus and each of the worker machines. The Storm releases can be downloaded [from here](http://github.com/apache/storm/releases).
+
+### Fill in mandatory configurations into storm.yaml
+
+The Storm release contains a file at `conf/storm.yaml` that configures the Storm daemons. You can see the default configuration values [here](https://github.com/apache/storm/blob/master/conf/defaults.yaml). storm.yaml overrides anything in defaults.yaml. There's a few configurations that are mandatory to get a working cluster:
+
+1) **storm.zookeeper.servers**: This is a list of the hosts in the Zookeeper cluster for your Storm cluster. It should look something like:
+
+```yaml
+storm.zookeeper.servers:
+  - "111.222.333.444"
+  - "555.666.777.888"
+```
+
+If the port that your Zookeeper cluster uses is different than the default, you should set **storm.zookeeper.port** as well.
+
+2) **storm.local.dir**: The Nimbus and Supervisor daemons require a directory on the local disk to store small amounts of state (like jars, confs, and things like that).
+ You should create that directory on each machine, give it proper permissions, and then fill in the directory location using this config. For example:
+
+```yaml
+storm.local.dir: "/mnt/storm"
+```
+If you run storm on windows,it could be:
+```yaml
+storm.local.dir: "C:\\storm-local"
+```
+If you use a relative path,it will be relative to where you installed storm(STORM_HOME).
+You can leave it empty with default value `$STORM_HOME/storm-local`
+3) **nimbus.host**: The worker nodes need to know which machine is the master in order to download topology jars and confs. For example:
+
+```yaml
+nimbus.host: "111.222.333.44"
+```
+
+4) **supervisor.slots.ports**: For each worker machine, you configure how many workers run on that machine with this config. Each worker uses a single port for receiving messages, and this setting defines which ports are open for use. If you define five ports here, then Storm will allocate up to five workers to run on this machine. If you define three ports, Storm will only run up to three. By default, this setting is configured to run 4 workers on the ports 6700, 6701, 6702, and 6703. For example:
+
+```yaml
+supervisor.slots.ports:
+    - 6700
+    - 6701
+    - 6702
+    - 6703
+```
+
+### Configure external libraries and environmental variables (optional)
+
+If you need support from external libraries or custom plugins, you can place such jars into the extlib/ and extlib-daemon/ directories. Note that the extlib-daemon/ directory stores jars used only by daemons (Nimbus, Supervisor, DRPC, UI, Logviewer), e.g., HDFS and customized scheduling libraries. Accordingly, two environmental variables STORM_EXT_CLASSPATH and STORM_EXT_CLASSPATH_DAEMON can be configured by users for including the external classpath and daemon-only external classpath.
+
+
+### Launch daemons under supervision using "storm" script and a supervisor of your choice
+
+The last step is to launch all the Storm daemons. It is critical that you run each of these daemons under supervision. Storm is a __fail-fast__ system which means the processes will halt whenever an unexpected error is encountered. Storm is designed so that it can safely halt at any point and recover correctly when the process is restarted. This is why Storm keeps no state in-process -- if Nimbus or the Supervisors restart, the running topologies are unaffected. Here's how to run the Storm daemons:
+
+1. **Nimbus**: Run the command "bin/storm nimbus" under supervision on the master machine.
+2. **Supervisor**: Run the command "bin/storm supervisor" under supervision on each worker machine. The supervisor daemon is responsible for starting and stopping worker processes on that machine.
+3. **UI**: Run the Storm UI (a site you can access from the browser that gives diagnostics on the cluster and topologies) by running the command "bin/storm ui" under supervision. The UI can be accessed by navigating your web browser to http://{nimbus host}:8080. 
+
+As you can see, running the daemons is very straightforward. The daemons will log to the logs/ directory in wherever you extracted the Storm release.

Added: storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-project-in-Eclipse.md
URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-project-in-Eclipse.md?rev=1735297&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-project-in-Eclipse.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-a-Storm-project-in-Eclipse.md Wed Mar 16 21:01:12 2016
@@ -0,0 +1 @@
+- fill me in
\ No newline at end of file

Added: storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md
URL: http://svn.apache.org/viewvc/storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md?rev=1735297&view=auto
==============================================================================
--- storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md (added)
+++ storm/branches/bobby-versioned-site/releases/0.10.0/Setting-up-development-environment.md Wed Mar 16 21:01:12 2016
@@ -0,0 +1,41 @@
+---
+title: Setting Up a Development Environment
+layout: documentation
+documentation: true
+---
+This page outlines what you need to do to get a Storm development environment set up. In summary, the steps are:
+
+1. Download a [Storm release](..//downloads.html) , unpack it, and put the unpacked `bin/` directory on your PATH
+2. To be able to start and stop topologies on a remote cluster, put the cluster information in `~/.storm/storm.yaml`
+
+More detail on each of these steps is below.
+
+### What is a development environment?
+
+Storm has two modes of operation: local mode and remote mode. In local mode, you can develop and test topologies completely in process on your local machine. In remote mode, you submit topologies for execution on a cluster of machines.
+
+A Storm development environment has everything installed so that you can develop and test Storm topologies in local mode, package topologies for execution on a remote cluster, and submit/kill topologies on a remote cluster.
+
+Let's quickly go over the relationship between your machine and a remote cluster. A Storm cluster is managed by a master node called "Nimbus". Your machine communicates with Nimbus to submit code (packaged as a jar) and topologies for execution on the cluster, and Nimbus will take care of distributing that code around the cluster and assigning workers to run your topology. Your machine uses a command line client called `storm` to communicate with Nimbus. The `storm` client is only used for remote mode; it is not used for developing and testing topologies in local mode.
+
+### Installing a Storm release locally
+
+If you want to be able to submit topologies to a remote cluster from your machine, you should install a Storm release locally. Installing a Storm release will give you the `storm` client that you can use to interact with remote clusters. To install Storm locally, download a release [from here](https://github.com/apache/storm/releases) and unzip it somewhere on your computer. Then add the unpacked `bin/` directory onto your `PATH` and make sure the `bin/storm` script is executable.
+
+Installing a Storm release locally is only for interacting with remote clusters. For developing and testing topologies in local mode, it is recommended that you use Maven to include Storm as a dev dependency for your project. You can read more about using Maven for this purpose on [Maven](Maven.html). 
+
+### Starting and stopping topologies on a remote cluster
+
+The previous step installed the `storm` client on your machine which is used to communicate with remote Storm clusters. Now all you have to do is tell the client which Storm cluster to talk to. To do this, all you have to do is put the host address of the master in the `~/.storm/storm.yaml` file. It should look something like this:
+
+```
+nimbus.host: "123.45.678.890"
+```
+
+Alternatively, if you use the [storm-deploy](https://github.com/nathanmarz/storm-deploy) project to provision Storm clusters on AWS, it will automatically set up your ~/.storm/storm.yaml file. You can manually attach to a Storm cluster (or switch between multiple clusters) using the "attach" command, like so:
+
+```
+lein run :deploy --attach --name mystormcluster
+```
+
+More information is on the storm-deploy [wiki](https://github.com/nathanmarz/storm-deploy/wiki)
\ No newline at end of file