You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by ja...@apache.org on 2018/10/15 23:46:36 UTC
samza git commit: Fix case-studies for LinkedIn, Optimizely, Tripadvisor, Slack. Re-word some of them.

Repository: samza
Updated Branches:
  refs/heads/master 988260a20 -> 7d3eb08b3


Fix case-studies for LinkedIn, Optimizely, Tripadvisor, Slack. Re-word some of them.

Author: Jagadish <jv...@linkedin.com>

Reviewers: Jagadish<ja...@apache.org>

Closes #729 from vjagadish1989/website-reorg18


Project: http://git-wip-us.apache.org/repos/asf/samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/7d3eb08b
Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/7d3eb08b
Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/7d3eb08b

Branch: refs/heads/master
Commit: 7d3eb08b3e0ed025e15a7f70c50ff84e803e39d7
Parents: 988260a
Author: Jagadish <jv...@linkedin.com>
Authored: Mon Oct 15 16:46:33 2018 -0700
Committer: Jagadish <jv...@linkedin.com>
Committed: Mon Oct 15 16:46:33 2018 -0700

----------------------------------------------------------------------
 docs/_case-studies/ebay.md                      |  2 +-
 docs/_case-studies/linkedin.md                  | 23 +++++----
 docs/_case-studies/optimizely.md                | 53 ++++++++++----------
 docs/_case-studies/redfin.md                    | 40 +++++++--------
 docs/_case-studies/slack.md                     | 23 ++++-----
 docs/_case-studies/tripadvisor.md               | 29 +++++------
 docs/_powered-by/linkedin.md                    |  4 ++
 .../versioned/core-concepts/core-concepts.md    |  4 +-
 8 files changed, 90 insertions(+), 88 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/samza/blob/7d3eb08b/docs/_case-studies/ebay.md
----------------------------------------------------------------------
diff --git a/docs/_case-studies/ebay.md b/docs/_case-studies/ebay.md
index 96821f0..7156ce5 100644
--- a/docs/_case-studies/ebay.md
+++ b/docs/_case-studies/ebay.md
@@ -65,5 +65,5 @@ Key Samza features: *Stateful processing*, *Windowing*, *Kafka-integration*, *JM
 
 More information:
 
--   [https://www.slideshare.net/edibice/extremely-low-latency-web-scale-fraud-prevention-with-apache-samza-kafka-and-friends](https://www.slideshare.net/edibice/extremely-low-latency-web-scale-fraud-prevention-with-apache-samza-kafka-and-friends)
+-   [Slides: Low latency Fraud prevention with Apache Samza](https://www.slideshare.net/edibice/extremely-low-latency-web-scale-fraud-prevention-with-apache-samza-kafka-and-friends)
 -   [http://ebayenterprise.com/](http://ebayenterprise.com/)
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/samza/blob/7d3eb08b/docs/_case-studies/linkedin.md
----------------------------------------------------------------------
diff --git a/docs/_case-studies/linkedin.md b/docs/_case-studies/linkedin.md
index df62764..66e4108 100644
--- a/docs/_case-studies/linkedin.md
+++ b/docs/_case-studies/linkedin.md
@@ -25,25 +25,28 @@ excerpt_separator: <!--more-->
 -->
 
 How LinkedIn built Air Traffic Controller, a stateful stream processing system to optimize email communications?
+
 <!--more-->
 
-LinkedIn is a professional networking company that offers various services and platform for job seekers, employers and sales professionals. With a growing user base and multiple product offerings, it becomes imperative to streamline and standardize our communications to the users. In order to ensure member experience comes first before individual product metrics, LinkedIn developed a new email and notifications platform called *Air Traffic Controller*.
+LinkedIn is a professional networking company that offers various services and platform for job seekers, employers and sales professionals. With a growing user base and multiple product offerings, it becomes imperative to streamline communications to members. To ensure member experience comes first, LinkedIn developed a new email and notifications platform called *Air Traffic Controller (ATC)*.
 
-ATC is an intelligent platform, that is capable of tracking all the outgoing communications to the user and delivering the communication through the right channel to the right member at the right time.
+ATC is designed to be an intelligent platform that tracks all outgoing communications and delivers the communication through the right channe to the right member at the right time.
 
 <img src="/img/{{site.version}}/case-studies/linkedin-atc-samza-pipeline.png" alt="architecture" style="max-width: 80%; height: auto;" onclick="window.open(this.src)"/>
 
-It has a three main components,
+Any service that wants to send out a notification to members writes its request to a Kafka topic, which ATC later reads from. The ATC platform comprises of three components: <br/>
+
+_Partitioners_ read incoming communication requests from Kafka and distribute them across _Pipeline_ instances based on the hash of the recipient. It also does some
+filtering early-on to drop malformed messages. <br/><br/>
+The _Relevance processors_ read personalized machine-learning models from Kafka and stores them in Samza's state store for evaluating them later. It uses them to score incoming requests and determine the right channel for the notification (eg: drop it vs sending an email vs push notification) . <br/><br/>
+The _ATC pipeline_ processors aggregate the output from the _Relevance_ and the _Partitioners_, thereby making the final call on the notification. It heavily leverages Samza's local state to batch and aggregate notifications. It decides the frequency of notifications - duplicate notifications are merged, notifications are capped at a certain threshold. The _Pipeline_ also implements a _scheduler_ on top of Samza's local-store so that it can schedule messages for delivery later. As an example, it may not be helpful to send a push-notification at midnight. <br/><br/>
 
-- **Partitioner**: Partition communication requests, metrics based on user
-- **Pipeline**: Handle partitioned communication requests which performs aggregation and consults with the relevance model to determine delivery time
-- **Relevance processor**: Provide insights on how relevant is the content to the user, the right delivery time, etc.
 
-ATC, leverages Samza extensively and uses a lot of features including but not limited to:
+ATC uses several of Samza features:
 
-- **Stateful processing**: The ML models in the relevance module are stored locally in RocksDb which are updated realtime time based on user feedback.
-- **Async APIs and Multi-threading**: Samza’s multi-threading and Async APIs allows ATC to perform remote calls with high-throughput. This helps bring down the 90th percentile (P90) end-to-end latency for end to end latency for push notifications from about 12 seconds to about 1.5 seconds.
-- **Host affinity**: Co-location of local state stores along with host awareness helps ATC to achieve zero downtime and instant recovery.
+**1.Stateful processing**: The ML models in the relevance module are stored locally in RocksDb and are updated realtime time based on user feedback. <br/><br/>
+**2.Async APIs and Multi-threading**: Samza’s multi-threading and Async APIs allow ATC to perform remote calls with high throughput. This helps bring down the 90th percentile end-to-end latency for push notifications. <br/><br/>
+**3.Host affinity**: Samza's incremental checkpointing and host-affinity enable ATC to achieve zero downtime during upgrades and instant recovery during failures. <br/><br/>
 
 Key Samza Features: *Stateful processing*, *Async API*, *Host affinity*
 

http://git-wip-us.apache.org/repos/asf/samza/blob/7d3eb08b/docs/_case-studies/optimizely.md
----------------------------------------------------------------------
diff --git a/docs/_case-studies/optimizely.md b/docs/_case-studies/optimizely.md
index 5df32c4..c93c4b2 100644
--- a/docs/_case-studies/optimizely.md
+++ b/docs/_case-studies/optimizely.md
@@ -5,7 +5,7 @@ title: Real Time Session Aggregation
 study_domain: optimizely.com
 priority: 2
 menu_title: Optimizely
-exclude_from_loop: true
+exclude_from_loop: false
 excerpt_separator: <!--more-->
 ---
 <!--
@@ -29,29 +29,31 @@ Real Time Session Aggregation
 
 <!--more-->
 
-Optimizely is a world’s leading experimentation platform, enabling businesses to 
+Optimizely is the world’s leading experimentation platform, enabling businesses to 
 deliver continuous experimentation and personalization across websites, mobile 
 apps and connected devices. At Optimizely, billions of events are tracked on a 
-daily basis. Session metrics are among the key metrics provided to their end user 
-in real time. Prior to introducing Samza for their realtime computation, the 
+daily basis and session metrics are provided to their users in real-time. 
+
+Prior to introducing Samza for their realtime computation, the 
 engineering team at Optimizely built their data-pipeline using a complex 
-[Lambda architecture] (http://lambda-architecture.net/) leveraging 
-[Druid and Hbase] (https://medium.com/engineers-optimizely/building-a-scalable-data-pipeline-bfe3f531eb38). 
-As business requirements evolve, this solution became more and more challenging.
+[Lambda architecture](http://lambda-architecture.net/) using 
+[Druid and Hbase](https://medium.com/engineers-optimizely/building-a-scalable-data-pipeline-bfe3f531eb38). 
+Since some session metrics were computed using Map-Reduce jobs, they 
+could be delayed up to hours after the events are received. As business requirements evolved, 
+this solution became more and [more challenging](https://medium.com/engineers-optimizely/from-batching-to-streaming-real-time-session-metrics-using-samza-part-1-aed2051dd7a3) to scale. 
+
 
-The engineering team at Optimizely decided to move away from Druid and focus on 
-HBase as the store, and introduced stream processing to pre-aggregate and 
-deduplicate session events. In their solution, every session event is tagged 
-with an identifier for up to 30 minutes; upon receiving a session event, the 
-Samza job updates session metadata and aggregates counters for the session 
-that is stored in a local RocksDB state store. At the end of each one-minute 
-window, aggregated session metrics are ingested to HBase. With the new solution
+The engineering team at Optimizely turned to stream processing to reduce latencies. 
+In their solution, each up-stream client associates a _sessionId_ with the events it generates. Upon receiving each event, the Samza job extracts various
+fields (e.g. ip address, location information, browser version, etc) and updates aggregated metrics
+for the session. At the end of a time-window, the merged metrics for that session are ingested to HBase. 
 
--   The median query latency was reduced from 40+ ms to 5 ms
--   Session metrics are now available in realtime
--   HBase query response time is improved due to reduced write-rate
--   HBase storage requirement are drastically reduced
--   Lower development effort thanks to out-of-the-box Kafka integration
+With the new solution <br/>
+-   The median query latency was reduced from 40+ ms to 5 ms <br/>
+-   Session metrics are now available in real-time <br/>
+-   Write-rate to Hbase is reduced, since the metrics are pre-aggregated by Samza<br/>
+-   Storage requirements on Hbase are drastically reduced <br/>
+-   Lower development effort thanks to out-of-the-box Kafka integration <br/>
  
 Here is a testimonial from Optimizely
 
@@ -61,17 +63,16 @@ for analysis. Apache Samza has been a great asset to Optimizely's Event
 ingestion pipeline allowing us to perform large scale, real time stream 
 computing such as aggregations (e.g. session computations) and data enrichment 
 on a multiple billion events / day scale. The programming model, durability 
-and the close integration with Apache Kafka fit our needs perfectly” said 
-Vignesh Sukumar, Senior Engineering Manager at Optimizely”
+and the close integration with Apache Kafka fit our needs perfectly” says 
+Vignesh Sukumar, Senior Engineering Manager at Optimizely.
 
-In addition, stream processing is also applied to other use cases such as 
-data enrichment, event stream partitioning and metrics processing at Optimizely.
+In addition to this case-study, Apache Samza is also leveraged for other usecases such as 
+data-enrichment, re-partitioning of event streams and computing realtime metrics etc.
 
 Key Samza features: *Stateful processing*, *Windowing*, *Kafka-integration*
 
 More information
 
--   [https://medium.com/engineers-optimizely/from-batching-to-streaming-real-time-session-metrics-using-samza-part-1-aed2051dd7a3](https://medium.com/engineers-optimizely/from-batching-to-streaming-real-time-session-metrics-using-samza-part-1-aed2051dd7a3)
-c9715fbc85f973907807cccc26c9d7d3ed983df
--   [https://medium.com/engineers-optimizely/from-batching-to-streaming-real-time-session-metrics-using-samza-part-2-b596350a7820](https://medium.com/engineers-optimizely/from-batching-to-streaming-real-time-session-metrics-using-samza-part-2-b596350a7820)
+-   [From batching to streaming at Optimizely - Part 1](https://medium.com/engineers-optimizely/from-batching-to-streaming-real-time-session-metrics-using-samza-part-1-aed2051dd7a3)
+-   [From batching to streaming at Optimizely - Part 2](https://medium.com/engineers-optimizely/from-batching-to-streaming-real-time-session-metrics-using-samza-part-2-b596350a7820)
     
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/samza/blob/7d3eb08b/docs/_case-studies/redfin.md
----------------------------------------------------------------------
diff --git a/docs/_case-studies/redfin.md b/docs/_case-studies/redfin.md
index 4341f18..d03be86 100644
--- a/docs/_case-studies/redfin.md
+++ b/docs/_case-studies/redfin.md
@@ -35,34 +35,34 @@ emails, scheduled digests and push notifications. Thousands of emails are delive
 to customers every minute at peak. 
 
 The notification system used to be a monolithic system, which served the company 
-well. However, as business grew and requirements evolved, it became harder and 
+well. However, as the business grew and requirements evolved, it became harder and 
 harder to maintain and scale. 
 
 ![Samza pipeline at Redfin](/img/case-studies/redfin.svg)
 
-The engineering team at Redfin decided to replace 
-the existing system with Samza primarily for Samza’s performance, scalability, 
-support for stateful processing and Kafka-integration. A multi-stage stream 
-processing pipeline was developed. At the Identify stage, external events 
-such as new Listings are identified as candidates for new notification;
-then potential recipients of notifications are determined by analyzing data in 
-events and customer profiles, results are grouped by customer at the end of 
-each time window at the Match Stage; once recipients and notification outlines are 
-identified, the Organize stage retrieves adjunct data necessary to appear in each 
-notification from various data sources by joining them with notification and 
-customer profiles, results are stored/merged in local RocksDB state store; finally 
-notifications are formatted at the Format stage and sent to notification
- delivery system at the Notify stage. 
+The engineering team at Redfin decided to replace the existing system with Samza 
+primarily for Samza’s performance, scalability,  support for stateful processing and 
+Kafka-integration. A multi-stage stream 
+processing pipeline was developed. At the _Identify_ stage, external events 
+such as new listings are identified as candidates for sending a new notification;
+Then potential recipients of notifications are determined by analyzing data in 
+the events and customer profiles. The results are grouped by customer at the end of 
+each time window during the _Match_ Stage. Once notifications and recipients are 
+identified, the _Organize_ stage further joins them with additional data-sources (eg: 
+notification settings, customer profiles) leveraging Samza's support for local state. 
+It makes heavy use of RocksDB to store and merge individual notifications before sending
+them to customers. Finally, the notifications are formatted at the _Format_ stage and 
+sent to the delivery system at the _Notify_ stage.
 
-With the new notification system
+With the new notification system based on Apache Samza, Redfin observed that
 
--   The system is more performant and horizontally scalable
 -   It is now easier to add support for new use cases
--   Reduced pressure on other system due to the use of local RocksDB state store
--   Processing stages can be scaled individually
+-   The new system is more performant and horizontally scalable
+-   Reduced pressure on downstream services due to the use of local RocksDB state store
+-   Processing stages can be scaled individually since they are isolated
 
-Other engineering teams at Redfin are also using Samza for business metrics 
-calculation, document processing, event scheduling.
+In addition to the notifications platform, other engineering teams at Redfin also use Samza for 
+calculating business metrics, document processing, event scheduling etc.,
 
 Key Samza Features: *Stateful processing*, *Windowing*, *Kafka-integration*
 

http://git-wip-us.apache.org/repos/asf/samza/blob/7d3eb08b/docs/_case-studies/slack.md
----------------------------------------------------------------------
diff --git a/docs/_case-studies/slack.md b/docs/_case-studies/slack.md
index 6bfc023..bc5b5fd 100644
--- a/docs/_case-studies/slack.md
+++ b/docs/_case-studies/slack.md
@@ -28,28 +28,25 @@ How Slack monitors their infrastructure using Samza's streaming data-pipelines?
 
 <!--more-->
 
-Slack is a cloud based company that offers collaboration tools and services to increase productivity. With a rapidly growing user base, and a daily active users north of 8 million, there is an imminent need to react quickly to issues and proactively monitor the health of the application. With a lack of existing monitoring solution, the team went on to build a new data pipeline with the following requirements
+Slack is a cloud based company that offers collaboration tools and services to increase productivity. With a rapidly growing user base and a daily active users north of 8 million, they needed to react quickly to issues and proactively monitor the application health. For this, the team went on to build a new monitoiring solution using Apache Samza with the following requirements:
 
-- Near realtime alerting
-- Fault tolerant and high throughput data pipeline
-- Process billions of metric data, logs and derive timely insights on the health of application
-- Extend the pipeline to other use cases such as experimentation, performance etc.
+- Near real-time alerting to quickly surface issues
+- Fault-tolerant processing of data streams
+- Process billions of events from metrics, logs and derive timely insights on application health
+- Ease of extensibility to other use cases like experimentation
 
 <img src="/img/{{site.version}}/case-studies/slack-samza-pipeline.png" alt="architecture" style="max-width: 80%; height: auto;" onclick="window.open(this.src)"/>
 
-The engineering team built a data platform using Apache Samza. It has three main components,
+The engineering team at Slack built their data platform using Apache Samza. It has three types of Samza jobs - _Routers_, _Processors_ and _Converters_.
 
-- **Router**: Deserialize Kafka events and add instrumentation
-- **Processor**: Registers with the routers to process subset of message types and performs aggregation
-- **Converter**: Enrich the processed data before piping the data to analytics store.  
+All services at Slack emit their logs in a well-defined format, which end up in a Kafka cluster. The logs are processed by a fleet of Samza jobs called _Routers_. The routers deserialize
+incoming log events, decorate them and add instrumentation on top of them. The output of the router is processed by another pipeline, _Processors_ which perform aggregations using Samza's state-store. Finally, the processed results are enriched by the last stage - _Coverters_, which pipe the data into Druid for analytics and querying. Performance anomalies trigger an alert to a slackbot for further action. Slack built the data-platform to be extensible, thereby enabling other teams within the company to build their own applications on top of it.
 
-The clients and backend servers channels the logs and exceptions through Kafka to content routers a.k.a samza partitioners. The partitioned data then flows through processors where it is stored in RocksDb before being joined with other metrics data. The enriched data is stored in druid which powers analytics queries and also acts as a trigger to alert slackbot notifications.
-
-Other notable use case includes experimentation framework that leverage the data pipeline to track the results of A/B testing in near realtime. The metrics data is joined with the exposure table (members part of the experiment) to derive insights on the experiment. The periodic snapshots of RocksDb is also used to perform data quality check with the batch pipeline.
+Another noteworthy use-case powered by Samza is their experimentation framework. It leverages a data-pipeline to measure the results of A/B testing in near real-time. The pipeline uses Samza to join a stream of performance-related metrics with additional data on experiments that the customer was a part of. This enables Slack to learn how each experiment affects their overall customer experience. 
 
 Key Samza Features: *Stateful processing*, *Join*, *Windowing*
 
 More information
 
 - [Talk: Streaming data pipelines at Slack](https://www.youtube.com/watch?v=wbS1P9ehgd0)
-- [Slides: Streaming data pipelines at Slack](https://speakerdeck.com/vananth22/streaming-data-pipelines-at-slack)
+- [Slides: Streaming data pipelines at Slack](https://speakerdeck.com/vananth22/streaming-data-pipelines-at-slack)
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/samza/blob/7d3eb08b/docs/_case-studies/tripadvisor.md
----------------------------------------------------------------------
diff --git a/docs/_case-studies/tripadvisor.md b/docs/_case-studies/tripadvisor.md
index 80ee33b..b3887c6 100644
--- a/docs/_case-studies/tripadvisor.md
+++ b/docs/_case-studies/tripadvisor.md
@@ -27,21 +27,20 @@ Hedwig - Converting Hadoop M/R ETL systems to Stream Processing at TripAdvisor
 
 <!--more-->
 
-TripAdvisor is one of the world’s largest travel website that provides hotel 
+TripAdvisor is one of the world’s largest travel websites that provides hotel 
 and restaurant reviews, accommodation bookings and other travel-related 
-content. It produces and processes billions events processed everyday 
+content. It produces and processes billions events everyday 
 including billing records, reports, monitoring events and application 
 notifications.
 
-Prior to migrating to Samza, TripAdvisor used Hadoop to ETL its data. Raw 
-data was rolled up to hourly and daily in a number of stages with joins 
-and sliding windows applied, session data is then produced from daily data. 
-About 300 million sessions are produced daily. With this solution, the 
+Prior to migrating to Samza, TripAdvisor used Hadoop to ETL its data. In this model, raw 
+data was rolled up to hourly and daily snapshots in a number of stages with joins 
+and sliding windows applied. Session data was then extracted from the daily snapshots. 
+About 300 million sessions were produced daily. With this solution, the 
 engineering team were faced with a few challenges
   
--   Long lag time to downstream that is business critical
+-   Long lag time to produce business-critical metrics
 -   Difficult to debug and troubleshoot due to scripts, environments, etc.
--   Adding more nodes doesn’t help to scale
  
 The engineering team at TripAdvisor decided to replace the Hadoop solution 
 with a multi-stage Samza pipeline. 
@@ -49,11 +48,11 @@ with a multi-stage Samza pipeline.
 ![Samza pipeline at TripAdvisor](/img/case-studies/trip-advisor.svg)
 
 In the new solution, after raw data is first collected by Flume and ingested 
-through a Kafka cluster, they are parsed, cleansed and partitioned by the
-Lookback Router; then processing logic such as windowing, grouping, joining, 
-fraud detection are applied by the Session Collector and the Fraud Collector, 
-RocksDB is used as the local store for intermediate states; finally the Uploader 
-uploads results to HDFS, ElasticSearch, RedShift and Hive. 
+through a Kafka cluster, it is parsed, cleansed and re-partitioned by the
+_Lookback Router_; then processing logic such as windowing, grouping, joining, 
+fraud detection are applied by the _Session Collector_ and the _Fraud Collector_, 
+The pipeline uses Samza's RocksDB store to perform stateful aggregations; finally the 
+_Uploader_ writes results to ElasticSearch, RedShift and Hive.
 
 The new solution achieved significant improvements:
 
@@ -66,6 +65,4 @@ Key Samza features: *Stateful processing*, *Windowing*, *Kafka-integration*
 
 More information
 
--   [https://www.youtube.com/watch?v=KQ5OnL2hMBY](https://www.youtube.com/watch?v=KQ5OnL2hMBY)
--   [https://www.tripadvisor.com/](https://www.tripadvisor.com/)
-    
\ No newline at end of file
+-   [Converting Hadoop M/R ETL to use Stream Processing at TripAdvisor](https://www.youtube.com/watch?v=KQ5OnL2hMBY)
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/samza/blob/7d3eb08b/docs/_powered-by/linkedin.md
----------------------------------------------------------------------
diff --git a/docs/_powered-by/linkedin.md b/docs/_powered-by/linkedin.md
index 8b5f6ca..0bd53d6 100644
--- a/docs/_powered-by/linkedin.md
+++ b/docs/_powered-by/linkedin.md
@@ -1,7 +1,11 @@
 ---
 name: LinkedIn
 domain: linkedin.com
+<<<<<<< Updated upstream
 priority: 0
+=======
+priority: D
+>>>>>>> Stashed changes
 ---
 <!--
    Licensed to the Apache Software Foundation (ASF) under one or more

http://git-wip-us.apache.org/repos/asf/samza/blob/7d3eb08b/docs/learn/documentation/versioned/core-concepts/core-concepts.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/core-concepts/core-concepts.md b/docs/learn/documentation/versioned/core-concepts/core-concepts.md
index c4e5c21..c1724fb 100644
--- a/docs/learn/documentation/versioned/core-concepts/core-concepts.md
+++ b/docs/learn/documentation/versioned/core-concepts/core-concepts.md
@@ -1,4 +1,4 @@
----
+.png
 layout: page
 title: Core concepts
 ---
@@ -43,7 +43,7 @@ _**Unified API:**_ Use a simple API to describe your application-logic in a mann
 
 *Massive scale:* Battle-tested on applications that use several terabytes of state and run on thousands of cores. It [powers](/powered-by/) multiple large companies including LinkedIn, Uber, TripAdvisor, Slack etc. 
 
-Next, we will introduce Samza’s terminology. You will realize that it is extremely easy to get started with [building](/quickstart/{{site.version}}) your first stream-processing application. 
+Next, we will introduce Samza’s terminology. You will realize that it is extremely easy to [get started](/quickstart/{{site.version}}) with building your first application. 
 
 
 ## Streams, Partitions