You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by joyyoj <gi...@git.apache.org> on 2014/08/03 18:26:29 UTC

[GitHub] spark pull request: [Spark-2201] Improve FlumeInputDStream's stabi...

GitHub user joyyoj opened a pull request:

    https://github.com/apache/spark/pull/1755

    [Spark-2201] Improve FlumeInputDStream's stability and make it scalable

    re-submit

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/joyyoj/spark SPARK-2201

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1755.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1755
    
----
commit f4660c5cb41d9b5ef737b38e7e38abf3b2f2e31c
Author: joyyoj <su...@gmail.com>
Date:   2014-06-03T13:15:11Z

    [SPARK-1998] SparkFlumeEvent with body bigger than 1020 bytes are not read properly

commit 3f4a6022b3e54a69a8b67729b2ce3dbbf0d7c85b
Author: joyyoj <su...@gmail.com>
Date:   2014-07-06T14:27:47Z

    Merge remote-tracking branch 'remotes/apache/master'

commit 22e7821e439ec3304c784217cbde94ae6db1b75e
Author: joyyoj <su...@gmail.com>
Date:   2014-07-31T15:08:51Z

    Merge remote-tracking branch 'apache/master'

commit 6bb8372b6637d61691a9254ae367a27e41b98e2c
Author: joyyoj <su...@gmail.com>
Date:   2014-07-31T15:26:15Z

    SparkPushSink

commit 1de7f6eb27a48246d990d581df41b053f5d13d7a
Author: joyyoj <su...@gmail.com>
Date:   2014-08-03T15:30:54Z

    update flume-sink

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-2201] Improve FlumeInputDStream's stabi...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1755#discussion_r15781582
  
    --- Diff: external/flume-sink/src/main/java/org/apache/spark/streaming/flume/sink/SparkRpcClient.java ---
    @@ -0,0 +1,354 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.streaming.flume.sink;
    +
    +import com.google.common.base.Preconditions;
    +import com.google.common.base.Strings;
    +import org.apache.flume.Event;
    +import org.apache.flume.EventDeliveryException;
    +import org.apache.flume.FlumeException;
    +import org.apache.flume.api.*;
    +import org.apache.spark.streaming.flume.sink.utils.LogicalHostRouter;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Properties;
    +import java.util.concurrent.CopyOnWriteArrayList;
    +import java.util.concurrent.CountDownLatch;
    +import java.util.concurrent.atomic.AtomicInteger;
    +
    +/*
    + * configuration example:
    + * agent.sinks.ls1.hostname = benchmark
    + * agent.sinks.ls1.router.path=192.168.59.128:2181/spark  [zookeeper path to logical host]
    + * agent.sinks.ls1.port = 0
    + * agent.sinks.ls1.router.retry.times=1  [optional]
    + * agent.sinks.ls1.router.retry.interval=1000 [optional]
    + */
    +public class SparkRpcClient extends AbstractRpcClient implements RpcClient {
    --- End diff --
    
    AbstractRpcClient already implements RpcClient - so you don't really need to implement RpcClient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-2201] Improve FlumeInputDStream's stabi...

Posted by joyyoj <gi...@git.apache.org>.
Github user joyyoj closed the pull request at:

    https://github.com/apache/spark/pull/1755


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-2201] Improve FlumeInputDStream's stabi...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1755#issuecomment-60014231
  
    Hey @joyyoj 
    
    This is a very cool functionality! In fact, we had added a subset of this functionality (sink) as part Spark 1.1, thus validating your approach. However, while this approach provides more functionality, it does at the cost of maintaining this, debugging this, and the cost of addition dependencies like to the spark project, like zookeeper, etc. 
    
    We are seeing a lot of such big contributions from the community, which is very encouraging and useful, but will be very hard to maintain for us. So we are trying to figure out a way by which the community can contribute such custom functionality, and maintain it themselves. Say something like a contribs repo... 
    
    I will let you know when we have figured something out. Till then please bear with us. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-2201] Improve FlumeInputDStream's stabi...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1755#issuecomment-68076472
  
    @joyyoj 
    We have set up http://spark-packages.org a way for the community to contribute significant functionality and maintain it themselves. Please consider adding your this Flume functionality to http://spark-packages.org/ and let the community use it. 
    
    Regarding this PR, mind closing it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-2201] Improve FlumeInputDStream's stabi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1755#issuecomment-54694520
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-2201] Improve FlumeInputDStream's stabi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1755#issuecomment-50994969
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-2201] Improve FlumeInputDStream's stabi...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/1755#issuecomment-68406804
  
    @joyyoj Mind closing this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-2201] Improve FlumeInputDStream's stabi...

Posted by joyyoj <gi...@git.apache.org>.
Github user joyyoj commented on the pull request:

    https://github.com/apache/spark/pull/1755#issuecomment-68418459
  
    Never mind, very very sorry for replay so late.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org