You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@storm.apache.org by harshach <gi...@git.apache.org> on 2015/11/10 02:56:57 UTC

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

GitHub user harshach opened a pull request:

    https://github.com/apache/storm/pull/871

    STORM-1030. Hive Connector Fixes.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/harshach/incubator-storm STORM-1030-V1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/871.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #871
    
----
commit f0760e86fe53be6ef4bf5d7eb4ed72c590ec4a82
Author: Sriharsha Chintalapani <ma...@harsha.io>
Date:   2015-11-10T00:40:34Z

    STORM-1030. Hive Connector Fixes.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by Parth-Brahmbhatt <gi...@git.apache.org>.

Github user Parth-Brahmbhatt commented on the pull request:

    https://github.com/apache/storm/pull/871#issuecomment-162580970
  
    Still +1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by revans2 <gi...@git.apache.org>.

Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r49201838
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/common/HiveConnector.java ---
    @@ -0,0 +1,241 @@
    +package org.apache.storm.hive.common;
    +
    +
    +import org.apache.hadoop.security.UserGroupInformation;
    +import org.apache.hive.hcatalog.streaming.HiveEndPoint;
    +import org.apache.hive.hcatalog.streaming.StreamingException;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import java.io.IOException;
    +import java.util.*;
    +import java.util.concurrent.ExecutorService;
    +import java.util.concurrent.Executors;
    +import com.google.common.util.concurrent.ThreadFactoryBuilder;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.ConcurrentHashMap;
    +import java.util.concurrent.atomic.AtomicBoolean;
    +
    +public class HiveConnector {
    +    private static final Logger LOG = LoggerFactory.getLogger(HiveConnector.class);
    +    private HiveOptions options;
    +    private transient Timer heartBeatTimer;
    +    private AtomicBoolean sendHeartBeat = new AtomicBoolean(true);
    +    private UserGroupInformation ugi = null;
    +    private Map<HiveEndPoint, HiveWriter> allWriters;
    +    private ExecutorService callTimeoutPool;
    +
    +    public HiveConnector(HiveOptions options) {
    +        this.options = options;
    +
    +    }
    +
    +    public void configure() {
    +        boolean kerberosEnabled;
    +
    +        if(this.options.getKerberosPrincipal() == null && this.options.getKerberosKeytab() == null) {
    +            kerberosEnabled = false;
    +        } else if(options.getKerberosPrincipal() != null && options.getKerberosKeytab() != null) {
    +            kerberosEnabled = true;
    +        } else {
    +            throw new IllegalArgumentException("To enable Kerberos, need to set both KerberosPrincipal " +
    +                                               " & KerberosKeytab");
    +        }
    +
    +        if (kerberosEnabled) {
    +            try {
    +                ugi = HiveUtils.authenticate(options.getKerberosKeytab(), options.getKerberosPrincipal());
    +            } catch(HiveUtils.AuthenticationFailed ex) {
    +                LOG.error("Hive Kerberos authentication failed " + ex.getMessage(), ex);
    +                throw new IllegalArgumentException(ex);
    +            }
    +        }
    +
    +        allWriters = new ConcurrentHashMap<HiveEndPoint, HiveWriter>();
    +        String timeoutName = "hive-bolt-%d";
    +        this.callTimeoutPool = Executors.newFixedThreadPool(1,
    +                   new ThreadFactoryBuilder().setNameFormat(timeoutName).build());
    +        heartBeatTimer = new Timer();
    +        setupHeartBeatTimer();
    +    }
    +
    +
    --- End diff --
    
    Extra sapce


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by revans2 <gi...@git.apache.org>.

Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r49201617
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -25,75 +26,40 @@
     import backtype.storm.topology.OutputFieldsDeclarer;
     import backtype.storm.utils.TupleUtils;
     import backtype.storm.Config;
    +import org.apache.storm.hive.common.HiveConnector;
     import org.apache.storm.hive.common.HiveWriter;
    -import com.google.common.util.concurrent.ThreadFactoryBuilder;
     import org.apache.hive.hcatalog.streaming.*;
     import org.apache.storm.hive.common.HiveOptions;
     import org.apache.storm.hive.common.HiveUtils;
    -import org.apache.hadoop.security.UserGroupInformation;
     import org.slf4j.Logger;
     import org.slf4j.LoggerFactory;
     
     import java.util.List;
    -import java.util.ArrayList;
     import java.util.Map;
    -import java.util.HashMap;
    -import java.util.Timer;
    -import java.util.TimerTask;
    -import java.util.Map.Entry;
    -import java.util.concurrent.ExecutorService;
    -import java.util.concurrent.Executors;
    -import java.util.concurrent.TimeUnit;
    -import java.util.concurrent.atomic.AtomicBoolean;
    -import java.util.List;
     import java.util.LinkedList;
    -import java.io.IOException;
     
     public class HiveBolt extends  BaseRichBolt {
         private static final Logger LOG = LoggerFactory.getLogger(HiveBolt.class);
         private OutputCollector collector;
         private HiveOptions options;
    -    private ExecutorService callTimeoutPool;
    -    private transient Timer heartBeatTimer;
    -    private Boolean kerberosEnabled = false;
    -    private AtomicBoolean timeToSendHeartBeat = new AtomicBoolean(false);
    -    private UserGroupInformation ugi = null;
    -    HashMap<HiveEndPoint, HiveWriter> allWriters;
    +    private HiveConnector hiveConnector;
         private List<Tuple> tupleBatch;
    +    transient CountMetric serializationErrorMetric;
     
         public HiveBolt(HiveOptions options) {
             this.options = options;
             tupleBatch = new LinkedList<>();
    +
         }
     
         @Override
         public void prepare(Map conf, TopologyContext topologyContext, OutputCollector collector)  {
             try {
    -            if(options.getKerberosPrincipal() == null && options.getKerberosKeytab() == null) {
    -                kerberosEnabled = false;
    -            } else if(options.getKerberosPrincipal() != null && options.getKerberosKeytab() != null) {
    -                kerberosEnabled = true;
    -            } else {
    -                throw new IllegalArgumentException("To enable Kerberos, need to set both KerberosPrincipal " +
    -                                                   " & KerberosKeytab");
    -            }
    -
    -            if (kerberosEnabled) {
    -                try {
    -                    ugi = HiveUtils.authenticate(options.getKerberosKeytab(), options.getKerberosPrincipal());
    -                } catch(HiveUtils.AuthenticationFailed ex) {
    -                    LOG.error("Hive Kerberos authentication failed " + ex.getMessage(), ex);
    -                    throw new IllegalArgumentException(ex);
    -                }
    -            }
                 this.collector = collector;
    -            allWriters = new HashMap<HiveEndPoint,HiveWriter>();
    -            String timeoutName = "hive-bolt-%d";
    -            this.callTimeoutPool = Executors.newFixedThreadPool(1,
    -                                new ThreadFactoryBuilder().setNameFormat(timeoutName).build());
    -            heartBeatTimer = new Timer();
    -            setupHeartBeatTimer();
    -
    +            this.hiveConnector  =  new HiveConnector(this.options);
    +            this.hiveConnector.configure();
    +            this.serializationErrorMetric = new CountMetric();
    +            topologyContext.registerMetric("hive_serialization_error_count", serializationErrorMetric, 0);
    --- End diff --
    
    Not really sure what registering a metric that is reported every 0 seconds does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by revans2 <gi...@git.apache.org>.

Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r49201793
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/common/HiveConnector.java ---
    @@ -0,0 +1,241 @@
    +package org.apache.storm.hive.common;
    +
    +
    +import org.apache.hadoop.security.UserGroupInformation;
    +import org.apache.hive.hcatalog.streaming.HiveEndPoint;
    +import org.apache.hive.hcatalog.streaming.StreamingException;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import java.io.IOException;
    +import java.util.*;
    +import java.util.concurrent.ExecutorService;
    +import java.util.concurrent.Executors;
    +import com.google.common.util.concurrent.ThreadFactoryBuilder;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.ConcurrentHashMap;
    +import java.util.concurrent.atomic.AtomicBoolean;
    +
    +public class HiveConnector {
    +    private static final Logger LOG = LoggerFactory.getLogger(HiveConnector.class);
    +    private HiveOptions options;
    +    private transient Timer heartBeatTimer;
    +    private AtomicBoolean sendHeartBeat = new AtomicBoolean(true);
    +    private UserGroupInformation ugi = null;
    +    private Map<HiveEndPoint, HiveWriter> allWriters;
    +    private ExecutorService callTimeoutPool;
    +
    +    public HiveConnector(HiveOptions options) {
    +        this.options = options;
    +
    --- End diff --
    
    Extra line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by revans2 <gi...@git.apache.org>.

Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r44598712
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -134,22 +131,16 @@ public void execute(Tuple tuple) {
                         collector.ack(t);
                     tupleBatch.clear();
                 }
    +        } catch(SerializationError se) {
    +            LOG.info("Serialization exception occurred, tuple is acknowledged but not written to Hive.", tuple);
    +            collector.ack(tuple);
    --- End diff --
    
    Also the other error handling code now deals with a tuple batch and fails/cleans it up.  Does this code need to handle that too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by harshach <gi...@git.apache.org>.

Github user harshach commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r44680811
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -134,22 +131,16 @@ public void execute(Tuple tuple) {
                         collector.ack(t);
                     tupleBatch.clear();
                 }
    +        } catch(SerializationError se) {
    +            LOG.info("Serialization exception occurred, tuple is acknowledged but not written to Hive.", tuple);
    +            collector.ack(tuple);
    --- End diff --
    
    @revans2 makes sense. I'll add the metric and update the patch. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by revans2 <gi...@git.apache.org>.

Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/871#issuecomment-170037330
  
    Sorry this took so long to get back to.  The code looks good for the most part.  I have a few minor formatting nits, but the header needs to be updated and I would like to see the metric registered with a non-zero poll interval.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by Parth-Brahmbhatt <gi...@git.apache.org>.

Github user Parth-Brahmbhatt commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r44365491
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -134,22 +131,16 @@ public void execute(Tuple tuple) {
                         collector.ack(t);
                     tupleBatch.clear();
                 }
    +        } catch(SerializationError se) {
    +            LOG.info("Serialization exception occurred, tuples will NOT be acknowledged ", tuple);
    --- End diff --
    
    the message says tuple won't be acked and in next line we ack it, one of the two is not right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by ptgoetz <gi...@git.apache.org>.

Github user ptgoetz commented on a diff in the pull request:

https://github.com/apache/storm/pull/871#discussion_r57597051

--- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/trident/HiveUpdater.java ---
@@ -1,20 +1,3 @@
-/**
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
--- End diff --

Need to keep the Apache header.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by revans2 <gi...@git.apache.org>.

Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r44598471
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -134,22 +131,16 @@ public void execute(Tuple tuple) {
                         collector.ack(t);
                     tupleBatch.clear();
                 }
    +        } catch(SerializationError se) {
    +            LOG.info("Serialization exception occurred, tuple is acknowledged but not written to Hive.", tuple);
    +            collector.ack(tuple);
    --- End diff --
    
    Why ack this, when before we would fail it?  I realize replaying this is not going to "fix" the problem, but hiding a failure seems worse.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by Parth-Brahmbhatt <gi...@git.apache.org>.

Github user Parth-Brahmbhatt commented on the pull request:

    https://github.com/apache/storm/pull/871#issuecomment-155269316
  
    overall I am +1, couple of clarifying questions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by ptgoetz <gi...@git.apache.org>.

Github user ptgoetz commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r57596988
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/trident/HiveStateFactory.java ---
    @@ -1,20 +1,3 @@
    -/**
    - * Licensed to the Apache Software Foundation (ASF) under one
    - * or more contributor license agreements.  See the NOTICE file
    - * distributed with this work for additional information
    - * regarding copyright ownership.  The ASF licenses this file
    - * to you under the Apache License, Version 2.0 (the
    - * "License"); you may not use this file except in compliance
    - * with the License.  You may obtain a copy of the License at
    - *
    - * http://www.apache.org/licenses/LICENSE-2.0
    - *
    - * Unless required by applicable law or agreed to in writing, software
    - * distributed under the License is distributed on an "AS IS" BASIS,
    - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    - * See the License for the specific language governing permissions and
    - * limitations under the License.
    - */
    --- End diff --
    
    Why remove the Apache header?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by ptgoetz <gi...@git.apache.org>.

Github user ptgoetz commented on the pull request:

    https://github.com/apache/storm/pull/871#issuecomment-202559608
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by harshach <gi...@git.apache.org>.

Github user harshach commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r44365638
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -87,7 +87,7 @@ public void prepare(Map conf, TopologyContext topologyContext, OutputCollector c
                     }
                 }
                 this.collector = collector;
    -            allWriters = new HashMap<HiveEndPoint,HiveWriter>();
    +            allWriters = new ConcurrentHashMap<HiveEndPoint,HiveWriter>();
    --- End diff --
    
    We are running heartbeat as separate timer thread. Every time we add a new HiveEndPoint, HiveWriter we add this to allWriters Map and heartbeat thread goes through all the values to call heartbeat method. To have the values visible  in both threads we need to make concurrent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by ptgoetz <gi...@git.apache.org>.

Github user ptgoetz commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r57602797
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -134,22 +131,16 @@ public void execute(Tuple tuple) {
                         collector.ack(t);
                     tupleBatch.clear();
                 }
    +        } catch(SerializationError se) {
    +            LOG.info("Serialization exception occurred, tuple is acknowledged but not written to Hive.", tuple);
    +            collector.ack(tuple);
             } catch(Exception e) {
                 this.collector.reportError(e);
                 collector.fail(tuple);
    -            try {
    -                flushAndCloseWriters();
    -                LOG.info("acknowledging tuples after writers flushed and closed");
    -                for (Tuple t : tupleBatch)
    -                    collector.ack(t);
    -                tupleBatch.clear();
    -            } catch (Exception e1) {
    -                //If flushAndClose fails assume tuples are lost, do not ack
    -                LOG.warn("Error while flushing and closing writers, tuples will NOT be acknowledged");
    -                for (Tuple t : tupleBatch)
    -                    collector.fail(t);
    -                tupleBatch.clear();
    -            }
    +            for (Tuple t : tupleBatch)
    --- End diff --
    
    I agree. Please wrap the for block with '{}'.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by harshach <gi...@git.apache.org>.

Github user harshach commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r44600086
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -134,22 +131,16 @@ public void execute(Tuple tuple) {
                         collector.ack(t);
                     tupleBatch.clear();
                 }
    +        } catch(SerializationError se) {
    +            LOG.info("Serialization exception occurred, tuple is acknowledged but not written to Hive.", tuple);
    +            collector.ack(tuple);
    --- End diff --
    
    @revans2 Hive SerializationError means the tuple fields to table column mapping is nto right. For example if table has 6 rows and this tuple has only 5 fields than it can throw SerializationError. In this case instead of failing the tuple and keep re-running into this error. Its better we ack and log an error. 
    hiveWriter.write will throw an error on the current tuple and it wil be not be added to the tupleBatch.  Since this is only happening for current the other tuples in the batch are good and we can still proceed writing them to hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by harshach <gi...@git.apache.org>.

Github user harshach commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r44670951
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -134,22 +131,16 @@ public void execute(Tuple tuple) {
                         collector.ack(t);
                     tupleBatch.clear();
                 }
    +        } catch(SerializationError se) {
    +            LOG.info("Serialization exception occurred, tuple is acknowledged but not written to Hive.", tuple);
    +            collector.ack(tuple);
    --- End diff --
    
    @dossett we can if that improves the readability of it but still the same though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by dossett <gi...@git.apache.org>.

Github user dossett commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r44699532
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -134,22 +131,16 @@ public void execute(Tuple tuple) {
                         collector.ack(t);
                     tupleBatch.clear();
                 }
    +        } catch(SerializationError se) {
    +            LOG.info("Serialization exception occurred, tuple is acknowledged but not written to Hive.", tuple);
    +            collector.ack(tuple);
    --- End diff --
    
    Yeah, I think right try/catch blocks improve readability greatly but that
    may be a style preference. Would a call to collector.reporterror also be
    appropriate in that catch?
    
    On Thu, Nov 12, 2015 at 8:44 AM Harsha <no...@github.com> wrote:
    
    > In
    > external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java
    > <https://github.com/apache/storm/pull/871#discussion_r44680811>:
    >
    > > @@ -134,22 +131,16 @@ public void execute(Tuple tuple) {
    > >                      collector.ack(t);
    > >                  tupleBatch.clear();
    > >              }
    > > +        } catch(SerializationError se) {
    > > +            LOG.info("Serialization exception occurred, tuple is acknowledged but not written to Hive.", tuple);
    > > +            collector.ack(tuple);
    >
    > @revans2 <https://github.com/revans2> makes sense. I'll add the metric
    > and update the patch. Thanks.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/storm/pull/871/files#r44680811>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by revans2 <gi...@git.apache.org>.

Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r49201673
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/common/HiveConnector.java ---
    @@ -0,0 +1,241 @@
    +package org.apache.storm.hive.common;
    --- End diff --
    
    Need a license header.  Rat is complaining.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/871


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by revans2 <gi...@git.apache.org>.

Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r44673758
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -134,22 +131,16 @@ public void execute(Tuple tuple) {
                         collector.ack(t);
                     tupleBatch.clear();
                 }
    +        } catch(SerializationError se) {
    +            LOG.info("Serialization exception occurred, tuple is acknowledged but not written to Hive.", tuple);
    +            collector.ack(tuple);
    --- End diff --
    
    I am fine with it how the code is, like I said my comments were all minor.  I was just thinking that if there are issues with the serialization I as a user am not going to be looking at the logs for errors, I am going to be looking at metrics, like failed tuples.  Perhaps we could have a follow on JIRA to add an IMetric for the number of tuples lost because of serialization errors. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by revans2 <gi...@git.apache.org>.

Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r44597850
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -56,9 +56,9 @@
         private ExecutorService callTimeoutPool;
         private transient Timer heartBeatTimer;
         private Boolean kerberosEnabled = false;
    -    private AtomicBoolean timeToSendHeartBeat = new AtomicBoolean(false);
    +    private Boolean sendHeartBeat = true;
    --- End diff --
    
    I think this needs to be volatile or atomic.  It is written in one thread and read in another with no locks around any of it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by revans2 <gi...@git.apache.org>.

Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/871#issuecomment-155934542
  
    Done with a quick pass through the code, for the most part it looks good, just a few minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by Parth-Brahmbhatt <gi...@git.apache.org>.

Github user Parth-Brahmbhatt commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r44365442
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -87,7 +87,7 @@ public void prepare(Map conf, TopologyContext topologyContext, OutputCollector c
                     }
                 }
                 this.collector = collector;
    -            allWriters = new HashMap<HiveEndPoint,HiveWriter>();
    +            allWriters = new ConcurrentHashMap<HiveEndPoint,HiveWriter>();
    --- End diff --
    
    any reason to make it concurrent?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by HeartSaVioR <gi...@git.apache.org>.

Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/871#issuecomment-202747713
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by revans2 <gi...@git.apache.org>.

Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r44598767
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -134,22 +131,16 @@ public void execute(Tuple tuple) {
                         collector.ack(t);
                     tupleBatch.clear();
                 }
    +        } catch(SerializationError se) {
    +            LOG.info("Serialization exception occurred, tuple is acknowledged but not written to Hive.", tuple);
    +            collector.ack(tuple);
             } catch(Exception e) {
                 this.collector.reportError(e);
                 collector.fail(tuple);
    -            try {
    -                flushAndCloseWriters();
    -                LOG.info("acknowledging tuples after writers flushed and closed");
    -                for (Tuple t : tupleBatch)
    -                    collector.ack(t);
    -                tupleBatch.clear();
    -            } catch (Exception e1) {
    -                //If flushAndClose fails assume tuples are lost, do not ack
    -                LOG.warn("Error while flushing and closing writers, tuples will NOT be acknowledged");
    -                for (Tuple t : tupleBatch)
    -                    collector.fail(t);
    -                tupleBatch.clear();
    -            }
    +            for (Tuple t : tupleBatch)
    --- End diff --
    
    Could you wrap the body of this in '{}' I know it is just one line, but for coding conventions sake.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by dossett <gi...@git.apache.org>.

Github user dossett commented on a diff in the pull request:

    https://github.com/apache/storm/pull/871#discussion_r44600468
  
    --- Diff: external/storm-hive/src/main/java/org/apache/storm/hive/bolt/HiveBolt.java ---
    @@ -134,22 +131,16 @@ public void execute(Tuple tuple) {
                         collector.ack(t);
                     tupleBatch.clear();
                 }
    +        } catch(SerializationError se) {
    +            LOG.info("Serialization exception occurred, tuple is acknowledged but not written to Hive.", tuple);
    +            collector.ack(tuple);
    --- End diff --
    
    Should the try {...} catch (SerializationError) block be moved to cover just the writer.write() call then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by harshach <gi...@git.apache.org>.

Github user harshach commented on the pull request:

    https://github.com/apache/storm/pull/871#issuecomment-162259764
  
    @revans2 @Parth-Brahmbhatt requesting for one more review as I deleted duplicated code and addressed previous comments. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by Parth-Brahmbhatt <gi...@git.apache.org>.

Github user Parth-Brahmbhatt commented on the pull request:

    https://github.com/apache/storm/pull/871#issuecomment-200564783
  
    This PR has been open for a long time, i am still +1 and will merge this this weekend if no one objects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by ptgoetz <gi...@git.apache.org>.

Github user ptgoetz commented on the pull request:

    https://github.com/apache/storm/pull/871#issuecomment-202502403
  
    Two very minor nits. Two files are missing Apache headers, and a for loop that should be enclosed in '{}'.
    
    Aside from that, I'm +1 for the patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1030. Hive Connector Fixes.

Posted by harshach <gi...@git.apache.org>.

Github user harshach commented on the pull request:

    https://github.com/apache/storm/pull/871#issuecomment-202514351
  
    Thanks for the review @Parth-Brahmbhatt @ptgoetz . Addressed the comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---