You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by roshannaik <gi...@git.apache.org> on 2018/06/10 12:38:58 UTC

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

GitHub user roshannaik opened a pull request:

    https://github.com/apache/storm/pull/2711

    STORM-3100 : Introducing CustomIndexArray to speedup HashMap lookups

    - Added Unit Tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/roshannaik/storm STORM-3100

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2711.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2711
    
----
commit 1b14f6aad86fa96abe5fec07fea287c02004cc3f
Author: Roshan Naik <na...@...>
Date:   2018-06-10T12:36:08Z

    STORM-3100 : Introducing CustomIndexArray. Using it for task2componentId mapping & ExecutorTransfer.localReceiveQueues

----


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by roshannaik <gi...@git.apache.org>.
Github user roshannaik commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r197605298
  
    --- Diff: storm-client/src/jvm/org/apache/storm/utils/CustomIndexArray.java ---
    @@ -0,0 +1,174 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.storm.utils;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeSet;
    +
    +/***
    + *  A fixed size array with a customizable indexing range. The index range can have -ve lower / upper bound.
    + *  Intended to be a faster alternative to HashMap<Integer, .. >. Only applicable as a substitute if the
    + *  largest & smallest indices are known at time of creation.
    + *  This class does not inherit from :
    + *   - Map<Integer,T> : For performance reasons. The get(Object key) method requires key to be cast to Integer before use.
    + *   - ArrayList<T> : as this class supports negative indexes & cannot grow/shrink.
    + */
    +
    +public class CustomIndexArray<T>  {
    +
    +    public final int baseIndex;
    +    private final ArrayList<T> elements;
    +    private final int elemCount;
    +
    +    /**
    +     * Creates the array with (upperIndex-lowerIndex+1) elements, initialized to null.
    +     * @param lowerIndex Smallest (inclusive) valid index for the array. Can be +ve, -ve or 0.
    +     * @param upperIndex Largest  (inclusive) valid index for the array. Can be +ve, -ve or 0. Must be > lowerIndex
    +     */
    +    public CustomIndexArray(int lowerIndex, int upperIndex) {
    +        if (lowerIndex >= upperIndex) {
    +            throw new IllegalArgumentException("lowerIndex must be < upperIndex");
    +        }
    +
    +        this.baseIndex = lowerIndex;
    +        this.elemCount = upperIndex - lowerIndex + 1;
    +
    +        this.elements = makeNullInitializedArray(elemCount);
    +    }
    +
    +    private static <T> ArrayList<T> makeNullInitializedArray(int elemCount) {
    +        ArrayList<T> result = new ArrayList<T>(elemCount);
    +        for (int i = 0; i < elemCount; i++) {
    +            result.add(null);
    +        }
    +        return result;
    +    }
    +
    +    /**
    +     * Initializes the array with elements from a HashTable.
    +     *
    +     * @param src  the source map from which to initialize
    +     */
    +    public CustomIndexArray(Map<Integer, T> src) {
    +        Integer lowerIndex = Integer.MAX_VALUE;
    +        Integer upperIndex = Integer.MIN_VALUE;
    +
    +        for (Map.Entry<Integer, T> entry : src.entrySet()) { // calculate smallest & largest indexes
    +            Integer key =  entry.getKey();
    +            if (key < lowerIndex) {
    +                lowerIndex = key;
    +            }
    +            if (key > upperIndex) {
    +                upperIndex = key;
    +            }
    +        }
    +        this.baseIndex = lowerIndex;
    +        this.elemCount = upperIndex - lowerIndex + 1;
    +
    +        this.elements = makeNullInitializedArray(elemCount);
    +
    +        for (Map.Entry<Integer, T> entry : src.entrySet()) {
    +            elements.set(entry.getKey() - baseIndex, entry.getValue());
    +        }
    +    }
    +
    +    /**
    +     * Get the value at the index.
    +     * @throws IndexOutOfBoundsException if index is outside of bounds specified to constructor
    +     */
    +    public T get(int index) {
    +        return elements.get(index - baseIndex);
    +    }
    +
    +    /**
    +     * Set the value at the index.
    +     * @throws IndexOutOfBoundsException if index is outside of bounds specified to constructor
    +     */
    +    public T set(int index, T value) {
    +        return elements.set(index - baseIndex, value);
    +    }
    +
    +    /**
    +     * Returns the number of elements (including null elements).
    +     */
    +    public int size() {
    +        return elemCount;
    +    }
    +
    +    /**
    +     * Always returns true as this cannot be empty.
    +     */
    +    public boolean isEmpty() {
    +        return false;
    +    }
    +
    +
    +    /**
    +     * Check if index is valid.
    +     */
    +    public boolean isValidIndex(int index) {
    +        int actualIndex = index - baseIndex;
    +        if (actualIndex < 0  ||  actualIndex >= elemCount) {
    +            return false;
    +        }
    +        return true;
    +    }
    +
    +    /**
    +     * Returns the index range as a Set.
    +     */
    +    public Set<Integer> indices() {
    --- End diff --
    
    I felt its intuitive and nice to maintain the integers in order even if not absolutely necessary.


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r196843356
  
    --- Diff: examples/storm-elasticsearch-examples/src/main/java/org/apache/storm/elasticsearch/common/EsTestUtil.java ---
    @@ -56,7 +57,7 @@ public static Tuple generateTestTuple(final String source,
             GeneralTopologyContext topologyContext = new GeneralTopologyContext(
                     builder.createTopology(),
                     new Config(),
    -                new HashMap<>(),
    +                new CustomIndexArray<String>(0,1),
    --- End diff --
    
    Also why does this need to be 0,1?


---

[GitHub] storm issue #2711: STORM-3100 : Introducing CustomIndexArray to speedup Hash...

Posted by roshannaik <gi...@git.apache.org>.
Github user roshannaik commented on the issue:

    https://github.com/apache/storm/pull/2711
  
    Spent sometime trying to triage this failure but unable to figure out the cause. i know the issue has to do with this patch as it passes w/o this patch.  I think it will take me some more time to get to the bottom of this.  I did notice another (kind of significant) performance issue is simple to address. Given that this is a relatively minor optimization, I am planning to provide a fix for that one first, and then revert back to this.


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r196839063
  
    --- Diff: storm-client/src/jvm/org/apache/storm/utils/CustomIndexArray.java ---
    @@ -0,0 +1,174 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.storm.utils;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeSet;
    +
    +/***
    + *  A fixed size array with a customizable indexing range. The index range can have -ve lower / upper bound.
    + *  Intended to be a faster alternative to HashMap<Integer, .. >. Only applicable as a substitute if the
    + *  largest & smallest indices are known at time of creation.
    + *  This class does not inherit from :
    + *   - Map<Integer,T> : For performance reasons. The get(Object key) method requires key to be cast to Integer before use.
    + *   - ArrayList<T> : as this class supports negative indexes & cannot grow/shrink.
    + */
    +
    +public class CustomIndexArray<T>  {
    +
    +    public final int baseIndex;
    +    private final ArrayList<T> elements;
    +    private final int elemCount;
    +
    +    /**
    +     * Creates the array with (upperIndex-lowerIndex+1) elements, initialized to null.
    +     * @param lowerIndex Smallest (inclusive) valid index for the array. Can be +ve, -ve or 0.
    +     * @param upperIndex Largest  (inclusive) valid index for the array. Can be +ve, -ve or 0. Must be > lowerIndex
    +     */
    +    public CustomIndexArray(int lowerIndex, int upperIndex) {
    +        if (lowerIndex >= upperIndex) {
    +            throw new IllegalArgumentException("lowerIndex must be < upperIndex");
    +        }
    +
    +        this.baseIndex = lowerIndex;
    +        this.elemCount = upperIndex - lowerIndex + 1;
    +
    +        this.elements = makeNullInitializedArray(elemCount);
    +    }
    +
    +    private static <T> ArrayList<T> makeNullInitializedArray(int elemCount) {
    +        ArrayList<T> result = new ArrayList<T>(elemCount);
    +        for (int i = 0; i < elemCount; i++) {
    +            result.add(null);
    +        }
    +        return result;
    +    }
    +
    +    /**
    +     * Initializes the array with elements from a HashTable.
    +     *
    +     * @param src  the source map from which to initialize
    +     */
    +    public CustomIndexArray(Map<Integer, T> src) {
    +        Integer lowerIndex = Integer.MAX_VALUE;
    +        Integer upperIndex = Integer.MIN_VALUE;
    +
    +        for (Map.Entry<Integer, T> entry : src.entrySet()) { // calculate smallest & largest indexes
    +            Integer key =  entry.getKey();
    +            if (key < lowerIndex) {
    +                lowerIndex = key;
    +            }
    +            if (key > upperIndex) {
    +                upperIndex = key;
    +            }
    +        }
    +        this.baseIndex = lowerIndex;
    +        this.elemCount = upperIndex - lowerIndex + 1;
    +
    +        this.elements = makeNullInitializedArray(elemCount);
    +
    +        for (Map.Entry<Integer, T> entry : src.entrySet()) {
    --- End diff --
    
    nit: would calling `set` improve code reuse a little?


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r196846579
  
    --- Diff: storm-client/src/jvm/org/apache/storm/daemon/Task.java ---
    @@ -112,15 +112,15 @@ public Task(Executor executor, Integer taskId) throws IOException {
         public List<Integer> getOutgoingTasks(Integer outTaskId, String stream, List<Object> values) {
             if (debug) {
                 LOG.info("Emitting direct: {}; {} {} {} ", outTaskId, componentId, stream, values);
    -        }
    --- End diff --
    
    I think grouping check should happen even if debug is not enabled.


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r196842510
  
    --- Diff: external/storm-elasticsearch/src/test/java/org/apache/storm/elasticsearch/common/EsTestUtil.java ---
    @@ -63,7 +64,7 @@
         public static Tuple generateTestTuple(String source, String index, String type, String id) {
             TopologyBuilder builder = new TopologyBuilder();
             GeneralTopologyContext topologyContext = new GeneralTopologyContext(builder.createTopology(),
    -                new Config(), new HashMap<>(), new HashMap<>(), new HashMap<>(), "") {
    +                new Config(),  new CustomIndexArray<String>(0,1), new HashMap<>(), new HashMap<>(), "") {
    --- End diff --
    
    nit: extra space.


---

[GitHub] storm issue #2711: STORM-3100 : Introducing CustomIndexArray to speedup Hash...

Posted by roshannaik <gi...@git.apache.org>.
Github user roshannaik commented on the issue:

    https://github.com/apache/storm/pull/2711
  
    Looks like tester_bolt_metrics.py under storm-core is failing. I see that it can be run via mvn clojure:test under storm-core. But unable to figure out what really going wrong. Any help would be appreciated. thanks.


---

[GitHub] storm issue #2711: STORM-3100 : Introducing CustomIndexArray to speedup Hash...

Posted by roshannaik <gi...@git.apache.org>.
Github user roshannaik commented on the issue:

    https://github.com/apache/storm/pull/2711
  
    Sorry for not being able to address the comments sooner. Have been traveling and should be able to get back to this soon after jul 12 th.  


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by roshannaik <gi...@git.apache.org>.
Github user roshannaik commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r197605410
  
    --- Diff: examples/storm-elasticsearch-examples/src/main/java/org/apache/storm/elasticsearch/common/EsTestUtil.java ---
    @@ -56,7 +57,7 @@ public static Tuple generateTestTuple(final String source,
             GeneralTopologyContext topologyContext = new GeneralTopologyContext(
                     builder.createTopology(),
                     new Config(),
    -                new HashMap<>(),
    +                new CustomIndexArray<String>(0,1),
    --- End diff --
    
    Just any non empty range.


---

[GitHub] storm issue #2711: STORM-3100 : Introducing CustomIndexArray to speedup Hash...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2711
  
    Looks like the build still fails and relevant to this patch.


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r196848589
  
    --- Diff: storm-server/src/main/java/org/apache/storm/Testing.java ---
    @@ -660,8 +661,8 @@ public static Tuple testTuple(List<Object> values, MkTupleParam param) {
                 }
             }
     
    -        Map<Integer, String> taskToComp = new HashMap<>();
    -        taskToComp.put(task, component);
    +        CustomIndexArray<String> taskToComp = new CustomIndexArray<>(task,task+1);
    --- End diff --
    
    If lower and upper are inclusive why do we need task, task+1?


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by roshannaik <gi...@git.apache.org>.
Github user roshannaik commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r197605381
  
    --- Diff: examples/storm-elasticsearch-examples/src/main/java/org/apache/storm/elasticsearch/common/EsTestUtil.java ---
    @@ -56,7 +57,7 @@ public static Tuple generateTestTuple(final String source,
             GeneralTopologyContext topologyContext = new GeneralTopologyContext(
                     builder.createTopology(),
                     new Config(),
    -                new HashMap<>(),
    +                new CustomIndexArray<String>(0,1),
    --- End diff --
    
    It is mostly for readability .. when reading this code its unclear what the deduced type will be without further checking the parameter types for the constructor. Happy to change if you prefer the other way.


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r196841115
  
    --- Diff: storm-client/src/jvm/org/apache/storm/utils/CustomIndexArray.java ---
    @@ -0,0 +1,174 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.storm.utils;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeSet;
    +
    +/***
    + *  A fixed size array with a customizable indexing range. The index range can have -ve lower / upper bound.
    + *  Intended to be a faster alternative to HashMap<Integer, .. >. Only applicable as a substitute if the
    + *  largest & smallest indices are known at time of creation.
    + *  This class does not inherit from :
    + *   - Map<Integer,T> : For performance reasons. The get(Object key) method requires key to be cast to Integer before use.
    + *   - ArrayList<T> : as this class supports negative indexes & cannot grow/shrink.
    + */
    +
    +public class CustomIndexArray<T>  {
    +
    +    public final int baseIndex;
    +    private final ArrayList<T> elements;
    +    private final int elemCount;
    +
    +    /**
    +     * Creates the array with (upperIndex-lowerIndex+1) elements, initialized to null.
    +     * @param lowerIndex Smallest (inclusive) valid index for the array. Can be +ve, -ve or 0.
    +     * @param upperIndex Largest  (inclusive) valid index for the array. Can be +ve, -ve or 0. Must be > lowerIndex
    +     */
    +    public CustomIndexArray(int lowerIndex, int upperIndex) {
    +        if (lowerIndex >= upperIndex) {
    +            throw new IllegalArgumentException("lowerIndex must be < upperIndex");
    +        }
    +
    +        this.baseIndex = lowerIndex;
    +        this.elemCount = upperIndex - lowerIndex + 1;
    +
    +        this.elements = makeNullInitializedArray(elemCount);
    +    }
    +
    +    private static <T> ArrayList<T> makeNullInitializedArray(int elemCount) {
    +        ArrayList<T> result = new ArrayList<T>(elemCount);
    +        for (int i = 0; i < elemCount; i++) {
    +            result.add(null);
    +        }
    +        return result;
    +    }
    +
    +    /**
    +     * Initializes the array with elements from a HashTable.
    +     *
    +     * @param src  the source map from which to initialize
    +     */
    +    public CustomIndexArray(Map<Integer, T> src) {
    +        Integer lowerIndex = Integer.MAX_VALUE;
    +        Integer upperIndex = Integer.MIN_VALUE;
    +
    +        for (Map.Entry<Integer, T> entry : src.entrySet()) { // calculate smallest & largest indexes
    +            Integer key =  entry.getKey();
    +            if (key < lowerIndex) {
    +                lowerIndex = key;
    +            }
    +            if (key > upperIndex) {
    +                upperIndex = key;
    +            }
    +        }
    +        this.baseIndex = lowerIndex;
    +        this.elemCount = upperIndex - lowerIndex + 1;
    +
    +        this.elements = makeNullInitializedArray(elemCount);
    +
    +        for (Map.Entry<Integer, T> entry : src.entrySet()) {
    +            elements.set(entry.getKey() - baseIndex, entry.getValue());
    +        }
    +    }
    +
    +    /**
    +     * Get the value at the index.
    +     * @throws IndexOutOfBoundsException if index is outside of bounds specified to constructor
    +     */
    +    public T get(int index) {
    +        return elements.get(index - baseIndex);
    +    }
    +
    +    /**
    +     * Set the value at the index.
    +     * @throws IndexOutOfBoundsException if index is outside of bounds specified to constructor
    +     */
    +    public T set(int index, T value) {
    +        return elements.set(index - baseIndex, value);
    +    }
    +
    +    /**
    +     * Returns the number of elements (including null elements).
    +     */
    +    public int size() {
    +        return elemCount;
    +    }
    +
    +    /**
    +     * Always returns true as this cannot be empty.
    +     */
    +    public boolean isEmpty() {
    +        return false;
    +    }
    +
    +
    +    /**
    +     * Check if index is valid.
    +     */
    +    public boolean isValidIndex(int index) {
    +        int actualIndex = index - baseIndex;
    +        if (actualIndex < 0  ||  actualIndex >= elemCount) {
    +            return false;
    +        }
    +        return true;
    +    }
    +
    +    /**
    +     * Returns the index range as a Set.
    +     */
    +    public Set<Integer> indices() {
    --- End diff --
    
    Nit: we are keeping the order of the indices here, but it is not documented in the javadocs, and it is not needed, as the only place this is used is for doing a set difference.
    
    It would also be nice to indicate that this is not a cheap call so people don't abuse it calling it a lot.


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r196843622
  
    --- Diff: external/storm-hdfs/src/test/java/org/apache/storm/hdfs/bolt/format/TestSimpleFileNameFormat.java ---
    @@ -69,8 +70,8 @@ public void testTimeFormat() {
         }
     
         private TopologyContext createTopologyContext(Map<String, Object> topoConf) {
    -        Map<Integer, String> taskToComponent = new HashMap<Integer, String>();
    -        taskToComponent.put(7, "Xcom");
    +        CustomIndexArray<String> taskToComponent = new CustomIndexArray<>(0,8);
    --- End diff --
    
    Why 0,8 if all we are setting is 7?


---

[GitHub] storm issue #2711: STORM-3100 : Introducing CustomIndexArray to speedup Hash...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2711
  
    @roshannaik 
    
    https://travis-ci.org/apache/storm/jobs/403972342
    https://travis-ci.org/apache/storm/jobs/403972347
    https://travis-ci.org/apache/storm/jobs/403972348
    https://travis-ci.org/apache/storm/jobs/403972349
    
    Travis CI build is failing and failures look like related to the changeset. Could you check above build result and track? 


---

[GitHub] storm issue #2711: STORM-3100 : Introducing CustomIndexArray to speedup Hash...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2711
  
    It looks like Zk issue but not 100% sure. If it is happening intermittently we might be fine. Could you setup Travis to your fork and run build couple of times?


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by roshannaik <gi...@git.apache.org>.
Github user roshannaik commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r197605635
  
    --- Diff: storm-client/src/jvm/org/apache/storm/daemon/Task.java ---
    @@ -112,15 +112,15 @@ public Task(Executor executor, Integer taskId) throws IOException {
         public List<Integer> getOutgoingTasks(Integer outTaskId, String stream, List<Object> values) {
             if (debug) {
                 LOG.info("Emitting direct: {}; {} {} {} ", outTaskId, componentId, stream, values);
    -        }
    --- End diff --
    
    That was part of an older PR. From what I recall ... on profiling I had noticed that the grouping check was expensive in the critical path due to the fact that it needed lookups in three (now down to 2) hashmaps :   streamComponentToGrouper  &  componentGrouping. Since neither were keyed on Integer, the CustomIndexArray style optimization was not possible. 


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by roshannaik <gi...@git.apache.org>.
Github user roshannaik commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r197605303
  
    --- Diff: storm-client/src/jvm/org/apache/storm/utils/CustomIndexArray.java ---
    @@ -0,0 +1,174 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.storm.utils;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeSet;
    +
    +/***
    + *  A fixed size array with a customizable indexing range. The index range can have -ve lower / upper bound.
    + *  Intended to be a faster alternative to HashMap<Integer, .. >. Only applicable as a substitute if the
    + *  largest & smallest indices are known at time of creation.
    + *  This class does not inherit from :
    + *   - Map<Integer,T> : For performance reasons. The get(Object key) method requires key to be cast to Integer before use.
    + *   - ArrayList<T> : as this class supports negative indexes & cannot grow/shrink.
    + */
    +
    +public class CustomIndexArray<T>  {
    +
    +    public final int baseIndex;
    +    private final ArrayList<T> elements;
    +    private final int elemCount;
    +
    +    /**
    +     * Creates the array with (upperIndex-lowerIndex+1) elements, initialized to null.
    +     * @param lowerIndex Smallest (inclusive) valid index for the array. Can be +ve, -ve or 0.
    +     * @param upperIndex Largest  (inclusive) valid index for the array. Can be +ve, -ve or 0. Must be > lowerIndex
    +     */
    +    public CustomIndexArray(int lowerIndex, int upperIndex) {
    +        if (lowerIndex >= upperIndex) {
    +            throw new IllegalArgumentException("lowerIndex must be < upperIndex");
    +        }
    +
    +        this.baseIndex = lowerIndex;
    +        this.elemCount = upperIndex - lowerIndex + 1;
    +
    +        this.elements = makeNullInitializedArray(elemCount);
    +    }
    +
    +    private static <T> ArrayList<T> makeNullInitializedArray(int elemCount) {
    +        ArrayList<T> result = new ArrayList<T>(elemCount);
    +        for (int i = 0; i < elemCount; i++) {
    +            result.add(null);
    +        }
    +        return result;
    +    }
    +
    +    /**
    +     * Initializes the array with elements from a HashTable.
    +     *
    +     * @param src  the source map from which to initialize
    +     */
    +    public CustomIndexArray(Map<Integer, T> src) {
    +        Integer lowerIndex = Integer.MAX_VALUE;
    +        Integer upperIndex = Integer.MIN_VALUE;
    +
    +        for (Map.Entry<Integer, T> entry : src.entrySet()) { // calculate smallest & largest indexes
    +            Integer key =  entry.getKey();
    +            if (key < lowerIndex) {
    +                lowerIndex = key;
    +            }
    +            if (key > upperIndex) {
    +                upperIndex = key;
    +            }
    +        }
    +        this.baseIndex = lowerIndex;
    +        this.elemCount = upperIndex - lowerIndex + 1;
    +
    +        this.elements = makeNullInitializedArray(elemCount);
    +
    +        for (Map.Entry<Integer, T> entry : src.entrySet()) {
    --- End diff --
    
    Yes, calling set() directly instead of  elements.set()  would have been nice..But normally I dont call instance methods from constructors since the object is not considered as fully initialized until the constructor has finished.


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r196838549
  
    --- Diff: storm-client/src/jvm/org/apache/storm/utils/CustomIndexArray.java ---
    @@ -0,0 +1,174 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.storm.utils;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeSet;
    +
    +/***
    + *  A fixed size array with a customizable indexing range. The index range can have -ve lower / upper bound.
    + *  Intended to be a faster alternative to HashMap<Integer, .. >. Only applicable as a substitute if the
    + *  largest & smallest indices are known at time of creation.
    + *  This class does not inherit from :
    + *   - Map<Integer,T> : For performance reasons. The get(Object key) method requires key to be cast to Integer before use.
    + *   - ArrayList<T> : as this class supports negative indexes & cannot grow/shrink.
    + */
    +
    +public class CustomIndexArray<T>  {
    +
    +    public final int baseIndex;
    +    private final ArrayList<T> elements;
    +    private final int elemCount;
    +
    +    /**
    +     * Creates the array with (upperIndex-lowerIndex+1) elements, initialized to null.
    +     * @param lowerIndex Smallest (inclusive) valid index for the array. Can be +ve, -ve or 0.
    +     * @param upperIndex Largest  (inclusive) valid index for the array. Can be +ve, -ve or 0. Must be > lowerIndex
    +     */
    +    public CustomIndexArray(int lowerIndex, int upperIndex) {
    +        if (lowerIndex >= upperIndex) {
    +            throw new IllegalArgumentException("lowerIndex must be < upperIndex");
    +        }
    +
    +        this.baseIndex = lowerIndex;
    +        this.elemCount = upperIndex - lowerIndex + 1;
    +
    +        this.elements = makeNullInitializedArray(elemCount);
    +    }
    +
    +    private static <T> ArrayList<T> makeNullInitializedArray(int elemCount) {
    +        ArrayList<T> result = new ArrayList<T>(elemCount);
    +        for (int i = 0; i < elemCount; i++) {
    +            result.add(null);
    +        }
    +        return result;
    +    }
    +
    +    /**
    +     * Initializes the array with elements from a HashTable.
    +     *
    +     * @param src  the source map from which to initialize
    +     */
    +    public CustomIndexArray(Map<Integer, T> src) {
    +        Integer lowerIndex = Integer.MAX_VALUE;
    +        Integer upperIndex = Integer.MIN_VALUE;
    +
    +        for (Map.Entry<Integer, T> entry : src.entrySet()) { // calculate smallest & largest indexes
    --- End diff --
    
    nit: would `src.keySet()` be simpler because we don't use the values?


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r196839436
  
    --- Diff: storm-client/src/jvm/org/apache/storm/utils/CustomIndexArray.java ---
    @@ -0,0 +1,174 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.storm.utils;
    +
    +import java.util.ArrayList;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeSet;
    +
    +/***
    + *  A fixed size array with a customizable indexing range. The index range can have -ve lower / upper bound.
    + *  Intended to be a faster alternative to HashMap<Integer, .. >. Only applicable as a substitute if the
    + *  largest & smallest indices are known at time of creation.
    + *  This class does not inherit from :
    + *   - Map<Integer,T> : For performance reasons. The get(Object key) method requires key to be cast to Integer before use.
    + *   - ArrayList<T> : as this class supports negative indexes & cannot grow/shrink.
    + */
    +
    +public class CustomIndexArray<T>  {
    +
    +    public final int baseIndex;
    +    private final ArrayList<T> elements;
    +    private final int elemCount;
    +
    +    /**
    +     * Creates the array with (upperIndex-lowerIndex+1) elements, initialized to null.
    +     * @param lowerIndex Smallest (inclusive) valid index for the array. Can be +ve, -ve or 0.
    +     * @param upperIndex Largest  (inclusive) valid index for the array. Can be +ve, -ve or 0. Must be > lowerIndex
    +     */
    +    public CustomIndexArray(int lowerIndex, int upperIndex) {
    +        if (lowerIndex >= upperIndex) {
    +            throw new IllegalArgumentException("lowerIndex must be < upperIndex");
    +        }
    +
    +        this.baseIndex = lowerIndex;
    +        this.elemCount = upperIndex - lowerIndex + 1;
    +
    +        this.elements = makeNullInitializedArray(elemCount);
    +    }
    +
    +    private static <T> ArrayList<T> makeNullInitializedArray(int elemCount) {
    +        ArrayList<T> result = new ArrayList<T>(elemCount);
    +        for (int i = 0; i < elemCount; i++) {
    +            result.add(null);
    +        }
    +        return result;
    +    }
    +
    +    /**
    +     * Initializes the array with elements from a HashTable.
    +     *
    +     * @param src  the source map from which to initialize
    +     */
    +    public CustomIndexArray(Map<Integer, T> src) {
    +        Integer lowerIndex = Integer.MAX_VALUE;
    +        Integer upperIndex = Integer.MIN_VALUE;
    +
    +        for (Map.Entry<Integer, T> entry : src.entrySet()) { // calculate smallest & largest indexes
    +            Integer key =  entry.getKey();
    +            if (key < lowerIndex) {
    +                lowerIndex = key;
    +            }
    +            if (key > upperIndex) {
    +                upperIndex = key;
    +            }
    +        }
    +        this.baseIndex = lowerIndex;
    +        this.elemCount = upperIndex - lowerIndex + 1;
    +
    +        this.elements = makeNullInitializedArray(elemCount);
    +
    +        for (Map.Entry<Integer, T> entry : src.entrySet()) {
    +            elements.set(entry.getKey() - baseIndex, entry.getValue());
    +        }
    +    }
    +
    +    /**
    +     * Get the value at the index.
    +     * @throws IndexOutOfBoundsException if index is outside of bounds specified to constructor
    +     */
    +    public T get(int index) {
    +        return elements.get(index - baseIndex);
    +    }
    +
    +    /**
    +     * Set the value at the index.
    +     * @throws IndexOutOfBoundsException if index is outside of bounds specified to constructor
    +     */
    +    public T set(int index, T value) {
    +        return elements.set(index - baseIndex, value);
    +    }
    +
    +    /**
    +     * Returns the number of elements (including null elements).
    +     */
    +    public int size() {
    +        return elemCount;
    +    }
    +
    +    /**
    +     * Always returns true as this cannot be empty.
    +     */
    +    public boolean isEmpty() {
    --- End diff --
    
    Why have this at all.  It does not implement collection so we don't need it.


---

[GitHub] storm pull request #2711: STORM-3100 : Introducing CustomIndexArray to speed...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2711#discussion_r196843223
  
    --- Diff: examples/storm-elasticsearch-examples/src/main/java/org/apache/storm/elasticsearch/common/EsTestUtil.java ---
    @@ -56,7 +57,7 @@ public static Tuple generateTestTuple(final String source,
             GeneralTopologyContext topologyContext = new GeneralTopologyContext(
                     builder.createTopology(),
                     new Config(),
    -                new HashMap<>(),
    +                new CustomIndexArray<String>(0,1),
    --- End diff --
    
    Nit: java should be able to figure out that it is a `String` from the required type for GeneralTopologyContext 


---