You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2016/01/26 08:25:28 UTC

[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/10920

    [SPARK-12937][SQL] bloom filter serialization

    This PR adds serialization support for BloomFilter.
    
    A version number is added to version the serialized binary format.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark bloom-filter

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10920.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10920
    
----
commit 4b05a35d58cdabccd915582894d303ba437bee0f
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-01-26T07:23:51Z

    bloom filter serialization

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174881466
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10920#discussion_r50801901
  
    --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/BitArray.java ---
    @@ -24,6 +27,9 @@
       private long bitCount;
     
       static int numWords(long numBits) {
    +    if (numBits <= 0) {
    +      throw new IllegalArgumentException("numBits must be positive");
    --- End diff --
    
    also include the current value


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10920#discussion_r50802030
  
    --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java ---
    @@ -83,7 +87,7 @@
        * bloom filters are appropriately sized to avoid saturating them.
        *
        * @param other The bloom filter to combine this bloom filter with. It is not mutated.
    -   * @throws IllegalArgumentException if {@code isCompatible(that) == false}
    +   * @throws IncompatibleMergeException if {@code isCompatible(that) == false}
    --- End diff --
    
    you are using "other" instead of "that" here. make them consistent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10920#discussion_r50805558
  
    --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java ---
    @@ -39,6 +43,28 @@
      * The implementation is largely based on the {@code BloomFilter} class from guava.
      */
     public abstract class BloomFilter {
    +
    +  public enum Version {
    +    /**
    +     * {@code BloomFilter} binary format version 1 (all values written in big-endian order):
    +     * - Version number, always 1 (32 bit)
    +     * - Total number of words of the underlying bit array (32 bit)
    +     * - The words/longs (numWords * 64 bit)
    +     * - Number of hash functions (32 bit)
    --- End diff --
    
    why do we write the number of hash functions at the end rather than before the words?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174881467
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50081/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174869956
  
    cc @rxin @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174889409
  
    **[Test build #50084 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50084/consoleFull)** for PR 10920 at commit [`38d674c`](https://github.com/apache/spark/commit/38d674c99af18aaf807c120647efb16442b5a967).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174898617
  
    **[Test build #50086 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50086/consoleFull)** for PR 10920 at commit [`c9b29c9`](https://github.com/apache/spark/commit/c9b29c94d6cbb6bde098e6d5b971de118be0218b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10920#discussion_r50802492
  
    --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/Version.java ---
    @@ -0,0 +1,35 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.util.sketch;
    +
    +/**
    + * Version number of the serialized binary format for bloom filter or count-min sketch.
    + */
    +public enum Version {
    --- End diff --
    
    I think we should move it back, because:
    
    1. The version enum is actually the best place to document the binary protocol.
    
    2. This will be really confusing when bloomfilter has v2 and yet count-min sketch has only v1.
    
    3. The amount of code duplication you save is teeny (actually you probably added more loc by having an apache licensing header).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10920


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174899235
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10920#discussion_r50801991
  
    --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/BitArray.java ---
    @@ -32,13 +38,14 @@ static int numWords(long numBits) {
       }
     
       BitArray(long numBits) {
    -    if (numBits <= 0) {
    -      throw new IllegalArgumentException("numBits must be positive");
    -    }
    -    this.data = new long[numWords(numBits)];
    +    this(new long[numWords(numBits)]);
    +  }
    +
    +  private BitArray(long[] data) {
    +    this.data = data;
         long bitCount = 0;
    -    for (long value : data) {
    -      bitCount += Long.bitCount(value);
    +    for (long datum : data) {
    --- End diff --
    
    it is a little bit weird to say datam here, since you are actually working with 64 "datum" at once. maybe "word"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174875525
  
    **[Test build #50081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50081/consoleFull)** for PR 10920 at commit [`4b05a35`](https://github.com/apache/spark/commit/4b05a35d58cdabccd915582894d303ba437bee0f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10920#discussion_r50805028
  
    --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/CountMinSketch.java ---
    @@ -67,13 +78,11 @@
           this.versionNumber = versionNumber;
         }
     
    -    public int getVersionNumber() {
    +    int getVersionNumber() {
           return versionNumber;
         }
       }
     
    -  public abstract Version version();
    --- End diff --
    
    cc @liancheng , I removed this as the design doc says users should not care about the version being used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174885419
  
    **[Test build #50084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50084/consoleFull)** for PR 10920 at commit [`38d674c`](https://github.com/apache/spark/commit/38d674c99af18aaf807c120647efb16442b5a967).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10920#discussion_r50801787
  
    --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/Version.java ---
    @@ -0,0 +1,35 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.util.sketch;
    +
    +/**
    + * Version number of the serialized binary format for bloom filter or count-min sketch.
    + */
    +public enum Version {
    --- End diff --
    
    bloom filter and count-min sketch can have different version values, but we can share same version class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174889570
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50084/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174901242
  
    I'm going to merge this first. Please move the num hash function thing in your next pr. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10920#discussion_r50905576
  
    --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java ---
    @@ -39,6 +43,28 @@
      * The implementation is largely based on the {@code BloomFilter} class from guava.
      */
     public abstract class BloomFilter {
    +
    +  public enum Version {
    +    /**
    +     * {@code BloomFilter} binary format version 1 (all values written in big-endian order):
    +     * - Version number, always 1 (32 bit)
    +     * - Total number of words of the underlying bit array (32 bit)
    +     * - The words/longs (numWords * 64 bit)
    +     * - Number of hash functions (32 bit)
    --- End diff --
    
    Nit: Scaladoc requires an extra space before `-` to form an unordered list. I'll fix this one in #10911.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174890369
  
    **[Test build #50086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50086/consoleFull)** for PR 10920 at commit [`c9b29c9`](https://github.com/apache/spark/commit/c9b29c94d6cbb6bde098e6d5b971de118be0218b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10920#discussion_r50802349
  
    --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilterImpl.java ---
    @@ -161,4 +194,24 @@ public BloomFilter mergeInPlace(BloomFilter other) throws IncompatibleMergeExcep
         this.bits.putAll(that.bits);
         return this;
       }
    +
    +  @Override
    +  public void writeTo(OutputStream out) throws IOException {
    +    DataOutputStream dos = new DataOutputStream(out);
    +
    +    dos.writeInt(Version.V1.getVersionNumber());
    +    bits.writeTo(dos);
    +    dos.writeInt(numHashFunctions);
    +  }
    +
    +  public static BloomFilterImpl readFrom(InputStream in) throws IOException {
    +    DataInputStream dis = new DataInputStream(in);
    +
    +    int version = dis.readInt();
    +    if (version != Version.V1.getVersionNumber()) {
    +      throw new IOException("Unexpected Bloom Filter version number (" + version + ")");
    --- End diff --
    
    BloomFilter, or Bloom filter


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174881345
  
    **[Test build #50081 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50081/consoleFull)** for PR 10920 at commit [`4b05a35`](https://github.com/apache/spark/commit/4b05a35d58cdabccd915582894d303ba437bee0f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174899242
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50086/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10920#discussion_r50802512
  
    --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/Version.java ---
    @@ -0,0 +1,35 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.util.sketch;
    +
    +/**
    + * Version number of the serialized binary format for bloom filter or count-min sketch.
    + */
    +public enum Version {
    --- End diff --
    
    cc @liancheng on point 1 - the best place to document the binary protocol is in Version!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10920#issuecomment-174889568
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org