You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by hnfgns <gi...@git.apache.org> on 2015/07/08 20:26:24 UTC

[GitHub] drill pull request: DRILL-3313: Eliminate redundant #load methods ...

GitHub user hnfgns opened a pull request:

    https://github.com/apache/drill/pull/81

    DRILL-3313: Eliminate redundant #load methods and unit-test loading & exporting of vectors

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hnfgns/incubator-drill DRILL-3313

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/81.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #81
    
----
commit ef79c18d408582dc07b83c8f62075d2329216bd1
Author: Hanifi Gunes <hg...@maprtech.com>
Date:   2015-06-23T19:05:45Z

    DRILL-3313: Eliminate redundant #load methods and unit-test loading & exporting of vectors

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-3313: Eliminate redundant #load methods ...

Posted by jaltekruse <gi...@git.apache.org>.
Github user jaltekruse commented on a diff in the pull request:

    https://github.com/apache/drill/pull/81#discussion_r35810091
  
    --- Diff: exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java ---
    @@ -119,29 +117,22 @@ public int getVarByteLength(){
       @Override
       public SerializedField getMetadata() {
         return getMetadataBuilder() //
    +             .addChild(offsetVector.getMetadata())
                  .setValueCount(getAccessor().getValueCount()) //
    -             .setVarByteLength(getVarByteLength()) //
    +//             .setVarByteLength(getVarByteLength()) //
    --- End diff --
    
    We should probably at least leave a comment as to why this was here originally and why it was removed and/or may re-enabled at some point. That or just remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-3313: Eliminate redundant #load methods ...

Posted by jaltekruse <gi...@git.apache.org>.
Github user jaltekruse commented on a diff in the pull request:

    https://github.com/apache/drill/pull/81#discussion_r35810016
  
    --- Diff: exec/java-exec/src/main/codegen/templates/NullableValueVectors.java ---
    @@ -239,37 +183,30 @@ public void allocateNew(int valueCount) {
         accessor.reset();
       }
     
    -  /**
    -   * {@inheritDoc}
    -   */
    +  @Override
       public void zeroVector() {
    -    this.values.zeroVector();
    -    this.bits.zeroVector();
    +    bits.zeroVector();
    +    values.zeroVector();
       }
    +  </#if>
     
    -  @Override
    -  public int load(int valueCount, DrillBuf buf){
    -    clear();
    -    int loaded = bits.load(valueCount, buf);
    -
    -    // remove bits part of buffer.
    -    buf = buf.slice(loaded, buf.capacity() - loaded);
    -    loaded += values.load(valueCount, buf);
    -    return loaded;
    -  }
     
       @Override
       public void load(SerializedField metadata, DrillBuf buffer) {
    -    assert this.field.matches(metadata);
    -    int loaded = load(metadata.getValueCount(), buffer);
    -    assert metadata.getBufferLength() == loaded;
    -  }
    +    clear();
    +    final SerializedField bitsField = metadata.getChild(0);
    --- End diff --
    
    I understand previously we weren't storing any metadata for the bit vector, but I think this addition could be a little clearer by at least commenting that these indexes are based on the order of the calls in getMetadataBuilder(). It would probably be worth it to add a comment up on the method itself that these additions are order sensitive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-3313: Eliminate redundant #load methods ...

Posted by jaltekruse <gi...@git.apache.org>.
Github user jaltekruse commented on a diff in the pull request:

    https://github.com/apache/drill/pull/81#discussion_r35811005
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/vector/FixedWidthVector.java ---
    @@ -26,21 +25,8 @@
        *
        * @param valueCount   Number of values in the vector.
        */
    -  public void allocateNew(int valueCount);
    -
    -  /**
    -   * Load the records in the provided buffer based on the given number of values.
    -   * @param valueCount Number of values the buffer contains.
    -   * @param buf Incoming buffer.
    -   * @return The number of bytes of the buffer that were consumed.
    -   */
    -  public int load(int valueCount, DrillBuf buf);
    +  void allocateNew(int valueCount);
     
    +  void zeroVector();
     
    -  public abstract Mutator getMutator();
    -
    -  /**
    -   * Zero out the underlying buffer backing this vector.
    --- End diff --
    
    I think this comment should be left in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-3313: Eliminate redundant #load methods ...

Posted by jaltekruse <gi...@git.apache.org>.
Github user jaltekruse commented on a diff in the pull request:

    https://github.com/apache/drill/pull/81#discussion_r35809948
  
    --- Diff: exec/java-exec/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -156,21 +156,17 @@ public void zeroVector() {
       }
     
       @Override
    -  public int load(int valueCount, DrillBuf buf){
    +  public void load(SerializedField metadata, DrillBuf buffer) {
    +    final int actualLength = metadata.getBufferLength();
    +    final int valueCount = metadata.getValueCount();
    +    final int expectedLength = valueCount * ${type.width};
    +    assert actualLength == expectedLength : String.format("Expected to load %d bytes but actually loaded %d bytes", expectedLength, actualLength);
    +
         clear();
    -    int len = valueCount * ${type.width};
    -    data = buf.slice(0, len);
    +    data = buffer.slice(0, actualLength);
    +    data.writerIndex(actualLength);
         data.retain();
    -    data.writerIndex(len);
    -    return len;
    -  }
    -
    -  @Override
    -  public void load(SerializedField metadata, DrillBuf buffer) {
    -    assert this.field.matches(metadata) : String.format("The field %s doesn't match the provided metadata %s.", this.field, metadata);
    --- End diff --
    
    Are you sure we want to actually remove this assert, are we certain that it is no longer useful for catching inconsistent state problems?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-3313: Eliminate redundant #load methods ...

Posted by jaltekruse <gi...@git.apache.org>.
Github user jaltekruse commented on a diff in the pull request:

    https://github.com/apache/drill/pull/81#discussion_r35816854
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java ---
    @@ -100,7 +100,7 @@ public void setInitialCapacity(int numRecords) {
     
       @Override
       public int getValueCapacity() {
    -    final int offsetValueCapacity = offsets.getValueCapacity() - 1;
    +    final int offsetValueCapacity = Math.max(offsets.getValueCapacity() - 1, 0);
    --- End diff --
    
    Small comment, but there are two patterns used to handle special zero cases in the vector code. The one I have seen is actually for a slightly different case, the example copied below is out of the VarCharVector.setValueCount(int) method. It might be worth making the pattern consistent by using the ternary operator for both of these cases, so the fact that they are distinct is more obvious. This might seem a little contradictory, having them look totally different might be better. However, on first glance I thought this was a different way to express the same concept.
    
    valueCount == 0 ? 0 : valueCount+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-3313: Eliminate redundant #load methods ...

Posted by jaltekruse <gi...@git.apache.org>.
Github user jaltekruse commented on a diff in the pull request:

    https://github.com/apache/drill/pull/81#discussion_r35810123
  
    --- Diff: exec/java-exec/src/main/codegen/templates/VariableLengthVectors.java ---
    @@ -119,29 +117,22 @@ public int getVarByteLength(){
       @Override
       public SerializedField getMetadata() {
         return getMetadataBuilder() //
    +             .addChild(offsetVector.getMetadata())
                  .setValueCount(getAccessor().getValueCount()) //
    -             .setVarByteLength(getVarByteLength()) //
    +//             .setVarByteLength(getVarByteLength()) //
                  .setBufferLength(getBufferSize()) //
                  .build();
       }
     
    -  public int load(int dataBytes, int valueCount, DrillBuf buf){
    -    if(valueCount == 0){
    -      allocateNew(0,0);
    -      return 0;
    -    }
    -    clear();
    -    int loaded = offsetVector.load(valueCount+1, buf);
    -    data = buf.slice(loaded, dataBytes - loaded);
    -    data.retain();
    -    return  dataBytes;
    -  }
    -
       @Override
       public void load(SerializedField metadata, DrillBuf buffer) {
    -    assert this.field.matches(metadata) : String.format("The field %s doesn't match the provided metadata %s.", this.field, metadata);
    -    int loaded = load(metadata.getBufferLength(), metadata.getValueCount(), buffer);
    -    assert metadata.getBufferLength() == loaded : String.format("Expected to load %d bytes but actually loaded %d bytes", metadata.getBufferLength(), loaded);
    +    final SerializedField offsetField = metadata.getChild(0);
    --- End diff --
    
    Same as above in regards to these ordinals and their relationship to getMetadataBuilder().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-3313: Eliminate redundant #load methods ...

Posted by jaltekruse <gi...@git.apache.org>.
Github user jaltekruse commented on a diff in the pull request:

    https://github.com/apache/drill/pull/81#discussion_r35810323
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/vector/BitVector.java ---
    @@ -177,9 +167,16 @@ public boolean copyFromSafe(int inIndex, int outIndex, BitVector from) {
     
       @Override
       public void load(SerializedField metadata, DrillBuf buffer) {
    -    assert this.field.matches(metadata);
    -    int loaded = load(metadata.getValueCount(), buffer);
    -    assert metadata.getBufferLength() == loaded;
    +    assert field.matches(metadata);
    --- End diff --
    
    If you decide to add this back in the case I mentioned above where it was removed, it would probably be worth having them produce the same message when the assert fails.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-3313: Eliminate redundant #load methods ...

Posted by jaltekruse <gi...@git.apache.org>.
Github user jaltekruse commented on the pull request:

    https://github.com/apache/drill/pull/81#issuecomment-126091654
  
    Apologies for the forthcoming double comments. I forgot to make comments on the pull request and instead was viewing the individual commit. These comments are nicely aggregated on the github IU, but are not currently picked up the the JIRA bot to copy the comments to the issue report. I will be duplicating my comments on the same lines they were on the individual commit on the pull request summary diff to make sure they get copied over.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-3313: Eliminate redundant #load methods ...

Posted by hnfgns <gi...@git.apache.org>.
Github user hnfgns commented on a diff in the pull request:

    https://github.com/apache/drill/pull/81#discussion_r35994287
  
    --- Diff: exec/java-exec/src/main/codegen/templates/NullableValueVectors.java ---
    @@ -239,37 +183,30 @@ public void allocateNew(int valueCount) {
         accessor.reset();
       }
     
    -  /**
    -   * {@inheritDoc}
    -   */
    +  @Override
       public void zeroVector() {
    -    this.values.zeroVector();
    -    this.bits.zeroVector();
    +    bits.zeroVector();
    +    values.zeroVector();
       }
    +  </#if>
     
    -  @Override
    -  public int load(int valueCount, DrillBuf buf){
    -    clear();
    -    int loaded = bits.load(valueCount, buf);
    -
    -    // remove bits part of buffer.
    -    buf = buf.slice(loaded, buf.capacity() - loaded);
    -    loaded += values.load(valueCount, buf);
    -    return loaded;
    -  }
     
       @Override
       public void load(SerializedField metadata, DrillBuf buffer) {
    -    assert this.field.matches(metadata);
    -    int loaded = load(metadata.getValueCount(), buffer);
    -    assert metadata.getBufferLength() == loaded;
    -  }
    +    clear();
    +    final SerializedField bitsField = metadata.getChild(0);
    --- End diff --
    
    That would be a really nice addition to VV documentation. Thanks for pointing me this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---