You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by paul-rogers <gi...@git.apache.org> on 2017/05/11 19:46:41 UTC

[GitHub] drill pull request #832: DRILL-5504: Vector validator to diagnose offset vec...

GitHub user paul-rogers opened a pull request:

    https://github.com/apache/drill/pull/832

    DRILL-5504: Vector validator to diagnose offset vector issues

    Validates offset vectors in VarChar and repeated vectors. Validates the
    special case of repeated VarChar vectors (two layers of offsets.)
    
    Provides two new session variables to turn on validation. One enables
    the existing operator (iterator) validation, the other adds vector
    validation. This allows validation to occur in a “production” Drill
    (without restarting Drill with assertions, as previously required.)
    
    Unit tests validate the validator. Another test validates the
    integration, but requires manual steps, so is ignored by default.
    
    This version is first-cut: all work is done within a single class.
    Allows back-porting to an earlier version to solve a specific issues. A
    revision should move some of the work into generated code (or refactor
    vectors to allow outside access), since offset vectors appear for each
    subclass; not on a base class that would allow generic operations.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/paul-rogers/drill DRILL-5504

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/832.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #832
    
----
commit 175e592419ca6bda1fd0259cc42b033616facc3d
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-05-11T19:46:15Z

    DRILL-5504: Vector validator to diagnose offset vector issues
    
    Validates offset vectors in VarChar and repeated vectors. Validates the
    special case of repeated VarChar vectors (two layers of offsets.)
    
    Provides two new session variables to turn on validation. One enables
    the existing operator (iterator) validation, the other adds vector
    validation. This allows validation to occur in a “production” Drill
    (without restarting Drill with assertions, as previously required.)
    
    Unit tests validate the validator. Another test validates the
    integration, but requires manual steps, so is ignored by default.
    
    This version is first-cut: all work is done within a single class.
    Allows back-porting to an earlier version to solve a specific issues. A
    revision should move some of the work into generated code (or refactor
    vectors to allow outside access), since offset vectors appear for each
    subclass; not on a base class that would allow generic operations.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #832: DRILL-5504: Vector validator to diagnose offset vec...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/832


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #832: DRILL-5504: Vector validator to diagnose offset vec...

Posted by sudheeshkatkam <gi...@git.apache.org>.
Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/832#discussion_r116094668
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java ---
    @@ -0,0 +1,205 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + ******************************************************************************/
    +package org.apache.drill.exec.physical.impl.validate;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.drill.exec.record.SimpleVectorWrapper;
    +import org.apache.drill.exec.record.VectorAccessible;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.vector.BaseDataValueVector;
    +import org.apache.drill.exec.vector.FixedWidthVector;
    +import org.apache.drill.exec.vector.NullableVarCharVector;
    +import org.apache.drill.exec.vector.NullableVector;
    +import org.apache.drill.exec.vector.RepeatedVarCharVector;
    +import org.apache.drill.exec.vector.UInt4Vector;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.VarCharVector;
    +import org.apache.drill.exec.vector.VariableWidthVector;
    +import org.apache.drill.exec.vector.complex.BaseRepeatedValueVector;
    +import org.apache.drill.exec.vector.complex.RepeatedFixedWidthVectorLike;
    +
    +
    +/**
    + * Validate a batch of value vectors. It is not possible to validate the
    + * data, but we can validate the structure, especially offset vectors.
    + * Only handles single (non-hyper) vectors at present. Current form is
    + * self-contained. Better checks can be done by moving checks inside
    + * vectors or by exposing more metadata from vectors.
    + */
    +
    +public class BatchValidator {
    +  private static final org.slf4j.Logger logger =
    +      org.slf4j.LoggerFactory.getLogger(BatchValidator.class);
    +
    +  public static final int MAX_ERRORS = 100;
    +
    +  private final int rowCount;
    +  private final VectorAccessible batch;
    +  private final List<String> errorList;
    +  private int errorCount;
    +
    +  public BatchValidator(VectorAccessible batch) {
    +    rowCount = batch.getRecordCount();
    +    this.batch = batch;
    +    errorList = null;
    +  }
    +
    +  public BatchValidator(VectorAccessible batch, boolean captureErrors) {
    +    rowCount = batch.getRecordCount();
    +    this.batch = batch;
    +    if (captureErrors) {
    +      errorList = new ArrayList<>();
    +    } else {
    +      errorList = null;
    +    }
    +  }
    +
    +  public void validate() {
    --- End diff --
    
    Just a thought. Is there a way to enable these checks (and fail if invalid) for pre-commit tests as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #832: DRILL-5504: Vector validator to diagnose offset vector iss...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/832
  
    Fixed typo in log message and rebased onto latest master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #832: DRILL-5504: Vector validator to diagnose offset vec...

Posted by sudheeshkatkam <gi...@git.apache.org>.
Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/832#discussion_r116092613
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java ---
    @@ -0,0 +1,205 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + ******************************************************************************/
    +package org.apache.drill.exec.physical.impl.validate;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.drill.exec.record.SimpleVectorWrapper;
    +import org.apache.drill.exec.record.VectorAccessible;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.vector.BaseDataValueVector;
    +import org.apache.drill.exec.vector.FixedWidthVector;
    +import org.apache.drill.exec.vector.NullableVarCharVector;
    +import org.apache.drill.exec.vector.NullableVector;
    +import org.apache.drill.exec.vector.RepeatedVarCharVector;
    +import org.apache.drill.exec.vector.UInt4Vector;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.VarCharVector;
    +import org.apache.drill.exec.vector.VariableWidthVector;
    +import org.apache.drill.exec.vector.complex.BaseRepeatedValueVector;
    +import org.apache.drill.exec.vector.complex.RepeatedFixedWidthVectorLike;
    +
    +
    +/**
    + * Validate a batch of value vectors. It is not possible to validate the
    + * data, but we can validate the structure, especially offset vectors.
    + * Only handles single (non-hyper) vectors at present. Current form is
    + * self-contained. Better checks can be done by moving checks inside
    + * vectors or by exposing more metadata from vectors.
    + */
    +
    +public class BatchValidator {
    +  private static final org.slf4j.Logger logger =
    +      org.slf4j.LoggerFactory.getLogger(BatchValidator.class);
    +
    +  public static final int MAX_ERRORS = 100;
    +
    +  private final int rowCount;
    +  private final VectorAccessible batch;
    +  private final List<String> errorList;
    +  private int errorCount;
    +
    +  public BatchValidator(VectorAccessible batch) {
    +    rowCount = batch.getRecordCount();
    +    this.batch = batch;
    +    errorList = null;
    +  }
    +
    +  public BatchValidator(VectorAccessible batch, boolean captureErrors) {
    +    rowCount = batch.getRecordCount();
    +    this.batch = batch;
    +    if (captureErrors) {
    +      errorList = new ArrayList<>();
    +    } else {
    +      errorList = null;
    +    }
    +  }
    +
    +  public void validate() {
    +    for (VectorWrapper<? extends ValueVector> w : batch) {
    +      validateWrapper(w);
    +    }
    +  }
    +
    +  private void validateWrapper(VectorWrapper<? extends ValueVector> w) {
    +    if (w instanceof SimpleVectorWrapper) {
    +      validateVector(w.getValueVector());
    +    }
    --- End diff --
    
    You mentioned above that HyperVectorWrapper is not validated. Can you open a ticket for the functionality to-be-implemented in this validator?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #832: DRILL-5504: Vector validator to diagnose offset vector iss...

Posted by sudheeshkatkam <gi...@git.apache.org>.
Github user sudheeshkatkam commented on the issue:

    https://github.com/apache/drill/pull/832
  
    +1
    
    Please squash the commits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #832: DRILL-5504: Vector validator to diagnose offset vec...

Posted by sudheeshkatkam <gi...@git.apache.org>.
Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/832#discussion_r116091914
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
    @@ -449,4 +449,19 @@
       String PERSISTENT_TABLE_UMASK = "exec.persistent_table.umask";
       StringValidator PERSISTENT_TABLE_UMASK_VALIDATOR = new StringValidator(PERSISTENT_TABLE_UMASK, "002");
     
    +  /**
    +   * When iterator validation is enabled, additionally validates the vectors in
    +   * each batch passed to each iterator.
    +   */
    +  String ENABLE_VECTOR_VALIDATION = "debug.validate_vectors";
    +  BooleanValidator ENABLE_VECTOR_VALIDATOR = new BooleanValidator(ENABLE_VECTOR_VALIDATION, true);
    --- End diff --
    
    false, by default, here and below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #832: DRILL-5504: Vector validator to diagnose offset vec...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/832#discussion_r117362498
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java ---
    @@ -0,0 +1,205 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + ******************************************************************************/
    +package org.apache.drill.exec.physical.impl.validate;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.drill.exec.record.SimpleVectorWrapper;
    +import org.apache.drill.exec.record.VectorAccessible;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.vector.BaseDataValueVector;
    +import org.apache.drill.exec.vector.FixedWidthVector;
    +import org.apache.drill.exec.vector.NullableVarCharVector;
    +import org.apache.drill.exec.vector.NullableVector;
    +import org.apache.drill.exec.vector.RepeatedVarCharVector;
    +import org.apache.drill.exec.vector.UInt4Vector;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.VarCharVector;
    +import org.apache.drill.exec.vector.VariableWidthVector;
    +import org.apache.drill.exec.vector.complex.BaseRepeatedValueVector;
    +import org.apache.drill.exec.vector.complex.RepeatedFixedWidthVectorLike;
    +
    +
    +/**
    + * Validate a batch of value vectors. It is not possible to validate the
    + * data, but we can validate the structure, especially offset vectors.
    + * Only handles single (non-hyper) vectors at present. Current form is
    + * self-contained. Better checks can be done by moving checks inside
    + * vectors or by exposing more metadata from vectors.
    + */
    +
    +public class BatchValidator {
    +  private static final org.slf4j.Logger logger =
    +      org.slf4j.LoggerFactory.getLogger(BatchValidator.class);
    +
    +  public static final int MAX_ERRORS = 100;
    +
    +  private final int rowCount;
    +  private final VectorAccessible batch;
    +  private final List<String> errorList;
    +  private int errorCount;
    +
    +  public BatchValidator(VectorAccessible batch) {
    +    rowCount = batch.getRecordCount();
    +    this.batch = batch;
    +    errorList = null;
    +  }
    +
    +  public BatchValidator(VectorAccessible batch, boolean captureErrors) {
    +    rowCount = batch.getRecordCount();
    +    this.batch = batch;
    +    if (captureErrors) {
    +      errorList = new ArrayList<>();
    +    } else {
    +      errorList = null;
    +    }
    +  }
    +
    +  public void validate() {
    +    for (VectorWrapper<? extends ValueVector> w : batch) {
    +      validateWrapper(w);
    +    }
    +  }
    +
    +  private void validateWrapper(VectorWrapper<? extends ValueVector> w) {
    +    if (w instanceof SimpleVectorWrapper) {
    +      validateVector(w.getValueVector());
    +    }
    --- End diff --
    
    Done. See DRILL-5526.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #832: DRILL-5504: Vector validator to diagnose offset vec...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/832#discussion_r117358723
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
    @@ -449,4 +449,19 @@
       String PERSISTENT_TABLE_UMASK = "exec.persistent_table.umask";
       StringValidator PERSISTENT_TABLE_UMASK_VALIDATOR = new StringValidator(PERSISTENT_TABLE_UMASK, "002");
     
    +  /**
    +   * When iterator validation is enabled, additionally validates the vectors in
    +   * each batch passed to each iterator.
    +   */
    +  String ENABLE_VECTOR_VALIDATION = "debug.validate_vectors";
    +  BooleanValidator ENABLE_VECTOR_VALIDATOR = new BooleanValidator(ENABLE_VECTOR_VALIDATION, true);
    --- End diff --
    
    Good catch. Fixed.
    
    But, that error actually accidentally caught a bug...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #832: DRILL-5504: Vector validator to diagnose offset vec...

Posted by sudheeshkatkam <gi...@git.apache.org>.
Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/832#discussion_r116092232
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ImplCreator.java ---
    @@ -69,9 +70,18 @@ public static RootExec getExec(FragmentContext context, FragmentRoot root) throw
         Preconditions.checkNotNull(root);
         Preconditions.checkNotNull(context);
     
    -    if (AssertionUtil.isAssertionsEnabled()) {
    +    // Enable iterator (operator) validation if assertions are enabled (debug mode)
    +    // or if in production mode and the ENABLE_ITERATOR_VALIDATION option is set
    +    // to true.
    +
    +    boolean enableValidation = AssertionUtil.isAssertionsEnabled();
    +    if (! enableValidation) {
    +      enableValidation = context.getOptionSet().getOption(ExecConstants.ENABLE_ITERATOR_VALIDATOR);
    +    }
    +    if (enableValidation) {
    --- End diff --
    
    ```
    if (AssertionUtil.isAssertionsEnabled() ||  
         context.getOptionSet().getOption(ExecConstants.ENABLE_ITERATOR_VALIDATOR) { ... }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #832: DRILL-5504: Vector validator to diagnose offset vector iss...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/832
  
    Commits squashed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #832: DRILL-5504: Vector validator to diagnose offset vec...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/832#discussion_r117359430
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ImplCreator.java ---
    @@ -69,9 +70,18 @@ public static RootExec getExec(FragmentContext context, FragmentRoot root) throw
         Preconditions.checkNotNull(root);
         Preconditions.checkNotNull(context);
     
    -    if (AssertionUtil.isAssertionsEnabled()) {
    +    // Enable iterator (operator) validation if assertions are enabled (debug mode)
    +    // or if in production mode and the ENABLE_ITERATOR_VALIDATION option is set
    +    // to true.
    +
    +    boolean enableValidation = AssertionUtil.isAssertionsEnabled();
    +    if (! enableValidation) {
    +      enableValidation = context.getOptionSet().getOption(ExecConstants.ENABLE_ITERATOR_VALIDATOR);
    +    }
    +    if (enableValidation) {
    --- End diff --
    
    Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #832: DRILL-5504: Vector validator to diagnose offset vec...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/832#discussion_r117361619
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java ---
    @@ -0,0 +1,205 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + ******************************************************************************/
    +package org.apache.drill.exec.physical.impl.validate;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.drill.exec.record.SimpleVectorWrapper;
    +import org.apache.drill.exec.record.VectorAccessible;
    +import org.apache.drill.exec.record.VectorWrapper;
    +import org.apache.drill.exec.vector.BaseDataValueVector;
    +import org.apache.drill.exec.vector.FixedWidthVector;
    +import org.apache.drill.exec.vector.NullableVarCharVector;
    +import org.apache.drill.exec.vector.NullableVector;
    +import org.apache.drill.exec.vector.RepeatedVarCharVector;
    +import org.apache.drill.exec.vector.UInt4Vector;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.VarCharVector;
    +import org.apache.drill.exec.vector.VariableWidthVector;
    +import org.apache.drill.exec.vector.complex.BaseRepeatedValueVector;
    +import org.apache.drill.exec.vector.complex.RepeatedFixedWidthVectorLike;
    +
    +
    +/**
    + * Validate a batch of value vectors. It is not possible to validate the
    + * data, but we can validate the structure, especially offset vectors.
    + * Only handles single (non-hyper) vectors at present. Current form is
    + * self-contained. Better checks can be done by moving checks inside
    + * vectors or by exposing more metadata from vectors.
    + */
    +
    +public class BatchValidator {
    +  private static final org.slf4j.Logger logger =
    +      org.slf4j.LoggerFactory.getLogger(BatchValidator.class);
    +
    +  public static final int MAX_ERRORS = 100;
    +
    +  private final int rowCount;
    +  private final VectorAccessible batch;
    +  private final List<String> errorList;
    +  private int errorCount;
    +
    +  public BatchValidator(VectorAccessible batch) {
    +    rowCount = batch.getRecordCount();
    +    this.batch = batch;
    +    errorList = null;
    +  }
    +
    +  public BatchValidator(VectorAccessible batch, boolean captureErrors) {
    +    rowCount = batch.getRecordCount();
    +    this.batch = batch;
    +    if (captureErrors) {
    +      errorList = new ArrayList<>();
    +    } else {
    +      errorList = null;
    +    }
    +  }
    +
    +  public void validate() {
    --- End diff --
    
    Great idea! Added a config option that forces vector validation. Add the following to the pom.xml file in the Surefire options:
    
    {code}
    -Ddrill.exec.debug.validate_vectors=true
    {code}
    
    Will try this out and enable the checks as a different JIRA ticket and PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---