You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by prasadns14 <gi...@git.apache.org> on 2017/09/11 01:27:58 UTC

[GitHub] drill pull request #939: DRILL-5550: Missing CSV column value set to null

GitHub user prasadns14 opened a pull request:

    https://github.com/apache/drill/pull/939

    DRILL-5550: Missing CSV column value set to null

    1) Changed the CSV datatype to nullable VARCHAR
    2) Missing CSV column value set to null
    3) Unit test added

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/prasadns14/drill DRILL-5550

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/939.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #939
    
----
commit 0171696142c3d1843b92dae61d21c190b7913c01
Author: Prasad Subramanya <pr...@psubram-1010.local>
Date:   2017-09-11T01:23:06Z

    DRILL-5550: Missing CSV column value set to null

----


---

[GitHub] drill pull request #939: DRILL-5550: Missing CSV column value set to null

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/939#discussion_r138685942
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/FieldVarCharOutput.java ---
    @@ -45,11 +45,11 @@
       static final String COL_NAME = "columns";
     
       // array of output vector
    -  private final VarCharVector [] vectors;
    +  private final NullableVarCharVector [] vectors;
    --- End diff --
    
    Let's think about this. CSV is defined as a fixed set of columns. For this reason, the original code used non-nullable values for each column.
    
    The question would be, what about this fix means that defined columns should now become null? Are there other solutions to this bug?
    
    Changing the column from required to nullable can have impact on the user experience and on the client: where code once expected required columns, now that code must change to handle nullable types.


---

[GitHub] drill pull request #939: DRILL-5550: Missing CSV column value set to null

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/939#discussion_r138687170
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/FieldVarCharOutput.java ---
    @@ -211,10 +210,10 @@ public void finishRecord() {
           endField();
         }
     
    -    // Fill in null (really empty) values.
    +    // Fill in null values
     
         for (int i = 0; i < nullCols.length; i++) {
    -      vectors[nullCols[i]].getMutator().setSafe(recordCount, nullValue, 0, 0);
    +      vectors[nullCols[i]].getMutator().setNull(recordCount);
    --- End diff --
    
    Here we are combining null columns and data columns into a single array. Perhaps we should have an array of projected columns (non-null), and a separate array of null columns (optional varChar).
    
    Note that "memory fragmentation" project has fixed this in the general case. In that solution, the solution is similar to that described above: projected columns are required, missing columns are nullable.


---

[GitHub] drill issue #939: DRILL-5550: Missing CSV column value set to null

Posted by prasadns14 <gi...@git.apache.org>.
Github user prasadns14 commented on the issue:

    https://github.com/apache/drill/pull/939
  
    @paul-rogers Can you please review the PR?


---