You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2020/11/06 16:23:50 UTC

[GitHub] [orc] pgaref opened a new pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

pgaref opened a new pull request #568:
URL: https://github.com/apache/orc/pull/568


   ### What changes were proposed in this pull request?
   Orc type conversion is throwing ArrayIndexOutOfBoundsException when using ColumnVectors larger than 1024.
   The issue is originating in ConvertTreeReaderFactory where we convert types always using the default len instead of the specific batch size.
   
   
   
   
   ### Why are the changes needed?
   This PR takes into account batchSize when converting types and adds tests for all possible type conversions (with large batchSize) as part of TestConvertTreeReaderFactory.
   
   
   ### How was this patch tested?
   TestConvertTreeReaderFactory
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #568:
URL: https://github.com/apache/orc/pull/568#issuecomment-723183635


   Thank you, @pgaref . According to the JIRA, is this a bug for 1.5/1.6/1.7?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref edited a comment on pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
pgaref edited a comment on pull request #568:
URL: https://github.com/apache/orc/pull/568#issuecomment-723187458


   > Thank you, @pgaref . According to the JIRA, is this a bug for 1.5/1.6/1.7?
   
   Hey @dongjoon-hyun  -- thats correct.
   This has to be backported along with ORC-598 (which for some reason never made it to 1.5 and 1.6 branches)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #568:
URL: https://github.com/apache/orc/pull/568#issuecomment-723289411


   Got it. Thanks, @pgaref .
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on a change in pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #568:
URL: https://github.com/apache/orc/pull/568#discussion_r519735828



##########
File path: java/core/src/test/org/apache/orc/impl/TestConvertTreeReaderFactory.java
##########
@@ -32,29 +34,105 @@
 import org.apache.orc.OrcFile;
 import org.apache.orc.Reader;
 import org.apache.orc.RecordReader;
-import org.apache.orc.TestVectorOrcFile;
+import org.apache.orc.TestProlepticConversions;
 import org.apache.orc.TypeDescription;
+import org.apache.orc.Writer;
+import org.junit.Before;
+import org.junit.Rule;
 import org.junit.Test;
+import org.junit.rules.TestName;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.sql.Timestamp;
+import java.text.ParseException;
+import java.text.SimpleDateFormat;
+import java.util.GregorianCalendar;
+import java.util.concurrent.TimeUnit;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;

Review comment:
       Hey @dongjoon-hyun -- you are right, seems that we dont follow a consistent import layout across the project. Switched to java -> others -> static (that seems to be followed in most of the src files)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on a change in pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #568:
URL: https://github.com/apache/orc/pull/568#discussion_r519737609



##########
File path: java/core/src/test/org/apache/orc/impl/TestConvertTreeReaderFactory.java
##########
@@ -65,30 +143,163 @@ public void testArraySizeBiggerThan1024AndConvertToDecimal() throws Exception {
     assertEquals(batch.cols.length, 1);
     assertTrue(batch.cols[0] instanceof ListColumnVector);
     assertEquals(((ListColumnVector) batch.cols[0]).child.getClass(), expectedColumnType);
-    return (TExpectedColumn) ((ListColumnVector) batch.cols[0]).child;
+    return (TExpectedColumnVector) ((ListColumnVector) batch.cols[0]).child;
+  }
+
+  public void testConvertToDecimal() throws Exception {
+    Decimal64ColumnVector columnVector =
+        readORCFileWithLargeArray("decimal(6,1)", Decimal64ColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToVarchar() throws Exception {
+    BytesColumnVector columnVector = readORCFileWithLargeArray("varchar(10)", BytesColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToDouble() throws Exception {
+    DoubleColumnVector columnVector = readORCFileWithLargeArray("double", DoubleColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToInteger() throws Exception {
+    LongColumnVector columnVector = readORCFileWithLargeArray("int", LongColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToTimestamp() throws Exception {
+    TimestampColumnVector columnVector =
+        readORCFileWithLargeArray("timestamp", TimestampColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.time.length);
+  }
+
+  public void testConvertToDate() throws Exception {
+    DateColumnVector columnVector = readORCFileWithLargeArray("date", DateColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
   }
 
   @Test
-  public void testArraySizeBiggerThan1024AndConvertToVarchar() throws Exception {
-    BytesColumnVector columnVector = testArraySizeBiggerThan1024("varchar(10)", BytesColumnVector.class);
-    assertEquals(columnVector.vector.length, 1025);
+  public void testDecimalArrayBiggerThanDefault() throws Exception {
+    String typeStr = "decimal(6,1)";
+    Class typeClass = DecimalColumnVector.class;
+
+    TypeDescription schema = TypeDescription.fromString("struct<col1:array<" + typeStr + ">>");
+    createORCFileWithLargeArray(schema, typeClass, typeClass.equals(Decimal64ColumnVector.class));
+    // Test all possible conversions
+    testConvertToDecimal();
+    testConvertToVarchar();
+    testConvertToDouble();
+    testConvertToInteger();
+    testConvertToTimestamp();
+
+    // Make sure we delete file across tests

Review comment:
       Right, added finally to make sure other tests run in case of a failure (not catching the exception is intentional for testing)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #568:
URL: https://github.com/apache/orc/pull/568#discussion_r519490364



##########
File path: java/core/src/test/org/apache/orc/impl/TestConvertTreeReaderFactory.java
##########
@@ -65,30 +143,163 @@ public void testArraySizeBiggerThan1024AndConvertToDecimal() throws Exception {
     assertEquals(batch.cols.length, 1);
     assertTrue(batch.cols[0] instanceof ListColumnVector);
     assertEquals(((ListColumnVector) batch.cols[0]).child.getClass(), expectedColumnType);
-    return (TExpectedColumn) ((ListColumnVector) batch.cols[0]).child;
+    return (TExpectedColumnVector) ((ListColumnVector) batch.cols[0]).child;
+  }
+
+  public void testConvertToDecimal() throws Exception {
+    Decimal64ColumnVector columnVector =
+        readORCFileWithLargeArray("decimal(6,1)", Decimal64ColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToVarchar() throws Exception {
+    BytesColumnVector columnVector = readORCFileWithLargeArray("varchar(10)", BytesColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToDouble() throws Exception {
+    DoubleColumnVector columnVector = readORCFileWithLargeArray("double", DoubleColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToInteger() throws Exception {
+    LongColumnVector columnVector = readORCFileWithLargeArray("int", LongColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToTimestamp() throws Exception {
+    TimestampColumnVector columnVector =
+        readORCFileWithLargeArray("timestamp", TimestampColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.time.length);
+  }
+
+  public void testConvertToDate() throws Exception {
+    DateColumnVector columnVector = readORCFileWithLargeArray("date", DateColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
   }
 
   @Test
-  public void testArraySizeBiggerThan1024AndConvertToVarchar() throws Exception {
-    BytesColumnVector columnVector = testArraySizeBiggerThan1024("varchar(10)", BytesColumnVector.class);
-    assertEquals(columnVector.vector.length, 1025);
+  public void testDecimalArrayBiggerThanDefault() throws Exception {
+    String typeStr = "decimal(6,1)";
+    Class typeClass = DecimalColumnVector.class;
+
+    TypeDescription schema = TypeDescription.fromString("struct<col1:array<" + typeStr + ">>");
+    createORCFileWithLargeArray(schema, typeClass, typeClass.equals(Decimal64ColumnVector.class));
+    // Test all possible conversions
+    testConvertToDecimal();
+    testConvertToVarchar();
+    testConvertToDouble();
+    testConvertToInteger();
+    testConvertToTimestamp();
+
+    // Make sure we delete file across tests
+    fs.delete(testFilePath, false);
   }
 
   @Test
-  public void testArraySizeBiggerThan1024AndConvertToDouble() throws Exception {
-    DoubleColumnVector columnVector = testArraySizeBiggerThan1024("double", DoubleColumnVector.class);
-    assertEquals(columnVector.vector.length, 1025);
+  public void testDecimal64ArrayBiggerThanDefault() throws Exception {
+    String typeStr = "decimal(6,1)";
+    Class typeClass = Decimal64ColumnVector.class;
+
+    TypeDescription schema = TypeDescription.fromString("struct<col1:array<" + typeStr + ">>");
+    createORCFileWithLargeArray(schema, typeClass, typeClass.equals(Decimal64ColumnVector.class));
+    // Test all possible conversions
+    testConvertToDecimal();
+    testConvertToVarchar();
+    testConvertToDouble();
+    testConvertToInteger();
+    testConvertToTimestamp();
+
+    // Make sure we delete file across tests
+    fs.delete(testFilePath, false);

Review comment:
       ditto. `try ... catch ... finally`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #568:
URL: https://github.com/apache/orc/pull/568#issuecomment-724307042


   Merged to master/1.6.
   
   Could you make a backporting PR to branch-1.5, too?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #568:
URL: https://github.com/apache/orc/pull/568#discussion_r519488593



##########
File path: java/core/src/test/org/apache/orc/impl/TestConvertTreeReaderFactory.java
##########
@@ -32,29 +34,105 @@
 import org.apache.orc.OrcFile;
 import org.apache.orc.Reader;
 import org.apache.orc.RecordReader;
-import org.apache.orc.TestVectorOrcFile;
+import org.apache.orc.TestProlepticConversions;
 import org.apache.orc.TypeDescription;
+import org.apache.orc.Writer;
+import org.junit.Before;
+import org.junit.Rule;
 import org.junit.Test;
+import org.junit.rules.TestName;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.sql.Timestamp;
+import java.text.ParseException;
+import java.text.SimpleDateFormat;
+import java.util.GregorianCalendar;
+import java.util.concurrent.TimeUnit;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;

Review comment:
       Hi, @pgaref . Although this is a test file, the `import` statement order looks a little strange to me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #568:
URL: https://github.com/apache/orc/pull/568#issuecomment-724315353


   Thank you always!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on a change in pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #568:
URL: https://github.com/apache/orc/pull/568#discussion_r519737833



##########
File path: java/core/src/test/org/apache/orc/impl/TestConvertTreeReaderFactory.java
##########
@@ -65,30 +143,163 @@ public void testArraySizeBiggerThan1024AndConvertToDecimal() throws Exception {
     assertEquals(batch.cols.length, 1);
     assertTrue(batch.cols[0] instanceof ListColumnVector);
     assertEquals(((ListColumnVector) batch.cols[0]).child.getClass(), expectedColumnType);
-    return (TExpectedColumn) ((ListColumnVector) batch.cols[0]).child;
+    return (TExpectedColumnVector) ((ListColumnVector) batch.cols[0]).child;
+  }
+
+  public void testConvertToDecimal() throws Exception {
+    Decimal64ColumnVector columnVector =
+        readORCFileWithLargeArray("decimal(6,1)", Decimal64ColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToVarchar() throws Exception {
+    BytesColumnVector columnVector = readORCFileWithLargeArray("varchar(10)", BytesColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToDouble() throws Exception {
+    DoubleColumnVector columnVector = readORCFileWithLargeArray("double", DoubleColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToInteger() throws Exception {
+    LongColumnVector columnVector = readORCFileWithLargeArray("int", LongColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToTimestamp() throws Exception {
+    TimestampColumnVector columnVector =
+        readORCFileWithLargeArray("timestamp", TimestampColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.time.length);
+  }
+
+  public void testConvertToDate() throws Exception {
+    DateColumnVector columnVector = readORCFileWithLargeArray("date", DateColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
   }
 
   @Test
-  public void testArraySizeBiggerThan1024AndConvertToVarchar() throws Exception {
-    BytesColumnVector columnVector = testArraySizeBiggerThan1024("varchar(10)", BytesColumnVector.class);
-    assertEquals(columnVector.vector.length, 1025);
+  public void testDecimalArrayBiggerThanDefault() throws Exception {
+    String typeStr = "decimal(6,1)";
+    Class typeClass = DecimalColumnVector.class;
+
+    TypeDescription schema = TypeDescription.fromString("struct<col1:array<" + typeStr + ">>");
+    createORCFileWithLargeArray(schema, typeClass, typeClass.equals(Decimal64ColumnVector.class));
+    // Test all possible conversions

Review comment:
       sure, updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref edited a comment on pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
pgaref edited a comment on pull request #568:
URL: https://github.com/apache/orc/pull/568#issuecomment-723187458


   > Thank you, @pgaref . According to the JIRA, is this a bug for 1.5/1.6/1.7?
   
   Hey @dongjoon-hyun  -- thats correct.
   This has to be backported along with ORC-598 (which for some reason never made it to 1.5 and 1.6 branches)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #568:
URL: https://github.com/apache/orc/pull/568#issuecomment-723183072


   cc @omalley , @wgtmac 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #568:
URL: https://github.com/apache/orc/pull/568#discussion_r519998678



##########
File path: java/core/src/test/org/apache/orc/impl/TestConvertTreeReaderFactory.java
##########
@@ -32,29 +43,96 @@
 import org.apache.orc.OrcFile;
 import org.apache.orc.Reader;
 import org.apache.orc.RecordReader;
-import org.apache.orc.TestVectorOrcFile;
+import org.apache.orc.TestProlepticConversions;
 import org.apache.orc.TypeDescription;
+import org.apache.orc.Writer;
+import org.junit.Before;
+import org.junit.Rule;
 import org.junit.Test;
+import org.junit.rules.TestName;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
 
 public class TestConvertTreeReaderFactory {
 
-  @Test
-  public void testArraySizeBiggerThan1024AndConvertToDecimal() throws Exception {
-    Decimal64ColumnVector columnVector = testArraySizeBiggerThan1024("decimal(6,1)", Decimal64ColumnVector.class);
-    assertEquals(columnVector.vector.length, 1025);
+  private Path workDir =
+      new Path(System.getProperty("test.tmp.dir", "target" + File.separator + "test" + File.separator + "tmp"));
+
+  private Configuration conf;
+  private FileSystem fs;
+  private Path testFilePath;
+  private int LARGE_BATCH_SIZE;
+
+  @Rule
+  public TestName testCaseName = new TestName();
+
+  @Before
+  public void setupPath() throws Exception {
+    // Default CV length is 1024
+    this.LARGE_BATCH_SIZE = 1030;
+    this.conf = new Configuration();
+    this.fs = FileSystem.getLocal(conf);
+    this.testFilePath = new Path(workDir, TestWriterImpl.class.getSimpleName() + testCaseName.getMethodName().
+        replaceFirst("\\[[0-9]+]", "") + ".orc");
+    fs.delete(testFilePath, false);
+  }
+
+  public <TExpectedColumnVector extends ColumnVector> TExpectedColumnVector createORCFileWithLargeArray(
+      TypeDescription schema, Class<TExpectedColumnVector> expectedColumnType, boolean useDecimal64)
+      throws IOException, ParseException {
+    conf = new Configuration();
+    fs = FileSystem.getLocal(conf);
+    fs.setWorkingDirectory(workDir);
+    Writer w = OrcFile.createWriter(testFilePath, OrcFile.writerOptions(conf).setSchema(schema));
+
+    SimpleDateFormat dateFormat = TestProlepticConversions.createParser("yyyy-MM-dd", new GregorianCalendar());
+    VectorizedRowBatch batch = schema.createRowBatch(
+        useDecimal64 ? TypeDescription.RowBatchVersion.USE_DECIMAL64 : TypeDescription.RowBatchVersion.ORIGINAL,
+        LARGE_BATCH_SIZE);
+
+    ListColumnVector listCol = (ListColumnVector) batch.cols[0];
+    TExpectedColumnVector dcv = (TExpectedColumnVector) (listCol).child;
+    batch.size = 1;
+    for (int row = 0; row < LARGE_BATCH_SIZE; ++row) {
+      if (dcv instanceof DecimalColumnVector) {
+        ((DecimalColumnVector) dcv).set(row, HiveDecimal.create(row * 2 + 1));
+      } else if (dcv instanceof DoubleColumnVector) {
+        ((DoubleColumnVector) dcv).vector[row] = row * 2 + 1;
+      } else if (dcv instanceof BytesColumnVector) {
+        ((BytesColumnVector) dcv).setVal(row, ((row * 2 + 1) + "").getBytes(StandardCharsets.UTF_8));
+      } else if (dcv instanceof LongColumnVector) {
+        ((LongColumnVector) dcv).vector[row] = row * 2 + 1;
+      } else if (dcv instanceof TimestampColumnVector) {
+        ((TimestampColumnVector) dcv).set(row, Timestamp.valueOf((1900 + row) + "-04-01 12:34:56.9"));
+      } else if (dcv instanceof DateColumnVector) {
+        String date = String.format("%04d-01-23", row * 2 + 1);
+        ((DateColumnVector) dcv).vector[row] = TimeUnit.MILLISECONDS.toDays(dateFormat.parse(date).getTime());
+      } else {
+        throw new IllegalStateException("Writing File with a large array of: "+ expectedColumnType + " not supported!");

Review comment:
       `of:` -> `of` 
   `not supported!` -> `is not supported!`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #568:
URL: https://github.com/apache/orc/pull/568


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #568:
URL: https://github.com/apache/orc/pull/568#discussion_r520043679



##########
File path: java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java
##########
@@ -1396,7 +1396,7 @@ public void nextVector(ColumnVector previousVector,
                            FilterContext filterContext) throws IOException {
       if (inBytesColVector == null) {
         // Allocate column vector for file; cast column vector for reader.
-        inBytesColVector = new BytesColumnVector();
+        inBytesColVector = new BytesColumnVector(batchSize);

Review comment:
       @pgaref . Could you add a test coverage for this change? It seems that we are missing `Varchar` to `Binary` conversion test case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #568:
URL: https://github.com/apache/orc/pull/568#discussion_r519490307



##########
File path: java/core/src/test/org/apache/orc/impl/TestConvertTreeReaderFactory.java
##########
@@ -65,30 +143,163 @@ public void testArraySizeBiggerThan1024AndConvertToDecimal() throws Exception {
     assertEquals(batch.cols.length, 1);
     assertTrue(batch.cols[0] instanceof ListColumnVector);
     assertEquals(((ListColumnVector) batch.cols[0]).child.getClass(), expectedColumnType);
-    return (TExpectedColumn) ((ListColumnVector) batch.cols[0]).child;
+    return (TExpectedColumnVector) ((ListColumnVector) batch.cols[0]).child;
+  }
+
+  public void testConvertToDecimal() throws Exception {
+    Decimal64ColumnVector columnVector =
+        readORCFileWithLargeArray("decimal(6,1)", Decimal64ColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToVarchar() throws Exception {
+    BytesColumnVector columnVector = readORCFileWithLargeArray("varchar(10)", BytesColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToDouble() throws Exception {
+    DoubleColumnVector columnVector = readORCFileWithLargeArray("double", DoubleColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToInteger() throws Exception {
+    LongColumnVector columnVector = readORCFileWithLargeArray("int", LongColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToTimestamp() throws Exception {
+    TimestampColumnVector columnVector =
+        readORCFileWithLargeArray("timestamp", TimestampColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.time.length);
+  }
+
+  public void testConvertToDate() throws Exception {
+    DateColumnVector columnVector = readORCFileWithLargeArray("date", DateColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
   }
 
   @Test
-  public void testArraySizeBiggerThan1024AndConvertToVarchar() throws Exception {
-    BytesColumnVector columnVector = testArraySizeBiggerThan1024("varchar(10)", BytesColumnVector.class);
-    assertEquals(columnVector.vector.length, 1025);
+  public void testDecimalArrayBiggerThanDefault() throws Exception {
+    String typeStr = "decimal(6,1)";
+    Class typeClass = DecimalColumnVector.class;
+
+    TypeDescription schema = TypeDescription.fromString("struct<col1:array<" + typeStr + ">>");
+    createORCFileWithLargeArray(schema, typeClass, typeClass.equals(Decimal64ColumnVector.class));
+    // Test all possible conversions
+    testConvertToDecimal();
+    testConvertToVarchar();
+    testConvertToDouble();
+    testConvertToInteger();
+    testConvertToTimestamp();
+
+    // Make sure we delete file across tests

Review comment:
       Without `try ... catch .. finally`, this will fail when one of the test fails.

##########
File path: java/core/src/test/org/apache/orc/impl/TestConvertTreeReaderFactory.java
##########
@@ -65,30 +143,163 @@ public void testArraySizeBiggerThan1024AndConvertToDecimal() throws Exception {
     assertEquals(batch.cols.length, 1);
     assertTrue(batch.cols[0] instanceof ListColumnVector);
     assertEquals(((ListColumnVector) batch.cols[0]).child.getClass(), expectedColumnType);
-    return (TExpectedColumn) ((ListColumnVector) batch.cols[0]).child;
+    return (TExpectedColumnVector) ((ListColumnVector) batch.cols[0]).child;
+  }
+
+  public void testConvertToDecimal() throws Exception {
+    Decimal64ColumnVector columnVector =
+        readORCFileWithLargeArray("decimal(6,1)", Decimal64ColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToVarchar() throws Exception {
+    BytesColumnVector columnVector = readORCFileWithLargeArray("varchar(10)", BytesColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToDouble() throws Exception {
+    DoubleColumnVector columnVector = readORCFileWithLargeArray("double", DoubleColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToInteger() throws Exception {
+    LongColumnVector columnVector = readORCFileWithLargeArray("int", LongColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToTimestamp() throws Exception {
+    TimestampColumnVector columnVector =
+        readORCFileWithLargeArray("timestamp", TimestampColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.time.length);
+  }
+
+  public void testConvertToDate() throws Exception {
+    DateColumnVector columnVector = readORCFileWithLargeArray("date", DateColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
   }
 
   @Test
-  public void testArraySizeBiggerThan1024AndConvertToVarchar() throws Exception {
-    BytesColumnVector columnVector = testArraySizeBiggerThan1024("varchar(10)", BytesColumnVector.class);
-    assertEquals(columnVector.vector.length, 1025);
+  public void testDecimalArrayBiggerThanDefault() throws Exception {
+    String typeStr = "decimal(6,1)";
+    Class typeClass = DecimalColumnVector.class;
+
+    TypeDescription schema = TypeDescription.fromString("struct<col1:array<" + typeStr + ">>");
+    createORCFileWithLargeArray(schema, typeClass, typeClass.equals(Decimal64ColumnVector.class));
+    // Test all possible conversions
+    testConvertToDecimal();
+    testConvertToVarchar();
+    testConvertToDouble();
+    testConvertToInteger();
+    testConvertToTimestamp();
+
+    // Make sure we delete file across tests

Review comment:
       Without `try ... catch .. finally`, this will fail when one of the tests fails.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref edited a comment on pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
pgaref edited a comment on pull request #568:
URL: https://github.com/apache/orc/pull/568#issuecomment-723187458


   > Thank you, @pgaref . According to the JIRA, is this a bug for 1.5/1.6/1.7?
   
   Hey @dongjoon-hyun  -- thats correct.
   This has to be backported along with ORC-598 (which for some reason was never made it to 1.5 and 1.6 branches)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on a change in pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #568:
URL: https://github.com/apache/orc/pull/568#discussion_r520054180



##########
File path: java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java
##########
@@ -1396,7 +1396,7 @@ public void nextVector(ColumnVector previousVector,
                            FilterContext filterContext) throws IOException {
       if (inBytesColVector == null) {
         // Allocate column vector for file; cast column vector for reader.
-        inBytesColVector = new BytesColumnVector();
+        inBytesColVector = new BytesColumnVector(batchSize);

Review comment:
       Good catch @dongjoon-hyun ! test added




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #568:
URL: https://github.com/apache/orc/pull/568#issuecomment-723291111


   I backported ORC-598 to branch-1.6/1.5 for next releases.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #568:
URL: https://github.com/apache/orc/pull/568#issuecomment-723187458


   > Thank you, @pgaref . According to the JIRA, is this a bug for 1.5/1.6/1.7?
   
   Hey @dongjoon-hyun  -- thats correct.
   This has to be backported along with ORC-598 (which for some reason was never backported)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #568:
URL: https://github.com/apache/orc/pull/568#issuecomment-724312869


   > Merged to master/1.6.
   > 
   > Could you make a backporting PR to branch-1.5, too?
   
   @dongjoon-hyun  Thanks for taking care of this -- 1.5 backport available at https://github.com/apache/orc/pull/569


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #568: ORC-672: FIX ORC type conversion within arrays where array.length > 1…

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #568:
URL: https://github.com/apache/orc/pull/568#discussion_r519490137



##########
File path: java/core/src/test/org/apache/orc/impl/TestConvertTreeReaderFactory.java
##########
@@ -65,30 +143,163 @@ public void testArraySizeBiggerThan1024AndConvertToDecimal() throws Exception {
     assertEquals(batch.cols.length, 1);
     assertTrue(batch.cols[0] instanceof ListColumnVector);
     assertEquals(((ListColumnVector) batch.cols[0]).child.getClass(), expectedColumnType);
-    return (TExpectedColumn) ((ListColumnVector) batch.cols[0]).child;
+    return (TExpectedColumnVector) ((ListColumnVector) batch.cols[0]).child;
+  }
+
+  public void testConvertToDecimal() throws Exception {
+    Decimal64ColumnVector columnVector =
+        readORCFileWithLargeArray("decimal(6,1)", Decimal64ColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToVarchar() throws Exception {
+    BytesColumnVector columnVector = readORCFileWithLargeArray("varchar(10)", BytesColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToDouble() throws Exception {
+    DoubleColumnVector columnVector = readORCFileWithLargeArray("double", DoubleColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToInteger() throws Exception {
+    LongColumnVector columnVector = readORCFileWithLargeArray("int", LongColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
+  }
+
+  public void testConvertToTimestamp() throws Exception {
+    TimestampColumnVector columnVector =
+        readORCFileWithLargeArray("timestamp", TimestampColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.time.length);
+  }
+
+  public void testConvertToDate() throws Exception {
+    DateColumnVector columnVector = readORCFileWithLargeArray("date", DateColumnVector.class);
+    assertEquals(LARGE_BATCH_SIZE, columnVector.vector.length);
   }
 
   @Test
-  public void testArraySizeBiggerThan1024AndConvertToVarchar() throws Exception {
-    BytesColumnVector columnVector = testArraySizeBiggerThan1024("varchar(10)", BytesColumnVector.class);
-    assertEquals(columnVector.vector.length, 1025);
+  public void testDecimalArrayBiggerThanDefault() throws Exception {
+    String typeStr = "decimal(6,1)";
+    Class typeClass = DecimalColumnVector.class;
+
+    TypeDescription schema = TypeDescription.fromString("struct<col1:array<" + typeStr + ">>");
+    createORCFileWithLargeArray(schema, typeClass, typeClass.equals(Decimal64ColumnVector.class));
+    // Test all possible conversions

Review comment:
       Does this include all? Could you add the reference comment pointing to the ORC?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org