You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by GitBox <gi...@apache.org> on 2022/10/17 11:43:22 UTC

[GitHub] [orc] deshanxiao opened a new pull request, #1278: ORC-1289: Refine DynamicArray

deshanxiao opened a new pull request, #1278:
URL: https://github.com/apache/orc/pull/1278

   ### What changes were proposed in this pull request?
   Refine DynamicIntArray and DynamicByteArray
   
   ### Why are the changes needed?
   This PR introduces the following changes:
   1. Removed unused methods DynamicByteArray#readAll
   2. Use Arrays.fill to replace the original implementation
   3. In DynamicIntArray#getSizeInBytes, the method should return long to avoid out of bounds.
   4. Update some comments.
   
   
   ### How was this patch tested?
   UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] deshanxiao commented on a diff in pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
deshanxiao commented on code in PR #1278:
URL: https://github.com/apache/orc/pull/1278#discussion_r997662624


##########
java/core/src/java/org/apache/orc/impl/DynamicIntArray.java:
##########
@@ -22,14 +22,14 @@
 /**
  * Dynamic int array that uses primitive types and chunks to avoid copying
  * large number of integers when it resizes.
- *
+ * <p>
  * The motivation for this class is memory optimization, i.e. space efficient
  * storage of potentially huge arrays without good a-priori size guesses.
- *
+ * <p>
  * The API of this class is between a primitive array and a AbstractList. It's
  * not a Collection implementation because it handles primitive types, but the
  * API could be extended to support iterators and the like.
- *
+ * <p>

Review Comment:
   Yes, will remove it and create a new one for it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #1278:
URL: https://github.com/apache/orc/pull/1278#discussion_r997284976


##########
java/core/src/java/org/apache/orc/impl/DynamicByteArray.java:
##########
@@ -122,29 +121,6 @@ public int add(byte[] value, int valueOffset, int valueLength) {
     return result;
   }
 
-  /**
-   * Read the entire stream into this array.
-   * @param in the stream to read from
-   * @throws IOException
-   */
-  public void readAll(InputStream in) throws IOException {
-    int currentChunk = length / chunkSize;
-    int currentOffset = length % chunkSize;
-    grow(currentChunk);
-    int currentLength = in.read(data[currentChunk], currentOffset,
-      chunkSize - currentOffset);
-    while (currentLength > 0) {
-      length += currentLength;
-      currentOffset = length % chunkSize;
-      if (currentOffset == 0) {
-        currentChunk = length / chunkSize;
-        grow(currentChunk);
-      }
-      currentLength = in.read(data[currentChunk], currentOffset,
-        chunkSize - currentOffset);
-    }
-  }
-

Review Comment:
   Sorry but I'm -1 for removing like this at 1.9.0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #1278:
URL: https://github.com/apache/orc/pull/1278#discussion_r997280101


##########
java/core/src/java/org/apache/orc/impl/DynamicByteArray.java:
##########
@@ -190,9 +166,7 @@ public int size() {
    */
   public void clear() {
     length = 0;
-    for(int i=0; i < data.length; ++i) {
-      data[i] = null;
-    }
+    Arrays.fill(data, null);

Review Comment:
   This is orthogonal to redefine something, @deshanxiao .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #1278:
URL: https://github.com/apache/orc/pull/1278#discussion_r997285956


##########
java/core/src/java/org/apache/orc/impl/DynamicIntArray.java:
##########
@@ -22,14 +22,14 @@
 /**
  * Dynamic int array that uses primitive types and chunks to avoid copying
  * large number of integers when it resizes.
- *
+ * <p>
  * The motivation for this class is memory optimization, i.e. space efficient
  * storage of potentially huge arrays without good a-priori size guesses.
- *
+ * <p>
  * The API of this class is between a primitive array and a AbstractList. It's
  * not a Collection implementation because it handles primitive types, but the
  * API could be extended to support iterators and the like.
- *
+ * <p>

Review Comment:
   Do you think you can proceed this kind of the documentation fixes as an independent PR which is global to all modules?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] deshanxiao commented on a diff in pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
deshanxiao commented on code in PR #1278:
URL: https://github.com/apache/orc/pull/1278#discussion_r997662396


##########
java/core/src/java/org/apache/orc/impl/DynamicByteArray.java:
##########
@@ -122,29 +121,6 @@ public int add(byte[] value, int valueOffset, int valueLength) {
     return result;
   }
 
-  /**
-   * Read the entire stream into this array.
-   * @param in the stream to read from
-   * @throws IOException
-   */
-  public void readAll(InputStream in) throws IOException {
-    int currentChunk = length / chunkSize;
-    int currentOffset = length % chunkSize;
-    grow(currentChunk);
-    int currentLength = in.read(data[currentChunk], currentOffset,
-      chunkSize - currentOffset);
-    while (currentLength > 0) {
-      length += currentLength;
-      currentOffset = length % chunkSize;
-      if (currentOffset == 0) {
-        currentChunk = length / chunkSize;
-        grow(currentChunk);
-      }
-      currentLength = in.read(data[currentChunk], currentOffset,
-        chunkSize - currentOffset);
-    }
-  }
-

Review Comment:
   Sorry, will remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #1278:
URL: https://github.com/apache/orc/pull/1278#discussion_r997286747


##########
java/core/src/java/org/apache/orc/impl/DynamicIntArray.java:
##########
@@ -137,8 +135,8 @@ public String toString() {
     return sb.append('}').toString();
   }
 
-  public int getSizeInBytes() {
-    return 4 * initializedChunks * chunkSize;
+  public long getSizeInBytes() {

Review Comment:
   From this PR, I believe this is the valid one, but could you make a test case for the claim?
   > In DynamicIntArray#getSizeInBytes, the method should return long to avoid out of bounds.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on a diff in pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #1278:
URL: https://github.com/apache/orc/pull/1278#discussion_r997280577


##########
java/core/src/java/org/apache/orc/impl/DynamicIntArray.java:
##########
@@ -113,9 +113,7 @@ public int size() {
 
   public void clear() {
     length = 0;
-    for(int i=0; i < data.length; ++i) {
-      data[i] = null;
-    }
+    Arrays.fill(data, null);

Review Comment:
   This is orthogonal to redefine something, @deshanxiao .



##########
java/core/src/java/org/apache/orc/impl/DynamicIntArray.java:
##########
@@ -113,9 +113,7 @@ public int size() {
 
   public void clear() {
     length = 0;
-    for(int i=0; i < data.length; ++i) {
-      data[i] = null;
-    }
+    Arrays.fill(data, null);

Review Comment:
   This is orthogonal to redefine something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] deshanxiao commented on a diff in pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
deshanxiao commented on code in PR #1278:
URL: https://github.com/apache/orc/pull/1278#discussion_r997672524


##########
java/core/src/java/org/apache/orc/impl/DynamicIntArray.java:
##########
@@ -137,8 +135,8 @@ public String toString() {
     return sb.append('}').toString();
   }
 
-  public int getSizeInBytes() {
-    return 4 * initializedChunks * chunkSize;
+  public long getSizeInBytes() {

Review Comment:
   Sure



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] deshanxiao commented on pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
deshanxiao commented on PR #1278:
URL: https://github.com/apache/orc/pull/1278#issuecomment-1284860203

   > Could you rebase this PR to the `main` branch, @deshanxiao ?
   
   Hi @dongjoon-hyun 
   I tried adding this unit test, but it requires a lot of memory(2GB * 4 =  8GB).
   I'm worried that this will cause problems with the whole test system.
   But it's easy to reproduce in DynamicByteArray because it requires less memory.
   
   do you have any good suggestions?
   
   ```
     @Test
     public void testIntArray2() throws Exception {
       DynamicIntArray array = new DynamicIntArray(10);
       array.set(Integer.MAX_VALUE, 10);
   
       assertEquals(1L + Integer.MAX_VALUE, array.size()) ;
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] deshanxiao commented on pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
deshanxiao commented on PR #1278:
URL: https://github.com/apache/orc/pull/1278#issuecomment-1286456065

   Turn it off before thinking about how to do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] deshanxiao commented on pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
deshanxiao commented on PR #1278:
URL: https://github.com/apache/orc/pull/1278#issuecomment-1281736281

   Thank you @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] deshanxiao commented on a diff in pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
deshanxiao commented on code in PR #1278:
URL: https://github.com/apache/orc/pull/1278#discussion_r997680916


##########
java/core/src/java/org/apache/orc/impl/DynamicByteArray.java:
##########
@@ -190,9 +166,7 @@ public int size() {
    */
   public void clear() {
     length = 0;
-    for(int i=0; i < data.length; ++i) {
-      data[i] = null;
-    }
+    Arrays.fill(data, null);

Review Comment:
   I will remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] deshanxiao commented on a diff in pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
deshanxiao commented on code in PR #1278:
URL: https://github.com/apache/orc/pull/1278#discussion_r997673694


##########
java/core/src/java/org/apache/orc/impl/DynamicByteArray.java:
##########
@@ -190,9 +166,7 @@ public int size() {
    */
   public void clear() {
     length = 0;
-    for(int i=0; i < data.length; ++i) {
-      data[i] = null;
-    }
+    Arrays.fill(data, null);

Review Comment:
   Yes, it is equivalent to the previous one. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] deshanxiao closed pull request #1278: ORC-1289: Refine DynamicArray

Posted by GitBox <gi...@apache.org>.
deshanxiao closed pull request #1278: ORC-1289: Refine DynamicArray
URL: https://github.com/apache/orc/pull/1278


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org