You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uniffle.apache.org by GitBox <gi...@apache.org> on 2022/08/06 02:51:57 UTC

[GitHub] [incubator-uniffle] jerqi opened a new issue, #134: [Improvement] Support more tasks of the application

jerqi opened a new issue, #134:
URL: https://github.com/apache/incubator-uniffle/issues/134

   The current blockId is designed as following:
   ```
    // BlockId is long and composed by partitionId, executorId and AtomicInteger
    // AtomicInteger is first 19 bit, max value is 2^19 - 1
    // partitionId is next 24 bit, max value is 2^24 - 1
    // taskAttemptId is rest of 20 bit, max value is 2^20 - 1
   ```
   **Why we need blockId?**
   It's designed for data check, filter, memory data read, etc.
   
   **Why blockId is designed as above?**
   BlockId will be stored in Shuffle server, to reduce memory cost. Roaringbitmap is used to cache it.
   According to implementation of Roaringbitmap, the design of BlockId is target to use `BitmapContainer` instead of `ArrayContainer` for memory saving.
   
   **What's the problem of blockId?**
   It can't support taskId which is greater than 2^20 - 1
   
   **Proposal**
   I think the first 19 bit is too much for atomic int, and we can leverage some of them for taskId.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Improvement] Support more tasks of the application [incubator-uniffle]

Posted by "EnricoMi (via GitHub)" <gi...@apache.org>.
EnricoMi commented on issue #134:
URL: https://github.com/apache/incubator-uniffle/issues/134#issuecomment-1915509797

   > If spark.rss.writer.buffer.size is set to the default value of 3m, then the data written by a taskAttempt does not exceed 3m*2^19=1536G
   
   You also have to take `spark.rss.writer.buffer.spill.size` (default of 128m) into consideration. Once that amount of memory is occupied by all buffers (one buffer per partition), **all** buffers will be sent to the server, no matter how full the buffer are. If you map to 100,000 partitions you will flush 100,000 buffers each having 1.28 kByte. This will cause high sequence numbers. This will then at most work with 0,64G.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on issue #134: [Improvement] Support more tasks of the application

Posted by GitBox <gi...@apache.org>.
leixm commented on issue #134:
URL: https://github.com/apache/incubator-uniffle/issues/134#issuecomment-1227987891

   It looks like taskAttemptId has been modified from 2^20 to 2^21, but the comment for org.apache.uniffle.client.util.ClientUtils has not been modified.
   org.apache.uniffle.common.util.Constants
   `public static final int TASK_ATTEMPT_ID_MAX_LENGTH = 21;`
   `public static final int ATOMIC_INT_MAX_LENGTH = 18;`
   org.apache.uniffle.client.util.ClientUtils
   `// AtomicInteger is first 19 bit, max value is 2^19 - 1`
   `// taskAttemptId is rest of 20 bit, max value is 2^20 - 1`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #134: [Improvement] Support more tasks of the application

Posted by GitBox <gi...@apache.org>.
jerqi commented on issue #134:
URL: https://github.com/apache/incubator-uniffle/issues/134#issuecomment-1228009519

   > It looks like taskAttemptId has been modified from 2^20 to 2^21, but the comment for org.apache.uniffle.client.util.ClientUtils has not been modified. org.apache.uniffle.common.util.Constants `public static final int TASK_ATTEMPT_ID_MAX_LENGTH = 21;` `public static final int ATOMIC_INT_MAX_LENGTH = 18;` org.apache.uniffle.client.util.ClientUtils `// AtomicInteger is first 19 bit, max value is 2^19 - 1` `// taskAttemptId is rest of 20 bit, max value is 2^20 - 1`
   
   You're right.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #134: [Improvement] Support more tasks of the application

Posted by GitBox <gi...@apache.org>.
jerqi commented on issue #134:
URL: https://github.com/apache/incubator-uniffle/issues/134#issuecomment-1228010095

   > Currently we can support 2^20 tasks, which is not a small number. If spark.rss.writer.buffer.size is set to the default value of 3m, then the data written by a taskAttempt does not exceed 3m*2^19=1536G It can be guaranteed, how to measure whether AtomicInteger can be reduced? How much can be reduced?
   
   It depends on the production environment application.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] leixm commented on issue #134: [Improvement] Support more tasks of the application

Posted by GitBox <gi...@apache.org>.
leixm commented on issue #134:
URL: https://github.com/apache/incubator-uniffle/issues/134#issuecomment-1227470284

   Currently we can support 2^20 tasks, which is not a small number. If spark.rss.writer.buffer.size is set to the default value of 3m, then the data written by a taskAttempt does not exceed 3m*2^19=1536G It can be guaranteed, how to measure whether AtomicInteger can be reduced? How much can be reduced?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Improvement] Support more tasks of the application [incubator-uniffle]

Posted by "EnricoMi (via GitHub)" <gi...@apache.org>.
EnricoMi commented on issue #134:
URL: https://github.com/apache/incubator-uniffle/issues/134#issuecomment-1916251014

   > It looks like taskAttemptId has been modified from 2^20 to 2^21
   
   Fixed in #1492.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


Re: [I] [Improvement] Support more tasks of the application [incubator-uniffle]

Posted by "zuston (via GitHub)" <gi...@apache.org>.
zuston closed issue #134: [Improvement] Support more tasks of the application
URL: https://github.com/apache/incubator-uniffle/issues/134


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org