You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2024/01/02 11:03:08 UTC

[PR] [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream [spark]

beliefer opened a new pull request, #44563:
URL: https://github.com/apache/spark/pull/44563

   ### What changes were proposed in this pull request?
   This PR propose to remove ThreadLocal for ReadAheadInputStream.
   
   
   ### Why are the changes needed?
   ReadAheadInputStream has a field oneByte declared as TheadLocal.
   In fact, oneByte only used in read.
   We can remove it by the way that the closure of local variables in instance methods can provide thread safety guarantees.
   
   On the other hand, the TheadLocal occupies a certain amount of space in the heap and there are allocation and GC costs.
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   
   
   ### How was this patch tested?
   Exists test cases.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   'No'.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #44563:
URL: https://github.com/apache/spark/pull/44563#issuecomment-1875408383

   Thank you, @beliefer and all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #44563:
URL: https://github.com/apache/spark/pull/44563#discussion_r1439418301


##########
core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java:
##########
@@ -247,7 +245,7 @@ public int read() throws IOException {
       // short path - just get one byte.
       return activeBuffer.get() & 0xFF;
     } else {
-      byte[] oneByteArray = oneByte.get();

Review Comment:
   `byte[1]` is very cheap and it could avoid GC.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44563:
URL: https://github.com/apache/spark/pull/44563#discussion_r1439401202


##########
core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java:
##########
@@ -247,7 +245,7 @@ public int read() throws IOException {
       // short path - just get one byte.
       return activeBuffer.get() & 0xFF;
     } else {
-      byte[] oneByteArray = oneByte.get();

Review Comment:
   The original writing seems to be to avoid creating `byte[1]` repeatedly. Have you encountered any bad cases? The new writing seems to create more objects of type `byte[1]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #44563: [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream
URL: https://github.com/apache/spark/pull/44563


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #44563:
URL: https://github.com/apache/spark/pull/44563#issuecomment-1875548508

   @srowen @dongjoon-hyun @LuciferYang Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #44563:
URL: https://github.com/apache/spark/pull/44563#issuecomment-1875402775

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #44563:
URL: https://github.com/apache/spark/pull/44563#discussion_r1439424026


##########
core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java:
##########
@@ -247,7 +245,7 @@ public int read() throws IOException {
       // short path - just get one byte.
       return activeBuffer.get() & 0xFF;
     } else {
-      byte[] oneByteArray = oneByte.get();

Review Comment:
   We can decorate it with `private final` if we really need to avoid creating `byte[1]` repeatedly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #44563:
URL: https://github.com/apache/spark/pull/44563#discussion_r1439418301


##########
core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java:
##########
@@ -247,7 +245,7 @@ public int read() throws IOException {
       // short path - just get one byte.
       return activeBuffer.get() & 0xFF;
     } else {
-      byte[] oneByteArray = oneByte.get();

Review Comment:
   `byte[1]` is very cheap and change it to local variable could avoid GC.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #44563:
URL: https://github.com/apache/spark/pull/44563#issuecomment-1875114597

   ping @dongjoon-hyun @srowen cc @sitalkedia


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream [spark]

Posted by "srowen (via GitHub)" <gi...@apache.org>.
srowen commented on PR #44563:
URL: https://github.com/apache/spark/pull/44563#issuecomment-1875150705

   I agree with this change, I would be surprised if it's even an optimization 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #44563:
URL: https://github.com/apache/spark/pull/44563#issuecomment-1875117596

   The GA failure is unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org