You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/05 13:30:04 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request #23977: [WIP][SPARK-26923][SQL][R] Refactor ArrowRRunner and RRunner to share one BaseRRunner

HyukjinKwon opened a new pull request #23977: [WIP][SPARK-26923][SQL][R] Refactor ArrowRRunner and RRunner to share one BaseRRunner
URL: https://github.com/apache/spark/pull/23977
 
 
   ## What changes were proposed in this pull request?
   
   This PR proposes to have one base R runner. 
   
   In the high level,
   
   Previously, it had `ArrowRRunner` and it inherited `RRunner`:
   
   ```
   └── RRunner
       └── ArrowRRunner
   ```
   
   After this PR, now it has a `BaseRRunner`, and `ArrowRRunner` and `RRunner` inherit `BaseRRunner`:
   
   ```
   └── BaseRRunner
       ├── ArrowRRunner
       └── RRunner
   ```
   
   This way is consistent with Python's.
   
   In more details, see below:
   
   ```scala
   class BaseRRunner[IN, OUT] {
   
     def compute: Iterator[OUT] = {
       ...
       newWriterThread(...)
       ...
       newReaderIterator(...)
       ...
     }
   
     // Make a thread that writes data from JVM to R process
     abstract protected def newWriterThread(..., iter: Iterator[IN], ...): WriterThread
   
     // Make an iterator that reads data from the R process to JVM
     abstract protected def newReaderIterator(...): ReaderIterator
   
     abstract class WriterThread(..., iter: Iterator[IN], ...) extends Thread {
       override def run(): Unit {
         ...
         writeIteratorToStream(...)
         ...
       }
   
       // Actually writing logic to the socket stream.
       abstract protected def writeIteratorToStream(dataOut: DataOutputStream): Unit
     }
   
     abstract class ReaderIterator extends Iterator[OUT] {
       override def hasNext: Boolean = {
   
       }
   
       override def next: OUT = {
         ...
         hasNext
         ...    
       }
   
       // Actually reading logic from the socket stream.
       abstract protected def read: OUT
     }
   }
   ```
   
   ```scala
   case RRunner extends BaseRRunner {
     override def newWriterThread(...) {
       new WriterThread(...) {
         override def writeIteratorToStream {
           ...
         }
       }
     }
   
     override def newReaderIterator(...) {
       new ReaderIterator(...) {
         override def read {
           ...
         }
       }
     }
   }
   ```
   
   ## How was this patch tested?
   
   Manually tested and existing tests should cover.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org