You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Russell Alexander Spitzer (JIRA)" <ji...@apache.org> on 2015/12/07 19:59:11 UTC

[jira] [Commented] (TINKERPOP-1017) Get InputRDDFormat to work with Multiple Splits

    [ https://issues.apache.org/jira/browse/TINKERPOP-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045489#comment-15045489 ] 

Russell Alexander Spitzer commented on TINKERPOP-1017:
------------------------------------------------------

Is the goal here to draw from a remote RDD into a an Iterator on the machine client? Because I think you've done that it about the best way it can be done currently with Spark :) 

The underlying code for that is 
{code}
    def collectPartition(p: Int): Array[T] = {
      sc.runJob(this, (iter: Iterator[T]) => iter.toArray, Seq(p)).head
    }
    (0 until partitions.length).iterator.flatMap(i => collectPartition(i))
{code}

This should pull an entire task into memory (it won't actually stream it except for one "task" at a time) so for many uses cases this means filling a giant block of memory one at a time. I'm not sure there is a much better way to do this in Spark ATM.

> Get InputRDDFormat to work with Multiple Splits
> -----------------------------------------------
>
>                 Key: TINKERPOP-1017
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1017
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.1.1-incubating
>            Reporter: Marko A. Rodriguez
>
> {{InputFormatRDD}} was recently added to enable {{HadoopGraph}} to OLTP stream in {{InputRDD}} data. It is currently single threaded. I tried to make it parallel, but ran into some {{Exceptions}} I didn't understand. For OLTP it doesn't matter, however, it would be good to make it work with multiple Hadoop {{InputSplits}} and then, Hadoop could read from Spark in OLAP too :). I don't know why that would ever be used... ? But if its easy enough to do, just do it.
> [~rspitzer] --- When https://issues.apache.org/jira/browse/TINKERPOP-1011 you will see {{InputFormatRDD}}. You might have an idea on how to do this. If you care -- no worries though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)