You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by ri...@sina.cn on 2016/09/06 12:56:29 UTC

回复：Re: 回复：Re: fromParallelCollection

my data from a Hbase table ,it is like a List[rowkey,Map[String,String]],
class MySplittableIterator extends SplittableIterator[String]{
    

     // Members declared in java.util.Iterator
    def hasNext(): Boolean = {
      
    }
    def next(): Nothing = {
      
    }
  
      // Members declared in org.apache.flink.util.SplittableIterator
     def getMaximumNumberOfSplits(): Int = {
      
    }
     def split(num: Int): Array[Iterator[String]] = {
      
    }
    }

i do not know the methods to write,can you give me a example.
----- 原始邮件 -----
发件人：Timo Walther <tw...@apache.org>
收件人：user@flink.apache.org
主题：Re: 回复：Re: fromParallelCollection
日期：2016年09月06日 17点03分


  
  
    Hi,

      

      you have to implement a class that extends
      
      "org.apache.flink.util.SplittableIterator". The runtime will ask
      this class for multiple "java.util.Iterator"s over your split
      data. How you split your data and how an iterator looks like
      depends on your data and implementation.
      
      

      

      If you need more help, you should show us some examples of your
      data.

      

      Timo

      

      Am 06/09/16 um 09:46 schrieb rimin515@sina.cn:

    
    fromCollection is not parallelization,the data is
      huge,so i want to use env.fromParallelCollection(data),but the
      data i do not know how to initialize,

      
        ----- 原始邮件 -----

          发件人：Maximilian Michels <mx...@apache.org>

          收件人："user@flink.apache.org" <us...@flink.apache.org>,
          rimin515@sina.cn

          主题：Re: fromParallelCollection

          日期：2016年09月05日 16点58分

        
        

        

        Please give us a bit more insight on what you're trying to do.

        On Sat, Sep 3, 2016 at 5:01 AM, <ri...@sina.cn> wrote:

        > Hi，

        > val env =
        StreamExecutionEnvironment.getExecutionEnvironment

        > val tr = env.fromParallelCollection(data)

        >

        > the data i do not know initialize,some one can tell me..

        > --------------------------------

        >

        >

        >

      
    
    

    

    
    -- 
Freundliche Grüße / Kind Regards

Timo Walther 

Follow me: @twalthr
https://www.linkedin.com/in/twalthr

Re: 回复：Re: 回复：Re: fromParallelCollection

Posted by Timo Walther <tw...@apache.org>.

If your data comes from HBase maybe it would also good to implement a 
HBase source. A current HBase sink is in the making: 
https://github.com/apache/flink/pull/2332

Maybe it would be better to save your data in an HDFS (e.g. CSV file) 
and use the built-in "readFile()". This does the parallelism automatically.



Am 06/09/16 um 14:56 schrieb rimin515@sina.cn:
> my data from a Hbase table ,it is like a List[rowkey,Map[String,String]],
> class MySplittableIterator extends SplittableIterator[String]{
>
> // Members declared in java.util.Iterator
> def hasNext(): Boolean = {
>
> }
> def next(): Nothing = {
>
> }
>
> // Members declared in org.apache.flink.util.SplittableIterator
> def getMaximumNumberOfSplits(): Int = {
>
> }
> def split(num: Int): Array[Iterator[String]] = {
>
> }
> }
> i do not know the methods to write,can you give me a example.
> ----- \u539f\u59cb\u90ae\u4ef6 -----
> \u53d1\u4ef6\u4eba\uff1aTimo Walther <tw...@apache.org>
> \u6536\u4ef6\u4eba\uff1auser@flink.apache.org
> \u4e3b\u9898\uff1aRe: \u56de\u590d\uff1aRe: fromParallelCollection
> \u65e5\u671f\uff1a2016\u5e7409\u670806\u65e5 17\u70b903\u5206
>
> Hi,
>
> you have to implement a class that extends 
> "org.apache.flink.util.SplittableIterator". The runtime will ask this 
> class for multiple "java.util.Iterator"s over your split data. How you 
> split your data and how an iterator looks like depends on your data 
> and implementation.
>
> If you need more help, you should show us some examples of your data.
>
> Timo
>
> Am 06/09/16 um 09:46 schrieb rimin515@sina.cn <ma...@sina.cn>:
>> fromCollection is not parallelization,the data is huge,so i want to 
>> use env.fromParallelCollection(data),but the data i do not know how 
>> to initialize,
>> ----- \u539f\u59cb\u90ae\u4ef6 -----
>> \u53d1\u4ef6\u4eba\uff1aMaximilian Michels <mx...@apache.org> <ma...@apache.org>
>> \u6536\u4ef6\u4eba\uff1a"user@flink.apache.org" <ma...@flink.apache.org> 
>> <us...@flink.apache.org> <ma...@flink.apache.org>, 
>> rimin515@sina.cn <ma...@sina.cn>
>> \u4e3b\u9898\uff1aRe: fromParallelCollection
>> \u65e5\u671f\uff1a2016\u5e7409\u670805\u65e5 16\u70b958\u5206
>>
>>
>> Please give us a bit more insight on what you're trying to do.
>> On Sat, Sep 3, 2016 at 5:01 AM, <ri...@sina.cn> 
>> <ma...@sina.cn> wrote:
>> > Hi\uff0c
>> > val env = StreamExecutionEnvironment.getExecutionEnvironment
>> > val tr = env.fromParallelCollection(data)
>> >
>> > the data i do not know initialize,some one can tell me..
>> > --------------------------------
>> >
>> >
>> >
>
>
> -- 
> Freundliche Gr��e / Kind Regards
>
> Timo Walther
>
> Follow me: @twalthr
> https://www.linkedin.com/in/twalthr


-- 
Freundliche Gr��e / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr