You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by "郭亚峰(默岭)" <ya...@alibaba-inc.com> on 2016/11/24 09:04:09 UTC

回复:how to use key-value storage like redis with PCollection?

Hi Jean-Baptiste,
Morning.I have quite similiar with Jing's case. I wanna join some relatively static data from HBase (which were bulk loaded everyday) in an unbounded pipeline. I'd like take a look at your code for a reference. I checked your github but couldn't found anything close to RedisIO you mentioned. Did I overlook anything? or could you send me a link to your RedisIO.
thanks a lot.Ya-Feng
------------------------------------------------------------------发件人:Jean-Baptiste Onofré <jb...@nanthrax.net>发送时间:2016年11月22日(星期二) 03:29收件人:user <us...@beam.incubator.apache.org>主 题:Re: how to use key-value storage like redis with PCollection?
Hi Amir,

I'm working on MqttIO right now, I will push the RedisIO on my github 
just after.

I will let you know.

Regards
JB

On 11/21/2016 08:13 PM, amir bahmanyari wrote:
> Am very curious about the RedisIO() example you mentioned JB...
> Thanks !
>
>
> ------------------------------------------------------------------------
> *From:* Lukasz Cwik <lc...@google.com>
> *To:* user@beam.incubator.apache.org
> *Sent:* Monday, November 21, 2016 5:42 AM
> *Subject:* Re: how to use key-value storage like redis with PCollection?
>
> Have you taken a look at the PCollectionView?
>
> It allows you to use various views of a PCollection from within a DoFn.
> This
> <https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L461> is
> a short example where a multimap view is used to join two PCollections.
> In your pipeline you would have the bounded PCollection used as a map or
> multimap view. You would then use a DoFn that had a main input with an
> unbounded PCollection and a side input of the view.
>
> On Mon, Nov 21, 2016 at 3:28 AM, Jean-Baptiste Onofré <jb@nanthrax.net
> <ma...@nanthrax.net>> wrote:
>
>     Sure, it's on a private repo, let me push on the public one.
>
>     I will let you know as soon as it's done.
>
>     Thanks !
>     Regards
>     JB
>
>     On 11/21/2016 10:25 AM, 陈竞 wrote:
>
>         ok, thank you very much. Could you show me your branch address?
>
>         2016-11-21 17:20 GMT+08:00 Jean-Baptiste Onofré <jb@nanthrax.net
>         <ma...@nanthrax.net>
>         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>:
>
>             I have an example, but with the RedisIO.
>
>             So, if you are interested, I can share my branch.
>
>             Regards
>             JB
>
>             On 11/21/2016 10:18 AM, 陈竞 wrote:
>
>                 could you show the example code of redis query with
>         PCollection?
>
>                 2016-11-21 16:41 GMT+08:00 Jean-Baptiste Onofré
>         <jb@nanthrax.net <ma...@nanthrax.net>
>                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>>:
>
>
>                     Hi,
>
>                     you can convert your PCollection<KV<?,?>> to a
>                 PCollection<POJO> and
>                     then create a DoFn to do the query.
>
>                     By the way, I have a RedisIO mostly ready.
>
>                     Regards
>                     JB
>
>
>                     On 11/21/2016 09:14 AM, 陈竞 wrote:
>
>                         my dataflow case is like that:
>                         stream:
>                         a stream want to query some data from redis with
>         a key,
>
>                         batch:
>                         a table left join another table in with a key
>
>                         i want to unify the two sence above by a
>         transform like
>                 MapJoin,
>                         so i
>                         need to use
>                         PCollection to represent the data in redis, but the
>                 question is that
>                         PCollection has no interface to make PCollection
>                 queryable, so
>                         is there
>                         any solution for my case?
>
>
>                     --
>                     Jean-Baptiste Onofré
>                     jbonofre@apache.org <ma...@apache.org>
>         <mailto:jbonofre@apache.org <ma...@apache.org>>
>                 <mailto:jbonofre@apache.org <ma...@apache.org>
>         <mailto:jbonofre@apache.org <ma...@apache.org>>>
>                     http://blog.nanthrax.net <http://blog.nanthrax.net/>
>                     Talend - http://www.talend.com <http://www.talend.com/>
>
>
>
>
>                 --
>                 陈竞,中科院计算技术研究所,高性能计算机中心
>                 Jing Chen HPCC.ICT.AC <http://hpcc.ict.ac/>
>         <http://HPCC.ICT.AC <http://hpcc.ict.ac/>> <http://HPCC.ICT.AC
>         <http://hpcc.ict.ac/>>
>                 China
>
>
>             --
>             Jean-Baptiste Onofré
>             jbonofre@apache.org <ma...@apache.org>
>         <mailto:jbonofre@apache.org <ma...@apache.org>>
>             http://blog.nanthrax.net <http://blog.nanthrax.net/>
>             Talend - http://www.talend.com <http://www.talend.com/>
>
>
>
>
>         --
>         陈竞,中科院计算技术研究所,高性能计算机中心
>         Jing Chen HPCC.ICT.AC <http://hpcc.ict.ac/> <http://HPCC.ICT.AC
>         <http://hpcc.ict.ac/>> China
>
>
>     --
>     Jean-Baptiste Onofré
>     jbonofre@apache.org <ma...@apache.org>
>     http://blog.nanthrax.net <http://blog.nanthrax.net/>
>     Talend - http://www.talend.com <http://www.talend.com/>
>
>
>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: 回复:how to use key-value storage like redis with PCollection?

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Ya-Feng,

The RedisIO is on my private repo (not yet public). I will push on 
public repo asap.

Regards
JB

On 11/24/2016 10:04 AM, \u90ed\u4e9a\u5cf0(\u9ed8\u5cad) wrote:
> Hi Jean-Baptiste,
>
> Morning.
> I have quite similiar with Jing's case. I wanna join some relatively
> static data from HBase (which were bulk loaded everyday) in an unbounded
> pipeline. I'd like take a look at your code for a reference. I checked
> your github but couldn't found anything close to RedisIO you mentioned.
> Did I overlook anything? or could you send me a link to your RedisIO.
>
> thanks a lot.
> Ya-Feng
>
>     ------------------------------------------------------------------
>     \u53d1\u4ef6\u4eba\uff1aJean-Baptiste Onofr� <jb...@nanthrax.net>
>     \u53d1\u9001\u65f6\u95f4\uff1a2016\u5e7411\u670822\u65e5(\u661f\u671f\u4e8c) 03:29
>     \u6536\u4ef6\u4eba\uff1auser <us...@beam.incubator.apache.org>
>     \u4e3b\u3000\u9898\uff1aRe: how to use key-value storage like redis with PCollection?
>
>     Hi Amir,
>
>     I'm working on MqttIO right now, I will push the RedisIO on my github
>     just after.
>
>     I will let you know.
>
>     Regards
>     JB
>
>     On 11/21/2016 08:13 PM, amir bahmanyari wrote:
>     > Am very curious about the RedisIO() example you mentioned JB...
>     > Thanks !
>     >
>     >
>     > ------------------------------------------------------------------------
>     > *From:* Lukasz Cwik <lc...@google.com>
>     > *To:* user@beam.incubator.apache.org
>     > *Sent:* Monday, November 21, 2016 5:42 AM
>     > *Subject:* Re: how to use key-value storage like redis with PCollection?
>     >
>     > Have you taken a look at the PCollectionView?
>     >
>     > It allows you to use various views of a PCollection from within a DoFn.
>     > This
>     > <https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L461> is
>     > a short example where a multimap view is used to join two PCollections.
>     > In your pipeline you would have the bounded PCollection used as a map or
>     > multimap view. You would then use a DoFn that had a main input with an
>     > unbounded PCollection and a side input of the view.
>     >
>     > On Mon, Nov 21, 2016 at 3:28 AM, Jean-Baptiste Onofr� <jb@nanthrax.net
>     > <ma...@nanthrax.net>> wrote:
>     >
>     >     Sure, it's on a private repo, let me push on the public one.
>     >
>     >     I will let you know as soon as it's done.
>     >
>     >     Thanks !
>     >     Regards
>     >     JB
>     >
>     >     On 11/21/2016 10:25 AM, \u9648\u7ade wrote:
>     >
>     >         ok, thank you very much. Could you show me your branch address?
>     >
>     >         2016-11-21 17:20 GMT+08:00 Jean-Baptiste Onofr� <jb@nanthrax.net
>     >         <ma...@nanthrax.net>
>     >         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>:
>     >
>     >             I have an example, but with the RedisIO.
>     >
>     >             So, if you are interested, I can share my branch.
>     >
>     >             Regards
>     >             JB
>     >
>     >             On 11/21/2016 10:18 AM, \u9648\u7ade wrote:
>     >
>     >                 could you show the example code of redis query with
>     >         PCollection?
>     >
>     >                 2016-11-21 16:41 GMT+08:00 Jean-Baptiste Onofr�
>     >         <jb@nanthrax.net <ma...@nanthrax.net>
>     >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>     >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>     >         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>>:
>     >
>     >
>     >                     Hi,
>     >
>     >                     you can convert your PCollection<KV<?,?>> to a
>     >                 PCollection<POJO> and
>     >                     then create a DoFn to do the query.
>     >
>     >                     By the way, I have a RedisIO mostly ready.
>     >
>     >                     Regards
>     >                     JB
>     >
>     >
>     >                     On 11/21/2016 09:14 AM, \u9648\u7ade wrote:
>     >
>     >                         my dataflow case is like that:
>     >                         stream:
>     >                         a stream want to query some data from redis with
>     >         a key,
>     >
>     >                         batch:
>     >                         a table left join another table in with a key
>     >
>     >                         i want to unify the two sence above by a
>     >         transform like
>     >                 MapJoin,
>     >                         so i
>     >                         need to use
>     >                         PCollection to represent the data in redis, but the
>     >                 question is that
>     >                         PCollection has no interface to make PCollection
>     >                 queryable, so
>     >                         is there
>     >                         any solution for my case?
>     >
>     >
>     >                     --
>     >                     Jean-Baptiste Onofr�
>     >                     jbonofre@apache.org <ma...@apache.org>
>     >         <mailto:jbonofre@apache.org <ma...@apache.org>>
>     >                 <mailto:jbonofre@apache.org <ma...@apache.org>
>     >         <mailto:jbonofre@apache.org <ma...@apache.org>>>
>     >                     http://blog.nanthrax.net <http://blog.nanthrax.net/>
>     >                     Talend - http://www.talend.com <http://www.talend.com/>
>     >
>     >
>     >
>     >
>     >                 --
>     >                 \u9648\u7ade\uff0c\u4e2d\u79d1\u9662\u8ba1\u7b97\u6280\u672f\u7814\u7a76\u6240\uff0c\u9ad8\u6027\u80fd\u8ba1\u7b97\u673a\u4e2d\u5fc3
>     >                 Jing Chen HPCC.ICT.AC <http://hpcc.ict.ac/>
>     >         <http://HPCC.ICT.AC <http://hpcc.ict.ac/>> <http://HPCC.ICT.AC
>     >         <http://hpcc.ict.ac/>>
>     >                 China
>     >
>     >
>     >             --
>     >             Jean-Baptiste Onofr�
>     >             jbonofre@apache.org <ma...@apache.org>
>     >         <mailto:jbonofre@apache.org <ma...@apache.org>>
>     >             http://blog.nanthrax.net <http://blog.nanthrax.net/>
>     >             Talend - http://www.talend.com <http://www.talend.com/>
>     >
>     >
>     >
>     >
>     >         --
>     >         \u9648\u7ade\uff0c\u4e2d\u79d1\u9662\u8ba1\u7b97\u6280\u672f\u7814\u7a76\u6240\uff0c\u9ad8\u6027\u80fd\u8ba1\u7b97\u673a\u4e2d\u5fc3
>     >         Jing Chen HPCC.ICT.AC <http://hpcc.ict.ac/> <http://HPCC.ICT.AC
>     >         <http://hpcc.ict.ac/>> China
>     >
>     >
>     >     --
>     >     Jean-Baptiste Onofr�
>     >     jbonofre@apache.org <ma...@apache.org>
>     >     http://blog.nanthrax.net <http://blog.nanthrax.net/>
>     >     Talend - http://www.talend.com <http://www.talend.com/>
>     >
>     >
>     >
>     >
>
>     --
>     Jean-Baptiste Onofr�
>     jbonofre@apache.org
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: 回复:how to use key-value storage like redis with PCollection?

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi,

I created a PullRequest with the RedisIO/RedisPubSubIO:

https://github.com/apache/beam/pull/1687

You have two IOs available: RedisIO to deal with key-value pairs store, 
and RedisPubSubIO for Redis PubSub.

I will update the PR today or tomorrow with:
- complete of RedisCluster (especially for the sharding in RedisIO)
- support of List, Set, Hash, Z* key-value pairs. Right now, RedisIO 
only deals with String key-value pairs.

Regards
JB

On 11/24/2016 10:04 AM, \u90ed\u4e9a\u5cf0(\u9ed8\u5cad) wrote:
> Hi Jean-Baptiste,
>
> Morning.
> I have quite similiar with Jing's case. I wanna join some relatively
> static data from HBase (which were bulk loaded everyday) in an unbounded
> pipeline. I'd like take a look at your code for a reference. I checked
> your github but couldn't found anything close to RedisIO you mentioned.
> Did I overlook anything? or could you send me a link to your RedisIO.
>
> thanks a lot.
> Ya-Feng
>
>     ------------------------------------------------------------------
>     \u53d1\u4ef6\u4eba\uff1aJean-Baptiste Onofr� <jb...@nanthrax.net>
>     \u53d1\u9001\u65f6\u95f4\uff1a2016\u5e7411\u670822\u65e5(\u661f\u671f\u4e8c) 03:29
>     \u6536\u4ef6\u4eba\uff1auser <us...@beam.incubator.apache.org>
>     \u4e3b\u3000\u9898\uff1aRe: how to use key-value storage like redis with PCollection?
>
>     Hi Amir,
>
>     I'm working on MqttIO right now, I will push the RedisIO on my github
>     just after.
>
>     I will let you know.
>
>     Regards
>     JB
>
>     On 11/21/2016 08:13 PM, amir bahmanyari wrote:
>     > Am very curious about the RedisIO() example you mentioned JB...
>     > Thanks !
>     >
>     >
>     > ------------------------------------------------------------------------
>     > *From:* Lukasz Cwik <lc...@google.com>
>     > *To:* user@beam.incubator.apache.org
>     > *Sent:* Monday, November 21, 2016 5:42 AM
>     > *Subject:* Re: how to use key-value storage like redis with PCollection?
>     >
>     > Have you taken a look at the PCollectionView?
>     >
>     > It allows you to use various views of a PCollection from within a DoFn.
>     > This
>     > <https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L461> is
>     > a short example where a multimap view is used to join two PCollections.
>     > In your pipeline you would have the bounded PCollection used as a map or
>     > multimap view. You would then use a DoFn that had a main input with an
>     > unbounded PCollection and a side input of the view.
>     >
>     > On Mon, Nov 21, 2016 at 3:28 AM, Jean-Baptiste Onofr� <jb@nanthrax.net
>     > <ma...@nanthrax.net>> wrote:
>     >
>     >     Sure, it's on a private repo, let me push on the public one.
>     >
>     >     I will let you know as soon as it's done.
>     >
>     >     Thanks !
>     >     Regards
>     >     JB
>     >
>     >     On 11/21/2016 10:25 AM, \u9648\u7ade wrote:
>     >
>     >         ok, thank you very much. Could you show me your branch address?
>     >
>     >         2016-11-21 17:20 GMT+08:00 Jean-Baptiste Onofr� <jb@nanthrax.net
>     >         <ma...@nanthrax.net>
>     >         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>:
>     >
>     >             I have an example, but with the RedisIO.
>     >
>     >             So, if you are interested, I can share my branch.
>     >
>     >             Regards
>     >             JB
>     >
>     >             On 11/21/2016 10:18 AM, \u9648\u7ade wrote:
>     >
>     >                 could you show the example code of redis query with
>     >         PCollection?
>     >
>     >                 2016-11-21 16:41 GMT+08:00 Jean-Baptiste Onofr�
>     >         <jb@nanthrax.net <ma...@nanthrax.net>
>     >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>     >                 <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>     >         <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>>:
>     >
>     >
>     >                     Hi,
>     >
>     >                     you can convert your PCollection<KV<?,?>> to a
>     >                 PCollection<POJO> and
>     >                     then create a DoFn to do the query.
>     >
>     >                     By the way, I have a RedisIO mostly ready.
>     >
>     >                     Regards
>     >                     JB
>     >
>     >
>     >                     On 11/21/2016 09:14 AM, \u9648\u7ade wrote:
>     >
>     >                         my dataflow case is like that:
>     >                         stream:
>     >                         a stream want to query some data from redis with
>     >         a key,
>     >
>     >                         batch:
>     >                         a table left join another table in with a key
>     >
>     >                         i want to unify the two sence above by a
>     >         transform like
>     >                 MapJoin,
>     >                         so i
>     >                         need to use
>     >                         PCollection to represent the data in redis, but the
>     >                 question is that
>     >                         PCollection has no interface to make PCollection
>     >                 queryable, so
>     >                         is there
>     >                         any solution for my case?
>     >
>     >
>     >                     --
>     >                     Jean-Baptiste Onofr�
>     >                     jbonofre@apache.org <ma...@apache.org>
>     >         <mailto:jbonofre@apache.org <ma...@apache.org>>
>     >                 <mailto:jbonofre@apache.org <ma...@apache.org>
>     >         <mailto:jbonofre@apache.org <ma...@apache.org>>>
>     >                     http://blog.nanthrax.net <http://blog.nanthrax.net/>
>     >                     Talend - http://www.talend.com <http://www.talend.com/>
>     >
>     >
>     >
>     >
>     >                 --
>     >                 \u9648\u7ade\uff0c\u4e2d\u79d1\u9662\u8ba1\u7b97\u6280\u672f\u7814\u7a76\u6240\uff0c\u9ad8\u6027\u80fd\u8ba1\u7b97\u673a\u4e2d\u5fc3
>     >                 Jing Chen HPCC.ICT.AC <http://hpcc.ict.ac/>
>     >         <http://HPCC.ICT.AC <http://hpcc.ict.ac/>> <http://HPCC.ICT.AC
>     >         <http://hpcc.ict.ac/>>
>     >                 China
>     >
>     >
>     >             --
>     >             Jean-Baptiste Onofr�
>     >             jbonofre@apache.org <ma...@apache.org>
>     >         <mailto:jbonofre@apache.org <ma...@apache.org>>
>     >             http://blog.nanthrax.net <http://blog.nanthrax.net/>
>     >             Talend - http://www.talend.com <http://www.talend.com/>
>     >
>     >
>     >
>     >
>     >         --
>     >         \u9648\u7ade\uff0c\u4e2d\u79d1\u9662\u8ba1\u7b97\u6280\u672f\u7814\u7a76\u6240\uff0c\u9ad8\u6027\u80fd\u8ba1\u7b97\u673a\u4e2d\u5fc3
>     >         Jing Chen HPCC.ICT.AC <http://hpcc.ict.ac/> <http://HPCC.ICT.AC
>     >         <http://hpcc.ict.ac/>> China
>     >
>     >
>     >     --
>     >     Jean-Baptiste Onofr�
>     >     jbonofre@apache.org <ma...@apache.org>
>     >     http://blog.nanthrax.net <http://blog.nanthrax.net/>
>     >     Talend - http://www.talend.com <http://www.talend.com/>
>     >
>     >
>     >
>     >
>
>     --
>     Jean-Baptiste Onofr�
>     jbonofre@apache.org
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com