You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by 马国维 <ma...@outlook.com> on 2015/06/28 04:29:40 UTC

Is there Any api that let DataStream join DataSet ?

Hi,everyone:
Is there Any api that let the DataStream join a DataSet ? I have read all the document But I can't find .
If Flink now does not have the api, will Flink support it in the future ?
 thanks a lot!
 		 	   		  

Re: Is there Any api that let DataStream join DataSet ?

Posted by Gyula Fóra <gy...@gmail.com>.
You are right, one cannot use the current window-join implementation to
this.

A workaround is to implement your custom binary stream operator that will
wait until it receives the whole file, then starts joining.
For instance a filestream.connect(streamToJoinWith).flatMap(
CustomCoFlatMap that does the join )

Matthias J. Sax <mj...@informatik.hu-berlin.de> ezt írta (időpont: 2015.
jún. 29., H, 11:40):

> I am wondering what the semantics of a DataStream created from a file
> is. It should be a regular (but finite) stream. From my understanding, a
> Window-Join is defined with some ts-constraint. So the static file part
> will also have this restriction in the join, right? However, a
> file-stream-join should join *all* data from the file with each element
> in the stream... It seems to me, that a file-DataStream would not yield
> this result. Am I wrong?
>
>
> On 06/29/2015 11:00 AM, Stephan Ewen wrote:
> > If you only want to "join" a finite data set (like a file) to a stream,
> you
> > can do that. you can create a DataStream from a (distributed) file.
> >
> > If you want specific batch-api operations, this is still on the roadmap,
> > not in yet, as Marton said.
> >
> > On Sun, Jun 28, 2015 at 10:45 AM, Márton Balassi <
> balassi.marton@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> Flink currently does not have explicit Api support for that, but is
> >> definitely possible to do. In fact Gyula (cc-d) mocked up a prototype
> for a
> >> similar problem some time ago.
> >>
> >> The idea needs some refinement to properly support all the viable use
> cases
> >> though and the streaming Api currently has some more pressing challenges
> >> than this integration. :)
> >>
> >> It's on our roadmap, but is not an immediate task. Could you tell us
> more
> >> about your use case?
> >>
> >> Best,
> >> Marton
> >> On Jun 28, 2015 8:29 AM, "马国维" <ma...@outlook.com> wrote:
> >>
> >>> Hi,everyone:
> >>> Is there Any api that let the DataStream join a DataSet ? I have read
> all
> >>> the document But I can't find .
> >>> If Flink now does not have the api, will Flink support it in the
> future ?
> >>>  thanks a lot!
> >>>
> >>
> >
>
>

Re: Is there Any api that let DataStream join DataSet ?

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
I am wondering what the semantics of a DataStream created from a file
is. It should be a regular (but finite) stream. From my understanding, a
Window-Join is defined with some ts-constraint. So the static file part
will also have this restriction in the join, right? However, a
file-stream-join should join *all* data from the file with each element
in the stream... It seems to me, that a file-DataStream would not yield
this result. Am I wrong?


On 06/29/2015 11:00 AM, Stephan Ewen wrote:
> If you only want to "join" a finite data set (like a file) to a stream, you
> can do that. you can create a DataStream from a (distributed) file.
> 
> If you want specific batch-api operations, this is still on the roadmap,
> not in yet, as Marton said.
> 
> On Sun, Jun 28, 2015 at 10:45 AM, Márton Balassi <ba...@gmail.com>
> wrote:
> 
>> Hi,
>>
>> Flink currently does not have explicit Api support for that, but is
>> definitely possible to do. In fact Gyula (cc-d) mocked up a prototype for a
>> similar problem some time ago.
>>
>> The idea needs some refinement to properly support all the viable use cases
>> though and the streaming Api currently has some more pressing challenges
>> than this integration. :)
>>
>> It's on our roadmap, but is not an immediate task. Could you tell us more
>> about your use case?
>>
>> Best,
>> Marton
>> On Jun 28, 2015 8:29 AM, "马国维" <ma...@outlook.com> wrote:
>>
>>> Hi,everyone:
>>> Is there Any api that let the DataStream join a DataSet ? I have read all
>>> the document But I can't find .
>>> If Flink now does not have the api, will Flink support it in the future ?
>>>  thanks a lot!
>>>
>>
> 


Re: Is there Any api that let DataStream join DataSet ?

Posted by Stephan Ewen <se...@apache.org>.
If you only want to "join" a finite data set (like a file) to a stream, you
can do that. you can create a DataStream from a (distributed) file.

If you want specific batch-api operations, this is still on the roadmap,
not in yet, as Marton said.

On Sun, Jun 28, 2015 at 10:45 AM, Márton Balassi <ba...@gmail.com>
wrote:

> Hi,
>
> Flink currently does not have explicit Api support for that, but is
> definitely possible to do. In fact Gyula (cc-d) mocked up a prototype for a
> similar problem some time ago.
>
> The idea needs some refinement to properly support all the viable use cases
> though and the streaming Api currently has some more pressing challenges
> than this integration. :)
>
> It's on our roadmap, but is not an immediate task. Could you tell us more
> about your use case?
>
> Best,
> Marton
> On Jun 28, 2015 8:29 AM, "马国维" <ma...@outlook.com> wrote:
>
> > Hi,everyone:
> > Is there Any api that let the DataStream join a DataSet ? I have read all
> > the document But I can't find .
> > If Flink now does not have the api, will Flink support it in the future ?
> >  thanks a lot!
> >
>

Re: Is there Any api that let DataStream join DataSet ?

Posted by Márton Balassi <ba...@gmail.com>.
Hi,

Flink currently does not have explicit Api support for that, but is
definitely possible to do. In fact Gyula (cc-d) mocked up a prototype for a
similar problem some time ago.

The idea needs some refinement to properly support all the viable use cases
though and the streaming Api currently has some more pressing challenges
than this integration. :)

It's on our roadmap, but is not an immediate task. Could you tell us more
about your use case?

Best,
Marton
On Jun 28, 2015 8:29 AM, "马国维" <ma...@outlook.com> wrote:

> Hi,everyone:
> Is there Any api that let the DataStream join a DataSet ? I have read all
> the document But I can't find .
> If Flink now does not have the api, will Flink support it in the future ?
>  thanks a lot!
>

RE: Is there Any api that let DataStream join DataSet ?

Posted by 马国维 <ma...@outlook.com>.
hi every one:thanks a lot for all you help.








In my case , there is a data stream and a huge data set. Each element in the data stream wants to join the huge data set to produce a new data stream.

But it can’t  use the join method like the shuffle method or the broadcast method because of time or memory issue.

I want a  data set that can be queryable. so my problem can be solved by this api:
DataStream.join(HugeQueryableDataSet).byKey(…)

This is a common problem in the production.

By the way because of problems with my email client, I can't cite the email. I will fix it.thanks again!


From: maguowei@outlook.com
To: dev@flink.apache.org
Subject: Is there Any api that let DataStream join DataSet ?
Date: Sun, 28 Jun 2015 02:29:40 +0000




Hi,everyone:
Is there Any api that let the DataStream join a DataSet ? I have read all the document But I can't find .
If Flink now does not have the api, will Flink support it in the future ?
 thanks a lot!