You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Stephan Ewen <se...@apache.org> on 2015/10/14 21:07:14 UTC

Question on Storm Split Streams

I am running into some issues with the Storm Compatibility layer when
dealing with split streams.
Specifically, the situation tested in
"FlinkTopologyBuilderTest.testFieldsGroupingOnMultipleSpoutOutputStreams()"

The topology builder creates a SplitStreamKeySelector, which internally
uses an array key selector. The type of the stream is, however, not
"array", but "SplitStreamType".

With my changes to the KeyedStream and the state handling, there are now
more checks for consistency, and this now throws an exception. From an
initial look, it seems absolutely correct that this fails, because it
attempts to build a program where a POJO stream is accessed as an array
selector (generically via the java.util.Array class at runtime), which
would fail.

Is this a bug in the Storm API and simply is untested at runtime (client
time can only produce such a program in the first place because all types
are raw and no generic checks can be performed), or is there something else
going on implicitly behind the scenes?

Thanks for helping me out,
Stephan

Re: Question on Storm Split Streams

Posted by Stephan Ewen <se...@apache.org>.
Sorry, I did not fully understand this. Is this splitting a feature in the
topology builder that is not supported at runtime?

On Wed, Oct 14, 2015 at 10:10 PM, Matthias J. Sax <mj...@apache.org> wrote:

> SplitStreamKeySelector was build for TupleX output type only
> (FlinkTopologyBuilder never used primitive or POJO types). So splitting
> a POJO type stream is currently not supported by the Storm layer. And
> therefore, there is also no test for it.
>
> It would not be too complicate to add this feature.
>
> -Matthias
>
>
> On 10/14/2015 09:07 PM, Stephan Ewen wrote:
> > I am running into some issues with the Storm Compatibility layer when
> > dealing with split streams.
> > Specifically, the situation tested in
> >
> "FlinkTopologyBuilderTest.testFieldsGroupingOnMultipleSpoutOutputStreams()"
> >
> > The topology builder creates a SplitStreamKeySelector, which internally
> > uses an array key selector. The type of the stream is, however, not
> > "array", but "SplitStreamType".
> >
> > With my changes to the KeyedStream and the state handling, there are now
> > more checks for consistency, and this now throws an exception. From an
> > initial look, it seems absolutely correct that this fails, because it
> > attempts to build a program where a POJO stream is accessed as an array
> > selector (generically via the java.util.Array class at runtime), which
> > would fail.
> >
> > Is this a bug in the Storm API and simply is untested at runtime (client
> > time can only produce such a program in the first place because all types
> > are raw and no generic checks can be performed), or is there something
> else
> > going on implicitly behind the scenes?
> >
> > Thanks for helping me out,
> > Stephan
> >
>
>

Re: Question on Storm Split Streams

Posted by "Matthias J. Sax" <mj...@apache.org>.
SplitStreamKeySelector was build for TupleX output type only
(FlinkTopologyBuilder never used primitive or POJO types). So splitting
a POJO type stream is currently not supported by the Storm layer. And
therefore, there is also no test for it.

It would not be too complicate to add this feature.

-Matthias


On 10/14/2015 09:07 PM, Stephan Ewen wrote:
> I am running into some issues with the Storm Compatibility layer when
> dealing with split streams.
> Specifically, the situation tested in
> "FlinkTopologyBuilderTest.testFieldsGroupingOnMultipleSpoutOutputStreams()"
> 
> The topology builder creates a SplitStreamKeySelector, which internally
> uses an array key selector. The type of the stream is, however, not
> "array", but "SplitStreamType".
> 
> With my changes to the KeyedStream and the state handling, there are now
> more checks for consistency, and this now throws an exception. From an
> initial look, it seems absolutely correct that this fails, because it
> attempts to build a program where a POJO stream is accessed as an array
> selector (generically via the java.util.Array class at runtime), which
> would fail.
> 
> Is this a bug in the Storm API and simply is untested at runtime (client
> time can only produce such a program in the first place because all types
> are raw and no generic checks can be performed), or is there something else
> going on implicitly behind the scenes?
> 
> Thanks for helping me out,
> Stephan
>