You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by Jungtaek Lim <ka...@gmail.com> on 2016/09/21 23:02:27 UTC

[DISCUSS] Plan for merge SQE and Storm SQL

Hi dev,

While code contribution of SQE is in progress, I would like to continue
discussion how to merge SQE and Storm SQL.

I did an analysis of merging SQE and Storm SQL in both side, integrating
SQE to Storm SQL and vice versa.
https://cwiki.apache.org/confluence/display/STORM/Technical+analysis+of+merging+SQE+and+Storm+SQL

As I commented to that page, since I'm working on Storm SQL for some weeks
I can be (heavily) biased. So I'd really appreciated if someone can do
another analysis.

Please feel free to share your thought about this analysis, or another
proposal if you have any, or other things about the merging.

Thanks,
Jungtaek Lim (HeartSaVioR)

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Jungtaek Lim <ka...@gmail.com>.
Another update on Storm SQL:

I wrote up what what tasks we have done with Storm SQL after Phase 1 (which
is shipped to official releases), and enumerate next works and also (near
or far) future works.
https://cwiki.apache.org/confluence/display/STORM/Milestone+of+Storm+SQL

Please let me know if you think something is missed.

Contributions on these issues are really appreciated! If someone has
interest on the work, we can discuss and set next milestone together. Or
I'll just set next milestone and receive feedbacks.

Thanks,
Jungtaek Lim (HeartSaVioR)


2016년 10월 15일 (토) 오후 12:36, Jungtaek Lim <ka...@gmail.com>님이 작성:

> Thanks Haohui to share your thoughts. Let me add some comments inline.
>
> 2016년 10월 13일 (목) 오후 2:31, Haohui Mai <ri...@gmail.com>님이 작성:
>
> Please allow me to share some of the thoughts on this front:
>
> 1. The concepts of monotonic primary key is ensure forward progress for the
> query from the relational standpoints -- essentially the results are
> append-only streams. The key does not need to be tied to the data itself.
> It can be an automatic generated primary key, somewhat similar to an
> auto_inc key in MySQL.
>
>
> What users have it now is unlikely having monotonic primary key and we
> can't use it as datasource so migration or additional processing is needed.
> Moreover, forcing data source to ensure having monotonic primary key is in
> fact not easy to achieve. It requires users to maintain global monotonic
> ID. If underlying external component for data source supports global
> monotonic ID like auto_inc that would be happy, but kinds of supporting
> components are less than others.
>
> Issuing process time as rowtime to PK could remedy the pain, but since
> spouts are parallelized and distributed to several nodes (fortunately but
> in this case unfortunately) we can't always ensure monotonic because of
> time sync issue between servers.
> We also need to have a policy: if tuple needs to be retried, do we issue
> rowtime again, or we try to use last issued rowtime? This is related to
> fault-tolerance, since Storm issues rowtime to tuple so origin data doesn't
> have it, which means we need to maintain across the workers (because Spout
> can be restarted to different worker or even different node.)
> If we pick former, we should tolerate out-of-order, and if we pick latter,
> we should tolerate maintaining storage.
> I might be going too deeply, but IMO they're valid.
>
> So requiring monotonic key is hard for both users and Storm. If Storm (and
> Calcite) supports quasi-monotonic that would be ideal.
>
> And I'm curious that we can keep the semantic 'PRIMARY KEY' as SQL
> standard. Storm can't verify its uniqueness, and doesn't null check (yet).
>
>
>
> One important thing to succeed in StormSQL or other SQL engines on stream
> processing is to separate the parts that are side-effect-free and the parts
> that are not. Relation algebras are great to express the computations that
> are have no side-effects (e.g., projections, filtering), but you'll need
> side effects to interact with the external world which should be expressed
> in DML and encapsulate them in the table. This drives a lot of design
> decisions in StormSQL. We have picked Calcite to keep the query experiences
> mostly the same but we have a bunch of DML to capture various side-effects
> from data sources / destinations.
>
>
> For example, let say there is a use case that uses Storm to driver the
> dashboards. The Storm topology processes the data and pushes the aggregated
> results into MySQL / Redis continuously. To handle data that arrives late,
> the topology patches the state in MySQL / Redis if it sees the late data.
> In
> StormSQL it can be done as two correlated queries: (1) a SELECT / GROUP BY
> query which produces the aggregated results and (2) a INSERT query that
> patches the state for the external store. The SELECT query can be written
> in a quite standard way, but the table that the INSERT query operates might
> have additional information or even code.
>
>
> I'm afraid if I understand your comment.
> AFAIK, Streaming SQL which Julian Hyde is sketching out is not relying on
> stored data even aggregation. That is done with windowing semantic, and I
> didn't see how to handle late tuple, but once window is discarded, I think
> late tuple should be discarded unless we are able to re-calculate window.
> (we might move discarded tuples to somewhere so that gets notified to
> users.)
>
> Currently Storm SQL uses state to push the result to storage.  It doesn't
> handle global aggregation states indeed, since I'm not sure this fits SQL
> semantic. But as I said earlier I may be persuaded.
>
>
> 2. Partitioned / sharded data sources is orthogonal -- they can be the
> implementation details of the data sources. It can be treated similarly as
> partitioned tables in databases. They do not show up in the algebra but the
> information can be useful for optimization. Same as parallelism.
>
>
> Yes for now we can start from adjusting parallelism. Filed STORM-2147
> <https://issues.apache.org/jira/browse/STORM-2147> for this.
>
>
>
> 3. SQL engines separates logical plans and physical plans for the purposes
> of optimization. Optimizations that are data-agnostic, e.g., constant
> propagation, common subexpression elimination (CSE) can be done on the
> logical plan via a common library. Optimizations that depend on the
> characteristics of the data, such as pruning the partitions during joins
> need to be for each engine.
>
>
> IMHO, since we handle stream data it's not easy to build statistics and
> determining characteristics (doesn't it affect performance?), and also
> currently not possible to re-plan query and update physical plan, and
> finally update topology. But if it's possible Storm topology can have
> elasticity in runtime which should be amazing, and these optimizations
> can be applied to other runtime env. (core, Trident).
>
>
>
> Personally I think the highest priority is to have a story about schemas in
> StormSQL / SQE. JSON is flexible but different use cases expect different
> formats. Avro, Thrifts are quite popular and offer different levels of
> performances depending on the use cases. Today StormSQL kind of forces the
> schema of the original data, but once this limitation is gone and having a
> standardize way to specify schemas will greatly help the adoption.
>
>
> Agreed. That's the point I was considering since current implementation
> just implies JSON schema even though SQL statement has INPUTFORMAT and the
> OUTPUTFORMAT, Do you have a picture in mind?
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
>
> Thanks,
> Haohui
>
> On Mon, Oct 10, 2016 at 3:27 PM Jungtaek Lim <ka...@gmail.com> wrote:
>
> > One more, Flink seems to made lots of progress on SQL (still in
> > experimenting), but they set more limited features to streaming SQL.
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table_api.html#limitations-1
> > SELECT, FROM, WHERE, and UNION clauses are all that Flink supports for
> > streaming SQL. I guess they seem to wait for "Streaming SQL"
> specification
> > to be stable, since they don't have micro-batch so window in "Streaming
> > SQL" is mandatory to implement aggregation, join, etc.
> >
> > - Jungtaek Lim (HeartSaVioR)
> >
> > 2016년 10월 11일 (화) 오전 6:36, Jungtaek Lim <ka...@gmail.com>님이 작성:
> >
> > > Hi Morrigan,
> > >
> > > since we're making SQL feature, we need to respect SQL standard,
> > behavior,
> > > and common sense, instead of changing its behavior to meet Storm's
> > > streaming fashion. Some limitations are actually unavoidable.
> > > And I didn't deeply think about "Streaming SQL" yet, since there're
> still
> > > lots of works (small and large) left for micro-batch query. So I
> couldn't
> > > answer some questions clearly about "streaming query" and I might be
> > wrong.
> > > "Streaming SQL" is even still in progress in Calcite project, and may
> be
> > > subject to change.
> > >
> > > Answered inline:
> > >
> > >
> > >
> > >
> > >
> > >
> > > *1) From reading the Calcite documentation, a monotonic expression
> > > isrequired in GROUP BYs of streaming queries. The most common use case
> > > isprobably some sort of date/time/timestamp field. This seems
> > > extremelylimiting to me, especially when streaming off of Kafka where
> > data
> > > may notbe strictly ordered. How is this handled? Is late data not
> allowed
> > > at all?(quasi-monotonicity is allowed, but still doesn't solve late
> data
> > > issue)*
> > >
> > > > First of all, I didn't deeply think about "streaming query" yet,
> since
> > > there're some works (small and large) left for micro-batch query.
> > > In streaming query, aggregation should be done within window. As you
> said
> > > there're late data and out of order issue, and watermark is for trying
> > > best-effort to handle this. It may not solve late data issue
> completely,
> > > and users need to know about that. Beam and Flink strongly publicize
> > about
> > > this, since it's their most strong point.
> > >
> > > Please read these links once if you haven't read these. I admit that I
> > > couldn't understand these clearly so willing to revisit if I need to.
> > > https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
> > > https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
> > >
> > >
> > > *2) Monotonicty becomes even more complicated when your dataset
> > > ispartitioned, again like Kafka. How is this handled?*
> > >
> > > > I think same rules are applied to partitioned dataset. It just tries
> > its
> > > best to wait for late tuples, and drop late tuples if they're not
> > arriving
> > > before window is discarded.
> > >
> > >
> > > *3) Does Calcite have any understanding of partitioned/sharded streams
> > > atall?*
> > >
> > > > I'm not sure about this, but I guess we can include metadata to
> Storm's
> > > own logical relation if needed, and can utilize while constructing
> > physical
> > > plan. We didn't reach to have Storm's own logical relation, and it
> should
> > > take time since it requires deep understanding of Calcite concepts. If
> > > someone can help on this area that should be great.
> > >
> > >
> > >
> > > *4) You say here that "aggregation is limited to micro-batch so result
> > > isnon deterministic." What do you mean here? Trident supports
> > > statefulaggregations, not just within the micro-batch.*
> > >
> > > > It requires every aggregate functions to store its column to State
> > > independently (please correct me if I'm wrong) whereas we're handling
> > 'row'
> > > and expect to insert all columns in row together. Yes it's more limited
> > but
> > > for me it's closer to origin SQL's behavior. I would like to see more
> > > opinions around this.
> > > Btw, when we change Storm SQL to use windowing rather than micro-batch,
> > > aggregation should be done within window, which is more clear to the
> > users,
> > > and more powerful (for example, 5 min average, 1 min sum, etc.) instead
> > of
> > > aggregating all time window.
> > >
> > >
> > > *5) In your technical analysis, you say that "flexible data sources"
> are
> > > apro for Storm SQL, as opposed to SQE. What do you mean?*
> > >
> > > > You can see that all the things for datasource are abstracted in
> > > storm-sql-core, and datasource specific configurations are all handled
> > from
> > > its own. storm-sql-core doesn't even know the existence of
> > storm-sql-kafka,
> > > and also upcoming datasources, and in fact it should.
> > >
> > >
> > > *The work on another high level API is also interesting. Is this meant
> > > tocompletely replace Trident?*
> > >
> > > > I can't clearly say about that, but IMO, when higher-level API can
> > > ensure exactly-once eventually we don't have to rely on Trident.
> > > Micro-batch semantics can be replaced with process time or ingest time
> > > windowing.
> > >
> > > Thanks,
> > > Jungtaek Lim (HeartSaVioR)
> > >
> > >
> > > 2016년 10월 11일 (화) 오전 5:08, Morrigan Jones <mo...@jwplayer.com>님이
> 작성:
> > >
> > > Thank you for this update Jungtaek. I've been meaning to do my own
> > followup
> > > to your technical analysis, but unfortunately had some Kafka work eat
> up
> > a
> > > lot of my time. I do have some immediate questions/thoughts, though:
> > >
> > > 1) From reading the Calcite documentation, a monotonic expression is
> > > required in GROUP BYs of streaming queries. The most common use case is
> > > probably some sort of date/time/timestamp field. This seems extremely
> > > limiting to me, especially when streaming off of Kafka where data may
> not
> > > be strictly ordered. How is this handled? Is late data not allowed at
> > all?
> > > (quasi-monotonicity is allowed, but still doesn't solve late data
> issue)
> > > 2) Monotonicty becomes even more complicated when your dataset is
> > > partitioned, again like Kafka. How is this handled?
> > > 3) Does Calcite have any understanding of partitioned/sharded streams
> at
> > > all?
> > > 4) You say here that "aggregation is limited to micro-batch so result
> is
> > > non deterministic." What do you mean here? Trident supports stateful
> > > aggregations, not just within the micro-batch.
> > > 5) In your technical analysis, you say that "flexible data sources"
> are a
> > > pro for Storm SQL, as opposed to SQE. What do you mean?
> > >
> > > The work on another high level API is also interesting. Is this meant
> to
> > > completely replace Trident?
> > >
> > >
> > > On Fri, Oct 7, 2016 at 1:25 AM, Jungtaek Lim <ka...@gmail.com>
> wrote:
> > >
> > > > For preventing this discussion to be staled, I'd like to put a note
> how
> > > > Storm SQL is going now.
> > > > (including the change after my tech. analysis)
> > > >
> > > > 1. There're 6 pull requests opened regarding Storm SQL.
> > > > https://github.com/apache/storm/pulls/HeartSaVioR
> > > >
> > > > 2. When STORM-2125 <https://github.com/apache/storm/pull/1714> is
> > merged
> > > > to
> > > > master, Storm SQL can handle most of things which Calcite publishes
> to
> > > SQL
> > > > reference page. I wrote Storm SQL's own reference page
> > > > <
> > >
> https://github.com/HeartSaVioR/storm/blob/43478edbd7369047ccc417d6b81ab6
> > > > a314910437/docs/storm-sql-reference.md>
> > > > and submitted it to another pull request (one of six pull requests)
> > > >
> > > > STORM-2125 is having +1 but waiting for Calcite 1.10.0 to be released
> > in
> > > > order to apply bugfix. (Yes, someone can state that it's a bit
> tightly
> > > > coupled.) I'm expecting they can cut 1.10.0 RC1 in next week. If it
> > > doesn't
> > > > happen in next week, we can just stick with Calcite 1.9.0 and
> > immediately
> > > > upgrade 1.10.0 when it's available.
> > > >
> > > > 3. During adding tests I found some of bugs on Calcite side (left
> notes
> > > for
> > > > each test and also reference doc), which we can contribute back to
> > > Calcite
> > > > community. I think this would be a positive feedback loop between
> Storm
> > > and
> > > > Calcite project.
> > > >
> > > > There're still many rooms available to work on Storm SQL.
> > > >
> > > > 1. I haven't had much time to do now, but would like to learn Calcite
> > and
> > > > try to optimize Storm SQL via STORM-1446
> > > > <https://issues.apache.org/jira/browse/STORM-1446>. If someone who
> > > > understands Calcite well takes over and contributes this area I'd be
> > much
> > > > appreciated.
> > > >
> > > > 2. I think Trident seems not good backend API for Storm SQL for a
> long
> > > > term.
> > > > Trident doesn't support typed operation, join and aggregation is
> > limited
> > > to
> > > > micro-batch so result is non deterministic. I'm waiting for
> > higher-level
> > > > API (STORM-1961 <http://issues.apache.org/jira/browse/STORM-1961> is
> > the
> > > > start) and plan to move the backend API.
> > > >
> > > > 3. Storm SQL needs to have more connectors as data sources. It isn't
> > hard
> > > > to work on, but a bit time-consuming.
> > > >
> > > > 4. Storm SQL needs to have time to stabilize by experiment loop:
> > > > experimenting from early adopters, reporting, and fixing bugs.
> > > >
> > > > 5. Long term work: we can expand supporting features on Storm SQL.
> Join
> > > > between streaming data source and normal table should help to enrich
> or
> > > > filter data with other external data source. There's also continuous
> > > effort
> > > > from Calcite community: 'Streaming SQL' led by Julian. etc.
> > > >
> > > > Looking forward to continue discussion with JW Player folks.
> > > >
> > > > Thanks,
> > > > Jungtaek Lim (HeartSaVioR)
> > > >
> > > > ps.
> > > > I'd really like to make revamped Storm SQL available to the
> community.
> > > > Do we think merge process can pause this effort? I hope not, but I
> can
> > > > follow the community's decision.
> > > >
> > > > 2016년 9월 30일 (금) 오전 3:58, P. Taylor Goetz <pt...@gmail.com>님이 작성:
> > > >
> > > > FYI, I’ve merged the SQE code and documentation into the sqe_import
> > > branch:
> > > >
> > > > https://github.com/apache/storm/tree/sqe_import
> > > >
> > > > Note the build will fail on the last component (storm-sqe) due to the
> > > > compilation issues mentioned earlier.
> > > >
> > > > -Taylor
> > > >
> > > >
> > > > On Sep 29, 2016, at 11:13 AM, Bobby Evans
> <evans@yahoo-inc.com.INVALID
> > >
> > > > wrote:
> > > >
> > > > Agreed, or if we can find a way to not break compatibility (like
> > perhaps
> > > > having a common base class for most of the logic and one subclass
> that
> > > uses
> > > > a string while another that uses the byte array).   - Bobby
> > > >
> > > >    On Thursday, September 29, 2016 10:05 AM, Jungtaek Lim <
> > > > kabhwan@gmail.com>
> > > > wrote:
> > > >
> > > >
> > > > The change on storm-redis seems to require breaking backward
> > > compatibility,
> > > > so I would love to see another pull request to integrate it via
> general
> > > > review process.
> > > > If it can be integrated to SQE without changing storm-redis, that
> would
> > > be
> > > > nice.
> > > >
> > > > Does it make sense?
> > > >
> > > > - Jungtaek Lim (HeartSaVioR)
> > > >
> > > > 2016년 9월 29일 (목) 오전 6:08, P. Taylor Goetz <pt...@gmail.com>님이 작성:
> > > >
> > > > The changes are available in GitHub, I just overlooked it. And
> they’re
> > > > actually neatly contained in two commits:
> > > >
> > > > The changes to storm-kafka can be found here:
> > > >
> > > >
> > > >
> > https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572
> > > > 836104755a
> > > >
> > > >
> > > > The changes to storm-redis are here:
> > > >
> > > >
> > > >
> > https://github.com/jwplayer/storm/commit/30d000d3ff673efa8b927d23e554a7
> > > > 05fb2928b8
> > > > <
> > > >
> > https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572
> > > > 836104755a
> > > > >
> > > >
> > > >
> > > > The changes to storm-kafka are straightforward and implemented in
> such
> > a
> > > > way that they would be useful for use cases outside of SQE. As the
> > commit
> > > > message states, it adds a new kafka deserialization scheme
> (FullScheme)
> > > > that includes the key, value, topic, partition and offset when
> reading
> > > from
> > > > kafka, which is a feature I can see as being valuable for some use
> > > cases. I
> > > > would be +1 for merging that code.
> > > >
> > > > The changes to storm-redis are a little different, as Morrigan
> pointed
> > > > out, because it only addresses the Trident API, but IMHO it looks
> like
> > a
> > > > good direction.
> > > >
> > > > @HeartSavior — Would you have some time to take a look at the
> > storm-redis
> > > > changes and provide your opinion, since you’re one of the original
> > > authors
> > > > of that code?
> > > >
> > > > -Taylor
> > > >
> > > >
> > > > On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> > > >
> > > > Great!
> > > >
> > > > For storm-redis, we might need to modify key/value mapper to use
> byte[]
> > > > rather than String.
> > > > When I co-authored storm-redis, I forgot considering binary format of
> > > > serde. If we want to address that part, we can also address it.
> > > >
> > > > 2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이
> 작성:
> > > >
> > > > Sure, when I can. Storm-kafka should be pretty easy. The storm-redis
> > one
> > > > will require more work to make it more complete.
> > > >
> > > > On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
> > > > wrote:
> > > >
> > > > Thanks for the explanation Morrigan!
> > > >
> > > > Would you be willing to provide a pull request or patch so the
> > community
> > > > can review?
> > > >
> > > > It sounds like at least some of the changes you mention could be
> useful
> > > >
> > > > to
> > > >
> > > > the broader community (beyond the SQL effort).
> > > >
> > > > Thanks again,
> > > >
> > > > -Taylor
> > > >
> > > > On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
> > > >
> > > > wrote:
> > > >
> > > >
> > > > storm-kafka - This is needed because storm-kafka does not provide a
> > > >
> > > > scheme
> > > >
> > > > class that gives you the key, value (payload), partition, and offset.
> > > > MessageMetadataScheme.java comes comes closest, but is missing the
> key.
> > > > This was a pretty simple change on my part.
> > > >
> > > > storm-redis - This is needed for proper support of Redis hashes. The
> > > > existing storm-redis uses a static string (additionalKey in
> > > > the RedisDataTypeDescription class) for the field name in hash
> types. I
> > > > updated it to use a configurable KeyFactory for both the hash name
> and
> > > >
> > > > the
> > > >
> > > > field name. We also added some limited support for set types. This is
> > > > admittedly the messiest between the two jars since we only cared
> about
> > > >
> > > > the
> > > >
> > > > trident states and would require a lot more changes to get
> storm-redis
> > > >
> > > > more
> > > >
> > > > "feature complete" overall.
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
> > > >
> > > > wrote:
> > > >
> > > >
> > > > Sounds good. I’ll find out if it builds against 2.x. If so I’ll go
> > > >
> > > > that
> > > >
> > > > direction. Otherwise I’ll come back with my findings and we can
> > > >
> > > > discuss
> > > >
> > > > it
> > > >
> > > > further.
> > > >
> > > > I notice there are jars in the git repo that we obviously can’t
> > > >
> > > > import.
> > > >
> > > > They look like they might be custom JWPlayer builds of storm-kafka
> and
> > > > storm-redis.
> > > >
> > > > Morrigan — Do you know if there is any differences there that
> required
> > > > custom builds of those components?
> > > >
> > > > -Taylor
> > > >
> > > > On Sep 26, 2016, at 3:31 PM, Bobby Evans
> > > >
> > > > <evans@yahoo-inc.com.INVALID
> > > >
> > > >
> > > > wrote:
> > > >
> > > > Does it compile against 2.X?  If so I would prefer to have it go
> > > >
> > > > there,
> > > >
> > > > and then possibly 1.x if people what it there too. - Bobby
> > > >
> > > >
> > > >   On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
> > > >
> > > > ptgoetz@gmail.com> wrote:
> > > >
> > > >
> > > > The IP Clearance vote has passed and we are now able to import the
> > > >
> > > > SQE
> > > >
> > > > code.
> > > >
> > > >
> > > > The question now is to where do we want to import the code?
> > > >
> > > > My inclination is to import it to “external” in the 1.x branch. It
> > > >
> > > > can
> > > >
> > > > be ported to other branches as necessary/if desired. An alternative
> > > >
> > > > would
> > > >
> > > > be to treat it as a feature branch, but I’d rather take the former
> > > >
> > > > approach.
> > > >
> > > >
> > > > Thought/opinions?
> > > >
> > > > -Taylor
> > > >
> > > > On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
> > > >
> > > > wrote:
> > > >
> > > >
> > > > My apologies. I meant to cc dev@, but didn't. Will forward in a
> > > >
> > > > bit...
> > > >
> > > >
> > > > The vote (lazy consensus) is underway on general@incubator, and
> > > >
> > > > will
> > > >
> > > > close in less than 72 hours. After that the code can be merged.
> > > >
> > > >
> > > > -Taylor
> > > >
> > > > On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>
> > > >
> > > > wrote:
> > > >
> > > >
> > > > Hi dev,
> > > >
> > > > While code contribution of SQE is in progress, I would like to
> > > >
> > > > continue
> > > >
> > > > discussion how to merge SQE and Storm SQL.
> > > >
> > > > I did an analysis of merging SQE and Storm SQL in both side,
> > > >
> > > > integrating
> > > >
> > > > SQE to Storm SQL and vice versa.
> > > > https://cwiki.apache.org/confluence/display/STORM/
> > > >
> > > > Technical+analysis+of+merging+SQE+and+Storm+SQL
> > > >
> > > >
> > > > As I commented to that page, since I'm working on Storm SQL for
> > > >
> > > > some
> > > >
> > > > weeks
> > > >
> > > > I can be (heavily) biased. So I'd really appreciated if someone can
> > > >
> > > > do
> > > >
> > > > another analysis.
> > > >
> > > > Please feel free to share your thought about this analysis, or
> > > >
> > > > another
> > > >
> > > > proposal if you have any, or other things about the merging.
> > > >
> > > > Thanks,
> > > > Jungtaek Lim (HeartSaVioR)
> > > >
> > > >
> > > >
> > > > --
> > > > Morrigan Jones
> > > > Principal Engineer
> > > > *JW*PLAYER  |  Your Way to Play
> > > > morrigan@jwplayer.com  |  jwplayer.com
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Morrigan Jones
> > > > Principal Engineer
> > > > *JW*PLAYER  |  Your Way to Play
> > > > morrigan@jwplayer.com  |  jwplayer.com
> > > >
> > >
> > >
> > >
> > > --
> > > Morrigan Jones
> > > Principal Engineer
> > > *JW*PLAYER  |  Your Way to Play
> > > morrigan@jwplayer.com  |  jwplayer.com
> > >
> > >
> >
>
>

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Jungtaek Lim <ka...@gmail.com>.
Thanks Haohui to share your thoughts. Let me add some comments inline.

2016년 10월 13일 (목) 오후 2:31, Haohui Mai <ri...@gmail.com>님이 작성:

Please allow me to share some of the thoughts on this front:

1. The concepts of monotonic primary key is ensure forward progress for the
query from the relational standpoints -- essentially the results are
append-only streams. The key does not need to be tied to the data itself.
It can be an automatic generated primary key, somewhat similar to an
auto_inc key in MySQL.


What users have it now is unlikely having monotonic primary key and we
can't use it as datasource so migration or additional processing is needed.
Moreover, forcing data source to ensure having monotonic primary key is in
fact not easy to achieve. It requires users to maintain global monotonic
ID. If underlying external component for data source supports global
monotonic ID like auto_inc that would be happy, but kinds of supporting
components are less than others.

Issuing process time as rowtime to PK could remedy the pain, but since
spouts are parallelized and distributed to several nodes (fortunately but
in this case unfortunately) we can't always ensure monotonic because of
time sync issue between servers.
We also need to have a policy: if tuple needs to be retried, do we issue
rowtime again, or we try to use last issued rowtime? This is related to
fault-tolerance, since Storm issues rowtime to tuple so origin data doesn't
have it, which means we need to maintain across the workers (because Spout
can be restarted to different worker or even different node.)
If we pick former, we should tolerate out-of-order, and if we pick latter,
we should tolerate maintaining storage.
I might be going too deeply, but IMO they're valid.

So requiring monotonic key is hard for both users and Storm. If Storm (and
Calcite) supports quasi-monotonic that would be ideal.

And I'm curious that we can keep the semantic 'PRIMARY KEY' as SQL
standard. Storm can't verify its uniqueness, and doesn't null check (yet).



One important thing to succeed in StormSQL or other SQL engines on stream
processing is to separate the parts that are side-effect-free and the parts
that are not. Relation algebras are great to express the computations that
are have no side-effects (e.g., projections, filtering), but you'll need
side effects to interact with the external world which should be expressed
in DML and encapsulate them in the table. This drives a lot of design
decisions in StormSQL. We have picked Calcite to keep the query experiences
mostly the same but we have a bunch of DML to capture various side-effects
from data sources / destinations.


For example, let say there is a use case that uses Storm to driver the
dashboards. The Storm topology processes the data and pushes the aggregated
results into MySQL / Redis continuously. To handle data that arrives late,
the topology patches the state in MySQL / Redis if it sees the late data. In
StormSQL it can be done as two correlated queries: (1) a SELECT / GROUP BY
query which produces the aggregated results and (2) a INSERT query that
patches the state for the external store. The SELECT query can be written
in a quite standard way, but the table that the INSERT query operates might
have additional information or even code.


I'm afraid if I understand your comment.
AFAIK, Streaming SQL which Julian Hyde is sketching out is not relying on
stored data even aggregation. That is done with windowing semantic, and I
didn't see how to handle late tuple, but once window is discarded, I think
late tuple should be discarded unless we are able to re-calculate window.
(we might move discarded tuples to somewhere so that gets notified to
users.)

Currently Storm SQL uses state to push the result to storage.  It doesn't
handle global aggregation states indeed, since I'm not sure this fits SQL
semantic. But as I said earlier I may be persuaded.


2. Partitioned / sharded data sources is orthogonal -- they can be the
implementation details of the data sources. It can be treated similarly as
partitioned tables in databases. They do not show up in the algebra but the
information can be useful for optimization. Same as parallelism.


Yes for now we can start from adjusting parallelism. Filed STORM-2147
<https://issues.apache.org/jira/browse/STORM-2147> for this.



3. SQL engines separates logical plans and physical plans for the purposes
of optimization. Optimizations that are data-agnostic, e.g., constant
propagation, common subexpression elimination (CSE) can be done on the
logical plan via a common library. Optimizations that depend on the
characteristics of the data, such as pruning the partitions during joins
need to be for each engine.


IMHO, since we handle stream data it's not easy to build statistics and
determining characteristics (doesn't it affect performance?), and also
currently not possible to re-plan query and update physical plan, and
finally update topology. But if it's possible Storm topology can have
elasticity in runtime which should be amazing, and these optimizations can
be applied to other runtime env. (core, Trident).



Personally I think the highest priority is to have a story about schemas in
StormSQL / SQE. JSON is flexible but different use cases expect different
formats. Avro, Thrifts are quite popular and offer different levels of
performances depending on the use cases. Today StormSQL kind of forces the
schema of the original data, but once this limitation is gone and having a
standardize way to specify schemas will greatly help the adoption.


Agreed. That's the point I was considering since current implementation
just implies JSON schema even though SQL statement has INPUTFORMAT and the
OUTPUTFORMAT, Do you have a picture in mind?

Thanks,
Jungtaek Lim (HeartSaVioR)


Thanks,
Haohui

On Mon, Oct 10, 2016 at 3:27 PM Jungtaek Lim <ka...@gmail.com> wrote:

> One more, Flink seems to made lots of progress on SQL (still in
> experimenting), but they set more limited features to streaming SQL.
>
>
https://ci.apache.org/projects/flink/flink-docs-master/dev/table_api.html#limitations-1
> SELECT, FROM, WHERE, and UNION clauses are all that Flink supports for
> streaming SQL. I guess they seem to wait for "Streaming SQL" specification
> to be stable, since they don't have micro-batch so window in "Streaming
> SQL" is mandatory to implement aggregation, join, etc.
>
> - Jungtaek Lim (HeartSaVioR)
>
> 2016년 10월 11일 (화) 오전 6:36, Jungtaek Lim <ka...@gmail.com>님이 작성:
>
> > Hi Morrigan,
> >
> > since we're making SQL feature, we need to respect SQL standard,
> behavior,
> > and common sense, instead of changing its behavior to meet Storm's
> > streaming fashion. Some limitations are actually unavoidable.
> > And I didn't deeply think about "Streaming SQL" yet, since there're
still
> > lots of works (small and large) left for micro-batch query. So I
couldn't
> > answer some questions clearly about "streaming query" and I might be
> wrong.
> > "Streaming SQL" is even still in progress in Calcite project, and may be
> > subject to change.
> >
> > Answered inline:
> >
> >
> >
> >
> >
> >
> > *1) From reading the Calcite documentation, a monotonic expression
> > isrequired in GROUP BYs of streaming queries. The most common use case
> > isprobably some sort of date/time/timestamp field. This seems
> > extremelylimiting to me, especially when streaming off of Kafka where
> data
> > may notbe strictly ordered. How is this handled? Is late data not
allowed
> > at all?(quasi-monotonicity is allowed, but still doesn't solve late data
> > issue)*
> >
> > > First of all, I didn't deeply think about "streaming query" yet, since
> > there're some works (small and large) left for micro-batch query.
> > In streaming query, aggregation should be done within window. As you
said
> > there're late data and out of order issue, and watermark is for trying
> > best-effort to handle this. It may not solve late data issue completely,
> > and users need to know about that. Beam and Flink strongly publicize
> about
> > this, since it's their most strong point.
> >
> > Please read these links once if you haven't read these. I admit that I
> > couldn't understand these clearly so willing to revisit if I need to.
> > https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
> > https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
> >
> >
> > *2) Monotonicty becomes even more complicated when your dataset
> > ispartitioned, again like Kafka. How is this handled?*
> >
> > > I think same rules are applied to partitioned dataset. It just tries
> its
> > best to wait for late tuples, and drop late tuples if they're not
> arriving
> > before window is discarded.
> >
> >
> > *3) Does Calcite have any understanding of partitioned/sharded streams
> > atall?*
> >
> > > I'm not sure about this, but I guess we can include metadata to
Storm's
> > own logical relation if needed, and can utilize while constructing
> physical
> > plan. We didn't reach to have Storm's own logical relation, and it
should
> > take time since it requires deep understanding of Calcite concepts. If
> > someone can help on this area that should be great.
> >
> >
> >
> > *4) You say here that "aggregation is limited to micro-batch so result
> > isnon deterministic." What do you mean here? Trident supports
> > statefulaggregations, not just within the micro-batch.*
> >
> > > It requires every aggregate functions to store its column to State
> > independently (please correct me if I'm wrong) whereas we're handling
> 'row'
> > and expect to insert all columns in row together. Yes it's more limited
> but
> > for me it's closer to origin SQL's behavior. I would like to see more
> > opinions around this.
> > Btw, when we change Storm SQL to use windowing rather than micro-batch,
> > aggregation should be done within window, which is more clear to the
> users,
> > and more powerful (for example, 5 min average, 1 min sum, etc.) instead
> of
> > aggregating all time window.
> >
> >
> > *5) In your technical analysis, you say that "flexible data sources" are
> > apro for Storm SQL, as opposed to SQE. What do you mean?*
> >
> > > You can see that all the things for datasource are abstracted in
> > storm-sql-core, and datasource specific configurations are all handled
> from
> > its own. storm-sql-core doesn't even know the existence of
> storm-sql-kafka,
> > and also upcoming datasources, and in fact it should.
> >
> >
> > *The work on another high level API is also interesting. Is this meant
> > tocompletely replace Trident?*
> >
> > > I can't clearly say about that, but IMO, when higher-level API can
> > ensure exactly-once eventually we don't have to rely on Trident.
> > Micro-batch semantics can be replaced with process time or ingest time
> > windowing.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> >
> > 2016년 10월 11일 (화) 오전 5:08, Morrigan Jones <mo...@jwplayer.com>님이 작성:
> >
> > Thank you for this update Jungtaek. I've been meaning to do my own
> followup
> > to your technical analysis, but unfortunately had some Kafka work eat up
> a
> > lot of my time. I do have some immediate questions/thoughts, though:
> >
> > 1) From reading the Calcite documentation, a monotonic expression is
> > required in GROUP BYs of streaming queries. The most common use case is
> > probably some sort of date/time/timestamp field. This seems extremely
> > limiting to me, especially when streaming off of Kafka where data may
not
> > be strictly ordered. How is this handled? Is late data not allowed at
> all?
> > (quasi-monotonicity is allowed, but still doesn't solve late data issue)
> > 2) Monotonicty becomes even more complicated when your dataset is
> > partitioned, again like Kafka. How is this handled?
> > 3) Does Calcite have any understanding of partitioned/sharded streams at
> > all?
> > 4) You say here that "aggregation is limited to micro-batch so result is
> > non deterministic." What do you mean here? Trident supports stateful
> > aggregations, not just within the micro-batch.
> > 5) In your technical analysis, you say that "flexible data sources" are
a
> > pro for Storm SQL, as opposed to SQE. What do you mean?
> >
> > The work on another high level API is also interesting. Is this meant to
> > completely replace Trident?
> >
> >
> > On Fri, Oct 7, 2016 at 1:25 AM, Jungtaek Lim <ka...@gmail.com> wrote:
> >
> > > For preventing this discussion to be staled, I'd like to put a note
how
> > > Storm SQL is going now.
> > > (including the change after my tech. analysis)
> > >
> > > 1. There're 6 pull requests opened regarding Storm SQL.
> > > https://github.com/apache/storm/pulls/HeartSaVioR
> > >
> > > 2. When STORM-2125 <https://github.com/apache/storm/pull/1714> is
> merged
> > > to
> > > master, Storm SQL can handle most of things which Calcite publishes to
> > SQL
> > > reference page. I wrote Storm SQL's own reference page
> > > <
> > https://github.com/HeartSaVioR/storm/blob/43478edbd7369047ccc417d6b81ab6
> > > a314910437/docs/storm-sql-reference.md>
> > > and submitted it to another pull request (one of six pull requests)
> > >
> > > STORM-2125 is having +1 but waiting for Calcite 1.10.0 to be released
> in
> > > order to apply bugfix. (Yes, someone can state that it's a bit tightly
> > > coupled.) I'm expecting they can cut 1.10.0 RC1 in next week. If it
> > doesn't
> > > happen in next week, we can just stick with Calcite 1.9.0 and
> immediately
> > > upgrade 1.10.0 when it's available.
> > >
> > > 3. During adding tests I found some of bugs on Calcite side (left
notes
> > for
> > > each test and also reference doc), which we can contribute back to
> > Calcite
> > > community. I think this would be a positive feedback loop between
Storm
> > and
> > > Calcite project.
> > >
> > > There're still many rooms available to work on Storm SQL.
> > >
> > > 1. I haven't had much time to do now, but would like to learn Calcite
> and
> > > try to optimize Storm SQL via STORM-1446
> > > <https://issues.apache.org/jira/browse/STORM-1446>. If someone who
> > > understands Calcite well takes over and contributes this area I'd be
> much
> > > appreciated.
> > >
> > > 2. I think Trident seems not good backend API for Storm SQL for a long
> > > term.
> > > Trident doesn't support typed operation, join and aggregation is
> limited
> > to
> > > micro-batch so result is non deterministic. I'm waiting for
> higher-level
> > > API (STORM-1961 <http://issues.apache.org/jira/browse/STORM-1961> is
> the
> > > start) and plan to move the backend API.
> > >
> > > 3. Storm SQL needs to have more connectors as data sources. It isn't
> hard
> > > to work on, but a bit time-consuming.
> > >
> > > 4. Storm SQL needs to have time to stabilize by experiment loop:
> > > experimenting from early adopters, reporting, and fixing bugs.
> > >
> > > 5. Long term work: we can expand supporting features on Storm SQL.
Join
> > > between streaming data source and normal table should help to enrich
or
> > > filter data with other external data source. There's also continuous
> > effort
> > > from Calcite community: 'Streaming SQL' led by Julian. etc.
> > >
> > > Looking forward to continue discussion with JW Player folks.
> > >
> > > Thanks,
> > > Jungtaek Lim (HeartSaVioR)
> > >
> > > ps.
> > > I'd really like to make revamped Storm SQL available to the community.
> > > Do we think merge process can pause this effort? I hope not, but I can
> > > follow the community's decision.
> > >
> > > 2016년 9월 30일 (금) 오전 3:58, P. Taylor Goetz <pt...@gmail.com>님이 작성:
> > >
> > > FYI, I’ve merged the SQE code and documentation into the sqe_import
> > branch:
> > >
> > > https://github.com/apache/storm/tree/sqe_import
> > >
> > > Note the build will fail on the last component (storm-sqe) due to the
> > > compilation issues mentioned earlier.
> > >
> > > -Taylor
> > >
> > >
> > > On Sep 29, 2016, at 11:13 AM, Bobby Evans <evans@yahoo-inc.com.INVALID
> >
> > > wrote:
> > >
> > > Agreed, or if we can find a way to not break compatibility (like
> perhaps
> > > having a common base class for most of the logic and one subclass that
> > uses
> > > a string while another that uses the byte array).   - Bobby
> > >
> > >    On Thursday, September 29, 2016 10:05 AM, Jungtaek Lim <
> > > kabhwan@gmail.com>
> > > wrote:
> > >
> > >
> > > The change on storm-redis seems to require breaking backward
> > compatibility,
> > > so I would love to see another pull request to integrate it via
general
> > > review process.
> > > If it can be integrated to SQE without changing storm-redis, that
would
> > be
> > > nice.
> > >
> > > Does it make sense?
> > >
> > > - Jungtaek Lim (HeartSaVioR)
> > >
> > > 2016년 9월 29일 (목) 오전 6:08, P. Taylor Goetz <pt...@gmail.com>님이 작성:
> > >
> > > The changes are available in GitHub, I just overlooked it. And they’re
> > > actually neatly contained in two commits:
> > >
> > > The changes to storm-kafka can be found here:
> > >
> > >
> > >
> https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572
> > > 836104755a
> > >
> > >
> > > The changes to storm-redis are here:
> > >
> > >
> > >
> https://github.com/jwplayer/storm/commit/30d000d3ff673efa8b927d23e554a7
> > > 05fb2928b8
> > > <
> > >
> https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572
> > > 836104755a
> > > >
> > >
> > >
> > > The changes to storm-kafka are straightforward and implemented in such
> a
> > > way that they would be useful for use cases outside of SQE. As the
> commit
> > > message states, it adds a new kafka deserialization scheme
(FullScheme)
> > > that includes the key, value, topic, partition and offset when reading
> > from
> > > kafka, which is a feature I can see as being valuable for some use
> > cases. I
> > > would be +1 for merging that code.
> > >
> > > The changes to storm-redis are a little different, as Morrigan pointed
> > > out, because it only addresses the Trident API, but IMHO it looks like
> a
> > > good direction.
> > >
> > > @HeartSavior — Would you have some time to take a look at the
> storm-redis
> > > changes and provide your opinion, since you’re one of the original
> > authors
> > > of that code?
> > >
> > > -Taylor
> > >
> > >
> > > On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> > >
> > > Great!
> > >
> > > For storm-redis, we might need to modify key/value mapper to use
byte[]
> > > rather than String.
> > > When I co-authored storm-redis, I forgot considering binary format of
> > > serde. If we want to address that part, we can also address it.
> > >
> > > 2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이 작성:
> > >
> > > Sure, when I can. Storm-kafka should be pretty easy. The storm-redis
> one
> > > will require more work to make it more complete.
> > >
> > > On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
> > > wrote:
> > >
> > > Thanks for the explanation Morrigan!
> > >
> > > Would you be willing to provide a pull request or patch so the
> community
> > > can review?
> > >
> > > It sounds like at least some of the changes you mention could be
useful
> > >
> > > to
> > >
> > > the broader community (beyond the SQL effort).
> > >
> > > Thanks again,
> > >
> > > -Taylor
> > >
> > > On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
> > >
> > > wrote:
> > >
> > >
> > > storm-kafka - This is needed because storm-kafka does not provide a
> > >
> > > scheme
> > >
> > > class that gives you the key, value (payload), partition, and offset.
> > > MessageMetadataScheme.java comes comes closest, but is missing the
key.
> > > This was a pretty simple change on my part.
> > >
> > > storm-redis - This is needed for proper support of Redis hashes. The
> > > existing storm-redis uses a static string (additionalKey in
> > > the RedisDataTypeDescription class) for the field name in hash types.
I
> > > updated it to use a configurable KeyFactory for both the hash name and
> > >
> > > the
> > >
> > > field name. We also added some limited support for set types. This is
> > > admittedly the messiest between the two jars since we only cared about
> > >
> > > the
> > >
> > > trident states and would require a lot more changes to get storm-redis
> > >
> > > more
> > >
> > > "feature complete" overall.
> > >
> > >
> > >
> > >
> > > On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
> > >
> > > wrote:
> > >
> > >
> > > Sounds good. I’ll find out if it builds against 2.x. If so I’ll go
> > >
> > > that
> > >
> > > direction. Otherwise I’ll come back with my findings and we can
> > >
> > > discuss
> > >
> > > it
> > >
> > > further.
> > >
> > > I notice there are jars in the git repo that we obviously can’t
> > >
> > > import.
> > >
> > > They look like they might be custom JWPlayer builds of storm-kafka and
> > > storm-redis.
> > >
> > > Morrigan — Do you know if there is any differences there that required
> > > custom builds of those components?
> > >
> > > -Taylor
> > >
> > > On Sep 26, 2016, at 3:31 PM, Bobby Evans
> > >
> > > <evans@yahoo-inc.com.INVALID
> > >
> > >
> > > wrote:
> > >
> > > Does it compile against 2.X?  If so I would prefer to have it go
> > >
> > > there,
> > >
> > > and then possibly 1.x if people what it there too. - Bobby
> > >
> > >
> > >   On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
> > >
> > > ptgoetz@gmail.com> wrote:
> > >
> > >
> > > The IP Clearance vote has passed and we are now able to import the
> > >
> > > SQE
> > >
> > > code.
> > >
> > >
> > > The question now is to where do we want to import the code?
> > >
> > > My inclination is to import it to “external” in the 1.x branch. It
> > >
> > > can
> > >
> > > be ported to other branches as necessary/if desired. An alternative
> > >
> > > would
> > >
> > > be to treat it as a feature branch, but I’d rather take the former
> > >
> > > approach.
> > >
> > >
> > > Thought/opinions?
> > >
> > > -Taylor
> > >
> > > On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
> > >
> > > wrote:
> > >
> > >
> > > My apologies. I meant to cc dev@, but didn't. Will forward in a
> > >
> > > bit...
> > >
> > >
> > > The vote (lazy consensus) is underway on general@incubator, and
> > >
> > > will
> > >
> > > close in less than 72 hours. After that the code can be merged.
> > >
> > >
> > > -Taylor
> > >
> > > On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>
> > >
> > > wrote:
> > >
> > >
> > > Hi dev,
> > >
> > > While code contribution of SQE is in progress, I would like to
> > >
> > > continue
> > >
> > > discussion how to merge SQE and Storm SQL.
> > >
> > > I did an analysis of merging SQE and Storm SQL in both side,
> > >
> > > integrating
> > >
> > > SQE to Storm SQL and vice versa.
> > > https://cwiki.apache.org/confluence/display/STORM/
> > >
> > > Technical+analysis+of+merging+SQE+and+Storm+SQL
> > >
> > >
> > > As I commented to that page, since I'm working on Storm SQL for
> > >
> > > some
> > >
> > > weeks
> > >
> > > I can be (heavily) biased. So I'd really appreciated if someone can
> > >
> > > do
> > >
> > > another analysis.
> > >
> > > Please feel free to share your thought about this analysis, or
> > >
> > > another
> > >
> > > proposal if you have any, or other things about the merging.
> > >
> > > Thanks,
> > > Jungtaek Lim (HeartSaVioR)
> > >
> > >
> > >
> > > --
> > > Morrigan Jones
> > > Principal Engineer
> > > *JW*PLAYER  |  Your Way to Play
> > > morrigan@jwplayer.com  |  jwplayer.com
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Morrigan Jones
> > > Principal Engineer
> > > *JW*PLAYER  |  Your Way to Play
> > > morrigan@jwplayer.com  |  jwplayer.com
> > >
> >
> >
> >
> > --
> > Morrigan Jones
> > Principal Engineer
> > *JW*PLAYER  |  Your Way to Play
> > morrigan@jwplayer.com  |  jwplayer.com
> >
> >
>

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Haohui Mai <ri...@gmail.com>.
Please allow me to share some of the thoughts on this front:

1. The concepts of monotonic primary key is ensure forward progress for the
query from the relational standpoints -- essentially the results are
append-only streams. The key does not need to be tied to the data itself.
It can be an automatic generated primary key, somewhat similar to an
auto_inc key in MySQL.

One important thing to succeed in StormSQL or other SQL engines on stream
processing is to separate the parts that are side-effect-free and the parts
that are not. Relation algebras are great to express the computations that
are have no side-effects (e.g., projections, filtering), but you'll need
side effects to interact with the external world which should be expressed
in DML and encapsulate them in the table. This drives a lot of design
decisions in StormSQL. We have picked Calcite to keep the query experiences
mostly the same but we have a bunch of DML to capture various side-effects
from data sources / destinations.

For example, let say there is a use case that uses Storm to driver the
dashboards. The Storm topology processes the data and pushes the aggregated
results into MySQL / Redis continuously. To handle data that arrives late,
the topology patches the state in MySQL / Redis if it sees the late data. In
StormSQL it can be done as two correlated queries: (1) a SELECT / GROUP BY
query which produces the aggregated results and (2) a INSERT query that
patches the state for the external store. The SELECT query can be written
in a quite standard way, but the table that the INSERT query operates might
have additional information or even code.

2. Partitioned / sharded data sources is orthogonal -- they can be the
implementation details of the data sources. It can be treated similarly as
partitioned tables in databases. They do not show up in the algebra but the
information can be useful for optimization. Same as parallelism.

3. SQL engines separates logical plans and physical plans for the purposes
of optimization. Optimizations that are data-agnostic, e.g., constant
propagation, common subexpression elimination (CSE) can be done on the
logical plan via a common library. Optimizations that depend on the
characteristics of the data, such as pruning the partitions during joins
need to be for each engine.


Personally I think the highest priority is to have a story about schemas in
StormSQL / SQE. JSON is flexible but different use cases expect different
formats. Avro, Thrifts are quite popular and offer different levels of
performances depending on the use cases. Today StormSQL kind of forces the
schema of the original data, but once this limitation is gone and having a
standardize way to specify schemas will greatly help the adoption.

Thanks,
Haohui

On Mon, Oct 10, 2016 at 3:27 PM Jungtaek Lim <ka...@gmail.com> wrote:

> One more, Flink seems to made lots of progress on SQL (still in
> experimenting), but they set more limited features to streaming SQL.
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table_api.html#limitations-1
> SELECT, FROM, WHERE, and UNION clauses are all that Flink supports for
> streaming SQL. I guess they seem to wait for "Streaming SQL" specification
> to be stable, since they don't have micro-batch so window in "Streaming
> SQL" is mandatory to implement aggregation, join, etc.
>
> - Jungtaek Lim (HeartSaVioR)
>
> 2016년 10월 11일 (화) 오전 6:36, Jungtaek Lim <ka...@gmail.com>님이 작성:
>
> > Hi Morrigan,
> >
> > since we're making SQL feature, we need to respect SQL standard,
> behavior,
> > and common sense, instead of changing its behavior to meet Storm's
> > streaming fashion. Some limitations are actually unavoidable.
> > And I didn't deeply think about "Streaming SQL" yet, since there're still
> > lots of works (small and large) left for micro-batch query. So I couldn't
> > answer some questions clearly about "streaming query" and I might be
> wrong.
> > "Streaming SQL" is even still in progress in Calcite project, and may be
> > subject to change.
> >
> > Answered inline:
> >
> >
> >
> >
> >
> >
> > *1) From reading the Calcite documentation, a monotonic expression
> > isrequired in GROUP BYs of streaming queries. The most common use case
> > isprobably some sort of date/time/timestamp field. This seems
> > extremelylimiting to me, especially when streaming off of Kafka where
> data
> > may notbe strictly ordered. How is this handled? Is late data not allowed
> > at all?(quasi-monotonicity is allowed, but still doesn't solve late data
> > issue)*
> >
> > > First of all, I didn't deeply think about "streaming query" yet, since
> > there're some works (small and large) left for micro-batch query.
> > In streaming query, aggregation should be done within window. As you said
> > there're late data and out of order issue, and watermark is for trying
> > best-effort to handle this. It may not solve late data issue completely,
> > and users need to know about that. Beam and Flink strongly publicize
> about
> > this, since it's their most strong point.
> >
> > Please read these links once if you haven't read these. I admit that I
> > couldn't understand these clearly so willing to revisit if I need to.
> > https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
> > https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
> >
> >
> > *2) Monotonicty becomes even more complicated when your dataset
> > ispartitioned, again like Kafka. How is this handled?*
> >
> > > I think same rules are applied to partitioned dataset. It just tries
> its
> > best to wait for late tuples, and drop late tuples if they're not
> arriving
> > before window is discarded.
> >
> >
> > *3) Does Calcite have any understanding of partitioned/sharded streams
> > atall?*
> >
> > > I'm not sure about this, but I guess we can include metadata to Storm's
> > own logical relation if needed, and can utilize while constructing
> physical
> > plan. We didn't reach to have Storm's own logical relation, and it should
> > take time since it requires deep understanding of Calcite concepts. If
> > someone can help on this area that should be great.
> >
> >
> >
> > *4) You say here that "aggregation is limited to micro-batch so result
> > isnon deterministic." What do you mean here? Trident supports
> > statefulaggregations, not just within the micro-batch.*
> >
> > > It requires every aggregate functions to store its column to State
> > independently (please correct me if I'm wrong) whereas we're handling
> 'row'
> > and expect to insert all columns in row together. Yes it's more limited
> but
> > for me it's closer to origin SQL's behavior. I would like to see more
> > opinions around this.
> > Btw, when we change Storm SQL to use windowing rather than micro-batch,
> > aggregation should be done within window, which is more clear to the
> users,
> > and more powerful (for example, 5 min average, 1 min sum, etc.) instead
> of
> > aggregating all time window.
> >
> >
> > *5) In your technical analysis, you say that "flexible data sources" are
> > apro for Storm SQL, as opposed to SQE. What do you mean?*
> >
> > > You can see that all the things for datasource are abstracted in
> > storm-sql-core, and datasource specific configurations are all handled
> from
> > its own. storm-sql-core doesn't even know the existence of
> storm-sql-kafka,
> > and also upcoming datasources, and in fact it should.
> >
> >
> > *The work on another high level API is also interesting. Is this meant
> > tocompletely replace Trident?*
> >
> > > I can't clearly say about that, but IMO, when higher-level API can
> > ensure exactly-once eventually we don't have to rely on Trident.
> > Micro-batch semantics can be replaced with process time or ingest time
> > windowing.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> >
> > 2016년 10월 11일 (화) 오전 5:08, Morrigan Jones <mo...@jwplayer.com>님이 작성:
> >
> > Thank you for this update Jungtaek. I've been meaning to do my own
> followup
> > to your technical analysis, but unfortunately had some Kafka work eat up
> a
> > lot of my time. I do have some immediate questions/thoughts, though:
> >
> > 1) From reading the Calcite documentation, a monotonic expression is
> > required in GROUP BYs of streaming queries. The most common use case is
> > probably some sort of date/time/timestamp field. This seems extremely
> > limiting to me, especially when streaming off of Kafka where data may not
> > be strictly ordered. How is this handled? Is late data not allowed at
> all?
> > (quasi-monotonicity is allowed, but still doesn't solve late data issue)
> > 2) Monotonicty becomes even more complicated when your dataset is
> > partitioned, again like Kafka. How is this handled?
> > 3) Does Calcite have any understanding of partitioned/sharded streams at
> > all?
> > 4) You say here that "aggregation is limited to micro-batch so result is
> > non deterministic." What do you mean here? Trident supports stateful
> > aggregations, not just within the micro-batch.
> > 5) In your technical analysis, you say that "flexible data sources" are a
> > pro for Storm SQL, as opposed to SQE. What do you mean?
> >
> > The work on another high level API is also interesting. Is this meant to
> > completely replace Trident?
> >
> >
> > On Fri, Oct 7, 2016 at 1:25 AM, Jungtaek Lim <ka...@gmail.com> wrote:
> >
> > > For preventing this discussion to be staled, I'd like to put a note how
> > > Storm SQL is going now.
> > > (including the change after my tech. analysis)
> > >
> > > 1. There're 6 pull requests opened regarding Storm SQL.
> > > https://github.com/apache/storm/pulls/HeartSaVioR
> > >
> > > 2. When STORM-2125 <https://github.com/apache/storm/pull/1714> is
> merged
> > > to
> > > master, Storm SQL can handle most of things which Calcite publishes to
> > SQL
> > > reference page. I wrote Storm SQL's own reference page
> > > <
> > https://github.com/HeartSaVioR/storm/blob/43478edbd7369047ccc417d6b81ab6
> > > a314910437/docs/storm-sql-reference.md>
> > > and submitted it to another pull request (one of six pull requests)
> > >
> > > STORM-2125 is having +1 but waiting for Calcite 1.10.0 to be released
> in
> > > order to apply bugfix. (Yes, someone can state that it's a bit tightly
> > > coupled.) I'm expecting they can cut 1.10.0 RC1 in next week. If it
> > doesn't
> > > happen in next week, we can just stick with Calcite 1.9.0 and
> immediately
> > > upgrade 1.10.0 when it's available.
> > >
> > > 3. During adding tests I found some of bugs on Calcite side (left notes
> > for
> > > each test and also reference doc), which we can contribute back to
> > Calcite
> > > community. I think this would be a positive feedback loop between Storm
> > and
> > > Calcite project.
> > >
> > > There're still many rooms available to work on Storm SQL.
> > >
> > > 1. I haven't had much time to do now, but would like to learn Calcite
> and
> > > try to optimize Storm SQL via STORM-1446
> > > <https://issues.apache.org/jira/browse/STORM-1446>. If someone who
> > > understands Calcite well takes over and contributes this area I'd be
> much
> > > appreciated.
> > >
> > > 2. I think Trident seems not good backend API for Storm SQL for a long
> > > term.
> > > Trident doesn't support typed operation, join and aggregation is
> limited
> > to
> > > micro-batch so result is non deterministic. I'm waiting for
> higher-level
> > > API (STORM-1961 <http://issues.apache.org/jira/browse/STORM-1961> is
> the
> > > start) and plan to move the backend API.
> > >
> > > 3. Storm SQL needs to have more connectors as data sources. It isn't
> hard
> > > to work on, but a bit time-consuming.
> > >
> > > 4. Storm SQL needs to have time to stabilize by experiment loop:
> > > experimenting from early adopters, reporting, and fixing bugs.
> > >
> > > 5. Long term work: we can expand supporting features on Storm SQL. Join
> > > between streaming data source and normal table should help to enrich or
> > > filter data with other external data source. There's also continuous
> > effort
> > > from Calcite community: 'Streaming SQL' led by Julian. etc.
> > >
> > > Looking forward to continue discussion with JW Player folks.
> > >
> > > Thanks,
> > > Jungtaek Lim (HeartSaVioR)
> > >
> > > ps.
> > > I'd really like to make revamped Storm SQL available to the community.
> > > Do we think merge process can pause this effort? I hope not, but I can
> > > follow the community's decision.
> > >
> > > 2016년 9월 30일 (금) 오전 3:58, P. Taylor Goetz <pt...@gmail.com>님이 작성:
> > >
> > > FYI, I’ve merged the SQE code and documentation into the sqe_import
> > branch:
> > >
> > > https://github.com/apache/storm/tree/sqe_import
> > >
> > > Note the build will fail on the last component (storm-sqe) due to the
> > > compilation issues mentioned earlier.
> > >
> > > -Taylor
> > >
> > >
> > > On Sep 29, 2016, at 11:13 AM, Bobby Evans <evans@yahoo-inc.com.INVALID
> >
> > > wrote:
> > >
> > > Agreed, or if we can find a way to not break compatibility (like
> perhaps
> > > having a common base class for most of the logic and one subclass that
> > uses
> > > a string while another that uses the byte array).   - Bobby
> > >
> > >    On Thursday, September 29, 2016 10:05 AM, Jungtaek Lim <
> > > kabhwan@gmail.com>
> > > wrote:
> > >
> > >
> > > The change on storm-redis seems to require breaking backward
> > compatibility,
> > > so I would love to see another pull request to integrate it via general
> > > review process.
> > > If it can be integrated to SQE without changing storm-redis, that would
> > be
> > > nice.
> > >
> > > Does it make sense?
> > >
> > > - Jungtaek Lim (HeartSaVioR)
> > >
> > > 2016년 9월 29일 (목) 오전 6:08, P. Taylor Goetz <pt...@gmail.com>님이 작성:
> > >
> > > The changes are available in GitHub, I just overlooked it. And they’re
> > > actually neatly contained in two commits:
> > >
> > > The changes to storm-kafka can be found here:
> > >
> > >
> > >
> https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572
> > > 836104755a
> > >
> > >
> > > The changes to storm-redis are here:
> > >
> > >
> > >
> https://github.com/jwplayer/storm/commit/30d000d3ff673efa8b927d23e554a7
> > > 05fb2928b8
> > > <
> > >
> https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572
> > > 836104755a
> > > >
> > >
> > >
> > > The changes to storm-kafka are straightforward and implemented in such
> a
> > > way that they would be useful for use cases outside of SQE. As the
> commit
> > > message states, it adds a new kafka deserialization scheme (FullScheme)
> > > that includes the key, value, topic, partition and offset when reading
> > from
> > > kafka, which is a feature I can see as being valuable for some use
> > cases. I
> > > would be +1 for merging that code.
> > >
> > > The changes to storm-redis are a little different, as Morrigan pointed
> > > out, because it only addresses the Trident API, but IMHO it looks like
> a
> > > good direction.
> > >
> > > @HeartSavior — Would you have some time to take a look at the
> storm-redis
> > > changes and provide your opinion, since you’re one of the original
> > authors
> > > of that code?
> > >
> > > -Taylor
> > >
> > >
> > > On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> > >
> > > Great!
> > >
> > > For storm-redis, we might need to modify key/value mapper to use byte[]
> > > rather than String.
> > > When I co-authored storm-redis, I forgot considering binary format of
> > > serde. If we want to address that part, we can also address it.
> > >
> > > 2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이 작성:
> > >
> > > Sure, when I can. Storm-kafka should be pretty easy. The storm-redis
> one
> > > will require more work to make it more complete.
> > >
> > > On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
> > > wrote:
> > >
> > > Thanks for the explanation Morrigan!
> > >
> > > Would you be willing to provide a pull request or patch so the
> community
> > > can review?
> > >
> > > It sounds like at least some of the changes you mention could be useful
> > >
> > > to
> > >
> > > the broader community (beyond the SQL effort).
> > >
> > > Thanks again,
> > >
> > > -Taylor
> > >
> > > On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
> > >
> > > wrote:
> > >
> > >
> > > storm-kafka - This is needed because storm-kafka does not provide a
> > >
> > > scheme
> > >
> > > class that gives you the key, value (payload), partition, and offset.
> > > MessageMetadataScheme.java comes comes closest, but is missing the key.
> > > This was a pretty simple change on my part.
> > >
> > > storm-redis - This is needed for proper support of Redis hashes. The
> > > existing storm-redis uses a static string (additionalKey in
> > > the RedisDataTypeDescription class) for the field name in hash types. I
> > > updated it to use a configurable KeyFactory for both the hash name and
> > >
> > > the
> > >
> > > field name. We also added some limited support for set types. This is
> > > admittedly the messiest between the two jars since we only cared about
> > >
> > > the
> > >
> > > trident states and would require a lot more changes to get storm-redis
> > >
> > > more
> > >
> > > "feature complete" overall.
> > >
> > >
> > >
> > >
> > > On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
> > >
> > > wrote:
> > >
> > >
> > > Sounds good. I’ll find out if it builds against 2.x. If so I’ll go
> > >
> > > that
> > >
> > > direction. Otherwise I’ll come back with my findings and we can
> > >
> > > discuss
> > >
> > > it
> > >
> > > further.
> > >
> > > I notice there are jars in the git repo that we obviously can’t
> > >
> > > import.
> > >
> > > They look like they might be custom JWPlayer builds of storm-kafka and
> > > storm-redis.
> > >
> > > Morrigan — Do you know if there is any differences there that required
> > > custom builds of those components?
> > >
> > > -Taylor
> > >
> > > On Sep 26, 2016, at 3:31 PM, Bobby Evans
> > >
> > > <evans@yahoo-inc.com.INVALID
> > >
> > >
> > > wrote:
> > >
> > > Does it compile against 2.X?  If so I would prefer to have it go
> > >
> > > there,
> > >
> > > and then possibly 1.x if people what it there too. - Bobby
> > >
> > >
> > >   On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
> > >
> > > ptgoetz@gmail.com> wrote:
> > >
> > >
> > > The IP Clearance vote has passed and we are now able to import the
> > >
> > > SQE
> > >
> > > code.
> > >
> > >
> > > The question now is to where do we want to import the code?
> > >
> > > My inclination is to import it to “external” in the 1.x branch. It
> > >
> > > can
> > >
> > > be ported to other branches as necessary/if desired. An alternative
> > >
> > > would
> > >
> > > be to treat it as a feature branch, but I’d rather take the former
> > >
> > > approach.
> > >
> > >
> > > Thought/opinions?
> > >
> > > -Taylor
> > >
> > > On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
> > >
> > > wrote:
> > >
> > >
> > > My apologies. I meant to cc dev@, but didn't. Will forward in a
> > >
> > > bit...
> > >
> > >
> > > The vote (lazy consensus) is underway on general@incubator, and
> > >
> > > will
> > >
> > > close in less than 72 hours. After that the code can be merged.
> > >
> > >
> > > -Taylor
> > >
> > > On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>
> > >
> > > wrote:
> > >
> > >
> > > Hi dev,
> > >
> > > While code contribution of SQE is in progress, I would like to
> > >
> > > continue
> > >
> > > discussion how to merge SQE and Storm SQL.
> > >
> > > I did an analysis of merging SQE and Storm SQL in both side,
> > >
> > > integrating
> > >
> > > SQE to Storm SQL and vice versa.
> > > https://cwiki.apache.org/confluence/display/STORM/
> > >
> > > Technical+analysis+of+merging+SQE+and+Storm+SQL
> > >
> > >
> > > As I commented to that page, since I'm working on Storm SQL for
> > >
> > > some
> > >
> > > weeks
> > >
> > > I can be (heavily) biased. So I'd really appreciated if someone can
> > >
> > > do
> > >
> > > another analysis.
> > >
> > > Please feel free to share your thought about this analysis, or
> > >
> > > another
> > >
> > > proposal if you have any, or other things about the merging.
> > >
> > > Thanks,
> > > Jungtaek Lim (HeartSaVioR)
> > >
> > >
> > >
> > > --
> > > Morrigan Jones
> > > Principal Engineer
> > > *JW*PLAYER  |  Your Way to Play
> > > morrigan@jwplayer.com  |  jwplayer.com
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Morrigan Jones
> > > Principal Engineer
> > > *JW*PLAYER  |  Your Way to Play
> > > morrigan@jwplayer.com  |  jwplayer.com
> > >
> >
> >
> >
> > --
> > Morrigan Jones
> > Principal Engineer
> > *JW*PLAYER  |  Your Way to Play
> > morrigan@jwplayer.com  |  jwplayer.com
> >
> >
>

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Jungtaek Lim <ka...@gmail.com>.
One more, Flink seems to made lots of progress on SQL (still in
experimenting), but they set more limited features to streaming SQL.
https://ci.apache.org/projects/flink/flink-docs-master/dev/table_api.html#limitations-1
SELECT, FROM, WHERE, and UNION clauses are all that Flink supports for
streaming SQL. I guess they seem to wait for "Streaming SQL" specification
to be stable, since they don't have micro-batch so window in "Streaming
SQL" is mandatory to implement aggregation, join, etc.

- Jungtaek Lim (HeartSaVioR)

2016년 10월 11일 (화) 오전 6:36, Jungtaek Lim <ka...@gmail.com>님이 작성:

> Hi Morrigan,
>
> since we're making SQL feature, we need to respect SQL standard, behavior,
> and common sense, instead of changing its behavior to meet Storm's
> streaming fashion. Some limitations are actually unavoidable.
> And I didn't deeply think about "Streaming SQL" yet, since there're still
> lots of works (small and large) left for micro-batch query. So I couldn't
> answer some questions clearly about "streaming query" and I might be wrong.
> "Streaming SQL" is even still in progress in Calcite project, and may be
> subject to change.
>
> Answered inline:
>
>
>
>
>
>
> *1) From reading the Calcite documentation, a monotonic expression
> isrequired in GROUP BYs of streaming queries. The most common use case
> isprobably some sort of date/time/timestamp field. This seems
> extremelylimiting to me, especially when streaming off of Kafka where data
> may notbe strictly ordered. How is this handled? Is late data not allowed
> at all?(quasi-monotonicity is allowed, but still doesn't solve late data
> issue)*
>
> > First of all, I didn't deeply think about "streaming query" yet, since
> there're some works (small and large) left for micro-batch query.
> In streaming query, aggregation should be done within window. As you said
> there're late data and out of order issue, and watermark is for trying
> best-effort to handle this. It may not solve late data issue completely,
> and users need to know about that. Beam and Flink strongly publicize about
> this, since it's their most strong point.
>
> Please read these links once if you haven't read these. I admit that I
> couldn't understand these clearly so willing to revisit if I need to.
> https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
> https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
>
>
> *2) Monotonicty becomes even more complicated when your dataset
> ispartitioned, again like Kafka. How is this handled?*
>
> > I think same rules are applied to partitioned dataset. It just tries its
> best to wait for late tuples, and drop late tuples if they're not arriving
> before window is discarded.
>
>
> *3) Does Calcite have any understanding of partitioned/sharded streams
> atall?*
>
> > I'm not sure about this, but I guess we can include metadata to Storm's
> own logical relation if needed, and can utilize while constructing physical
> plan. We didn't reach to have Storm's own logical relation, and it should
> take time since it requires deep understanding of Calcite concepts. If
> someone can help on this area that should be great.
>
>
>
> *4) You say here that "aggregation is limited to micro-batch so result
> isnon deterministic." What do you mean here? Trident supports
> statefulaggregations, not just within the micro-batch.*
>
> > It requires every aggregate functions to store its column to State
> independently (please correct me if I'm wrong) whereas we're handling 'row'
> and expect to insert all columns in row together. Yes it's more limited but
> for me it's closer to origin SQL's behavior. I would like to see more
> opinions around this.
> Btw, when we change Storm SQL to use windowing rather than micro-batch,
> aggregation should be done within window, which is more clear to the users,
> and more powerful (for example, 5 min average, 1 min sum, etc.) instead of
> aggregating all time window.
>
>
> *5) In your technical analysis, you say that "flexible data sources" are
> apro for Storm SQL, as opposed to SQE. What do you mean?*
>
> > You can see that all the things for datasource are abstracted in
> storm-sql-core, and datasource specific configurations are all handled from
> its own. storm-sql-core doesn't even know the existence of storm-sql-kafka,
> and also upcoming datasources, and in fact it should.
>
>
> *The work on another high level API is also interesting. Is this meant
> tocompletely replace Trident?*
>
> > I can't clearly say about that, but IMO, when higher-level API can
> ensure exactly-once eventually we don't have to rely on Trident.
> Micro-batch semantics can be replaced with process time or ingest time
> windowing.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
>
> 2016년 10월 11일 (화) 오전 5:08, Morrigan Jones <mo...@jwplayer.com>님이 작성:
>
> Thank you for this update Jungtaek. I've been meaning to do my own followup
> to your technical analysis, but unfortunately had some Kafka work eat up a
> lot of my time. I do have some immediate questions/thoughts, though:
>
> 1) From reading the Calcite documentation, a monotonic expression is
> required in GROUP BYs of streaming queries. The most common use case is
> probably some sort of date/time/timestamp field. This seems extremely
> limiting to me, especially when streaming off of Kafka where data may not
> be strictly ordered. How is this handled? Is late data not allowed at all?
> (quasi-monotonicity is allowed, but still doesn't solve late data issue)
> 2) Monotonicty becomes even more complicated when your dataset is
> partitioned, again like Kafka. How is this handled?
> 3) Does Calcite have any understanding of partitioned/sharded streams at
> all?
> 4) You say here that "aggregation is limited to micro-batch so result is
> non deterministic." What do you mean here? Trident supports stateful
> aggregations, not just within the micro-batch.
> 5) In your technical analysis, you say that "flexible data sources" are a
> pro for Storm SQL, as opposed to SQE. What do you mean?
>
> The work on another high level API is also interesting. Is this meant to
> completely replace Trident?
>
>
> On Fri, Oct 7, 2016 at 1:25 AM, Jungtaek Lim <ka...@gmail.com> wrote:
>
> > For preventing this discussion to be staled, I'd like to put a note how
> > Storm SQL is going now.
> > (including the change after my tech. analysis)
> >
> > 1. There're 6 pull requests opened regarding Storm SQL.
> > https://github.com/apache/storm/pulls/HeartSaVioR
> >
> > 2. When STORM-2125 <https://github.com/apache/storm/pull/1714> is merged
> > to
> > master, Storm SQL can handle most of things which Calcite publishes to
> SQL
> > reference page. I wrote Storm SQL's own reference page
> > <
> https://github.com/HeartSaVioR/storm/blob/43478edbd7369047ccc417d6b81ab6
> > a314910437/docs/storm-sql-reference.md>
> > and submitted it to another pull request (one of six pull requests)
> >
> > STORM-2125 is having +1 but waiting for Calcite 1.10.0 to be released in
> > order to apply bugfix. (Yes, someone can state that it's a bit tightly
> > coupled.) I'm expecting they can cut 1.10.0 RC1 in next week. If it
> doesn't
> > happen in next week, we can just stick with Calcite 1.9.0 and immediately
> > upgrade 1.10.0 when it's available.
> >
> > 3. During adding tests I found some of bugs on Calcite side (left notes
> for
> > each test and also reference doc), which we can contribute back to
> Calcite
> > community. I think this would be a positive feedback loop between Storm
> and
> > Calcite project.
> >
> > There're still many rooms available to work on Storm SQL.
> >
> > 1. I haven't had much time to do now, but would like to learn Calcite and
> > try to optimize Storm SQL via STORM-1446
> > <https://issues.apache.org/jira/browse/STORM-1446>. If someone who
> > understands Calcite well takes over and contributes this area I'd be much
> > appreciated.
> >
> > 2. I think Trident seems not good backend API for Storm SQL for a long
> > term.
> > Trident doesn't support typed operation, join and aggregation is limited
> to
> > micro-batch so result is non deterministic. I'm waiting for higher-level
> > API (STORM-1961 <http://issues.apache.org/jira/browse/STORM-1961> is the
> > start) and plan to move the backend API.
> >
> > 3. Storm SQL needs to have more connectors as data sources. It isn't hard
> > to work on, but a bit time-consuming.
> >
> > 4. Storm SQL needs to have time to stabilize by experiment loop:
> > experimenting from early adopters, reporting, and fixing bugs.
> >
> > 5. Long term work: we can expand supporting features on Storm SQL. Join
> > between streaming data source and normal table should help to enrich or
> > filter data with other external data source. There's also continuous
> effort
> > from Calcite community: 'Streaming SQL' led by Julian. etc.
> >
> > Looking forward to continue discussion with JW Player folks.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > ps.
> > I'd really like to make revamped Storm SQL available to the community.
> > Do we think merge process can pause this effort? I hope not, but I can
> > follow the community's decision.
> >
> > 2016년 9월 30일 (금) 오전 3:58, P. Taylor Goetz <pt...@gmail.com>님이 작성:
> >
> > FYI, I’ve merged the SQE code and documentation into the sqe_import
> branch:
> >
> > https://github.com/apache/storm/tree/sqe_import
> >
> > Note the build will fail on the last component (storm-sqe) due to the
> > compilation issues mentioned earlier.
> >
> > -Taylor
> >
> >
> > On Sep 29, 2016, at 11:13 AM, Bobby Evans <ev...@yahoo-inc.com.INVALID>
> > wrote:
> >
> > Agreed, or if we can find a way to not break compatibility (like perhaps
> > having a common base class for most of the logic and one subclass that
> uses
> > a string while another that uses the byte array).   - Bobby
> >
> >    On Thursday, September 29, 2016 10:05 AM, Jungtaek Lim <
> > kabhwan@gmail.com>
> > wrote:
> >
> >
> > The change on storm-redis seems to require breaking backward
> compatibility,
> > so I would love to see another pull request to integrate it via general
> > review process.
> > If it can be integrated to SQE without changing storm-redis, that would
> be
> > nice.
> >
> > Does it make sense?
> >
> > - Jungtaek Lim (HeartSaVioR)
> >
> > 2016년 9월 29일 (목) 오전 6:08, P. Taylor Goetz <pt...@gmail.com>님이 작성:
> >
> > The changes are available in GitHub, I just overlooked it. And they’re
> > actually neatly contained in two commits:
> >
> > The changes to storm-kafka can be found here:
> >
> >
> > https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572
> > 836104755a
> >
> >
> > The changes to storm-redis are here:
> >
> >
> > https://github.com/jwplayer/storm/commit/30d000d3ff673efa8b927d23e554a7
> > 05fb2928b8
> > <
> > https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572
> > 836104755a
> > >
> >
> >
> > The changes to storm-kafka are straightforward and implemented in such a
> > way that they would be useful for use cases outside of SQE. As the commit
> > message states, it adds a new kafka deserialization scheme (FullScheme)
> > that includes the key, value, topic, partition and offset when reading
> from
> > kafka, which is a feature I can see as being valuable for some use
> cases. I
> > would be +1 for merging that code.
> >
> > The changes to storm-redis are a little different, as Morrigan pointed
> > out, because it only addresses the Trident API, but IMHO it looks like a
> > good direction.
> >
> > @HeartSavior — Would you have some time to take a look at the storm-redis
> > changes and provide your opinion, since you’re one of the original
> authors
> > of that code?
> >
> > -Taylor
> >
> >
> > On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> >
> > Great!
> >
> > For storm-redis, we might need to modify key/value mapper to use byte[]
> > rather than String.
> > When I co-authored storm-redis, I forgot considering binary format of
> > serde. If we want to address that part, we can also address it.
> >
> > 2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이 작성:
> >
> > Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one
> > will require more work to make it more complete.
> >
> > On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
> > wrote:
> >
> > Thanks for the explanation Morrigan!
> >
> > Would you be willing to provide a pull request or patch so the community
> > can review?
> >
> > It sounds like at least some of the changes you mention could be useful
> >
> > to
> >
> > the broader community (beyond the SQL effort).
> >
> > Thanks again,
> >
> > -Taylor
> >
> > On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
> >
> > wrote:
> >
> >
> > storm-kafka - This is needed because storm-kafka does not provide a
> >
> > scheme
> >
> > class that gives you the key, value (payload), partition, and offset.
> > MessageMetadataScheme.java comes comes closest, but is missing the key.
> > This was a pretty simple change on my part.
> >
> > storm-redis - This is needed for proper support of Redis hashes. The
> > existing storm-redis uses a static string (additionalKey in
> > the RedisDataTypeDescription class) for the field name in hash types. I
> > updated it to use a configurable KeyFactory for both the hash name and
> >
> > the
> >
> > field name. We also added some limited support for set types. This is
> > admittedly the messiest between the two jars since we only cared about
> >
> > the
> >
> > trident states and would require a lot more changes to get storm-redis
> >
> > more
> >
> > "feature complete" overall.
> >
> >
> >
> >
> > On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
> >
> > wrote:
> >
> >
> > Sounds good. I’ll find out if it builds against 2.x. If so I’ll go
> >
> > that
> >
> > direction. Otherwise I’ll come back with my findings and we can
> >
> > discuss
> >
> > it
> >
> > further.
> >
> > I notice there are jars in the git repo that we obviously can’t
> >
> > import.
> >
> > They look like they might be custom JWPlayer builds of storm-kafka and
> > storm-redis.
> >
> > Morrigan — Do you know if there is any differences there that required
> > custom builds of those components?
> >
> > -Taylor
> >
> > On Sep 26, 2016, at 3:31 PM, Bobby Evans
> >
> > <evans@yahoo-inc.com.INVALID
> >
> >
> > wrote:
> >
> > Does it compile against 2.X?  If so I would prefer to have it go
> >
> > there,
> >
> > and then possibly 1.x if people what it there too. - Bobby
> >
> >
> >   On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
> >
> > ptgoetz@gmail.com> wrote:
> >
> >
> > The IP Clearance vote has passed and we are now able to import the
> >
> > SQE
> >
> > code.
> >
> >
> > The question now is to where do we want to import the code?
> >
> > My inclination is to import it to “external” in the 1.x branch. It
> >
> > can
> >
> > be ported to other branches as necessary/if desired. An alternative
> >
> > would
> >
> > be to treat it as a feature branch, but I’d rather take the former
> >
> > approach.
> >
> >
> > Thought/opinions?
> >
> > -Taylor
> >
> > On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
> >
> > wrote:
> >
> >
> > My apologies. I meant to cc dev@, but didn't. Will forward in a
> >
> > bit...
> >
> >
> > The vote (lazy consensus) is underway on general@incubator, and
> >
> > will
> >
> > close in less than 72 hours. After that the code can be merged.
> >
> >
> > -Taylor
> >
> > On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>
> >
> > wrote:
> >
> >
> > Hi dev,
> >
> > While code contribution of SQE is in progress, I would like to
> >
> > continue
> >
> > discussion how to merge SQE and Storm SQL.
> >
> > I did an analysis of merging SQE and Storm SQL in both side,
> >
> > integrating
> >
> > SQE to Storm SQL and vice versa.
> > https://cwiki.apache.org/confluence/display/STORM/
> >
> > Technical+analysis+of+merging+SQE+and+Storm+SQL
> >
> >
> > As I commented to that page, since I'm working on Storm SQL for
> >
> > some
> >
> > weeks
> >
> > I can be (heavily) biased. So I'd really appreciated if someone can
> >
> > do
> >
> > another analysis.
> >
> > Please feel free to share your thought about this analysis, or
> >
> > another
> >
> > proposal if you have any, or other things about the merging.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> >
> >
> > --
> > Morrigan Jones
> > Principal Engineer
> > *JW*PLAYER  |  Your Way to Play
> > morrigan@jwplayer.com  |  jwplayer.com
> >
> >
> >
> >
> >
> > --
> > Morrigan Jones
> > Principal Engineer
> > *JW*PLAYER  |  Your Way to Play
> > morrigan@jwplayer.com  |  jwplayer.com
> >
>
>
>
> --
> Morrigan Jones
> Principal Engineer
> *JW*PLAYER  |  Your Way to Play
> morrigan@jwplayer.com  |  jwplayer.com
>
>

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Jungtaek Lim <ka...@gmail.com>.
Hi Morrigan,

since we're making SQL feature, we need to respect SQL standard, behavior,
and common sense, instead of changing its behavior to meet Storm's
streaming fashion. Some limitations are actually unavoidable.
And I didn't deeply think about "Streaming SQL" yet, since there're still
lots of works (small and large) left for micro-batch query. So I couldn't
answer some questions clearly about "streaming query" and I might be wrong.
"Streaming SQL" is even still in progress in Calcite project, and may be
subject to change.

Answered inline:






*1) From reading the Calcite documentation, a monotonic expression
isrequired in GROUP BYs of streaming queries. The most common use case
isprobably some sort of date/time/timestamp field. This seems
extremelylimiting to me, especially when streaming off of Kafka where data
may notbe strictly ordered. How is this handled? Is late data not allowed
at all?(quasi-monotonicity is allowed, but still doesn't solve late data
issue)*

> First of all, I didn't deeply think about "streaming query" yet, since
there're some works (small and large) left for micro-batch query.
In streaming query, aggregation should be done within window. As you said
there're late data and out of order issue, and watermark is for trying
best-effort to handle this. It may not solve late data issue completely,
and users need to know about that. Beam and Flink strongly publicize about
this, since it's their most strong point.

Please read these links once if you haven't read these. I admit that I
couldn't understand these clearly so willing to revisit if I need to.
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102


*2) Monotonicty becomes even more complicated when your dataset
ispartitioned, again like Kafka. How is this handled?*

> I think same rules are applied to partitioned dataset. It just tries its
best to wait for late tuples, and drop late tuples if they're not arriving
before window is discarded.


*3) Does Calcite have any understanding of partitioned/sharded streams
atall?*

> I'm not sure about this, but I guess we can include metadata to Storm's
own logical relation if needed, and can utilize while constructing physical
plan. We didn't reach to have Storm's own logical relation, and it should
take time since it requires deep understanding of Calcite concepts. If
someone can help on this area that should be great.



*4) You say here that "aggregation is limited to micro-batch so result
isnon deterministic." What do you mean here? Trident supports
statefulaggregations, not just within the micro-batch.*

> It requires every aggregate functions to store its column to State
independently (please correct me if I'm wrong) whereas we're handling 'row'
and expect to insert all columns in row together. Yes it's more limited but
for me it's closer to origin SQL's behavior. I would like to see more
opinions around this.
Btw, when we change Storm SQL to use windowing rather than micro-batch,
aggregation should be done within window, which is more clear to the users,
and more powerful (for example, 5 min average, 1 min sum, etc.) instead of
aggregating all time window.


*5) In your technical analysis, you say that "flexible data sources" are
apro for Storm SQL, as opposed to SQE. What do you mean?*

> You can see that all the things for datasource are abstracted in
storm-sql-core, and datasource specific configurations are all handled from
its own. storm-sql-core doesn't even know the existence of storm-sql-kafka,
and also upcoming datasources, and in fact it should.


*The work on another high level API is also interesting. Is this meant
tocompletely replace Trident?*

> I can't clearly say about that, but IMO, when higher-level API can ensure
exactly-once eventually we don't have to rely on Trident. Micro-batch
semantics can be replaced with process time or ingest time windowing.

Thanks,
Jungtaek Lim (HeartSaVioR)


2016년 10월 11일 (화) 오전 5:08, Morrigan Jones <mo...@jwplayer.com>님이 작성:

> Thank you for this update Jungtaek. I've been meaning to do my own followup
> to your technical analysis, but unfortunately had some Kafka work eat up a
> lot of my time. I do have some immediate questions/thoughts, though:
>
> 1) From reading the Calcite documentation, a monotonic expression is
> required in GROUP BYs of streaming queries. The most common use case is
> probably some sort of date/time/timestamp field. This seems extremely
> limiting to me, especially when streaming off of Kafka where data may not
> be strictly ordered. How is this handled? Is late data not allowed at all?
> (quasi-monotonicity is allowed, but still doesn't solve late data issue)
> 2) Monotonicty becomes even more complicated when your dataset is
> partitioned, again like Kafka. How is this handled?
> 3) Does Calcite have any understanding of partitioned/sharded streams at
> all?
> 4) You say here that "aggregation is limited to micro-batch so result is
> non deterministic." What do you mean here? Trident supports stateful
> aggregations, not just within the micro-batch.
> 5) In your technical analysis, you say that "flexible data sources" are a
> pro for Storm SQL, as opposed to SQE. What do you mean?
>
> The work on another high level API is also interesting. Is this meant to
> completely replace Trident?
>
>
> On Fri, Oct 7, 2016 at 1:25 AM, Jungtaek Lim <ka...@gmail.com> wrote:
>
> > For preventing this discussion to be staled, I'd like to put a note how
> > Storm SQL is going now.
> > (including the change after my tech. analysis)
> >
> > 1. There're 6 pull requests opened regarding Storm SQL.
> > https://github.com/apache/storm/pulls/HeartSaVioR
> >
> > 2. When STORM-2125 <https://github.com/apache/storm/pull/1714> is merged
> > to
> > master, Storm SQL can handle most of things which Calcite publishes to
> SQL
> > reference page. I wrote Storm SQL's own reference page
> > <
> https://github.com/HeartSaVioR/storm/blob/43478edbd7369047ccc417d6b81ab6
> > a314910437/docs/storm-sql-reference.md>
> > and submitted it to another pull request (one of six pull requests)
> >
> > STORM-2125 is having +1 but waiting for Calcite 1.10.0 to be released in
> > order to apply bugfix. (Yes, someone can state that it's a bit tightly
> > coupled.) I'm expecting they can cut 1.10.0 RC1 in next week. If it
> doesn't
> > happen in next week, we can just stick with Calcite 1.9.0 and immediately
> > upgrade 1.10.0 when it's available.
> >
> > 3. During adding tests I found some of bugs on Calcite side (left notes
> for
> > each test and also reference doc), which we can contribute back to
> Calcite
> > community. I think this would be a positive feedback loop between Storm
> and
> > Calcite project.
> >
> > There're still many rooms available to work on Storm SQL.
> >
> > 1. I haven't had much time to do now, but would like to learn Calcite and
> > try to optimize Storm SQL via STORM-1446
> > <https://issues.apache.org/jira/browse/STORM-1446>. If someone who
> > understands Calcite well takes over and contributes this area I'd be much
> > appreciated.
> >
> > 2. I think Trident seems not good backend API for Storm SQL for a long
> > term.
> > Trident doesn't support typed operation, join and aggregation is limited
> to
> > micro-batch so result is non deterministic. I'm waiting for higher-level
> > API (STORM-1961 <http://issues.apache.org/jira/browse/STORM-1961> is the
> > start) and plan to move the backend API.
> >
> > 3. Storm SQL needs to have more connectors as data sources. It isn't hard
> > to work on, but a bit time-consuming.
> >
> > 4. Storm SQL needs to have time to stabilize by experiment loop:
> > experimenting from early adopters, reporting, and fixing bugs.
> >
> > 5. Long term work: we can expand supporting features on Storm SQL. Join
> > between streaming data source and normal table should help to enrich or
> > filter data with other external data source. There's also continuous
> effort
> > from Calcite community: 'Streaming SQL' led by Julian. etc.
> >
> > Looking forward to continue discussion with JW Player folks.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > ps.
> > I'd really like to make revamped Storm SQL available to the community.
> > Do we think merge process can pause this effort? I hope not, but I can
> > follow the community's decision.
> >
> > 2016년 9월 30일 (금) 오전 3:58, P. Taylor Goetz <pt...@gmail.com>님이 작성:
> >
> > FYI, I’ve merged the SQE code and documentation into the sqe_import
> branch:
> >
> > https://github.com/apache/storm/tree/sqe_import
> >
> > Note the build will fail on the last component (storm-sqe) due to the
> > compilation issues mentioned earlier.
> >
> > -Taylor
> >
> >
> > On Sep 29, 2016, at 11:13 AM, Bobby Evans <ev...@yahoo-inc.com.INVALID>
> > wrote:
> >
> > Agreed, or if we can find a way to not break compatibility (like perhaps
> > having a common base class for most of the logic and one subclass that
> uses
> > a string while another that uses the byte array).   - Bobby
> >
> >    On Thursday, September 29, 2016 10:05 AM, Jungtaek Lim <
> > kabhwan@gmail.com>
> > wrote:
> >
> >
> > The change on storm-redis seems to require breaking backward
> compatibility,
> > so I would love to see another pull request to integrate it via general
> > review process.
> > If it can be integrated to SQE without changing storm-redis, that would
> be
> > nice.
> >
> > Does it make sense?
> >
> > - Jungtaek Lim (HeartSaVioR)
> >
> > 2016년 9월 29일 (목) 오전 6:08, P. Taylor Goetz <pt...@gmail.com>님이 작성:
> >
> > The changes are available in GitHub, I just overlooked it. And they’re
> > actually neatly contained in two commits:
> >
> > The changes to storm-kafka can be found here:
> >
> >
> > https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572
> > 836104755a
> >
> >
> > The changes to storm-redis are here:
> >
> >
> > https://github.com/jwplayer/storm/commit/30d000d3ff673efa8b927d23e554a7
> > 05fb2928b8
> > <
> > https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572
> > 836104755a
> > >
> >
> >
> > The changes to storm-kafka are straightforward and implemented in such a
> > way that they would be useful for use cases outside of SQE. As the commit
> > message states, it adds a new kafka deserialization scheme (FullScheme)
> > that includes the key, value, topic, partition and offset when reading
> from
> > kafka, which is a feature I can see as being valuable for some use
> cases. I
> > would be +1 for merging that code.
> >
> > The changes to storm-redis are a little different, as Morrigan pointed
> > out, because it only addresses the Trident API, but IMHO it looks like a
> > good direction.
> >
> > @HeartSavior — Would you have some time to take a look at the storm-redis
> > changes and provide your opinion, since you’re one of the original
> authors
> > of that code?
> >
> > -Taylor
> >
> >
> > On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> >
> > Great!
> >
> > For storm-redis, we might need to modify key/value mapper to use byte[]
> > rather than String.
> > When I co-authored storm-redis, I forgot considering binary format of
> > serde. If we want to address that part, we can also address it.
> >
> > 2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이 작성:
> >
> > Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one
> > will require more work to make it more complete.
> >
> > On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
> > wrote:
> >
> > Thanks for the explanation Morrigan!
> >
> > Would you be willing to provide a pull request or patch so the community
> > can review?
> >
> > It sounds like at least some of the changes you mention could be useful
> >
> > to
> >
> > the broader community (beyond the SQL effort).
> >
> > Thanks again,
> >
> > -Taylor
> >
> > On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
> >
> > wrote:
> >
> >
> > storm-kafka - This is needed because storm-kafka does not provide a
> >
> > scheme
> >
> > class that gives you the key, value (payload), partition, and offset.
> > MessageMetadataScheme.java comes comes closest, but is missing the key.
> > This was a pretty simple change on my part.
> >
> > storm-redis - This is needed for proper support of Redis hashes. The
> > existing storm-redis uses a static string (additionalKey in
> > the RedisDataTypeDescription class) for the field name in hash types. I
> > updated it to use a configurable KeyFactory for both the hash name and
> >
> > the
> >
> > field name. We also added some limited support for set types. This is
> > admittedly the messiest between the two jars since we only cared about
> >
> > the
> >
> > trident states and would require a lot more changes to get storm-redis
> >
> > more
> >
> > "feature complete" overall.
> >
> >
> >
> >
> > On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
> >
> > wrote:
> >
> >
> > Sounds good. I’ll find out if it builds against 2.x. If so I’ll go
> >
> > that
> >
> > direction. Otherwise I’ll come back with my findings and we can
> >
> > discuss
> >
> > it
> >
> > further.
> >
> > I notice there are jars in the git repo that we obviously can’t
> >
> > import.
> >
> > They look like they might be custom JWPlayer builds of storm-kafka and
> > storm-redis.
> >
> > Morrigan — Do you know if there is any differences there that required
> > custom builds of those components?
> >
> > -Taylor
> >
> > On Sep 26, 2016, at 3:31 PM, Bobby Evans
> >
> > <evans@yahoo-inc.com.INVALID
> >
> >
> > wrote:
> >
> > Does it compile against 2.X?  If so I would prefer to have it go
> >
> > there,
> >
> > and then possibly 1.x if people what it there too. - Bobby
> >
> >
> >   On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
> >
> > ptgoetz@gmail.com> wrote:
> >
> >
> > The IP Clearance vote has passed and we are now able to import the
> >
> > SQE
> >
> > code.
> >
> >
> > The question now is to where do we want to import the code?
> >
> > My inclination is to import it to “external” in the 1.x branch. It
> >
> > can
> >
> > be ported to other branches as necessary/if desired. An alternative
> >
> > would
> >
> > be to treat it as a feature branch, but I’d rather take the former
> >
> > approach.
> >
> >
> > Thought/opinions?
> >
> > -Taylor
> >
> > On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
> >
> > wrote:
> >
> >
> > My apologies. I meant to cc dev@, but didn't. Will forward in a
> >
> > bit...
> >
> >
> > The vote (lazy consensus) is underway on general@incubator, and
> >
> > will
> >
> > close in less than 72 hours. After that the code can be merged.
> >
> >
> > -Taylor
> >
> > On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>
> >
> > wrote:
> >
> >
> > Hi dev,
> >
> > While code contribution of SQE is in progress, I would like to
> >
> > continue
> >
> > discussion how to merge SQE and Storm SQL.
> >
> > I did an analysis of merging SQE and Storm SQL in both side,
> >
> > integrating
> >
> > SQE to Storm SQL and vice versa.
> > https://cwiki.apache.org/confluence/display/STORM/
> >
> > Technical+analysis+of+merging+SQE+and+Storm+SQL
> >
> >
> > As I commented to that page, since I'm working on Storm SQL for
> >
> > some
> >
> > weeks
> >
> > I can be (heavily) biased. So I'd really appreciated if someone can
> >
> > do
> >
> > another analysis.
> >
> > Please feel free to share your thought about this analysis, or
> >
> > another
> >
> > proposal if you have any, or other things about the merging.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> >
> >
> > --
> > Morrigan Jones
> > Principal Engineer
> > *JW*PLAYER  |  Your Way to Play
> > morrigan@jwplayer.com  |  jwplayer.com
> >
> >
> >
> >
> >
> > --
> > Morrigan Jones
> > Principal Engineer
> > *JW*PLAYER  |  Your Way to Play
> > morrigan@jwplayer.com  |  jwplayer.com
> >
>
>
>
> --
> Morrigan Jones
> Principal Engineer
> *JW*PLAYER  |  Your Way to Play
> morrigan@jwplayer.com  |  jwplayer.com
>

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Morrigan Jones <mo...@jwplayer.com>.
Thank you for this update Jungtaek. I've been meaning to do my own followup
to your technical analysis, but unfortunately had some Kafka work eat up a
lot of my time. I do have some immediate questions/thoughts, though:

1) From reading the Calcite documentation, a monotonic expression is
required in GROUP BYs of streaming queries. The most common use case is
probably some sort of date/time/timestamp field. This seems extremely
limiting to me, especially when streaming off of Kafka where data may not
be strictly ordered. How is this handled? Is late data not allowed at all?
(quasi-monotonicity is allowed, but still doesn't solve late data issue)
2) Monotonicty becomes even more complicated when your dataset is
partitioned, again like Kafka. How is this handled?
3) Does Calcite have any understanding of partitioned/sharded streams at
all?
4) You say here that "aggregation is limited to micro-batch so result is
non deterministic." What do you mean here? Trident supports stateful
aggregations, not just within the micro-batch.
5) In your technical analysis, you say that "flexible data sources" are a
pro for Storm SQL, as opposed to SQE. What do you mean?

The work on another high level API is also interesting. Is this meant to
completely replace Trident?


On Fri, Oct 7, 2016 at 1:25 AM, Jungtaek Lim <ka...@gmail.com> wrote:

> For preventing this discussion to be staled, I'd like to put a note how
> Storm SQL is going now.
> (including the change after my tech. analysis)
>
> 1. There're 6 pull requests opened regarding Storm SQL.
> https://github.com/apache/storm/pulls/HeartSaVioR
>
> 2. When STORM-2125 <https://github.com/apache/storm/pull/1714> is merged
> to
> master, Storm SQL can handle most of things which Calcite publishes to SQL
> reference page. I wrote Storm SQL's own reference page
> <https://github.com/HeartSaVioR/storm/blob/43478edbd7369047ccc417d6b81ab6
> a314910437/docs/storm-sql-reference.md>
> and submitted it to another pull request (one of six pull requests)
>
> STORM-2125 is having +1 but waiting for Calcite 1.10.0 to be released in
> order to apply bugfix. (Yes, someone can state that it's a bit tightly
> coupled.) I'm expecting they can cut 1.10.0 RC1 in next week. If it doesn't
> happen in next week, we can just stick with Calcite 1.9.0 and immediately
> upgrade 1.10.0 when it's available.
>
> 3. During adding tests I found some of bugs on Calcite side (left notes for
> each test and also reference doc), which we can contribute back to Calcite
> community. I think this would be a positive feedback loop between Storm and
> Calcite project.
>
> There're still many rooms available to work on Storm SQL.
>
> 1. I haven't had much time to do now, but would like to learn Calcite and
> try to optimize Storm SQL via STORM-1446
> <https://issues.apache.org/jira/browse/STORM-1446>. If someone who
> understands Calcite well takes over and contributes this area I'd be much
> appreciated.
>
> 2. I think Trident seems not good backend API for Storm SQL for a long
> term.
> Trident doesn't support typed operation, join and aggregation is limited to
> micro-batch so result is non deterministic. I'm waiting for higher-level
> API (STORM-1961 <http://issues.apache.org/jira/browse/STORM-1961> is the
> start) and plan to move the backend API.
>
> 3. Storm SQL needs to have more connectors as data sources. It isn't hard
> to work on, but a bit time-consuming.
>
> 4. Storm SQL needs to have time to stabilize by experiment loop:
> experimenting from early adopters, reporting, and fixing bugs.
>
> 5. Long term work: we can expand supporting features on Storm SQL. Join
> between streaming data source and normal table should help to enrich or
> filter data with other external data source. There's also continuous effort
> from Calcite community: 'Streaming SQL' led by Julian. etc.
>
> Looking forward to continue discussion with JW Player folks.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> ps.
> I'd really like to make revamped Storm SQL available to the community.
> Do we think merge process can pause this effort? I hope not, but I can
> follow the community's decision.
>
> 2016년 9월 30일 (금) 오전 3:58, P. Taylor Goetz <pt...@gmail.com>님이 작성:
>
> FYI, I’ve merged the SQE code and documentation into the sqe_import branch:
>
> https://github.com/apache/storm/tree/sqe_import
>
> Note the build will fail on the last component (storm-sqe) due to the
> compilation issues mentioned earlier.
>
> -Taylor
>
>
> On Sep 29, 2016, at 11:13 AM, Bobby Evans <ev...@yahoo-inc.com.INVALID>
> wrote:
>
> Agreed, or if we can find a way to not break compatibility (like perhaps
> having a common base class for most of the logic and one subclass that uses
> a string while another that uses the byte array).   - Bobby
>
>    On Thursday, September 29, 2016 10:05 AM, Jungtaek Lim <
> kabhwan@gmail.com>
> wrote:
>
>
> The change on storm-redis seems to require breaking backward compatibility,
> so I would love to see another pull request to integrate it via general
> review process.
> If it can be integrated to SQE without changing storm-redis, that would be
> nice.
>
> Does it make sense?
>
> - Jungtaek Lim (HeartSaVioR)
>
> 2016년 9월 29일 (목) 오전 6:08, P. Taylor Goetz <pt...@gmail.com>님이 작성:
>
> The changes are available in GitHub, I just overlooked it. And they’re
> actually neatly contained in two commits:
>
> The changes to storm-kafka can be found here:
>
>
> https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572
> 836104755a
>
>
> The changes to storm-redis are here:
>
>
> https://github.com/jwplayer/storm/commit/30d000d3ff673efa8b927d23e554a7
> 05fb2928b8
> <
> https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572
> 836104755a
> >
>
>
> The changes to storm-kafka are straightforward and implemented in such a
> way that they would be useful for use cases outside of SQE. As the commit
> message states, it adds a new kafka deserialization scheme (FullScheme)
> that includes the key, value, topic, partition and offset when reading from
> kafka, which is a feature I can see as being valuable for some use cases. I
> would be +1 for merging that code.
>
> The changes to storm-redis are a little different, as Morrigan pointed
> out, because it only addresses the Trident API, but IMHO it looks like a
> good direction.
>
> @HeartSavior — Would you have some time to take a look at the storm-redis
> changes and provide your opinion, since you’re one of the original authors
> of that code?
>
> -Taylor
>
>
> On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <ka...@gmail.com> wrote:
>
> Great!
>
> For storm-redis, we might need to modify key/value mapper to use byte[]
> rather than String.
> When I co-authored storm-redis, I forgot considering binary format of
> serde. If we want to address that part, we can also address it.
>
> 2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이 작성:
>
> Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one
> will require more work to make it more complete.
>
> On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
> wrote:
>
> Thanks for the explanation Morrigan!
>
> Would you be willing to provide a pull request or patch so the community
> can review?
>
> It sounds like at least some of the changes you mention could be useful
>
> to
>
> the broader community (beyond the SQL effort).
>
> Thanks again,
>
> -Taylor
>
> On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
>
> wrote:
>
>
> storm-kafka - This is needed because storm-kafka does not provide a
>
> scheme
>
> class that gives you the key, value (payload), partition, and offset.
> MessageMetadataScheme.java comes comes closest, but is missing the key.
> This was a pretty simple change on my part.
>
> storm-redis - This is needed for proper support of Redis hashes. The
> existing storm-redis uses a static string (additionalKey in
> the RedisDataTypeDescription class) for the field name in hash types. I
> updated it to use a configurable KeyFactory for both the hash name and
>
> the
>
> field name. We also added some limited support for set types. This is
> admittedly the messiest between the two jars since we only cared about
>
> the
>
> trident states and would require a lot more changes to get storm-redis
>
> more
>
> "feature complete" overall.
>
>
>
>
> On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
>
> wrote:
>
>
> Sounds good. I’ll find out if it builds against 2.x. If so I’ll go
>
> that
>
> direction. Otherwise I’ll come back with my findings and we can
>
> discuss
>
> it
>
> further.
>
> I notice there are jars in the git repo that we obviously can’t
>
> import.
>
> They look like they might be custom JWPlayer builds of storm-kafka and
> storm-redis.
>
> Morrigan — Do you know if there is any differences there that required
> custom builds of those components?
>
> -Taylor
>
> On Sep 26, 2016, at 3:31 PM, Bobby Evans
>
> <evans@yahoo-inc.com.INVALID
>
>
> wrote:
>
> Does it compile against 2.X?  If so I would prefer to have it go
>
> there,
>
> and then possibly 1.x if people what it there too. - Bobby
>
>
>   On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
>
> ptgoetz@gmail.com> wrote:
>
>
> The IP Clearance vote has passed and we are now able to import the
>
> SQE
>
> code.
>
>
> The question now is to where do we want to import the code?
>
> My inclination is to import it to “external” in the 1.x branch. It
>
> can
>
> be ported to other branches as necessary/if desired. An alternative
>
> would
>
> be to treat it as a feature branch, but I’d rather take the former
>
> approach.
>
>
> Thought/opinions?
>
> -Taylor
>
> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
>
> wrote:
>
>
> My apologies. I meant to cc dev@, but didn't. Will forward in a
>
> bit...
>
>
> The vote (lazy consensus) is underway on general@incubator, and
>
> will
>
> close in less than 72 hours. After that the code can be merged.
>
>
> -Taylor
>
> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>
>
> wrote:
>
>
> Hi dev,
>
> While code contribution of SQE is in progress, I would like to
>
> continue
>
> discussion how to merge SQE and Storm SQL.
>
> I did an analysis of merging SQE and Storm SQL in both side,
>
> integrating
>
> SQE to Storm SQL and vice versa.
> https://cwiki.apache.org/confluence/display/STORM/
>
> Technical+analysis+of+merging+SQE+and+Storm+SQL
>
>
> As I commented to that page, since I'm working on Storm SQL for
>
> some
>
> weeks
>
> I can be (heavily) biased. So I'd really appreciated if someone can
>
> do
>
> another analysis.
>
> Please feel free to share your thought about this analysis, or
>
> another
>
> proposal if you have any, or other things about the merging.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
>
>
> --
> Morrigan Jones
> Principal Engineer
> *JW*PLAYER  |  Your Way to Play
> morrigan@jwplayer.com  |  jwplayer.com
>
>
>
>
>
> --
> Morrigan Jones
> Principal Engineer
> *JW*PLAYER  |  Your Way to Play
> morrigan@jwplayer.com  |  jwplayer.com
>



-- 
Morrigan Jones
Principal Engineer
*JW*PLAYER  |  Your Way to Play
morrigan@jwplayer.com  |  jwplayer.com

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Jungtaek Lim <ka...@gmail.com>.
For preventing this discussion to be staled, I'd like to put a note how
Storm SQL is going now.
(including the change after my tech. analysis)

1. There're 6 pull requests opened regarding Storm SQL.
https://github.com/apache/storm/pulls/HeartSaVioR

2. When STORM-2125 <https://github.com/apache/storm/pull/1714> is merged to
master, Storm SQL can handle most of things which Calcite publishes to SQL
reference page. I wrote Storm SQL's own reference page
<https://github.com/HeartSaVioR/storm/blob/43478edbd7369047ccc417d6b81ab6a314910437/docs/storm-sql-reference.md>
and submitted it to another pull request (one of six pull requests)

STORM-2125 is having +1 but waiting for Calcite 1.10.0 to be released in
order to apply bugfix. (Yes, someone can state that it's a bit tightly
coupled.) I'm expecting they can cut 1.10.0 RC1 in next week. If it doesn't
happen in next week, we can just stick with Calcite 1.9.0 and immediately
upgrade 1.10.0 when it's available.

3. During adding tests I found some of bugs on Calcite side (left notes for
each test and also reference doc), which we can contribute back to Calcite
community. I think this would be a positive feedback loop between Storm and
Calcite project.

There're still many rooms available to work on Storm SQL.

1. I haven't had much time to do now, but would like to learn Calcite and
try to optimize Storm SQL via STORM-1446
<https://issues.apache.org/jira/browse/STORM-1446>. If someone who
understands Calcite well takes over and contributes this area I'd be much
appreciated.

2. I think Trident seems not good backend API for Storm SQL for a long
term.
Trident doesn't support typed operation, join and aggregation is limited to
micro-batch so result is non deterministic. I'm waiting for higher-level
API (STORM-1961 <http://issues.apache.org/jira/browse/STORM-1961> is the
start) and plan to move the backend API.

3. Storm SQL needs to have more connectors as data sources. It isn't hard
to work on, but a bit time-consuming.

4. Storm SQL needs to have time to stabilize by experiment loop:
experimenting from early adopters, reporting, and fixing bugs.

5. Long term work: we can expand supporting features on Storm SQL. Join
between streaming data source and normal table should help to enrich or
filter data with other external data source. There's also continuous effort
from Calcite community: 'Streaming SQL' led by Julian. etc.

Looking forward to continue discussion with JW Player folks.

Thanks,
Jungtaek Lim (HeartSaVioR)

ps.
I'd really like to make revamped Storm SQL available to the community.
Do we think merge process can pause this effort? I hope not, but I can
follow the community's decision.

2016년 9월 30일 (금) 오전 3:58, P. Taylor Goetz <pt...@gmail.com>님이 작성:

FYI, I’ve merged the SQE code and documentation into the sqe_import branch:

https://github.com/apache/storm/tree/sqe_import

Note the build will fail on the last component (storm-sqe) due to the
compilation issues mentioned earlier.

-Taylor


On Sep 29, 2016, at 11:13 AM, Bobby Evans <ev...@yahoo-inc.com.INVALID>
wrote:

Agreed, or if we can find a way to not break compatibility (like perhaps
having a common base class for most of the logic and one subclass that uses
a string while another that uses the byte array).   - Bobby

   On Thursday, September 29, 2016 10:05 AM, Jungtaek Lim <ka...@gmail.com>
wrote:


The change on storm-redis seems to require breaking backward compatibility,
so I would love to see another pull request to integrate it via general
review process.
If it can be integrated to SQE without changing storm-redis, that would be
nice.

Does it make sense?

- Jungtaek Lim (HeartSaVioR)

2016년 9월 29일 (목) 오전 6:08, P. Taylor Goetz <pt...@gmail.com>님이 작성:

The changes are available in GitHub, I just overlooked it. And they’re
actually neatly contained in two commits:

The changes to storm-kafka can be found here:


https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a


The changes to storm-redis are here:


https://github.com/jwplayer/storm/commit/30d000d3ff673efa8b927d23e554a705fb2928b8
<
https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a
>


The changes to storm-kafka are straightforward and implemented in such a
way that they would be useful for use cases outside of SQE. As the commit
message states, it adds a new kafka deserialization scheme (FullScheme)
that includes the key, value, topic, partition and offset when reading from
kafka, which is a feature I can see as being valuable for some use cases. I
would be +1 for merging that code.

The changes to storm-redis are a little different, as Morrigan pointed
out, because it only addresses the Trident API, but IMHO it looks like a
good direction.

@HeartSavior — Would you have some time to take a look at the storm-redis
changes and provide your opinion, since you’re one of the original authors
of that code?

-Taylor


On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <ka...@gmail.com> wrote:

Great!

For storm-redis, we might need to modify key/value mapper to use byte[]
rather than String.
When I co-authored storm-redis, I forgot considering binary format of
serde. If we want to address that part, we can also address it.

2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이 작성:

Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one
will require more work to make it more complete.

On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
wrote:

Thanks for the explanation Morrigan!

Would you be willing to provide a pull request or patch so the community
can review?

It sounds like at least some of the changes you mention could be useful

to

the broader community (beyond the SQL effort).

Thanks again,

-Taylor

On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>

wrote:


storm-kafka - This is needed because storm-kafka does not provide a

scheme

class that gives you the key, value (payload), partition, and offset.
MessageMetadataScheme.java comes comes closest, but is missing the key.
This was a pretty simple change on my part.

storm-redis - This is needed for proper support of Redis hashes. The
existing storm-redis uses a static string (additionalKey in
the RedisDataTypeDescription class) for the field name in hash types. I
updated it to use a configurable KeyFactory for both the hash name and

the

field name. We also added some limited support for set types. This is
admittedly the messiest between the two jars since we only cared about

the

trident states and would require a lot more changes to get storm-redis

more

"feature complete" overall.




On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>

wrote:


Sounds good. I’ll find out if it builds against 2.x. If so I’ll go

that

direction. Otherwise I’ll come back with my findings and we can

discuss

it

further.

I notice there are jars in the git repo that we obviously can’t

import.

They look like they might be custom JWPlayer builds of storm-kafka and
storm-redis.

Morrigan — Do you know if there is any differences there that required
custom builds of those components?

-Taylor

On Sep 26, 2016, at 3:31 PM, Bobby Evans

<evans@yahoo-inc.com.INVALID


wrote:

Does it compile against 2.X?  If so I would prefer to have it go

there,

and then possibly 1.x if people what it there too. - Bobby


  On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <

ptgoetz@gmail.com> wrote:


The IP Clearance vote has passed and we are now able to import the

SQE

code.


The question now is to where do we want to import the code?

My inclination is to import it to “external” in the 1.x branch. It

can

be ported to other branches as necessary/if desired. An alternative

would

be to treat it as a feature branch, but I’d rather take the former

approach.


Thought/opinions?

-Taylor

On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>

wrote:


My apologies. I meant to cc dev@, but didn't. Will forward in a

bit...


The vote (lazy consensus) is underway on general@incubator, and

will

close in less than 72 hours. After that the code can be merged.


-Taylor

On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>

wrote:


Hi dev,

While code contribution of SQE is in progress, I would like to

continue

discussion how to merge SQE and Storm SQL.

I did an analysis of merging SQE and Storm SQL in both side,

integrating

SQE to Storm SQL and vice versa.
https://cwiki.apache.org/confluence/display/STORM/

Technical+analysis+of+merging+SQE+and+Storm+SQL


As I commented to that page, since I'm working on Storm SQL for

some

weeks

I can be (heavily) biased. So I'd really appreciated if someone can

do

another analysis.

Please feel free to share your thought about this analysis, or

another

proposal if you have any, or other things about the merging.

Thanks,
Jungtaek Lim (HeartSaVioR)



--
Morrigan Jones
Principal Engineer
*JW*PLAYER  |  Your Way to Play
morrigan@jwplayer.com  |  jwplayer.com





--
Morrigan Jones
Principal Engineer
*JW*PLAYER  |  Your Way to Play
morrigan@jwplayer.com  |  jwplayer.com

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
FYI, I’ve merged the SQE code and documentation into the sqe_import branch:

https://github.com/apache/storm/tree/sqe_import <https://github.com/apache/storm/tree/sqe_import>

Note the build will fail on the last component (storm-sqe) due to the compilation issues mentioned earlier.

-Taylor


> On Sep 29, 2016, at 11:13 AM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
> 
> Agreed, or if we can find a way to not break compatibility (like perhaps having a common base class for most of the logic and one subclass that uses a string while another that uses the byte array).   - Bobby
> 
>    On Thursday, September 29, 2016 10:05 AM, Jungtaek Lim <ka...@gmail.com> wrote:
> 
> 
> The change on storm-redis seems to require breaking backward compatibility,
> so I would love to see another pull request to integrate it via general
> review process.
> If it can be integrated to SQE without changing storm-redis, that would be
> nice.
> 
> Does it make sense?
> 
> - Jungtaek Lim (HeartSaVioR)
> 
> 2016년 9월 29일 (목) 오전 6:08, P. Taylor Goetz <pt...@gmail.com>님이 작성:
> 
>> The changes are available in GitHub, I just overlooked it. And they’re
>> actually neatly contained in two commits:
>> 
>> The changes to storm-kafka can be found here:
>> 
>> 
>> https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a
>> 
>> 
>> The changes to storm-redis are here:
>> 
>> 
>> https://github.com/jwplayer/storm/commit/30d000d3ff673efa8b927d23e554a705fb2928b8
>> <https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a>
>> 
>> 
>> The changes to storm-kafka are straightforward and implemented in such a
>> way that they would be useful for use cases outside of SQE. As the commit
>> message states, it adds a new kafka deserialization scheme (FullScheme)
>> that includes the key, value, topic, partition and offset when reading from
>> kafka, which is a feature I can see as being valuable for some use cases. I
>> would be +1 for merging that code.
>> 
>> The changes to storm-redis are a little different, as Morrigan pointed
>> out, because it only addresses the Trident API, but IMHO it looks like a
>> good direction.
>> 
>> @HeartSavior — Would you have some time to take a look at the storm-redis
>> changes and provide your opinion, since you’re one of the original authors
>> of that code?
>> 
>> -Taylor
>> 
>> 
>> On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <ka...@gmail.com> wrote:
>> 
>> Great!
>> 
>> For storm-redis, we might need to modify key/value mapper to use byte[]
>> rather than String.
>> When I co-authored storm-redis, I forgot considering binary format of
>> serde. If we want to address that part, we can also address it.
>> 
>> 2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이 작성:
>> 
>> Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one
>> will require more work to make it more complete.
>> 
>> On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
>> wrote:
>> 
>> Thanks for the explanation Morrigan!
>> 
>> Would you be willing to provide a pull request or patch so the community
>> can review?
>> 
>> It sounds like at least some of the changes you mention could be useful
>> 
>> to
>> 
>> the broader community (beyond the SQL effort).
>> 
>> Thanks again,
>> 
>> -Taylor
>> 
>> On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
>> 
>> wrote:
>> 
>> 
>> storm-kafka - This is needed because storm-kafka does not provide a
>> 
>> scheme
>> 
>> class that gives you the key, value (payload), partition, and offset.
>> MessageMetadataScheme.java comes comes closest, but is missing the key.
>> This was a pretty simple change on my part.
>> 
>> storm-redis - This is needed for proper support of Redis hashes. The
>> existing storm-redis uses a static string (additionalKey in
>> the RedisDataTypeDescription class) for the field name in hash types. I
>> updated it to use a configurable KeyFactory for both the hash name and
>> 
>> the
>> 
>> field name. We also added some limited support for set types. This is
>> admittedly the messiest between the two jars since we only cared about
>> 
>> the
>> 
>> trident states and would require a lot more changes to get storm-redis
>> 
>> more
>> 
>> "feature complete" overall.
>> 
>> 
>> 
>> 
>> On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
>> 
>> wrote:
>> 
>> 
>> Sounds good. I’ll find out if it builds against 2.x. If so I’ll go
>> 
>> that
>> 
>> direction. Otherwise I’ll come back with my findings and we can
>> 
>> discuss
>> 
>> it
>> 
>> further.
>> 
>> I notice there are jars in the git repo that we obviously can’t
>> 
>> import.
>> 
>> They look like they might be custom JWPlayer builds of storm-kafka and
>> storm-redis.
>> 
>> Morrigan — Do you know if there is any differences there that required
>> custom builds of those components?
>> 
>> -Taylor
>> 
>> On Sep 26, 2016, at 3:31 PM, Bobby Evans
>> 
>> <evans@yahoo-inc.com.INVALID
>> 
>> 
>> wrote:
>> 
>> Does it compile against 2.X?  If so I would prefer to have it go
>> 
>> there,
>> 
>> and then possibly 1.x if people what it there too. - Bobby
>> 
>> 
>>   On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
>> 
>> ptgoetz@gmail.com> wrote:
>> 
>> 
>> The IP Clearance vote has passed and we are now able to import the
>> 
>> SQE
>> 
>> code.
>> 
>> 
>> The question now is to where do we want to import the code?
>> 
>> My inclination is to import it to “external” in the 1.x branch. It
>> 
>> can
>> 
>> be ported to other branches as necessary/if desired. An alternative
>> 
>> would
>> 
>> be to treat it as a feature branch, but I’d rather take the former
>> 
>> approach.
>> 
>> 
>> Thought/opinions?
>> 
>> -Taylor
>> 
>> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
>> 
>> wrote:
>> 
>> 
>> My apologies. I meant to cc dev@, but didn't. Will forward in a
>> 
>> bit...
>> 
>> 
>> The vote (lazy consensus) is underway on general@incubator, and
>> 
>> will
>> 
>> close in less than 72 hours. After that the code can be merged.
>> 
>> 
>> -Taylor
>> 
>> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>
>> 
>> wrote:
>> 
>> 
>> Hi dev,
>> 
>> While code contribution of SQE is in progress, I would like to
>> 
>> continue
>> 
>> discussion how to merge SQE and Storm SQL.
>> 
>> I did an analysis of merging SQE and Storm SQL in both side,
>> 
>> integrating
>> 
>> SQE to Storm SQL and vice versa.
>> https://cwiki.apache.org/confluence/display/STORM/
>> 
>> Technical+analysis+of+merging+SQE+and+Storm+SQL
>> 
>> 
>> As I commented to that page, since I'm working on Storm SQL for
>> 
>> some
>> 
>> weeks
>> 
>> I can be (heavily) biased. So I'd really appreciated if someone can
>> 
>> do
>> 
>> another analysis.
>> 
>> Please feel free to share your thought about this analysis, or
>> 
>> another
>> 
>> proposal if you have any, or other things about the merging.
>> 
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>> 
>> 
>> 
>> --
>> Morrigan Jones
>> Principal Engineer
>> *JW*PLAYER  |  Your Way to Play
>> morrigan@jwplayer.com  |  jwplayer.com
>> 
>> 
>> 
>> 
>> 
>> --
>> Morrigan Jones
>> Principal Engineer
>> *JW*PLAYER  |  Your Way to Play
>> morrigan@jwplayer.com  |  jwplayer.com
>> 
>> 
>> 
> 


Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.
Agreed, or if we can find a way to not break compatibility (like perhaps having a common base class for most of the logic and one subclass that uses a string while another that uses the byte array).   - Bobby 

    On Thursday, September 29, 2016 10:05 AM, Jungtaek Lim <ka...@gmail.com> wrote:
 

 The change on storm-redis seems to require breaking backward compatibility,
so I would love to see another pull request to integrate it via general
review process.
If it can be integrated to SQE without changing storm-redis, that would be
nice.

Does it make sense?

- Jungtaek Lim (HeartSaVioR)

2016년 9월 29일 (목) 오전 6:08, P. Taylor Goetz <pt...@gmail.com>님이 작성:

> The changes are available in GitHub, I just overlooked it. And they’re
> actually neatly contained in two commits:
>
> The changes to storm-kafka can be found here:
>
>
> https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a
>
>
> The changes to storm-redis are here:
>
>
> https://github.com/jwplayer/storm/commit/30d000d3ff673efa8b927d23e554a705fb2928b8
> <https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a>
>
>
> The changes to storm-kafka are straightforward and implemented in such a
> way that they would be useful for use cases outside of SQE. As the commit
> message states, it adds a new kafka deserialization scheme (FullScheme)
> that includes the key, value, topic, partition and offset when reading from
> kafka, which is a feature I can see as being valuable for some use cases. I
> would be +1 for merging that code.
>
> The changes to storm-redis are a little different, as Morrigan pointed
> out, because it only addresses the Trident API, but IMHO it looks like a
> good direction.
>
> @HeartSavior — Would you have some time to take a look at the storm-redis
> changes and provide your opinion, since you’re one of the original authors
> of that code?
>
> -Taylor
>
>
> On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <ka...@gmail.com> wrote:
>
> Great!
>
> For storm-redis, we might need to modify key/value mapper to use byte[]
> rather than String.
> When I co-authored storm-redis, I forgot considering binary format of
> serde. If we want to address that part, we can also address it.
>
> 2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이 작성:
>
> Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one
> will require more work to make it more complete.
>
> On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
> wrote:
>
> Thanks for the explanation Morrigan!
>
> Would you be willing to provide a pull request or patch so the community
> can review?
>
> It sounds like at least some of the changes you mention could be useful
>
> to
>
> the broader community (beyond the SQL effort).
>
> Thanks again,
>
> -Taylor
>
> On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
>
> wrote:
>
>
> storm-kafka - This is needed because storm-kafka does not provide a
>
> scheme
>
> class that gives you the key, value (payload), partition, and offset.
> MessageMetadataScheme.java comes comes closest, but is missing the key.
> This was a pretty simple change on my part.
>
> storm-redis - This is needed for proper support of Redis hashes. The
> existing storm-redis uses a static string (additionalKey in
> the RedisDataTypeDescription class) for the field name in hash types. I
> updated it to use a configurable KeyFactory for both the hash name and
>
> the
>
> field name. We also added some limited support for set types. This is
> admittedly the messiest between the two jars since we only cared about
>
> the
>
> trident states and would require a lot more changes to get storm-redis
>
> more
>
> "feature complete" overall.
>
>
>
>
> On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
>
> wrote:
>
>
> Sounds good. I’ll find out if it builds against 2.x. If so I’ll go
>
> that
>
> direction. Otherwise I’ll come back with my findings and we can
>
> discuss
>
> it
>
> further.
>
> I notice there are jars in the git repo that we obviously can’t
>
> import.
>
> They look like they might be custom JWPlayer builds of storm-kafka and
> storm-redis.
>
> Morrigan — Do you know if there is any differences there that required
> custom builds of those components?
>
> -Taylor
>
> On Sep 26, 2016, at 3:31 PM, Bobby Evans
>
> <evans@yahoo-inc.com.INVALID
>
>
> wrote:
>
> Does it compile against 2.X?  If so I would prefer to have it go
>
> there,
>
> and then possibly 1.x if people what it there too. - Bobby
>
>
>  On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
>
> ptgoetz@gmail.com> wrote:
>
>
> The IP Clearance vote has passed and we are now able to import the
>
> SQE
>
> code.
>
>
> The question now is to where do we want to import the code?
>
> My inclination is to import it to “external” in the 1.x branch. It
>
> can
>
> be ported to other branches as necessary/if desired. An alternative
>
> would
>
> be to treat it as a feature branch, but I’d rather take the former
>
> approach.
>
>
> Thought/opinions?
>
> -Taylor
>
> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
>
> wrote:
>
>
> My apologies. I meant to cc dev@, but didn't. Will forward in a
>
> bit...
>
>
> The vote (lazy consensus) is underway on general@incubator, and
>
> will
>
> close in less than 72 hours. After that the code can be merged.
>
>
> -Taylor
>
> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>
>
> wrote:
>
>
> Hi dev,
>
> While code contribution of SQE is in progress, I would like to
>
> continue
>
> discussion how to merge SQE and Storm SQL.
>
> I did an analysis of merging SQE and Storm SQL in both side,
>
> integrating
>
> SQE to Storm SQL and vice versa.
> https://cwiki.apache.org/confluence/display/STORM/
>
> Technical+analysis+of+merging+SQE+and+Storm+SQL
>
>
> As I commented to that page, since I'm working on Storm SQL for
>
> some
>
> weeks
>
> I can be (heavily) biased. So I'd really appreciated if someone can
>
> do
>
> another analysis.
>
> Please feel free to share your thought about this analysis, or
>
> another
>
> proposal if you have any, or other things about the merging.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
>
>
> --
> Morrigan Jones
> Principal Engineer
> *JW*PLAYER  |  Your Way to Play
> morrigan@jwplayer.com  |  jwplayer.com
>
>
>
>
>
> --
> Morrigan Jones
> Principal Engineer
> *JW*PLAYER  |  Your Way to Play
> morrigan@jwplayer.com  |  jwplayer.com
>
>
>

   

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Jungtaek Lim <ka...@gmail.com>.
The change on storm-redis seems to require breaking backward compatibility,
so I would love to see another pull request to integrate it via general
review process.
If it can be integrated to SQE without changing storm-redis, that would be
nice.

Does it make sense?

- Jungtaek Lim (HeartSaVioR)

2016년 9월 29일 (목) 오전 6:08, P. Taylor Goetz <pt...@gmail.com>님이 작성:

> The changes are available in GitHub, I just overlooked it. And they’re
> actually neatly contained in two commits:
>
> The changes to storm-kafka can be found here:
>
>
> https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a
>
>
> The changes to storm-redis are here:
>
>
> https://github.com/jwplayer/storm/commit/30d000d3ff673efa8b927d23e554a705fb2928b8
> <https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a>
>
>
> The changes to storm-kafka are straightforward and implemented in such a
> way that they would be useful for use cases outside of SQE. As the commit
> message states, it adds a new kafka deserialization scheme (FullScheme)
> that includes the key, value, topic, partition and offset when reading from
> kafka, which is a feature I can see as being valuable for some use cases. I
> would be +1 for merging that code.
>
> The changes to storm-redis are a little different, as Morrigan pointed
> out, because it only addresses the Trident API, but IMHO it looks like a
> good direction.
>
> @HeartSavior — Would you have some time to take a look at the storm-redis
> changes and provide your opinion, since you’re one of the original authors
> of that code?
>
> -Taylor
>
>
> On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <ka...@gmail.com> wrote:
>
> Great!
>
> For storm-redis, we might need to modify key/value mapper to use byte[]
> rather than String.
> When I co-authored storm-redis, I forgot considering binary format of
> serde. If we want to address that part, we can also address it.
>
> 2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이 작성:
>
> Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one
> will require more work to make it more complete.
>
> On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
> wrote:
>
> Thanks for the explanation Morrigan!
>
> Would you be willing to provide a pull request or patch so the community
> can review?
>
> It sounds like at least some of the changes you mention could be useful
>
> to
>
> the broader community (beyond the SQL effort).
>
> Thanks again,
>
> -Taylor
>
> On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
>
> wrote:
>
>
> storm-kafka - This is needed because storm-kafka does not provide a
>
> scheme
>
> class that gives you the key, value (payload), partition, and offset.
> MessageMetadataScheme.java comes comes closest, but is missing the key.
> This was a pretty simple change on my part.
>
> storm-redis - This is needed for proper support of Redis hashes. The
> existing storm-redis uses a static string (additionalKey in
> the RedisDataTypeDescription class) for the field name in hash types. I
> updated it to use a configurable KeyFactory for both the hash name and
>
> the
>
> field name. We also added some limited support for set types. This is
> admittedly the messiest between the two jars since we only cared about
>
> the
>
> trident states and would require a lot more changes to get storm-redis
>
> more
>
> "feature complete" overall.
>
>
>
>
> On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
>
> wrote:
>
>
> Sounds good. I’ll find out if it builds against 2.x. If so I’ll go
>
> that
>
> direction. Otherwise I’ll come back with my findings and we can
>
> discuss
>
> it
>
> further.
>
> I notice there are jars in the git repo that we obviously can’t
>
> import.
>
> They look like they might be custom JWPlayer builds of storm-kafka and
> storm-redis.
>
> Morrigan — Do you know if there is any differences there that required
> custom builds of those components?
>
> -Taylor
>
> On Sep 26, 2016, at 3:31 PM, Bobby Evans
>
> <evans@yahoo-inc.com.INVALID
>
>
> wrote:
>
> Does it compile against 2.X?  If so I would prefer to have it go
>
> there,
>
> and then possibly 1.x if people what it there too. - Bobby
>
>
>  On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
>
> ptgoetz@gmail.com> wrote:
>
>
> The IP Clearance vote has passed and we are now able to import the
>
> SQE
>
> code.
>
>
> The question now is to where do we want to import the code?
>
> My inclination is to import it to “external” in the 1.x branch. It
>
> can
>
> be ported to other branches as necessary/if desired. An alternative
>
> would
>
> be to treat it as a feature branch, but I’d rather take the former
>
> approach.
>
>
> Thought/opinions?
>
> -Taylor
>
> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
>
> wrote:
>
>
> My apologies. I meant to cc dev@, but didn't. Will forward in a
>
> bit...
>
>
> The vote (lazy consensus) is underway on general@incubator, and
>
> will
>
> close in less than 72 hours. After that the code can be merged.
>
>
> -Taylor
>
> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>
>
> wrote:
>
>
> Hi dev,
>
> While code contribution of SQE is in progress, I would like to
>
> continue
>
> discussion how to merge SQE and Storm SQL.
>
> I did an analysis of merging SQE and Storm SQL in both side,
>
> integrating
>
> SQE to Storm SQL and vice versa.
> https://cwiki.apache.org/confluence/display/STORM/
>
> Technical+analysis+of+merging+SQE+and+Storm+SQL
>
>
> As I commented to that page, since I'm working on Storm SQL for
>
> some
>
> weeks
>
> I can be (heavily) biased. So I'd really appreciated if someone can
>
> do
>
> another analysis.
>
> Please feel free to share your thought about this analysis, or
>
> another
>
> proposal if you have any, or other things about the merging.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
>
>
> --
> Morrigan Jones
> Principal Engineer
> *JW*PLAYER  |  Your Way to Play
> morrigan@jwplayer.com  |  jwplayer.com
>
>
>
>
>
> --
> Morrigan Jones
> Principal Engineer
> *JW*PLAYER  |  Your Way to Play
> morrigan@jwplayer.com  |  jwplayer.com
>
>
>

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
The changes are available in GitHub, I just overlooked it. And they’re actually neatly contained in two commits:

The changes to storm-kafka can be found here:

https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a <https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a>


The changes to storm-redis are here:

https://github.com/jwplayer/storm/commit/30d000d3ff673efa8b927d23e554a705fb2928b8 <https://github.com/jwplayer/storm/commit/2069c76695a225e4bb8f402c89e572836104755a>


The changes to storm-kafka are straightforward and implemented in such a way that they would be useful for use cases outside of SQE. As the commit message states, it adds a new kafka deserialization scheme (FullScheme) that includes the key, value, topic, partition and offset when reading from kafka, which is a feature I can see as being valuable for some use cases. I would be +1 for merging that code.

The changes to storm-redis are a little different, as Morrigan pointed out, because it only addresses the Trident API, but IMHO it looks like a good direction.

@HeartSavior — Would you have some time to take a look at the storm-redis changes and provide your opinion, since you’re one of the original authors of that code?

-Taylor


> On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> 
> Great!
> 
> For storm-redis, we might need to modify key/value mapper to use byte[]
> rather than String.
> When I co-authored storm-redis, I forgot considering binary format of
> serde. If we want to address that part, we can also address it.
> 
> 2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이 작성:
> 
>> Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one
>> will require more work to make it more complete.
>> 
>> On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
>> wrote:
>> 
>>> Thanks for the explanation Morrigan!
>>> 
>>> Would you be willing to provide a pull request or patch so the community
>>> can review?
>>> 
>>> It sounds like at least some of the changes you mention could be useful
>> to
>>> the broader community (beyond the SQL effort).
>>> 
>>> Thanks again,
>>> 
>>> -Taylor
>>> 
>>>> On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
>>> wrote:
>>>> 
>>>> storm-kafka - This is needed because storm-kafka does not provide a
>>> scheme
>>>> class that gives you the key, value (payload), partition, and offset.
>>>> MessageMetadataScheme.java comes comes closest, but is missing the key.
>>>> This was a pretty simple change on my part.
>>>> 
>>>> storm-redis - This is needed for proper support of Redis hashes. The
>>>> existing storm-redis uses a static string (additionalKey in
>>>> the RedisDataTypeDescription class) for the field name in hash types. I
>>>> updated it to use a configurable KeyFactory for both the hash name and
>>> the
>>>> field name. We also added some limited support for set types. This is
>>>> admittedly the messiest between the two jars since we only cared about
>>> the
>>>> trident states and would require a lot more changes to get storm-redis
>>> more
>>>> "feature complete" overall.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
>>> wrote:
>>>>> 
>>>>> Sounds good. I’ll find out if it builds against 2.x. If so I’ll go
>> that
>>>>> direction. Otherwise I’ll come back with my findings and we can
>> discuss
>>> it
>>>>> further.
>>>>> 
>>>>> I notice there are jars in the git repo that we obviously can’t
>> import.
>>>>> They look like they might be custom JWPlayer builds of storm-kafka and
>>>>> storm-redis.
>>>>> 
>>>>> Morrigan — Do you know if there is any differences there that required
>>>>> custom builds of those components?
>>>>> 
>>>>> -Taylor
>>>>> 
>>>>>>> On Sep 26, 2016, at 3:31 PM, Bobby Evans
>> <evans@yahoo-inc.com.INVALID
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>> Does it compile against 2.X?  If so I would prefer to have it go
>> there,
>>>>> and then possibly 1.x if people what it there too. - Bobby
>>>>>> 
>>>>>>>  On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
>>>>>> ptgoetz@gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> The IP Clearance vote has passed and we are now able to import the
>> SQE
>>>>> code.
>>>>>> 
>>>>>> The question now is to where do we want to import the code?
>>>>>> 
>>>>>> My inclination is to import it to “external” in the 1.x branch. It
>> can
>>>>> be ported to other branches as necessary/if desired. An alternative
>>> would
>>>>> be to treat it as a feature branch, but I’d rather take the former
>>> approach.
>>>>>> 
>>>>>> Thought/opinions?
>>>>>> 
>>>>>> -Taylor
>>>>>> 
>>>>>>> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>> My apologies. I meant to cc dev@, but didn't. Will forward in a
>>> bit...
>>>>>>> 
>>>>>>> The vote (lazy consensus) is underway on general@incubator, and
>> will
>>>>> close in less than 72 hours. After that the code can be merged.
>>>>>>> 
>>>>>>> -Taylor
>>>>>>> 
>>>>>>>> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>
>> wrote:
>>>>>>>> 
>>>>>>>> Hi dev,
>>>>>>>> 
>>>>>>>> While code contribution of SQE is in progress, I would like to
>>> continue
>>>>>>>> discussion how to merge SQE and Storm SQL.
>>>>>>>> 
>>>>>>>> I did an analysis of merging SQE and Storm SQL in both side,
>>>>> integrating
>>>>>>>> SQE to Storm SQL and vice versa.
>>>>>>>> https://cwiki.apache.org/confluence/display/STORM/
>>>>> Technical+analysis+of+merging+SQE+and+Storm+SQL
>>>>>>>> 
>>>>>>>> As I commented to that page, since I'm working on Storm SQL for
>> some
>>>>> weeks
>>>>>>>> I can be (heavily) biased. So I'd really appreciated if someone can
>>> do
>>>>>>>> another analysis.
>>>>>>>> 
>>>>>>>> Please feel free to share your thought about this analysis, or
>>> another
>>>>>>>> proposal if you have any, or other things about the merging.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Jungtaek Lim (HeartSaVioR)
>>>> 
>>>> 
>>>> --
>>>> Morrigan Jones
>>>> Principal Engineer
>>>> *JW*PLAYER  |  Your Way to Play
>>>> morrigan@jwplayer.com  |  jwplayer.com
>>> 
>> 
>> 
>> 
>> --
>> Morrigan Jones
>> Principal Engineer
>> *JW*PLAYER  |  Your Way to Play
>> morrigan@jwplayer.com  |  jwplayer.com
>> 


Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
If there are no objections, I’d like to proceed with this approach.

-Taylor


> On Sep 27, 2016, at 2:22 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
> 
> Indeed it does not compile against any version due to the missing changes to storm-kafka and storm-redis that Morrigan pointed out.
> 
> So for now I’m leaning toward a feature branch so we can at least import the code as is (knowing it will not compile). From there we can request pull requests from the JWPlayer developers for the pieces missing from storm-kafka and storm-redis, and evaluate pulling those changes into other branches.
> 
> I’d like to propose creating a “sqe_import” branch based on master, and importing the SQE code into a “storm-sqe” module in external. I would incorporate it into the main build, but make it the last module to build (knowing it won’t compile) so other modules have a chance to successfully build. Once we have the patches in place for the build to succeed, we can decide which branch(es) we want it to land in.
> 
> How does that sound?
> 
> -Taylor
> 
>> On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <ka...@gmail.com> wrote:
>> 
>> Great!
>> 
>> For storm-redis, we might need to modify key/value mapper to use byte[]
>> rather than String.
>> When I co-authored storm-redis, I forgot considering binary format of
>> serde. If we want to address that part, we can also address it.
>> 
>> 2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이 작성:
>> 
>>> Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one
>>> will require more work to make it more complete.
>>> 
>>> On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
>>> wrote:
>>> 
>>>> Thanks for the explanation Morrigan!
>>>> 
>>>> Would you be willing to provide a pull request or patch so the community
>>>> can review?
>>>> 
>>>> It sounds like at least some of the changes you mention could be useful
>>> to
>>>> the broader community (beyond the SQL effort).
>>>> 
>>>> Thanks again,
>>>> 
>>>> -Taylor
>>>> 
>>>>> On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
>>>> wrote:
>>>>> 
>>>>> storm-kafka - This is needed because storm-kafka does not provide a
>>>> scheme
>>>>> class that gives you the key, value (payload), partition, and offset.
>>>>> MessageMetadataScheme.java comes comes closest, but is missing the key.
>>>>> This was a pretty simple change on my part.
>>>>> 
>>>>> storm-redis - This is needed for proper support of Redis hashes. The
>>>>> existing storm-redis uses a static string (additionalKey in
>>>>> the RedisDataTypeDescription class) for the field name in hash types. I
>>>>> updated it to use a configurable KeyFactory for both the hash name and
>>>> the
>>>>> field name. We also added some limited support for set types. This is
>>>>> admittedly the messiest between the two jars since we only cared about
>>>> the
>>>>> trident states and would require a lot more changes to get storm-redis
>>>> more
>>>>> "feature complete" overall.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>> Sounds good. I’ll find out if it builds against 2.x. If so I’ll go
>>> that
>>>>>> direction. Otherwise I’ll come back with my findings and we can
>>> discuss
>>>> it
>>>>>> further.
>>>>>> 
>>>>>> I notice there are jars in the git repo that we obviously can’t
>>> import.
>>>>>> They look like they might be custom JWPlayer builds of storm-kafka and
>>>>>> storm-redis.
>>>>>> 
>>>>>> Morrigan — Do you know if there is any differences there that required
>>>>>> custom builds of those components?
>>>>>> 
>>>>>> -Taylor
>>>>>> 
>>>>>>>> On Sep 26, 2016, at 3:31 PM, Bobby Evans
>>> <evans@yahoo-inc.com.INVALID
>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Does it compile against 2.X?  If so I would prefer to have it go
>>> there,
>>>>>> and then possibly 1.x if people what it there too. - Bobby
>>>>>>> 
>>>>>>>> On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
>>>>>>> ptgoetz@gmail.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> The IP Clearance vote has passed and we are now able to import the
>>> SQE
>>>>>> code.
>>>>>>> 
>>>>>>> The question now is to where do we want to import the code?
>>>>>>> 
>>>>>>> My inclination is to import it to “external” in the 1.x branch. It
>>> can
>>>>>> be ported to other branches as necessary/if desired. An alternative
>>>> would
>>>>>> be to treat it as a feature branch, but I’d rather take the former
>>>> approach.
>>>>>>> 
>>>>>>> Thought/opinions?
>>>>>>> 
>>>>>>> -Taylor
>>>>>>> 
>>>>>>>> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
>>>> wrote:
>>>>>>>> 
>>>>>>>> My apologies. I meant to cc dev@, but didn't. Will forward in a
>>>> bit...
>>>>>>>> 
>>>>>>>> The vote (lazy consensus) is underway on general@incubator, and
>>> will
>>>>>> close in less than 72 hours. After that the code can be merged.
>>>>>>>> 
>>>>>>>> -Taylor
>>>>>>>> 
>>>>>>>>> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>
>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi dev,
>>>>>>>>> 
>>>>>>>>> While code contribution of SQE is in progress, I would like to
>>>> continue
>>>>>>>>> discussion how to merge SQE and Storm SQL.
>>>>>>>>> 
>>>>>>>>> I did an analysis of merging SQE and Storm SQL in both side,
>>>>>> integrating
>>>>>>>>> SQE to Storm SQL and vice versa.
>>>>>>>>> https://cwiki.apache.org/confluence/display/STORM/
>>>>>> Technical+analysis+of+merging+SQE+and+Storm+SQL
>>>>>>>>> 
>>>>>>>>> As I commented to that page, since I'm working on Storm SQL for
>>> some
>>>>>> weeks
>>>>>>>>> I can be (heavily) biased. So I'd really appreciated if someone can
>>>> do
>>>>>>>>> another analysis.
>>>>>>>>> 
>>>>>>>>> Please feel free to share your thought about this analysis, or
>>>> another
>>>>>>>>> proposal if you have any, or other things about the merging.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>> 
>>>>> 
>>>>> --
>>>>> Morrigan Jones
>>>>> Principal Engineer
>>>>> *JW*PLAYER  |  Your Way to Play
>>>>> morrigan@jwplayer.com  |  jwplayer.com
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Morrigan Jones
>>> Principal Engineer
>>> *JW*PLAYER  |  Your Way to Play
>>> morrigan@jwplayer.com  |  jwplayer.com
>>> 
> 


Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
Indeed it does not compile against any version due to the missing changes to storm-kafka and storm-redis that Morrigan pointed out.

So for now I’m leaning toward a feature branch so we can at least import the code as is (knowing it will not compile). From there we can request pull requests from the JWPlayer developers for the pieces missing from storm-kafka and storm-redis, and evaluate pulling those changes into other branches.

I’d like to propose creating a “sqe_import” branch based on master, and importing the SQE code into a “storm-sqe” module in external. I would incorporate it into the main build, but make it the last module to build (knowing it won’t compile) so other modules have a chance to successfully build. Once we have the patches in place for the build to succeed, we can decide which branch(es) we want it to land in.

How does that sound?

-Taylor

> On Sep 26, 2016, at 6:28 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> 
> Great!
> 
> For storm-redis, we might need to modify key/value mapper to use byte[]
> rather than String.
> When I co-authored storm-redis, I forgot considering binary format of
> serde. If we want to address that part, we can also address it.
> 
> 2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이 작성:
> 
>> Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one
>> will require more work to make it more complete.
>> 
>> On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
>> wrote:
>> 
>>> Thanks for the explanation Morrigan!
>>> 
>>> Would you be willing to provide a pull request or patch so the community
>>> can review?
>>> 
>>> It sounds like at least some of the changes you mention could be useful
>> to
>>> the broader community (beyond the SQL effort).
>>> 
>>> Thanks again,
>>> 
>>> -Taylor
>>> 
>>>> On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
>>> wrote:
>>>> 
>>>> storm-kafka - This is needed because storm-kafka does not provide a
>>> scheme
>>>> class that gives you the key, value (payload), partition, and offset.
>>>> MessageMetadataScheme.java comes comes closest, but is missing the key.
>>>> This was a pretty simple change on my part.
>>>> 
>>>> storm-redis - This is needed for proper support of Redis hashes. The
>>>> existing storm-redis uses a static string (additionalKey in
>>>> the RedisDataTypeDescription class) for the field name in hash types. I
>>>> updated it to use a configurable KeyFactory for both the hash name and
>>> the
>>>> field name. We also added some limited support for set types. This is
>>>> admittedly the messiest between the two jars since we only cared about
>>> the
>>>> trident states and would require a lot more changes to get storm-redis
>>> more
>>>> "feature complete" overall.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
>>> wrote:
>>>>> 
>>>>> Sounds good. I’ll find out if it builds against 2.x. If so I’ll go
>> that
>>>>> direction. Otherwise I’ll come back with my findings and we can
>> discuss
>>> it
>>>>> further.
>>>>> 
>>>>> I notice there are jars in the git repo that we obviously can’t
>> import.
>>>>> They look like they might be custom JWPlayer builds of storm-kafka and
>>>>> storm-redis.
>>>>> 
>>>>> Morrigan — Do you know if there is any differences there that required
>>>>> custom builds of those components?
>>>>> 
>>>>> -Taylor
>>>>> 
>>>>>>> On Sep 26, 2016, at 3:31 PM, Bobby Evans
>> <evans@yahoo-inc.com.INVALID
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>> Does it compile against 2.X?  If so I would prefer to have it go
>> there,
>>>>> and then possibly 1.x if people what it there too. - Bobby
>>>>>> 
>>>>>>>  On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
>>>>>> ptgoetz@gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> The IP Clearance vote has passed and we are now able to import the
>> SQE
>>>>> code.
>>>>>> 
>>>>>> The question now is to where do we want to import the code?
>>>>>> 
>>>>>> My inclination is to import it to “external” in the 1.x branch. It
>> can
>>>>> be ported to other branches as necessary/if desired. An alternative
>>> would
>>>>> be to treat it as a feature branch, but I’d rather take the former
>>> approach.
>>>>>> 
>>>>>> Thought/opinions?
>>>>>> 
>>>>>> -Taylor
>>>>>> 
>>>>>>> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>> My apologies. I meant to cc dev@, but didn't. Will forward in a
>>> bit...
>>>>>>> 
>>>>>>> The vote (lazy consensus) is underway on general@incubator, and
>> will
>>>>> close in less than 72 hours. After that the code can be merged.
>>>>>>> 
>>>>>>> -Taylor
>>>>>>> 
>>>>>>>> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>
>> wrote:
>>>>>>>> 
>>>>>>>> Hi dev,
>>>>>>>> 
>>>>>>>> While code contribution of SQE is in progress, I would like to
>>> continue
>>>>>>>> discussion how to merge SQE and Storm SQL.
>>>>>>>> 
>>>>>>>> I did an analysis of merging SQE and Storm SQL in both side,
>>>>> integrating
>>>>>>>> SQE to Storm SQL and vice versa.
>>>>>>>> https://cwiki.apache.org/confluence/display/STORM/
>>>>> Technical+analysis+of+merging+SQE+and+Storm+SQL
>>>>>>>> 
>>>>>>>> As I commented to that page, since I'm working on Storm SQL for
>> some
>>>>> weeks
>>>>>>>> I can be (heavily) biased. So I'd really appreciated if someone can
>>> do
>>>>>>>> another analysis.
>>>>>>>> 
>>>>>>>> Please feel free to share your thought about this analysis, or
>>> another
>>>>>>>> proposal if you have any, or other things about the merging.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Jungtaek Lim (HeartSaVioR)
>>>> 
>>>> 
>>>> --
>>>> Morrigan Jones
>>>> Principal Engineer
>>>> *JW*PLAYER  |  Your Way to Play
>>>> morrigan@jwplayer.com  |  jwplayer.com
>>> 
>> 
>> 
>> 
>> --
>> Morrigan Jones
>> Principal Engineer
>> *JW*PLAYER  |  Your Way to Play
>> morrigan@jwplayer.com  |  jwplayer.com
>> 


Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Jungtaek Lim <ka...@gmail.com>.
Great!

For storm-redis, we might need to modify key/value mapper to use byte[]
rather than String.
When I co-authored storm-redis, I forgot considering binary format of
serde. If we want to address that part, we can also address it.

2016년 9월 27일 (화) 오전 7:19, Morrigan Jones <mo...@jwplayer.com>님이 작성:

> Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one
> will require more work to make it more complete.
>
> On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com>
> wrote:
>
> > Thanks for the explanation Morrigan!
> >
> > Would you be willing to provide a pull request or patch so the community
> > can review?
> >
> > It sounds like at least some of the changes you mention could be useful
> to
> > the broader community (beyond the SQL effort).
> >
> > Thanks again,
> >
> > -Taylor
> >
> > > On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
> > wrote:
> > >
> > > storm-kafka - This is needed because storm-kafka does not provide a
> > scheme
> > > class that gives you the key, value (payload), partition, and offset.
> > > MessageMetadataScheme.java comes comes closest, but is missing the key.
> > > This was a pretty simple change on my part.
> > >
> > > storm-redis - This is needed for proper support of Redis hashes. The
> > > existing storm-redis uses a static string (additionalKey in
> > > the RedisDataTypeDescription class) for the field name in hash types. I
> > > updated it to use a configurable KeyFactory for both the hash name and
> > the
> > > field name. We also added some limited support for set types. This is
> > > admittedly the messiest between the two jars since we only cared about
> > the
> > > trident states and would require a lot more changes to get storm-redis
> > more
> > > "feature complete" overall.
> > >
> > >
> > >
> > >
> > >> On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
> > wrote:
> > >>
> > >> Sounds good. I’ll find out if it builds against 2.x. If so I’ll go
> that
> > >> direction. Otherwise I’ll come back with my findings and we can
> discuss
> > it
> > >> further.
> > >>
> > >> I notice there are jars in the git repo that we obviously can’t
> import.
> > >> They look like they might be custom JWPlayer builds of storm-kafka and
> > >> storm-redis.
> > >>
> > >> Morrigan — Do you know if there is any differences there that required
> > >> custom builds of those components?
> > >>
> > >> -Taylor
> > >>
> > >>>> On Sep 26, 2016, at 3:31 PM, Bobby Evans
> <evans@yahoo-inc.com.INVALID
> > >
> > >>> wrote:
> > >>>
> > >>> Does it compile against 2.X?  If so I would prefer to have it go
> there,
> > >> and then possibly 1.x if people what it there too. - Bobby
> > >>>
> > >>>>   On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
> > >>> ptgoetz@gmail.com> wrote:
> > >>>
> > >>>
> > >>> The IP Clearance vote has passed and we are now able to import the
> SQE
> > >> code.
> > >>>
> > >>> The question now is to where do we want to import the code?
> > >>>
> > >>> My inclination is to import it to “external” in the 1.x branch. It
> can
> > >> be ported to other branches as necessary/if desired. An alternative
> > would
> > >> be to treat it as a feature branch, but I’d rather take the former
> > approach.
> > >>>
> > >>> Thought/opinions?
> > >>>
> > >>> -Taylor
> > >>>
> > >>>> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
> > wrote:
> > >>>>
> > >>>> My apologies. I meant to cc dev@, but didn't. Will forward in a
> > bit...
> > >>>>
> > >>>> The vote (lazy consensus) is underway on general@incubator, and
> will
> > >> close in less than 72 hours. After that the code can be merged.
> > >>>>
> > >>>> -Taylor
> > >>>>
> > >>>>> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com>
> wrote:
> > >>>>>
> > >>>>> Hi dev,
> > >>>>>
> > >>>>> While code contribution of SQE is in progress, I would like to
> > continue
> > >>>>> discussion how to merge SQE and Storm SQL.
> > >>>>>
> > >>>>> I did an analysis of merging SQE and Storm SQL in both side,
> > >> integrating
> > >>>>> SQE to Storm SQL and vice versa.
> > >>>>> https://cwiki.apache.org/confluence/display/STORM/
> > >> Technical+analysis+of+merging+SQE+and+Storm+SQL
> > >>>>>
> > >>>>> As I commented to that page, since I'm working on Storm SQL for
> some
> > >> weeks
> > >>>>> I can be (heavily) biased. So I'd really appreciated if someone can
> > do
> > >>>>> another analysis.
> > >>>>>
> > >>>>> Please feel free to share your thought about this analysis, or
> > another
> > >>>>> proposal if you have any, or other things about the merging.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Jungtaek Lim (HeartSaVioR)
> > >
> > >
> > > --
> > > Morrigan Jones
> > > Principal Engineer
> > > *JW*PLAYER  |  Your Way to Play
> > > morrigan@jwplayer.com  |  jwplayer.com
> >
>
>
>
> --
> Morrigan Jones
> Principal Engineer
> *JW*PLAYER  |  Your Way to Play
> morrigan@jwplayer.com  |  jwplayer.com
>

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Morrigan Jones <mo...@jwplayer.com>.
Sure, when I can. Storm-kafka should be pretty easy. The storm-redis one
will require more work to make it more complete.

On Mon, Sep 26, 2016 at 6:09 PM, P. Taylor Goetz <pt...@gmail.com> wrote:

> Thanks for the explanation Morrigan!
>
> Would you be willing to provide a pull request or patch so the community
> can review?
>
> It sounds like at least some of the changes you mention could be useful to
> the broader community (beyond the SQL effort).
>
> Thanks again,
>
> -Taylor
>
> > On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com>
> wrote:
> >
> > storm-kafka - This is needed because storm-kafka does not provide a
> scheme
> > class that gives you the key, value (payload), partition, and offset.
> > MessageMetadataScheme.java comes comes closest, but is missing the key.
> > This was a pretty simple change on my part.
> >
> > storm-redis - This is needed for proper support of Redis hashes. The
> > existing storm-redis uses a static string (additionalKey in
> > the RedisDataTypeDescription class) for the field name in hash types. I
> > updated it to use a configurable KeyFactory for both the hash name and
> the
> > field name. We also added some limited support for set types. This is
> > admittedly the messiest between the two jars since we only cared about
> the
> > trident states and would require a lot more changes to get storm-redis
> more
> > "feature complete" overall.
> >
> >
> >
> >
> >> On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com>
> wrote:
> >>
> >> Sounds good. I’ll find out if it builds against 2.x. If so I’ll go that
> >> direction. Otherwise I’ll come back with my findings and we can discuss
> it
> >> further.
> >>
> >> I notice there are jars in the git repo that we obviously can’t import.
> >> They look like they might be custom JWPlayer builds of storm-kafka and
> >> storm-redis.
> >>
> >> Morrigan — Do you know if there is any differences there that required
> >> custom builds of those components?
> >>
> >> -Taylor
> >>
> >>>> On Sep 26, 2016, at 3:31 PM, Bobby Evans <evans@yahoo-inc.com.INVALID
> >
> >>> wrote:
> >>>
> >>> Does it compile against 2.X?  If so I would prefer to have it go there,
> >> and then possibly 1.x if people what it there too. - Bobby
> >>>
> >>>>   On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
> >>> ptgoetz@gmail.com> wrote:
> >>>
> >>>
> >>> The IP Clearance vote has passed and we are now able to import the SQE
> >> code.
> >>>
> >>> The question now is to where do we want to import the code?
> >>>
> >>> My inclination is to import it to “external” in the 1.x branch. It can
> >> be ported to other branches as necessary/if desired. An alternative
> would
> >> be to treat it as a feature branch, but I’d rather take the former
> approach.
> >>>
> >>> Thought/opinions?
> >>>
> >>> -Taylor
> >>>
> >>>> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com>
> wrote:
> >>>>
> >>>> My apologies. I meant to cc dev@, but didn't. Will forward in a
> bit...
> >>>>
> >>>> The vote (lazy consensus) is underway on general@incubator, and will
> >> close in less than 72 hours. After that the code can be merged.
> >>>>
> >>>> -Taylor
> >>>>
> >>>>> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> >>>>>
> >>>>> Hi dev,
> >>>>>
> >>>>> While code contribution of SQE is in progress, I would like to
> continue
> >>>>> discussion how to merge SQE and Storm SQL.
> >>>>>
> >>>>> I did an analysis of merging SQE and Storm SQL in both side,
> >> integrating
> >>>>> SQE to Storm SQL and vice versa.
> >>>>> https://cwiki.apache.org/confluence/display/STORM/
> >> Technical+analysis+of+merging+SQE+and+Storm+SQL
> >>>>>
> >>>>> As I commented to that page, since I'm working on Storm SQL for some
> >> weeks
> >>>>> I can be (heavily) biased. So I'd really appreciated if someone can
> do
> >>>>> another analysis.
> >>>>>
> >>>>> Please feel free to share your thought about this analysis, or
> another
> >>>>> proposal if you have any, or other things about the merging.
> >>>>>
> >>>>> Thanks,
> >>>>> Jungtaek Lim (HeartSaVioR)
> >
> >
> > --
> > Morrigan Jones
> > Principal Engineer
> > *JW*PLAYER  |  Your Way to Play
> > morrigan@jwplayer.com  |  jwplayer.com
>



-- 
Morrigan Jones
Principal Engineer
*JW*PLAYER  |  Your Way to Play
morrigan@jwplayer.com  |  jwplayer.com

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
Thanks for the explanation Morrigan!

Would you be willing to provide a pull request or patch so the community can review?

It sounds like at least some of the changes you mention could be useful to the broader community (beyond the SQL effort).

Thanks again,

-Taylor

> On Sep 26, 2016, at 4:40 PM, Morrigan Jones <mo...@jwplayer.com> wrote:
> 
> storm-kafka - This is needed because storm-kafka does not provide a scheme
> class that gives you the key, value (payload), partition, and offset.
> MessageMetadataScheme.java comes comes closest, but is missing the key.
> This was a pretty simple change on my part.
> 
> storm-redis - This is needed for proper support of Redis hashes. The
> existing storm-redis uses a static string (additionalKey in
> the RedisDataTypeDescription class) for the field name in hash types. I
> updated it to use a configurable KeyFactory for both the hash name and the
> field name. We also added some limited support for set types. This is
> admittedly the messiest between the two jars since we only cared about the
> trident states and would require a lot more changes to get storm-redis more
> "feature complete" overall.
> 
> 
> 
> 
>> On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
>> 
>> Sounds good. I’ll find out if it builds against 2.x. If so I’ll go that
>> direction. Otherwise I’ll come back with my findings and we can discuss it
>> further.
>> 
>> I notice there are jars in the git repo that we obviously can’t import.
>> They look like they might be custom JWPlayer builds of storm-kafka and
>> storm-redis.
>> 
>> Morrigan — Do you know if there is any differences there that required
>> custom builds of those components?
>> 
>> -Taylor
>> 
>>>> On Sep 26, 2016, at 3:31 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID>
>>> wrote:
>>> 
>>> Does it compile against 2.X?  If so I would prefer to have it go there,
>> and then possibly 1.x if people what it there too. - Bobby
>>> 
>>>>   On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
>>> ptgoetz@gmail.com> wrote:
>>> 
>>> 
>>> The IP Clearance vote has passed and we are now able to import the SQE
>> code.
>>> 
>>> The question now is to where do we want to import the code?
>>> 
>>> My inclination is to import it to “external” in the 1.x branch. It can
>> be ported to other branches as necessary/if desired. An alternative would
>> be to treat it as a feature branch, but I’d rather take the former approach.
>>> 
>>> Thought/opinions?
>>> 
>>> -Taylor
>>> 
>>>> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
>>>> 
>>>> My apologies. I meant to cc dev@, but didn't. Will forward in a bit...
>>>> 
>>>> The vote (lazy consensus) is underway on general@incubator, and will
>> close in less than 72 hours. After that the code can be merged.
>>>> 
>>>> -Taylor
>>>> 
>>>>> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com> wrote:
>>>>> 
>>>>> Hi dev,
>>>>> 
>>>>> While code contribution of SQE is in progress, I would like to continue
>>>>> discussion how to merge SQE and Storm SQL.
>>>>> 
>>>>> I did an analysis of merging SQE and Storm SQL in both side,
>> integrating
>>>>> SQE to Storm SQL and vice versa.
>>>>> https://cwiki.apache.org/confluence/display/STORM/
>> Technical+analysis+of+merging+SQE+and+Storm+SQL
>>>>> 
>>>>> As I commented to that page, since I'm working on Storm SQL for some
>> weeks
>>>>> I can be (heavily) biased. So I'd really appreciated if someone can do
>>>>> another analysis.
>>>>> 
>>>>> Please feel free to share your thought about this analysis, or another
>>>>> proposal if you have any, or other things about the merging.
>>>>> 
>>>>> Thanks,
>>>>> Jungtaek Lim (HeartSaVioR)
> 
> 
> -- 
> Morrigan Jones
> Principal Engineer
> *JW*PLAYER  |  Your Way to Play
> morrigan@jwplayer.com  |  jwplayer.com

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Morrigan Jones <mo...@jwplayer.com>.
storm-kafka - This is needed because storm-kafka does not provide a scheme
class that gives you the key, value (payload), partition, and offset.
MessageMetadataScheme.java comes comes closest, but is missing the key.
This was a pretty simple change on my part.

storm-redis - This is needed for proper support of Redis hashes. The
existing storm-redis uses a static string (additionalKey in
the RedisDataTypeDescription class) for the field name in hash types. I
updated it to use a configurable KeyFactory for both the hash name and the
field name. We also added some limited support for set types. This is
admittedly the messiest between the two jars since we only cared about the
trident states and would require a lot more changes to get storm-redis more
"feature complete" overall.




On Mon, Sep 26, 2016 at 4:03 PM, P. Taylor Goetz <pt...@gmail.com> wrote:

> Sounds good. I’ll find out if it builds against 2.x. If so I’ll go that
> direction. Otherwise I’ll come back with my findings and we can discuss it
> further.
>
> I notice there are jars in the git repo that we obviously can’t import.
> They look like they might be custom JWPlayer builds of storm-kafka and
> storm-redis.
>
> Morrigan — Do you know if there is any differences there that required
> custom builds of those components?
>
> -Taylor
>
> > On Sep 26, 2016, at 3:31 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID>
> wrote:
> >
> > Does it compile against 2.X?  If so I would prefer to have it go there,
> and then possibly 1.x if people what it there too. - Bobby
> >
> >    On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <
> ptgoetz@gmail.com> wrote:
> >
> >
> > The IP Clearance vote has passed and we are now able to import the SQE
> code.
> >
> > The question now is to where do we want to import the code?
> >
> > My inclination is to import it to “external” in the 1.x branch. It can
> be ported to other branches as necessary/if desired. An alternative would
> be to treat it as a feature branch, but I’d rather take the former approach.
> >
> > Thought/opinions?
> >
> > -Taylor
> >
> >> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
> >>
> >> My apologies. I meant to cc dev@, but didn't. Will forward in a bit...
> >>
> >> The vote (lazy consensus) is underway on general@incubator, and will
> close in less than 72 hours. After that the code can be merged.
> >>
> >> -Taylor
> >>
> >>> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> >>>
> >>> Hi dev,
> >>>
> >>> While code contribution of SQE is in progress, I would like to continue
> >>> discussion how to merge SQE and Storm SQL.
> >>>
> >>> I did an analysis of merging SQE and Storm SQL in both side,
> integrating
> >>> SQE to Storm SQL and vice versa.
> >>> https://cwiki.apache.org/confluence/display/STORM/
> Technical+analysis+of+merging+SQE+and+Storm+SQL
> >>>
> >>> As I commented to that page, since I'm working on Storm SQL for some
> weeks
> >>> I can be (heavily) biased. So I'd really appreciated if someone can do
> >>> another analysis.
> >>>
> >>> Please feel free to share your thought about this analysis, or another
> >>> proposal if you have any, or other things about the merging.
> >>>
> >>> Thanks,
> >>> Jungtaek Lim (HeartSaVioR)
> >
> >
>
>


-- 
Morrigan Jones
Principal Engineer
*JW*PLAYER  |  Your Way to Play
morrigan@jwplayer.com  |  jwplayer.com

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
Sounds good. I’ll find out if it builds against 2.x. If so I’ll go that direction. Otherwise I’ll come back with my findings and we can discuss it further.

I notice there are jars in the git repo that we obviously can’t import. They look like they might be custom JWPlayer builds of storm-kafka and storm-redis.

Morrigan — Do you know if there is any differences there that required custom builds of those components?

-Taylor

> On Sep 26, 2016, at 3:31 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
> 
> Does it compile against 2.X?  If so I would prefer to have it go there, and then possibly 1.x if people what it there too. - Bobby
> 
>    On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
> 
> 
> The IP Clearance vote has passed and we are now able to import the SQE code.
> 
> The question now is to where do we want to import the code?
> 
> My inclination is to import it to “external” in the 1.x branch. It can be ported to other branches as necessary/if desired. An alternative would be to treat it as a feature branch, but I’d rather take the former approach.
> 
> Thought/opinions?
> 
> -Taylor
> 
>> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
>> 
>> My apologies. I meant to cc dev@, but didn't. Will forward in a bit...
>> 
>> The vote (lazy consensus) is underway on general@incubator, and will close in less than 72 hours. After that the code can be merged.
>> 
>> -Taylor
>> 
>>> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com> wrote:
>>> 
>>> Hi dev,
>>> 
>>> While code contribution of SQE is in progress, I would like to continue
>>> discussion how to merge SQE and Storm SQL.
>>> 
>>> I did an analysis of merging SQE and Storm SQL in both side, integrating
>>> SQE to Storm SQL and vice versa.
>>> https://cwiki.apache.org/confluence/display/STORM/Technical+analysis+of+merging+SQE+and+Storm+SQL
>>> 
>>> As I commented to that page, since I'm working on Storm SQL for some weeks
>>> I can be (heavily) biased. So I'd really appreciated if someone can do
>>> another analysis.
>>> 
>>> Please feel free to share your thought about this analysis, or another
>>> proposal if you have any, or other things about the merging.
>>> 
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
> 
> 


Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.
Does it compile against 2.X?  If so I would prefer to have it go there, and then possibly 1.x if people what it there too. - Bobby 

    On Monday, September 26, 2016 12:47 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
 

 The IP Clearance vote has passed and we are now able to import the SQE code.

The question now is to where do we want to import the code?

My inclination is to import it to “external” in the 1.x branch. It can be ported to other branches as necessary/if desired. An alternative would be to treat it as a feature branch, but I’d rather take the former approach.

Thought/opinions?

-Taylor

> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
> 
> My apologies. I meant to cc dev@, but didn't. Will forward in a bit...
> 
> The vote (lazy consensus) is underway on general@incubator, and will close in less than 72 hours. After that the code can be merged.
> 
> -Taylor
> 
>> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com> wrote:
>> 
>> Hi dev,
>> 
>> While code contribution of SQE is in progress, I would like to continue
>> discussion how to merge SQE and Storm SQL.
>> 
>> I did an analysis of merging SQE and Storm SQL in both side, integrating
>> SQE to Storm SQL and vice versa.
>> https://cwiki.apache.org/confluence/display/STORM/Technical+analysis+of+merging+SQE+and+Storm+SQL
>> 
>> As I commented to that page, since I'm working on Storm SQL for some weeks
>> I can be (heavily) biased. So I'd really appreciated if someone can do
>> another analysis.
>> 
>> Please feel free to share your thought about this analysis, or another
>> proposal if you have any, or other things about the merging.
>> 
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)


   

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
The IP Clearance vote has passed and we are now able to import the SQE code.

The question now is to where do we want to import the code?

My inclination is to import it to “external” in the 1.x branch. It can be ported to other branches as necessary/if desired. An alternative would be to treat it as a feature branch, but I’d rather take the former approach.

Thought/opinions?

-Taylor

> On Sep 21, 2016, at 8:39 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
> 
> My apologies. I meant to cc dev@, but didn't. Will forward in a bit...
> 
> The vote (lazy consensus) is underway on general@incubator, and will close in less than 72 hours. After that the code can be merged.
> 
> -Taylor
> 
>> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com> wrote:
>> 
>> Hi dev,
>> 
>> While code contribution of SQE is in progress, I would like to continue
>> discussion how to merge SQE and Storm SQL.
>> 
>> I did an analysis of merging SQE and Storm SQL in both side, integrating
>> SQE to Storm SQL and vice versa.
>> https://cwiki.apache.org/confluence/display/STORM/Technical+analysis+of+merging+SQE+and+Storm+SQL
>> 
>> As I commented to that page, since I'm working on Storm SQL for some weeks
>> I can be (heavily) biased. So I'd really appreciated if someone can do
>> another analysis.
>> 
>> Please feel free to share your thought about this analysis, or another
>> proposal if you have any, or other things about the merging.
>> 
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)


Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
My apologies. I meant to cc dev@, but didn't. Will forward in a bit...

The vote (lazy consensus) is underway on general@incubator, and will close in less than 72 hours. After that the code can be merged.

-Taylor

> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> 
> Hi dev,
> 
> While code contribution of SQE is in progress, I would like to continue
> discussion how to merge SQE and Storm SQL.
> 
> I did an analysis of merging SQE and Storm SQL in both side, integrating
> SQE to Storm SQL and vice versa.
> https://cwiki.apache.org/confluence/display/STORM/Technical+analysis+of+merging+SQE+and+Storm+SQL
> 
> As I commented to that page, since I'm working on Storm SQL for some weeks
> I can be (heavily) biased. So I'd really appreciated if someone can do
> another analysis.
> 
> Please feel free to share your thought about this analysis, or another
> proposal if you have any, or other things about the merging.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by Morrigan Jones <mo...@jwplayer.com>.
Thanks Jungtaek for the analysis. We've been super busy the past week, but
I will perform my own analysis this week.

On Wed, Sep 21, 2016 at 9:14 PM, P. Taylor Goetz <pt...@gmail.com> wrote:

> Thank you for the analysis Jungtaek. I particularly appreciate your
> admitted bias as a big contributor to storm-SQL.
>
> It will take time for us to sort out how the merge will happen. Similar to
> what we are discussing with Kafka and ES, we could support two versions of
> SQL, and eventually deprecate the one one that has the least traction.
>
> I'd rather not do that if possible. I'd rather see the community rally
> around a (TBD) approach to a single SQL interface. SQE has an advantage
> that it used in production, but being newly donated/open sourced, it does
> not have much of a user community. Storm-SQL has the same community issue,
> likely because it's never been announced. And it's never been announced
> because it hasn't been ready (missing MVP features).
>
> It would be good to see some sort of analysis from the SQE devs who are
> more intimately acquainted with the SQE codebase.
>
> -Taylor
>
>
> > On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> >
> > Hi dev,
> >
> > While code contribution of SQE is in progress, I would like to continue
> > discussion how to merge SQE and Storm SQL.
> >
> > I did an analysis of merging SQE and Storm SQL in both side, integrating
> > SQE to Storm SQL and vice versa.
> > https://cwiki.apache.org/confluence/display/STORM/
> Technical+analysis+of+merging+SQE+and+Storm+SQL
> >
> > As I commented to that page, since I'm working on Storm SQL for some
> weeks
> > I can be (heavily) biased. So I'd really appreciated if someone can do
> > another analysis.
> >
> > Please feel free to share your thought about this analysis, or another
> > proposal if you have any, or other things about the merging.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
>



-- 
Morrigan Jones
Principal Engineer
*JW*PLAYER  |  Your Way to Play
morrigan@jwplayer.com  |  jwplayer.com

Re: [DISCUSS] Plan for merge SQE and Storm SQL

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
Thank you for the analysis Jungtaek. I particularly appreciate your admitted bias as a big contributor to storm-SQL.

It will take time for us to sort out how the merge will happen. Similar to what we are discussing with Kafka and ES, we could support two versions of SQL, and eventually deprecate the one one that has the least traction.

I'd rather not do that if possible. I'd rather see the community rally around a (TBD) approach to a single SQL interface. SQE has an advantage that it used in production, but being newly donated/open sourced, it does not have much of a user community. Storm-SQL has the same community issue, likely because it's never been announced. And it's never been announced because it hasn't been ready (missing MVP features).

It would be good to see some sort of analysis from the SQE devs who are more intimately acquainted with the SQE codebase.

-Taylor


> On Sep 21, 2016, at 7:02 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> 
> Hi dev,
> 
> While code contribution of SQE is in progress, I would like to continue
> discussion how to merge SQE and Storm SQL.
> 
> I did an analysis of merging SQE and Storm SQL in both side, integrating
> SQE to Storm SQL and vice versa.
> https://cwiki.apache.org/confluence/display/STORM/Technical+analysis+of+merging+SQE+and+Storm+SQL
> 
> As I commented to that page, since I'm working on Storm SQL for some weeks
> I can be (heavily) biased. So I'd really appreciated if someone can do
> another analysis.
> 
> Please feel free to share your thought about this analysis, or another
> proposal if you have any, or other things about the merging.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)