You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apex.apache.org by Thomas Weise <th...@apache.org> on 2016/12/08 19:27:47 UTC

[ANNOUNCE] Apache Apex Malhar 3.6.0 released

Dear Community,

The Apache Apex community is pleased to announce release 3.6.0 of the
Malhar library. The release resolved 70 JIRAs <https://s.apache.org/9b0t>.

The release adds first iteration of SQL support via Apache Calcite.
Features include SELECT, INSERT, INNER JOIN with non-empty equi join
condition, WHERE clause,
SCALAR functions that are implemented in Calcite, custom scalar functions.
Endpoint can be file, Kafka or internal streaming port for both input and
output. CSV format is implemented for both input and output. See examples
<https://github.com/apache/apex-malhar/tree/release-3.6/demos/sql/src/main/java/org/apache/apex/malhar/sql/sample>
for usage of the new API.

The windowed state management has been improved (WindowedOperator). There
is now an option to use spillable data structures for the state storage.
This enables the operator to store large states and perform efficient
checkpointing.

We also did benchmarking on WindowedOperator with the spillable data
structures. From the result of our findings, we improved greatly how
objects are serialized and reduced garbage collection considerably in the
Managed State layer. Work is still in progress for purging state that is
not needed any more and further improving the performance of Managed State
that the spillable data structures depend on. More information about the
windowing support can be found here
<http://apex.apache.org/docs/malhar/operators/windowedOperator/>.

This release also adds a new, alternative Cassandra output operator
(non-transactional, upsert based) and support for fixed length file format
to the enrichment operator.

The user documentation <http://apex.apache.org/docs/malhar-3.6/> has been
expanded to cover more operators. See https://s.apache.org/9b0t for other
enhancements and fixes in this release.

Apache Apex is an enterprise grade native YARN big data-in-motion platform
that unifies stream and batch processing. Apex was built for scalability
and low-latency processing, high availability and operability.

Apex provides features that similar platforms currently don’t offer, such
as fine grained, incremental recovery to only reset the portion of a
topology that is affected by a failure, support for elastic scaling based
on the ability to acquire (and release) resources as needed as well as the
ability to alter topology and operator properties on running applications.

Apex has been developed since 2012 and became ASF top level project earlier
this year, following 8 months of incubation. Apex early on brought the
combination of high throughput, low latency and fault tolerance with strong
processing guarantees to the stream data processing space and gained
maturity through important production use cases at several organizations.
See the powered by page and resources on the project web site for more
information:

http://apex.apache.org/powered-by-apex.html
http://apex.apache.org/docs.html

The Apex engine is supplemented by Malhar, the library of pre-built
operators, including adapters that integrate with many existing
technologies as sources and destinations, like message buses, databases,
files or social media feeds.

An easy way to get started with Apex is to pick one of the examples as
starting point. They cover many common and recurring tasks, such as data
consumption from different sources, output to various sinks, partitioning
and fault tolerance:

https://github.com/DataTorrent/examples/tree/master/tutorials

Apex Malhar and Core (the engine) are separate repositories and releases.
We expect more frequent releases of Malhar to roll out new connectors and
other operators based on a stable engine API. This release 3.6.0 works on
existing Apex Core 3.4.0. Users only need to upgrade the Maven dependency
in their project.

The source release can be found at:

http://apex.apache.org/downloads.html

We welcome your help and feedback. For more information on the project and
how to get involved, visit our website at:

http://apex.apache.org/

Regards,
The Apache Apex community

Re: [ANNOUNCE] Apache Apex Malhar 3.6.0 released

Posted by sebb <se...@gmail.com>.

What is the project about? Why should I be interested in it?
[rhetorical questions]

The Announce emails are sent to people not on the developer or user lists.
Most will have no idea what the project is about.

So the e-mails should contain at least brief details of what the
product does, and some info on why the new release might be of
interest to them.

Readers should not have to click the link to find out the basic information
(although of course it is useful to have such links for further detail).

In this case the information on what Malhar does is present, but it is
buried deep within the e-mail.

For future releases, please could the email be adjusted to make it
easy to find out what the project does?

For example put the following two sentences near the beginning:

Apache Apex is an enterprise grade native ...
The Apex engine is supplemented by Malhar, the library ...

Also, Cassandra should be "Apache Cassandra", at least for first mention.
And YARN should be "Apache Hadoop YARN", please.

Thanks.

On 8 December 2016 at 19:27, Thomas Weise <th...@apache.org> wrote:
> Dear Community,
>
> The Apache Apex community is pleased to announce release 3.6.0 of the Malhar
> library. The release resolved 70 JIRAs.
>
> The release adds first iteration of SQL support via Apache Calcite. Features
> include SELECT, INSERT, INNER JOIN with non-empty equi join condition, WHERE
> clause,
> SCALAR functions that are implemented in Calcite, custom scalar functions.
> Endpoint can be file, Kafka or internal streaming port for both input and
> output. CSV format is implemented for both input and output. See examples
> for usage of the new API.
>
> The windowed state management has been improved (WindowedOperator). There is
> now an option to use spillable data structures for the state storage. This
> enables the operator to store large states and perform efficient
> checkpointing.
>
> We also did benchmarking on WindowedOperator with the spillable data
> structures. From the result of our findings, we improved greatly how objects
> are serialized and reduced garbage collection considerably in the Managed
> State layer. Work is still in progress for purging state that is not needed
> any more and further improving the performance of Managed State that the
> spillable data structures depend on. More information about the windowing
> support can be found here.
>
> This release also adds a new, alternative Cassandra output operator
> (non-transactional, upsert based) and support for fixed length file format
> to the enrichment operator.
>
> The user documentation has been expanded to cover more operators. See
> https://s.apache.org/9b0t for other enhancements and fixes in this release.
>
> Apache Apex is an enterprise grade native YARN big data-in-motion platform
> that unifies stream and batch processing. Apex was built for scalability and
> low-latency processing, high availability and operability.
>
> Apex provides features that similar platforms currently don’t offer, such as
> fine grained, incremental recovery to only reset the portion of a topology
> that is affected by a failure, support for elastic scaling based on the
> ability to acquire (and release) resources as needed as well as the ability
> to alter topology and operator properties on running applications.
>
> Apex has been developed since 2012 and became ASF top level project earlier
> this year, following 8 months of incubation. Apex early on brought the
> combination of high throughput, low latency and fault tolerance with strong
> processing guarantees to the stream data processing space and gained
> maturity through important production use cases at several organizations.
> See the powered by page and resources on the project web site for more
> information:
>
> http://apex.apache.org/powered-by-apex.html
> http://apex.apache.org/docs.html
>
> The Apex engine is supplemented by Malhar, the library of pre-built
> operators, including adapters that integrate with many existing technologies
> as sources and destinations, like message buses, databases, files or social
> media feeds.
>
> An easy way to get started with Apex is to pick one of the examples as
> starting point. They cover many common and recurring tasks, such as data
> consumption from different sources, output to various sinks, partitioning
> and fault tolerance:
>
> https://github.com/DataTorrent/examples/tree/master/tutorials
>
> Apex Malhar and Core (the engine) are separate repositories and releases. We
> expect more frequent releases of Malhar to roll out new connectors and other
> operators based on a stable engine API. This release 3.6.0 works on existing
> Apex Core 3.4.0. Users only need to upgrade the Maven dependency in their
> project.
>
> The source release can be found at:
>
> http://apex.apache.org/downloads.html
>
> We welcome your help and feedback. For more information on the project and
> how to get involved, visit our website at:
>
> http://apex.apache.org/
>
> Regards,
> The Apache Apex community
>