You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by sharanf <sh...@apache.org> on 2022/03/11 13:01:16 UTC

Performance Engineering Track at ApacheCon NA?

Hi All

The call for tracks for ApacheCon NA is open. There is a suggestion to 
try and run a Performance Engineering track at ApacheCon. At the end of 
the message I have included some details including a definition of what 
we mean by it and some reasoning about why it could be good to run. We 
have a list of projects that have something to do with performance 
engineering and if you take a look -  you will see that this project is 
on the list!

So what I need is some feedback as to whether the community thinks that 
this could be an interesting track topic to run at ApacheCon..and more 
importantly would the community be willing to submit talks for it or 
attend ApacheCon to see it.

Like I say - this is just an idea at this stage. If the Performance 
Engineering track does get approval to be included at ApacheCon  - do we 
have any volunteers willing to help with managing and promoting the 
track on behalf of the project?

Thanks
Sharan

-----------------------------

*Performance Engineering*  is the science and practice of engineering
software with the required performance and scalability characteristics.
Many Apache projects focus on solving hard Big Data performance and
scalability challenges, while others provide tools for performance
engineering - but there are few projects that don’t care about some
aspect of software performance.

This track will enable Apache projects members to share their
experiences of performance engineering best practices, tools,
techniques, and results, from their own communities, with the benefits
of cross-fertilization between projects. Performance Engineering in the
wider open source community is pervasive and includes methods and tools
(including automation and agile approaches) for performance:
architecting and design, benchmarking, monitoring, tracing, analysis,
prediction, modeling and simulation, testing and reporting, regression
testing, and source code analysis and instrumentation techniques.

Performance Engineering also has wider applicability to DevOps, the
operation of cloud platforms by managed service providers (hence some
overlap with SRE - Site Reliability Engineering), and customer
application performance and tuning.  This track would therefore be
applicable to the wider open source community.

*SUPPORTING DETAILS*

*Google Searches*
Google “Open source performance engineering” has 4,180,000,000 results
Google “site:apache.org<http://apache.org>  performance” has 147,000 results

*Apache Projects *which may have some interest in, or focus on,
performance (just the top results):
JMeter, Cassandra, Storm, Spark, Samza, Pulsar, Kafka, Log4J, SystemML,
Drill, HTTP Server, Cayenne, ActiveMQ, Impala, Geode, Flink, Ignite,
Impala, Lucene, TVM, Tika, YuniKorn, Solr, Iceberg, Dubbo, Hudi,
Accumulo, Xerces, MXNet, Zookeeper

*Incubator projects *which may have some interest in, or focus on,
performance**(again just top results):
Crail, Eagle, Nemo, Skywalking, MXnet, HAWQ, Mnemonic, CarbonData,
Drill, ShenYu, Tephra, Sedona

*References *(randomly selected to show the range of open-source
performance engineering topics available, rather than the quality of
articles):

  1. Performance Engineering for Apache Spark and Databricks Runtime
     ETHZ, Big Data HS19
     <https://archive-systems.ethz.ch/sites/default/files/courses/2019-fall/bigdata/Databricks%20ETHZ%20Big%20Data%20HS19.pdf>
  2. Real time insights into LinkedIn's performance using Apache Samza
     <https://engineering.linkedin.com/samza/real-time-insights-linkedins-performance-using-apache-samza>
  3. A day in the life of an open source performance engineering team
     <https://opensource.com/article/19/5/life-performance-engineer>
  4. Locating Performance Regression Root Causes in the Field Operations
     of<https://ieeexplore.ieee.org/document/9629300>Web-based Systems:
     An Experience Report Published in: IEEE Transactions on Software
     Engineering (Early Access)
     <https://ieeexplore.ieee.org/document/9629300>
  5. How to Detect Performance Changes in Software History: Performance
     Analysis of Software System Versions
     <https://dl.acm.org/doi/10.1145/3185768.3186404>
  6. Performance-Regression Pitfalls Every Project Should Avoid
     <https://www.eetimes.eu/performance-regression-pitfalls-every-project-should-avoid/>
  7. How to benchmark your websites with the open source Apache Bench
     tool
     <https://www.techrepublic.com/article/how-to-benchmark-your-websites-with-the-open-source-apache-bench-tool/>
  8. Benchmarking Pulsar and Kafka - A More Accurate Perspective on
     Pulsar’s Performance
     <https://streamnative.io/blog/tech/2020-11-09-benchmark-pulsar-kafka-performance/>
  9. Performance-Analyse: Apache Cassandra 4.0.0 Release
     <https://benchant.com/blog/cassandra-4-performance>
10. Log4J Performance - This page compares the performance of a number
     of logging frameworks
     <https://logging.apache.org/log4j/2.x/performance.html>
11. SystemML Performance Testing
     <https://systemds.apache.org/docs/1.0.0/python-performance-test.html>