You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Kip M Twitchell <ki...@us.ibm.com> on 2021/09/24 17:55:17 UTC

Apache Spark Architecture and GenevaERS Open Source Community

Spark Development Community:



I lead an open-source project called GenevaERS, which has been continuously
developed since the 90’s, and has many characteristics like Spark.  Our
project has been experimenting with Apache Spark for a year now to see if
there are complementary areas between the projects.  I wondered if someone
from the Spark team would be interested in discussing that.



GenevaERS runs on z/OS, a mainframe and is an Active Project of the Linux
Foundation’s Open Mainframe Project.  It has Extract and Format Phases,
similar in some respects to Map-Reduce.  It is a parallel processing engine
that generates and executes highly efficient machine code, created to resolve
all processes (“queries” if you will) in one scan of the source data.  It is
often used to scan billions of rows of data in a few minutes, performing at
times billions of joins or look-ups.



The project team thinks there may be architectural benefits in the Spark space
to learn about the GenevaERS extract engine.  Specifically, we think Spark
might benefit from the idea of automatically doing multiple functions in one
pass through a source, perhaps in the map phase.  



We typically have an Open R&D Hour on Fridays at noon ET on the Webex link
below if that was convenient but would be willing to set up another session if
desired.  <https://ibm.webex.com/meet/kip.twitchell>  



Kip Twitchell

Technical Steering Committee Chair

GenevaERS Project  
IBM Global Business Services  
Kip.Twitchell@us.ibm.com  
630-248-0443 (cell)

  
  
\--------------------------------------------------------------------- To
unsubscribe e-mail: dev-unsubscribe@spark.apache.org