You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by ec...@apache.org on 2019/01/04 10:38:22 UTC
[beam] branch spark-runner_structured-streaming updated (3533779 ->
6392179)
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a change to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git.
discard 3533779 Fix spotlessJava issues
discard 77cacde Add ReadSourceTranslatorStreaming
discard 91baa65 Cleaning
discard 36a72f7 Use raw Encoder<WindowedValue> also in regular ReadSourceTranslatorBatch
discard d9869c4 Split batch and streaming sources and translators
discard 8591d63 Run pipeline in batch mode or in streaming mode
discard 87bec8e Move DatasetSourceMock to proper batch mode
discard 017dcb9 clean deps
discard e510711 Use raw WindowedValue so that spark Encoders could work (temporary)
discard 98d9049 fix mock, wire mock in translators and create a main test.
discard 83f8487 Add source mocks
discard 44fd6c7 Experiment over using spark Catalog to pass in Beam Source through spark Table
discard e7ed784 Improve type enforcement in ReadSourceTranslator
discard 8bcfa5f Improve exception flow
discard e86247f start source instanciation
discard 8e08c58 Apply spotless
discard f54899b update TODO
discard 3e87c5e Implement read transform
discard b2d37bf Use Iterators.transform() to return Iterable
discard 7b00f7c Add primitive GroupByKeyTranslatorBatch implementation
discard 5ca19f2 Add Flatten transformation translator
discard 59acff8 Create Datasets manipulation methods
discard 26238ce Create PCollections manipulation methods
discard 9061cb0 Add basic pipeline execution. Refactor translatePipeline() to return the translationContext on which we can run startPipeline()
discard 463178b Added SparkRunnerRegistrar
discard 358170a Add precise TODO for multiple TransformTranslator per transform URN
discard c567d40 Post-pone batch qualifier in all classes names for readability
discard af6a350 Add TODOs
discard 74054fe Make codestyle and firebug happy
discard 92ed130 apply spotless for e-formatting
discard 41742be Move common translation context components to superclass
discard fc03a4a Move SparkTransformOverrides to correct package
discard 3251ab2 Improve javadocs
discard 8184de0 Make transform translation clearer: renaming, comments
discard 3ce792d Refactoring: -move batch/streaming common translation visitor and utility methods to PipelineTranslator -rename batch dedicated classes to Batch* to differentiate with their streaming counterparts -Introduce TranslationContext for common batch/streaming components
discard 53b2e71 Initialise BatchTranslationContext
discard 5cb1add Organise methods in PipelineTranslator
discard b6b426e Renames: better differenciate pipeline translator for transform translator
discard b77c7bb Wire node translators with pipeline translator
discard 91f283a Add nodes translators structure
discard 999761a Add global pipeline translation structure
discard ad32304 Start pipeline translation
discard 8b5c33c Add SparkPipelineOptions
discard 0a89b75 Fix missing dep
discard 03d333d Add an empty spark-structured-streaming runner project targeting spark 2.4.0
add cb4358c [BEAM-6098] Support lookup join symmetric in left/right inputs
add b484402 Merge pull request #7118: [BEAM-6098] Support lookup join symmetric in left/right inputs
add 7822091 [BEAM-6082] Fix enum for SQL query 5 and 7
add 9d05eed Merge pull request #7133: [BEAM-6082] Fix enum for SQL query 5 and 7
add 7963df4 Updates the script for cutting a release branch
add bedd4d9 Merge pull request #7108: Updates the script for cutting a release branch
add e662b5f [BEAM-6102] Legacy-worker gradle changes
add adf659e [BEAM-6114] Add isBounded() to BeamRelNode and BeamSqlTable, use for JOIN
add 048471b Merge pull request #7121: [BEAM-6114] Add isBounded() to BeamRelNode and BeamSqlTable, use for JOIN
add 148808a [BEAM-6102] Move the worker jar before the pipeline jar when using the --dataflowWorkerJar option (#7143)
add 02c763b [BEAM-5197] Fix UnboundedSourceWrapper#testWatermarkEmission (#7138)
add f84feac [BEAM-6058] Adding option for flink configuration directory and setting config in exectution environment
add c2aaf2d Merge pull request #7031: [BEAM-6058] Adding option for Flink configuration directory and setting config in execution environment
add d922f70 [BEAM-6102] Fix several packages that were being bundled without relocation within the Dataflow worker. (#7145)
add 4082ce5 [BEAM-2687] Correctly handle read-before-write semantics for user state. (#7102)
add 01ef416 Add argument parsing and filtering to coders microbenchmark.
add b76f38b Missing type declaration in default coder.
add 45c114d Optimize sequence coders.
add 18fde2e Restore unicode typing after Py3 changes.
add 0a33d9b Optimize LengthPrefixCoder.
add eb4a65a Optimize WindowedValue encoding.
add 1cc6efc Optimize IntervalWindow encoding.
add 06d4a56 Merge pull request #7130 from Optimize Python coders.
add 400c174 BEAM-6134: MongoDbIO add support for projection
add 62b63fc Merge pull request #7148 from chaimt/BEAM-6134
add 68be16a [BEAM-6100] Collect metrics properly in Load tests (#7087)
add 3fc46b5 [BEAM-5197] Fix possible overflow of timestamp for processing timer
add 0261783 Add an option to create Dataflow piplines from a snapshot
add 55fae0d Fix merge conflicts
add b9e6fa1 Merge pull request #7092 from dpmills/snapshots
add b1c0993 Update Java Container beam-master-20181128
add 05a7801 Undo accidental commit b1c09939a54527ca47d02daf742001012a2be149
add 8aff81d Update Dataflow worker container to beam-master-20181128
add df23d47 Merge pull request #7154 from charlesccychen/update-container
add de3bd4af Future-proofing for TensorFlow2 (#7155)
add 004cc54 Revert "Merge pull request #7051 from markflyhigh/py-move-off-apitools-1"
add d027d96 Merge pull request #7156: [BEAM-6145] Revert "Merge pull request #7051 from markflyhigh/py-move-off-apitools-1"
add 0b7dc3d [BEAM-6148] ptest support arbitrary runners
add 64a6a3c Merge pull request #7158 from lostluck/ptest
add a440344 [BEAM-4726] Add heap profiling hook
add d9ae90f Merge pull request #7159 from lostluck/heap
add 0a45041 [BEAM-3612] Closurize method invocations
add 1024472 Merge pull request #7161 from lostluck/wrap
add 6cd0d9b [BEAM-3661] Port TriggerExampleTest off DoFnTester
add 2b5a604 Merge pull request #7125: [BEAM-3661] Port TriggerExampleTest off DoFnTester
add b629727 [SQL] Add support for TableMacro UDF
add 2f0f3f2 Merge pull request #7141 from kanterov/kanterov_table_functions
add b8d2423 Fix Go lints on Initialized
add 96fe92d Merge pull request #7167 from lostluck/fixtypos
add 4a6527d Revert "Optimize several Python coder implementations."
add fc373df Merge pull request #7166 from apache/revert-7130-fast-coders
add 5b9641d [BEAM-5978] Use dynamic port when starting auto starting jobserver
add 38e6be9 [BEAM-5978] Adding libltdl7 to flink job server docker
add 65136cf [BEAM-5978] Correctly pick the docker executable
add 63a4c18 [BEAM-5978] Increase portable wordcount threads to avoid dead lock
add 49c8386 [BEAM-6146] Add precommit for portable python
add 5506335 [BEAM-6146] Portable Python Precommit test
add b06b8e5 Merge pull request #6954: [BEAM-6146] Add portable WordCount to Python PreCommit
add 4ca0e89 [BEAM-3659] Port UserScoreTest off DoFnTester
add bd81e8b Merge pull request #7126: [BEAM-3659] Port UserScoreTest off DoFnTester
add d136637 [BEAM-2939] SplittableDoFn Java SDK API Changes (#6969)
add e55c514 [BEAM-6111] Fix flaky PortableTimersExecutionTest (#7171)
add ac10c91 Add packageDeclaration Checkstyle check + fix an issue it threw up (#7172)
add ad0cb4f Simplifying a few junit assertions (#7139)
add d116dd3 [BEAM-6162] Fix PipelineOptions argparse behavior
add 1768511 Merge pull request #7176 from charlesccychen/fix-argparse
add a9dff65 Merge pull request #7175: [BEAM-5884] Move the nullable attribute onto FieldType.
add 6119242 [BEAM-6163] Build python boot for mac and support process env on mac
add 6517025 Merge pull request #7178: [BEAM-6163] Build python boot for mac and support process env on mac
add 60cc4c1 Merge pull request #7147 : [BEAM-4453] Use constructors to generate schema POJOs
add 42f39e9 [BEAM-6143] Upgrade to Mongo Java Driver 3.9.1 and update the API
add b469880 Merge pull request #7151: [BEAM-6143] Upgrade to Mongo Java Driver 3.9.1 and update the API
add f2d1581 [BEAM-5396] Savepoint restore option in Flink runner.
add 236d0dd Merge pull request #7169: [BEAM-5396] Savepoint restore option in Flink runner
add 3432c04 [BEAM-6032] Move PortableValidatesRunner configuration out of BeamModulePlugin (#7173)
add ff93af2 [BEAM-5925] Add .withSocketAndRetryTimeout() and .withConnectTimeout() to ElasticseachIO.ConnectionConfiguration
add a566874 [BEAM-5925] Set a socket and retry timeout of 1.5min and a connect timeout of 5s for all connections in all elasticsearch unit tests"
add d98c9f4 Merge pull request #7065 from wscheep/elastic_maxretrytimeout
add aeeb085 [BEAM-5984] Enable publishing load test results to BigQuery
add 9d3daf2 [BEAM-5984] Use more high level method for creating BQ rows in Nexmark too
add ab3519e [BEAM-5984] Provide generic BigQueryResultPublisher class
add 90ef4eb Merge pull request #7090: [BEAM-5984] Enable publishing load test results to Big Query
add 4c97d0c [BEAM-6146] Run pre-commit wordcount in batch and streaming mode.
add b917a86 Merge pull request #7180: [BEAM-6146] Run pre-commit wordcount in batch and streaming mode
add f040a4e [BEAM-5859] Improve operator names for portable pipelines
add e2e31c9 Merge pull request #7150: [BEAM-5859] Improve operator names for portable pipelines
add aa4d9bc Add more AVRO utilities to convert between Beam and Avro. Add schema-conversion utilities as well as a conversion from a Beam Row into GenericRecord.
add 6b7cf42 Merge pull request #7181 : [BEAM-4454] Add more AVRO utilities to convert between Beam and Avro.
add 78c1a10 [BEAM-6160] Use service server rather than service (#7168)
add 270ef6a [BEAM-2939] Add support for backlog reporting to byte key and offset restriction trackers. (#7177)
add 3494a8f [BEAM-5058] Run basic ITs in Python Precommit in parallel
add 3a348e8 Merge pull request #7163 from markflyhigh/py-precommit-it
add 34420cc [BEAM-6174] Kryo dependency removed.
add 54214f8 Merge pull request #7194: [BEAM-6174] Kryo dependency removed.
add 2aa5d07 [BEAM-5778] Add integrations of Metrics API to Big Query for SyntheticSources load tests in Python SDK
add ee515bf Merge pull request #6943: [BEAM-5778] Add integrations of Metrics API to Big Query for SyntheticcSources load tests in Python SDK
add 08dafbe [BEAM-1628] Allow empty port for flink master url
add bec7dac Merge pull request #7187: [BEAM-1628] Allow empty port for flink master url
add 9c018ac [BEAM-6122] Update committer guidelines
add 1c67861 Address review comments.
add 702b9de Move squash paragraph under merging.
add e5d9cf4 Merge pull request #7129: [BEAM-6122] Update committer guidelines
add 9593adb [BEAM-6077] If available, use max_parallelism for splitting unbounded source
add 5565b0a [BEAM-6077] Tests for read source translator
add a3a8a32 Merge pull request #7128: [BEAM-6077] If available, use max_parallelism for splitting unbounded source
add 74ed7ac [BEAM-5462] get rid of <pipeline>.options deprecation warnings in tests
add 95d0ac5 Merge pull request #6930: [BEAM-5462] get rid of <pipeline>.options deprecation warnings in tests
add a3d2611 [BEAM-2400] Use getJobId() consistently
add a5b36c5 Merge pull request #7199: [BEAM-2400] Use getJobId() consistently
add 2f9330c [BEAM-6180] Remove duplicated IdGenerator from runner harness and use IdGenerator from fnexecution instead. (#7201)
add 681b5cd Merge pull request #7204: [BEAM-5884] Fix FieldType comparison in BeamSQL
add 33453c2 Revert "Revert "Optimize several Python coder implementations.""
add 50d8392 [BEAM-6153] Stricter interval window comparison.
add e7ab8c4 Merge pull request #7170 from [BEAM-6153] Re-enable coder optimization.
add 43fe997 [BEAM-5817] Add SQL bounded side input join to queries that are actually run
add a3510e0 Merge pull request #7205: [BEAM-5817] Add SQL bounded side input join to queries that are actually run
add d897c5c BEAM-6151: MongoDbIO add support mongodb server with self signed ssl
add 9fbe80e Merge branch 'master' into BEAM-6151
add 385f2a1 Merge pull request #7162 from chaimt/BEAM-6151
add 9448dba [BEAM-6182] Disable conscrypt by default (#7203)
add 8f15b88 [BEAM-3912] Add HadoopOutputFormatIO support
add 406f8d7 [BEAM-3912] Remove useless dep
add 86f723e [BEAM-3912] Add HadoopOutputFormatIO support
add 9863c79 [BEAM-3912] Remove useless dep
add fa9cc4a Fix typo in test name
add 757b71e [BEAM-3912] Implement HadoopFormatIO.Write
add 20e3e24 [BEAM-5309] Add streaming support for HadoopFormatIO
add 4adc254 [BEAM-5309] Add streaming support for HadoopFormatIO
add aec6d82 Merge pull request #6691: [BEAM-5309] Add streaming support for HadoopFormatIO
add af05ee2 Add portable-runner dependency to wordcount example as one of the defaults.
add 736077c Merge pull request #7213 from [BEAM-6184] Add portable-runner dependency to example pom.xml
add 60da04a [BEAM-5859] Better handle fused composite stage names.
add 5ae80df Merge pull request #7208: [BEAM-5859] Better handle fused composite stage names.
add 0edc85e [BEAM-6067] In Python SDK, specify pipeline_proto_coder_id property in non-Beam-standard CloudObject coders (#7081)
add ecc2d84 Fixup User_COUNTER_URN_PREFIX to contain the trailing: (#7188)
add 75d45e2 [BEAM-6159] Make Dataflow worker use ExecutableStage to process bundle (#7015)
add a466104 Update PortableTimersExecutionTest to use PAssert, to prevent a concurrency issue collecting the test results.
add 79df784 Remove extra timeout code from PortableTimersExecutionTest.
add ce15b25 Merge pull request #7214: Update PortableTimersExecutionTest to use PAssert
add 5e94da3 [BEAM-5920] Add additional owners for Community Metrics
add 5850c00 Merge pull request #7186: [BEAM-5920] Add additional owners for Community Metrics
add bc859cc [BEAM-6181] Reporting user counters via MonitoringInfos in Portable Dataflow Runner. (#7202)
add 9159d9b [BEAM-5321] Port transforms package to Python 3 (#7104)
add 4cd1226 Add QueueingBeamFnDataClient and make process, finish and start run on the same thread to support metrics. (#6786)
add 1f6dd22 [BEAM-6155] Updates the GCS library the Go SDK uses.
add ca6ee63 Merge pull request #7182 from bramp/BEAM-6155
add 4d63cd3 [BEAM-5167] Log unscheduled process bundle requrests
add e6add6a Merge pull request #7192 from [BEAM-5167] Log unscheduled process bundle requests
add fceeaef Removing some unnecessary parentheses
add 64c62b1 Merge pull request #7185 Removing some unnecessary parentheses
add ee0801f Enabling the ArrayTypeStyle checkstyle module
add ec5602f Merge pull request #7062 Enabling the ArrayTypeStyle checkstyle module
add 61e8106 Fix translate_pattern test on Python 3.7
add 40977f4 Merge pull request #6739 [BEAM-5787] Fix test_translate_pattern on Python 3.7
add c314dfe Add a MonitoringInfoSpec proto and SimpleMonitoringInfoBuilder to provide specs and validate MonitoringInfos are properly populated.
add a21b196 [BEAM-6194] Follow up with cleanup for https://github.com/apache/beam/pull/7015 (#7219)
add c3102a9 Fix precommits due to concurrent submissions.
add aa09bbb Merge pull request #7230 Fix precommits due to concurrent submissions.
add ab59d6d [BEAM-6139] Adding support for BQ GEOGRAPHY data type (#7144)
add ac3a9df Redirect from nexmark page's older location
add a226343 Merge pull request #7225 from udim/patch-2
add 17968a2 [BEAM-6195] Make ProcessRemoteBundleOperation map PCollectionId into correct OutputReceiver and throws Exception when there is more than one input PCollection. (#7223)
add 378d907 [BEAM-6167] Add class ReadFromTextWithFilename (Python) (#7193)
add 36e3f98 [BEAM-4150] Use unwindowed coder in FnApiRunner optimization phases.
add 6e1a8cd [BEAM-6186] Optimization cleanup: move phase utilities out of local scope.
add a076444 Merge pull request #7227 from [BEAM-6186] Optimization cleanup
add e147bf9 [BEAM-6120] Support retrieval of large gbk iterables over the state API.
add c2a9cac Merge pull request #7127 from [BEAM-6120] Large gbk iterables
add c3e636d Add instructions to post-commit policy web page, according to discussions in dev mailing list.
add f976430 Update website/src/contribute/postcommits-policies-details.md
add 6e3cf84 Merge pull request #7095 from HuangLED/update_postcommit_doc
add 2b95624 Move string literals to the left hand side of the expression in a few places
add e7b2f30 Merge pull request #6887 from coheigea/string_literals
add 2905227 Clarify usage of PipelineOptions subclass
add e963882 Merge pull request #6872 Clarify snippet for PipelineOptions subclass
add d4fd5a1 [BEAM-5866] Override structuralValue in ListCoder
add 8b4f60e [BEAM-5866] Override structuralValue in MapCoder
add 7a394cf Merge pull request #6862 from [BEAM-5866] structuralValue in List/MapCoder
add 8783994 [BEAM-4444] Parquet IO for Python SDK (#6763)
add a07da9e Upgrade to Apache Tika 19.1
add 1c6f145 Merge pull request #6719 Upgrade to Apache Tika 1.19.1
add 8100f32 [BEAM-6079] Add ability for CassandraIO to delete data
add 07d9311 [BEAM-6079] Fix access level and clean up generics issues
add b0aae58 Merge pull request #7064: [BEAM-6079] Add ability for CassandraIO to delete data
add 010357d [BEAM-3657] Port JoinExamplesTest off DoFnTester
add 29a7917 Merge pull request #7179: [BEAM-3657] Port JoinExamplesTest off DoFnTester
add f8ef83b [BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly
add b592f94 Merge pull request #7189: [BEAM-5514] BigQueryIO doesn't handle quotaExceeded errors properly
add dc0946f [BEAM-6201] Move fromJsonString() method to SyntheticOptions class
add cd8f014 [BEAM-6201] Add SyntheticDataPubSubPublisher pipeline
add b0069eb Merge pull request #7238: [BEAM-6201] Data insertion pipeline
add eb9065b [BEAM-6176] Support IPv6 addresses for Flink master url
add 4d65d72 Merge pull request #7196: [BEAM-6176] Support IPv6 addresses for Flink master url
add f57def8 [BEAM-6172] Adjust Flink metric names / Add metric reporting tests
add d80957c Merge pull request #7207: [BEAM-6172] Adjust Flink metric names / Add metric reporting tests
add 57c248b Remove trailing whitespace.
add 0d2ec39 Quiet c-extension-no-member lint warning.
add edc37f9 Merge pull request #7249: Fix various lint errors in Python presubmit.
add a9709ff [BEAM-6186] Move combiner lifting out of line.
add 3398a88 [BEAM-4678] Support combiner lifting in portable Flink runner.
add d4237ec Merge pull request #7228 from [BEAM-4678] Combiner lifting in portable Flink.
add bfd1be9 [BEAM-6178] Adding beam-sdks-java-bom, adding exportJavadoc flag for applyJavaNature (#7197)
add 17a881b [BEAM-6181] Unexpected metrics should be non-fatal.
add 4160b67 Merge pull request #7250 from [BEAM-6181] Unexpected metrics non-fatal.
add cb06639 [BEAM-6033] normalize httplib2.Http initialization and usage
add 3a182d6 Merge pull request #7032: [BEAM-6033] normalize httplib2.Http initialization and usage
add fddb684 [BEAM-5058] remove unused envdir parameter
add e8cbbf6 Merge pull request #7246: [BEAM-5058] remove unused envdir parameter
add ebe81b9 [BEAM-2943] Fix typo in filename on FlinkRunner page
add 8c8b7c3 Merge pull request #7254: [BEAM-2943] Fix typo in filename on FlinkRunner page
add 73a4325 Use environment to control worker startup in FnApiRunner.
add efd261a [BEAM-6094] Implement external environment for Portable Beam.
add ec55000 Merge pull request #7078 from [BEAM-6094] External portable BeamPython.
add 9853bd7 [BEAM-6213] Fix matching of glob patterns in windows local filesystem
add 289d2b2 Merge pull request #7258: [BEAM-6213] Fix matching of glob patterns in windows local filesystem
add c00dfa5 Fix combiner lifting for external job service.
add 16b34cf Merge pull request #7260 Fix combiner lifting for external job service.
add bc11c06 [BEAM-6216] Update flinkMaster URL in the nexmark wep-page to reflect change in FlinkExecutionEnvironments.
add ffcd9f4 [BEAM-6216] Update flinkMaster URL in nexmark postCommit script to reflect change in FlinkExecutionEnvironments.
add a53f56a Merge pull request #7261 from echauchot/BEAM-6216-flink-local
add 9e8ac83 [BEAM-6240] Clean up duplicated SocketAddressFactory class.
add 798b3b3 Merge pull request #7241: [BEAM-6204] Clean up duplicated SocketAddressFactory class.
add 5e506bf [BEAM-6205] Setup gradle task ro run fnapi worker test with use_executable_stage_bundle_execution
add 5574f47 Merge pull request #7243: [BEAM-6205] Setup gradle task ro run fnapi worker test with use_execu…
add 8828b16 Add a MonitoringInfoLabelProps proto to attach the proper key string name to MonitoringInfo label Enums
add a61f2c5 Merge pull request #7242: Add a MonitoringInfoLabelProps proto to attach the proper key string name to MonitoringInfo label Enums
add 11513c3 Merge remote-tracking branch 'upstream/master' into pr7244
add dc10f75 Merge pull request #7244: [BEAM-6138] Add a MonitoringInfoSpec proto and SimpleMonitoringInfoBuilder to pro…
add 41719ac Fixing publishing problem introduced in #7197
add 20df151 Merge pull request #7265: Fixing publishing problem introduced in #7197
add 07bae67 [BEAM-5419] Add Flink multi-version build layout
add 6308a6d [BEAM-5267] Make source Flink v1.6 compatible
add d0254f0 [BEAM-5267] Add Flink v1.6 target for Flink Runner
add 3400ba6 Merge pull request #7229: [BEAM-5419] Add Flink multi-version builds
add 2a40c57 Add remaining Schema support for AVRO records: * Add support for SpecificRecord using ByteBuddy codegen. * Add helper methods for GenericRecord. * Fix uncovered bugs involving nullable support.
add 0a74b17 Add period to sentence.
add 8b20602 Merge pull request #7233: [BEAM-4454] Add remaining functionality for AVRO schemas
add 3fafec1 [BEAM-5320] [BEAM-6106] Finish Python 3 porting for testing module (#7262)
add bd0103c [BEAM-5978] Changing parallalim for wordcount to 1
add 1518361 Merge pull request #7174: [BEAM-5978] Changing parallelism for wordcount to 1
add a89d296 Fix broken beam-sdks-python:test
add 4eb7744 Merge pull request #7273: Fix broken beam-sdks-python:test
add 0b3b9e0 [BEAM-6138] Add User Counter Metric Support to Java SDK (#6799)
add 1775c19 Stop subclassing user pojos.
add 6280255 spotless.
add f8f9ca5 Merge pull request #7234: [BEAM-4453] Stop subclassing user POJOs.
add e2db4d5 Updates Beam Website to use 2.9.0 as the latest release
add 0f14b40 Merge pull request #7215: Updates Beam Website to use 2.9.0 as the latest release
add daed1e6 Blog for Apache Beam 2.9.0 release
add 5df8cb2 Merge pull request #7275: Blog for Apache Beam 2.9.0 release
add 2cac7ba Updates blog for 2.9.0 release
add e21b80a Merge pull request #7278: Updates blog for 2.9.0 release
add eae5521 Add reshuffle option to Create.
add a34e459 Merge pull request #7274 Add reshuffle option to Create.
add 0d60a89 [BEAM-6229] Fix LoadTestResult to store propoer timestamp and runtime
add 88f181c Merge pull request #7283: [BEAM-6229] Fix LoadTestResult to store propoer timestamp and runtime
add fc38359 [BEAM-6227] Fix GroupByKey with null values in Flink Runner
add 2886473 Merge pull request #7282: [BEAM-6227] Fix GroupByKey with null values in Flink Runner
add 788ce61 Merge pull request #7267: [BEAM-4454] Support Avro POJO objects
add 54e2fc1 [BEAM-6206] Add CustomHttpErrors a tool to allow adding custom errors for specific failing http calls. Plus, add a custom error message in BigQueryServicesImpl. (#7270)
add 69358c5 [BEAM-6191] Remove redundant error logging for Dataflow exception handling
add 977080f Merge pull request #7220: [BEAM-6191] Remove redundant error logging for Dataflow exception handling
add f6c1dd5 [BEAM-6179] Fixing bundle estimation when all xs are same
add 71890da Merge pull request #7280 from angoenka/fix_bundle_estimation
add 5ec695b [BEAM-6150] Superinterface for SerializableFunction allowing declared exceptions (#7160)
add 4f90294 Remove Gradle from Jenkins job names
add ec3f792 Merge pull request #7286 from swegner/jenkins_gradle
add b7035c1 [BEAM-6225] Setup Jenkins Job to Run VR with ExecutableStage (#7271)
add 53a5ce7 [BEAM-4594] Support state continuation tokens over the Fn API.
add 39721d2 Merge pull request #7252 from [BEAM-4594] State continuation tokens.
add f9bc485 Mention portable Flink runner support for state and timers in 2.9.0 release blog
add bd5bbf9 Merge pull request #7294: Mention portable Flink runner support for state and timers in 2.9.0 release blog
add c26d532 More robust FnApi Runner.
add ca3eb14 Merge pull request #7251 from More robust FnApi Runner.
add e4f6517 [BEAM-6227] Do not compare recovered state against structural null value
add e108cca Merge pull request #7291: [BEAM-6227] Do not compare recovered state against structural null value
add 8a333e7 [BEAM-6235] Upgrade AutoValue to version 1.6.3
add 4588c25 Merge pull request #7285: [BEAM-6235] Upgrade AutoValue to version 1.6.3
add 89ad88d [BEAM-2873] Setting number of shards for writes with runner determined sharding
add a68f209 Merge pull request #4760: [BEAM-2873] Setting number of shards for writes with runner determined sharding
add a6d4345 [BEAM-6200] Deprecate old HadoopInputFormatIO, move it into new HadoopFormatIO
add 67d0f78 Merge pull request #7263: [BEAM-6200] Deprecate old HadoopInputFormatIO, move it into new HadoopFormatIO
add 2a5cc73 [BEAM-5419] Simplify job-server-container targets
add 1061dc9 Merge pull request #7299: [BEAM-5419] Simplify job-server-container targets
add da7fec2 [BEAM-6190] Add processing stuck message to Pantheon.
add b070e89 Fixed some style errors and tests that needed an additional parameter.
add fa522c5 Merge pull request #7240 from dustin12/lullError
add 2555945 Revert "[BEAM-5978] Changing parallelism for wordcount to 1"
add dc62028 [BEAM-6067] Update BeamPython side-input support in the Dataflow runner for the unified worker. (#7269)
add bc1609d [BEAM-6170] Change Nexmark stuckness warnings to not cause pipeline failure.
add c29092e Merge pull request #7191 from scwhittle/remove_stuck_error
add 0cdfca4 Avoid creating a variable and immediately returning it.
add af5d7bf Merge pull request #7007 from coheigea/return
add eab1d0e [BEAM-6252] SamzaRunner: Add a registrar to allow customized DoFnInvoker
add 6a6fbe4 Allow Samza DoFnInvoker to generate configs and pass in TaskContext
add 319a370 Merge pull request #7301: [BEAM-6252] SamzaRunner: Add a registrar to allow customized DoFnInvoker
add 473e5a6 [BEAM-6253] SamzaRunner: Add a few customized transforms for runner use cases
add 6ba3a45 Merge pull request #7302: [BEAM-6253] SamzaRunner: Add a few customized transforms for runner use cases
add 2563e92 Update Dataflow Python container to 20181218
add 335fdb2 Merge pull request #7306 from charlesccychen/update-container
add a7315c5 [BEAM-6197] Log time for Dataflow GCS upload of staged files
add 095ce13 Merge pull request #7235 from alanmyrvold/uploadGCS
add df4b623 [BEAM-5167] Ensure monitoring thread does not prevent process exit.
add d3fbf80 Merge pull request #7248 from robertwb/monitor-thread
add 4573c56 Fix time format on Windows.
add e4577ed Merge pull request #7257 Fix time format on Windows.
add 0ef5e66 Update release guide with new Jenkins job name.
add aa0fdc5 Merge pull request #7311: Update release guide with new Jenkins job name.
add 384de82 WEBSITE: update community nav, add in-person page
add 5845e2a Merge pull request #7314: WEBSITE: update community nav, add in-person page
add b5223f4 [BEAM-6263] Prevent NullPointer on concurrent JobServer shutdown
add a50ae05 [BEAM-6263] Prevent port collisions in FlinkJobServerDriveTest
add 5d3b420 [BEAM-6263] Prevent stderr->stdout redirection
add 19f7dfa [BEAM-6263] Restore stderr on exceptions to print the error
add 302b883 Merge pull request #7309: [BEAM-6263] Fix error-prone test setup for FlinkJobServerDriver
add 6ff3129 [BEAM-6186] Finish moving optimization phases.
add dba30b4 Merge pull request #7281 from [BEAM-6186] Finish moving optimization phases.
add 6de7cb7 [BEAM-6094] Add loopback environment for java.
add a49c835 More complete exception message.
add 66ff825 Merge pull request #7307 [BEAM-6094] Add loopback environment for java.
add 6e2ca58 [BEAM-6245] Set translation mode directly on PipelineOptions
add f67ac59 [BEAM-6245] Add integration test for FlinkTransformOverrides
add 85bd95e Merge pull request #7296: [BEAM-6245] Add integration test for FlinkTransformOverrides
add 8214cb6 [BEAM-5449] Tagging failing ULR ValidatesRunner tests.
add bd68ef6 Merge pull request #7295: [BEAM-5449] Tagging failing ULR ValidatesRunner tests.
add 0c589db [BEAM-5993] Create SideInput Load test (#7020)
add 61d7e53 Fix documentation for PCollection Type in XMLIO example
add f036699 Merge pull request #7320: [BEAM-6270] Fix documentation for PCollection Type in XmlIO example
add c0655d1 Rename v1_13_1 to v1p13p1.
add 85ea7ba Depend on local vendored jar instead of the one from Maven
add 2afc42b Disable 'jar' task for vendoring projects.
add a5a139d Merge pull request #7324: [BEAM-6056] Rename vendored guava relocation to v1p13p1
add 33c85e1 [BEAM-6268] Ignore failed HadoopFormatIOCassandraTest (#7325)
add 8850ad7 Python 3 port io.filesystem module
add 04a647d Merge pull request #7318 from RobbeSneyders/filesystem
add 7217b47 [BEAM-6273] Update dependencies pages with 2.9.0
add 230c7c9 Merge pull request #7327: [BEAM-6273] Update dependencies pages with 2.9.0
add 9a51eb3 [BEAM-6179] Fixing itterable comparison
add d9e318c Fix whitespace.
add bed5747 Merge pull request #7313 from angoenka/fix_bundle_estimation
add 7248602 [BEAM-6262] KinesisIO - gracefully shutdown executor service
add e9d51ec Merge pull request #7315: [BEAM-6262] KinesisIO - gracefully shutdown executor service
add 5d305af Fix performance regression.
add c039440 Merge pull request #7331: [BEAM-6276] Fix performance regression.
add 6c39b61 Adds a link to release notes for Beam 2.9.0.
add 7c8f7ba Merge pull request #7329: Adds a link to release notes for Beam 2.9.0
add 1e41220 [BEAM-6165] Send metrics to Flink in portable Flink runner (#7183)
add e07374a [BEAM-5334] Remove unused 'language' argument
add 4290921 Merge pull request #7264 from udim/perf-tests
add 9d4302d [BEAM-5539] Beam Dependency Update Request: google-cloud-pubsub
add 0644c55 Merge pull request #7268 from ihji/upgrade_pubsub
add b44ecb4 [BEAM-6286] Add SamzaRunner profile to mvn archetype
add fb4b151 Merge pull request #7335: [BEAM-6286] Add SamzaRunner profile to mvn archetype
add 8d611b4 [BEAM-6212] Add MongoDbIO ordered option
add 094586d Merge pull request #7256: [BEAM-6212] Add MongoDbIO ordered option
add 7302aef Update Slack invitation on #general vs #beam channel
add 845de99 [BEAM-6283] Convert PortableStateExecutionTest and PortableExecutionTest to using PAssert
add f1e339b [BEAM-6295] Fix versions in 2.8.0 Java dependencies table
add 941cb27 Merge pull request #7341 from melap/dependencies
add e6e85ed Reimplement GCS copies with rewrites.
add b196397 Merge pull request #7050: [BEAM-5959] Reimplement GCS copies with rewrites.
add 24aa20f Add toplevel :sqlPostCommit gradle command
add 0f3560a Add Jenkins job to run :sqlPostCommit
add 0b88bca Merge pull request #7338: [BEAM-6288] Add SQL postcommit
add 80b0c6a [BEAM-6295] Fix versions in 2.7.0 Java dependencies table
add 76b180b Merge pull request #7342 from melap/dependencies
add 1c2d631 Update JUnit
add ad43619 Merge pull request #7344: [BEAM-6299] Update JUnit to fix bug with parameterized tests
add 2f5ba05 Put generated getter/setter/creator classes in the same package as the class they are modified.
add a02d884 Merge pull request #7345: [BEAM-6300] Put generated getter/setter/creator classes in the same package as the class they access
add 8bdbb33 Add schema support to AvroIO and PubsubIO. For backwards-compatibility reasons, Beam schema support must be explicitly enabled in these sources.
add 4f23004 Remove unneeded @Rule.
add 2681c25 Merge pull request #7290: [BEAM-4454] Support avro schema inference in sources
add a90dabf [website] Point Slack link to #beam channel instead of #general
add 9460fee Merge pull request #7346: [website] Point Slack link to #beam channel instead of #general
add 77791da [BEAM-6239] Add session side input join to Nexmark
add ac8c956 Merge pull request #7287: [BEAM-6239] Add session side input join to Nexmark
add 0ad4a5d [BEAM-6244] Restore updateProducerProperties
add 9b0d8fb [BEAM-6244] Restore validate
add e636294 Merge pull request #7343: [BEAM-6244] KafkaIO: keep KafkaIO.Write compatibility with 2.9.0
add 45a61e4 BEAM-6306 Upgrade Jackson to version 2.9.8
add ba01b8e Merge pull request #7352: [BEAM-6306] Upgrade Jackson to version 2.9.8
add 4ca3cf0 Upgrade to Calcite 1.18
add 718aef7 Merge pull request #7209 from apilloud/upgrade
add 8588d52 Add time usage in seconds for staging files.
add 41eeb39 Merge pull request #7336: Add time usage in seconds for staging files.
add 6439fb1 Python 3 port io.filesystemio module
add d5638e7 Add apache_beam.io.localfilesystem_test to python 3 test suite
add 7710391 Merge pull request #7326: [BEAM-5315] [BEAM-5627] Python 3 port io.filesystemio module
add be7549c [BEAM-6287] pyarrow is not supported on Windows Python 2
add dc01009 Merge pull request #7337: [BEAM-6287] pyarrow is not supported on Windows Python 2
add bac909b Treat VarInt encoding as a Beam primitive encoding in Dataflow runner (#7351)
add f720985 [BEAM-6110] For SQL CoGBK-based join use EARLIEST output timestamp
add f190152 Merge pull request #7115 from kennknowles/sql-join-cogbk-timestamps
add ffec485 Flink 1.5.6 upgrade (#7322)
add 671ed3f Update data source for syncing jobs from Jenkins.
add 8f50ac2 Update deployment versions.
add 498b186 Merge pull request #7364 from Ardagan/FixBMetrics
add e413099 [BEAM-4726] Add arity specialization for calling and returns.
add a47b697 Merge pull request #7355 from lostluck/arity
add 643e562 Enforce the checkstyle IllegalThrows rule for throwing Error + RuntimeException
add 680d911 Merge pull request #7259 from coheigea/illegal_throws
add 100c561 [BEAM-5918] Fix CastTest
add d7c64e7 Merge pull request #7372: [BEAM-5918] Fix CastTest
add 25865d3 [BEAM-5467] Increase test timeout for portable ValidatesRunner tests
add d1384b9 Merge pull request #7376: [BEAM-5467] Increase test timeout for portable ValidatesRunner tests
add 4e8a07b [BEAM-6294] Use Flink rebalance for shuffle.
add 9f2eb34 Merge pull request #7360 [BEAM-6294] Use Flink rebalance for shuffle.
add 2481ee6 Disable BigQueryIO validation since datasets and tables are created during runtime.
add 3a0f70e Merge pull request #7368 from boyuanzz/fix_bq
add eab6759 [BEAM-4725] Use unsafe to avoid small allocations to the heap.
add 8f38b46 Merge pull request #7357 from lostluck/smallbuf
add 54d3857 [BEAM-6325] Cast cross compile output from []byte to string for printing
add 7c9babd Merge pull request #7375 from lostluck/cast
add fb7ae4f [BEAM-5112] Generate code for BeamCalcRel DoFn
add 1ad4aff Remove Beam Interpreter
add e398bee Merge pull request #6417 from apilloud/codegen
add d3a38f5 [BEAM-6316] Fix container image name for PreCommit PortableWordCount
add 9ad1074 Merge pull request #7377: [BEAM-6316] Fix container image name for PreCommit PortableWordCount
add 0d50a17 Fix go runtime break
add ca4defe Merge pull request #7379 from lostluck/fixbreak
add ec6384a [BEAM-6329] Address synchronization issue for portable timers (#7359)
add 1a4db4b [BEAM-5386] Prevent CheckpointMarks from not getting acknowledged
add ea275e4 [BEAM-5386] Assert that source thread enters sleep instead of terminating
add f56c86f Merge pull request #7349: [BEAM-5386] Prevent CheckpointMarks from not getting acknowledged
add 926361b [BEAM-5386] Move assertion out of finally block to not swallow original exception
add 3b8abca Upgrade vendored gRPC artifact version to 0.2
add 15aa88d Merge pull request #7328: [BEAM-6056] Upgrade vendored gRPC artifact version to 0.2
add 14781c7 [BEAM-6056] Source vendored grpc dependency from Maven central
add a25b64d Merge pull request #7388: [BEAM-6056] Source vendored grpc dependency from Maven central
add 095870f Python 3 port io.range_trackers
add 359ddb9 Add io.restriction_trackers_test to Python 3 test suite
add bca5c60 Merge pull request #7358 from RobbeSneyders/trackers
add 5ce0933 Updates release validating to run LeaderBoard example using Dataflow Streaming Engine
add 5dd597e Merge pull request #7365: [BEAM-6249] Adds an Streaming Engine based test to release validation
add 5cdf3a7 [BEAM-5315] Python 3 port io.source* and io.concat_source* modules (#7383)
add 4b039e4 [BEAM-5315] Python 3 port io.filebased_* modules (#7386)
add fc482f1 [BEAM-5959] Add performance testing for writing many files
add 41dd6e1 Merge pull request #7266 from udim/cmek-perf
add a24b1af Move org.apache.beam.runners.samza.util.Base64Serializer to org.apache.beam.runners.core.serialization.Base64Serializer to be used by other runners
add 3b8ae00 Fix visibility of deserialize method
add 0783779 Add missing package-info
add 4660895 Merge pull request #7384 from echauchot/Base64Serializer
add c4590a0 split SerializablePipelineOptions into serialization utils and instance class.
add 5130bcb Merge pull request #7385 from echauchot/exposeSerializationSerializablePipelineOptions
add a404cee Add paddedcell fix to spotlessJava rules.
add c148c35 Merge pull request #7390: [BEAM-6339] Add paddedcell fix to spotlessJava rules.
add c028ebc Upgrade html-proofer and dependencies to latest
add 07c279a Remove broken links to datatorrent.com
add b09e721 Fix pydoc link to GoogleCloudOptions
add fd5e321 Remove broken link to atrato.io
add a79ef89 Fix link to internal anchor
add 5466ac0 Remove stale exclusions from HTML link checker.
add a2986cc Merge pull request #7393: [BEAM-5662] Clean up website html-proofer config
add b02f79f Disable UsesMetricsPusher tests for direct-runner
add f74c979 Fix SplittableDoFnTest#testBoundedness
add 459e730 [BEAM-6352] Ignore tests using Watch PTransform
add 26c73ef [BEAM-6353] Fix TFRecordIOTest
add 92a6c23 [BEAM-6354] Add timeout and ignore hanging tests
add 55ffd97 Add :beam-runners-direct-java:needsRunnerTests to javaPreCommit
add c591727 Merge pull request #7374: Add :beam-runners-direct-java:needsRunnerTests to javaPreCommit
add 3948595 [BEAM-5959] Reorder methods according to convention
add 5716dba Merge pull request #7403 from udim/cmek-perf
add 5212b71 [BEAM-6030] Split metrics related options out of PipelineOptions
add 185cb1a [BEAM-6030] Add Experimental label on MetricsOptions
add bd80118 Merge pull request #7400 from echauchot/BEAM-6030-metrics-sinks-pipelineOptions
new ce39f93 Add an empty spark-structured-streaming runner project targeting spark 2.4.0
new 737af2f Fix missing dep
new 018c773 Add SparkPipelineOptions
new 1c97788 Start pipeline translation
new abf4b46 Add global pipeline translation structure
new 28a9422 Add nodes translators structure
new 3a743c2 Wire node translators with pipeline translator
new 051e8dc Renames: better differenciate pipeline translator for transform translator
new 6695d64 Organise methods in PipelineTranslator
new ec9d634 Initialise BatchTranslationContext
new 476cae8 Refactoring: -move batch/streaming common translation visitor and utility methods to PipelineTranslator -rename batch dedicated classes to Batch* to differentiate with their streaming counterparts -Introduce TranslationContext for common batch/streaming components
new ce484e9 Make transform translation clearer: renaming, comments
new 0033f89 Improve javadocs
new 26f2e4b Move SparkTransformOverrides to correct package
new 4777e22 Move common translation context components to superclass
new 0cfa70d apply spotless for e-formatting
new 91f9ef5 Make codestyle and firebug happy
new 901a1ac Add TODOs
new 2ccccdd Post-pone batch qualifier in all classes names for readability
new b37da3e Add precise TODO for multiple TransformTranslator per transform URN
new bbf583c Added SparkRunnerRegistrar
new 866ef13 Add basic pipeline execution. Refactor translatePipeline() to return the translationContext on which we can run startPipeline()
new 7a645e1 Create PCollections manipulation methods
new 31fb182 Create Datasets manipulation methods
new 286d7f3 Add Flatten transformation translator
new 9e6fc2c Add primitive GroupByKeyTranslatorBatch implementation
new 57ce2d1 Use Iterators.transform() to return Iterable
new 4f150da Implement read transform
new 1ec9356 update TODO
new ebbab69 Apply spotless
new d531bb5 start source instanciation
new a3a87b4 Improve exception flow
new b7283d7 Improve type enforcement in ReadSourceTranslator
new e9ac3c3 Experiment over using spark Catalog to pass in Beam Source through spark Table
new 0452733 Add source mocks
new 8cdc20f fix mock, wire mock in translators and create a main test.
new 1060121 Use raw WindowedValue so that spark Encoders could work (temporary)
new 1184022 clean deps
new 49ee259 Move DatasetSourceMock to proper batch mode
new 340991e Run pipeline in batch mode or in streaming mode
new 1ca4192 Split batch and streaming sources and translators
new 4e0f7a0 Use raw Encoder<WindowedValue> also in regular ReadSourceTranslatorBatch
new 758c1ce Cleaning
new 92a104e Add ReadSourceTranslatorStreaming
new 2f5bdd3 Move Source and translator mocks to a mock package.
new 1cea29d Pass Beam Source and PipelineOptions to the spark DataSource as serialized strings
new 92c94b1 Refactor DatasetSource fields
new d1b549e Wire real SourceTransform and not mock and update the test
new 878ff4e Add missing 0-arg public constructor
new 6392179 Apply spotless
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (3533779)
\
N -- N -- N refs/heads/spark-runner_structured-streaming (6392179)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omit" are not gone; other references still
refer to them. Any revisions marked "discard" are gone forever.
The 50 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
.github/PULL_REQUEST_TEMPLATE.md | 4 +-
.gitignore | 1 +
.test-infra/jenkins/CommonJobProperties.groovy | 2 +-
.../job_PerformanceTests_FileBasedIO_IT.groovy | 16 +
...job_PerformanceTests_FileBasedIO_IT_HDFS.groovy | 13 +
.../jenkins/job_PerformanceTests_Python.groovy | 2 +-
...GradleBuild.groovy => job_PostCommit_Go.groovy} | 3 +-
...adleBuild.groovy => job_PostCommit_Java.groovy} | 3 +-
.../job_PostCommit_Java_Nexmark_Flink.groovy | 10 +-
...y => job_PostCommit_Java_PortabilityApi.groovy} | 3 +-
...job_PostCommit_Java_ValidatesRunner_Apex.groovy | 5 +-
...PostCommit_Java_ValidatesRunner_Dataflow.groovy | 5 +-
...nner_DataflowPortabilityExecutableStage.groovy} | 11 +-
...ob_PostCommit_Java_ValidatesRunner_Flink.groovy | 3 +-
...PostCommit_Java_ValidatesRunner_Gearpump.groovy | 5 +-
..._ValidatesRunner_PortabilityApi_Dataflow.groovy | 3 +-
...ob_PostCommit_Java_ValidatesRunner_Samza.groovy | 3 +-
...ob_PostCommit_Java_ValidatesRunner_Spark.groovy | 5 +-
...radleBuild.groovy => job_PostCommit_SQL.groovy} | 11 +-
.../jenkins/job_PreCommit_Portable_Python.groovy | 18 +-
...t.groovy => job_Release_NightlySnapshot.groovy} | 3 +-
.../job_beam_PerformanceTests_Analysis.groovy | 2 +
.test-infra/metrics/OWNERS | 3 +
.test-infra/metrics/beamgrafana-deploy.yaml | 2 +-
.../dashboards/stability_critical_jobs_status.json | 4 +-
.test-infra/metrics/sync/jenkins/syncjenkins.py | 3 +-
README.md | 6 +-
build.gradle | 10 +
.../org/apache/beam/gradle/BeamModulePlugin.groovy | 281 ++-
.../org/apache/beam/gradle/GrpcVendoring.groovy | 8 +-
.../org/apache/beam/gradle/VendorJavaPlugin.groovy | 4 +
examples/java/build.gradle | 2 +-
.../beam/examples/complete/TrafficRoutes.java | 3 +-
.../beam/examples/complete/game/UserScore.java | 36 +-
.../beam/examples/cookbook/FilterExamples.java | 3 +-
.../beam/examples/cookbook/TriggerExample.java | 3 +-
.../examples/complete/game/LeaderBoardTest.java | 2 +-
.../beam/examples/complete/game/UserScoreTest.java | 31 +-
.../beam/examples/cookbook/JoinExamplesTest.java | 38 +-
.../beam/examples/cookbook/TriggerExampleTest.java | 23 +-
model/fn-execution/build.gradle | 4 +-
.../fn-execution/src/main/proto/beam_fn_api.proto | 300 ++-
model/job-management/build.gradle | 4 +-
model/pipeline/build.gradle | 2 +-
.../pipeline/src/main/proto/beam_runner_api.proto | 40 +
.../src/main/groovy/MobileGamingCommands.groovy | 20 +-
.../main/groovy/mobilegaming-java-dataflow.groovy | 100 +-
release/src/main/scripts/cut_release_branch.sh | 4 +-
.../operators/ApexProcessFnOperator.java | 35 +-
.../core/construction/ArtifactServiceStager.java | 6 +-
.../beam/runners/core/construction/BeamUrns.java | 2 +-
.../core/construction/CoderTranslation.java | 2 +-
.../core/construction/CombineTranslation.java | 2 +-
.../CreatePCollectionViewTranslation.java | 2 +-
.../core/construction/DisplayDataTranslation.java | 4 +-
.../runners/core/construction/Environments.java | 31 +-
.../construction/ExecutableStageTranslation.java | 90 +
.../construction/PCollectionViewTranslation.java | 2 +-
.../core/construction/PTransformTranslation.java | 6 +
.../core/construction/ParDoTranslation.java | 4 +-
...java => PipelineOptionsSerializationUtils.java} | 34 +-
.../construction/PipelineOptionsTranslation.java | 6 +-
.../runners/core/construction/ReadTranslation.java | 4 +-
.../construction/SerializablePipelineOptions.java | 26 +-
.../runners/core/construction/SplittableParDo.java | 2 +
.../construction/SplittableParDoNaiveBounded.java | 4 +-
.../core/construction/TestStreamTranslation.java | 2 +-
.../core/construction/WindowIntoTranslation.java | 2 +-
.../construction/WindowingStrategyTranslation.java | 8 +-
.../core/construction/WriteFilesTranslation.java | 2 +-
.../graph/GreedyPCollectionFusers.java | 30 +-
.../core/construction/graph/QueryablePipeline.java | 8 +-
.../construction/ArtifactServiceStagerTest.java | 6 +-
.../runners/core/construction/CommonCoderTest.java | 2 +-
.../ExecutableStageTranslationTest.java | 120 ++
.../InMemoryArtifactStagerService.java | 2 +-
.../PipelineOptionsTranslationTest.java | 6 +-
.../construction/WindowIntoTranslationTest.java | 2 +-
.../construction/graph/ProtoOverridesTest.java | 2 +-
.../runners/core/LateDataDroppingDoFnRunner.java | 17 +-
.../apache/beam/runners/core/SideInputHandler.java | 3 +-
.../core/SplittableParDoViaKeyedWorkItems.java | 7 +-
.../runners/core/metrics/DistributionCell.java | 5 +
.../runners/core/metrics/MetricsContainerImpl.java | 19 +
.../beam/runners/core/metrics/MetricsPusher.java | 6 +-
.../beam/runners/core/metrics/NoOpMetricsSink.java | 4 +-
.../core/metrics/SimpleMonitoringInfoBuilder.java | 219 +++
.../core/serialization}/Base64Serializer.java | 4 +-
.../runners/core/serialization}/package-info.java | 4 +-
.../beam/runners/core/ReduceFnRunnerTest.java | 2 +-
.../runners/core/SplittableParDoProcessFnTest.java | 20 +-
.../runners/core/metrics/MetricsPusherTest.java | 7 +-
.../metrics/SimpleMonitoringInfoBuilderTest.java | 87 +
.../beam/runners/core/metrics/TestMetricsSink.java | 4 +-
runners/direct-java/build.gradle | 35 +-
.../runners/direct/ParDoMultiOverrideFactory.java | 32 +-
.../runners/direct/WindowEvaluatorFactory.java | 3 +-
.../runners/direct/portable/ReferenceRunner.java | 15 +-
.../direct/portable/WindowEvaluatorFactory.java | 3 +-
.../LocalFileSystemArtifactRetrievalService.java | 6 +-
.../LocalFileSystemArtifactStagerService.java | 8 +-
.../runners/direct/portable/job/PreparingJob.java | 2 +-
.../portable/job/ReferenceRunnerJobService.java | 6 +-
.../beam/runners/direct/DirectRunnerTest.java | 12 +-
.../direct/UnboundedReadEvaluatorFactoryTest.java | 5 +-
.../runners/direct/WindowEvaluatorFactoryTest.java | 3 +-
.../direct/portable/ReferenceRunnerTest.java | 3 +-
.../portable/RemoteStageEvaluatorFactoryTest.java | 5 +-
...ocalFileSystemArtifactRetrievalServiceTest.java | 4 +-
.../LocalFileSystemArtifactStagerServiceTest.java | 10 +-
.../UnsupportedArtifactRetrievalServiceTest.java | 2 +-
.../job/ReferenceRunnerJobServiceTest.java | 4 +-
runners/extensions-java/metrics/build.gradle | 2 +-
.../extensions/metrics/MetricsGraphiteSink.java | 4 +-
.../extensions/metrics/MetricsHttpSink.java | 4 +-
.../metrics/MetricsGraphiteSinkTest.java | 6 +-
.../extensions/metrics/MetricsHttpSinkTest.java | 6 +-
{model/pipeline => runners/flink/1.6}/build.gradle | 22 +-
.../flink/1.6/job-server-container}/build.gradle | 10 +-
.../flink/1.6/job-server}/build.gradle | 19 +-
runners/flink/build.gradle | 129 +-
.../flink/{build.gradle => flink_runner.gradle} | 34 +-
runners/flink/job-server-container/Dockerfile | 6 +-
runners/flink/job-server-container/build.gradle | 38 +-
...ld.gradle => flink_job_server_container.gradle} | 26 +-
runners/flink/job-server/build.gradle | 80 +-
.../{build.gradle => flink_job_server.gradle} | 79 +-
.../FlinkBatchPortablePipelineTranslator.java | 71 +-
.../runners/flink/FlinkExecutionEnvironments.java | 173 +-
.../beam/runners/flink/FlinkJobInvocation.java | 88 +-
.../apache/beam/runners/flink/FlinkJobInvoker.java | 3 +-
.../beam/runners/flink/FlinkJobServerDriver.java | 25 +-
.../flink/FlinkPipelineExecutionEnvironment.java | 15 +-
.../beam/runners/flink/FlinkPipelineOptions.java | 22 +
.../flink/FlinkPortablePipelineTranslator.java | 19 +-
.../org/apache/beam/runners/flink/FlinkRunner.java | 5 +-
.../flink/FlinkStreamingPipelineTranslator.java | 70 +
.../FlinkStreamingPortablePipelineTranslator.java | 49 +-
.../flink/FlinkStreamingTransformTranslators.java | 26 +-
.../runners/flink/FlinkTransformOverrides.java | 24 +-
....java => PipelineTranslationModeOptimizer.java} | 26 +-
.../beam/runners/flink/metrics/FileReporter.java | 75 +
.../flink/metrics/FlinkMetricContainer.java | 97 +-
.../apache/beam/runners/flink/metrics/Metrics.java | 56 +
.../FlinkDefaultExecutableStageContext.java | 3 +
.../functions/FlinkExecutableStageFunction.java | 23 +-
.../utils/FlinkPipelineTranslatorUtils.java | 2 +-
.../runners/flink/translation/utils/NoopLock.java | 72 +
.../wrappers/streaming/DoFnOperator.java | 3 +-
.../streaming/ExecutableStageDoFnOperator.java | 144 +-
.../streaming/io/UnboundedSourceWrapper.java | 46 +-
.../state/FlinkKeyGroupStateInternals.java | 3 +-
.../streaming/state/FlinkStateInternals.java | 56 +-
.../flink/FlinkExecutionEnvironmentsTest.java | 252 ++-
.../runners/flink/FlinkJobServerDriverTest.java | 27 +-
.../FlinkPipelineExecutionEnvironmentTest.java | 26 +
.../FlinkStreamingTransformTranslatorsTest.java | 238 +++
.../runners/flink/FlinkTransformOverridesTest.java | 116 ++
.../beam/runners/flink/PipelineOptionsTest.java | 3 +
.../PipelineTranslationModeOptimizerTest.java | 63 +
.../beam/runners/flink/PortableExecutionTest.java | 95 +-
.../runners/flink/PortableStateExecutionTest.java | 194 +-
.../runners/flink/PortableTimersExecutionTest.java | 36 +-
.../flink/metrics/FlinkMetricContainerTest.java | 134 ++
.../flink/streaming/BoundedSourceRestoreTest.java | 1 +
.../streaming/ExecutableStageDoFnOperatorTest.java | 2 +-
.../flink/streaming/GroupByWithNullValuesTest.java | 92 +
.../FlinkPipelineTranslatorUtilsTest.java | 44 +
.../FlinkDefaultExecutableStageContextTest.java | 2 +-
.../FlinkExecutableStageFunctionTest.java | 2 +-
.../wrappers/streaming/io}/TestCountingSource.java | 17 +-
.../streaming/io}/UnboundedSourceWrapperTest.java | 150 +-
runners/google-cloud-dataflow-java/build.gradle | 64 +-
.../examples-streaming/build.gradle | 2 +-
.../examples/build.gradle | 6 +-
.../beam/runners/dataflow/DataflowPipelineJob.java | 6 +-
.../dataflow/DataflowPipelineTranslator.java | 2 +-
.../beam/runners/dataflow/DataflowRunner.java | 3 +
.../dataflow/options/DataflowPipelineOptions.java | 8 +
.../runners/dataflow/util/CloudObjectKinds.java | 1 +
.../dataflow/util/CloudObjectTranslators.java | 6 +-
.../beam/runners/dataflow/util/GcsStager.java | 4 +-
.../beam/runners/dataflow/util/PackageUtil.java | 9 +-
.../beam/runners/dataflow/util/GCSUploadMain.java} | 29 +-
.../google-cloud-dataflow-java/worker/build.gradle | 26 +-
.../worker/legacy-worker/build.gradle | 114 +-
.../dataflow/worker/BatchDataflowWorker.java | 59 +-
.../worker/BeamFnMapTaskExecutorFactory.java | 113 +-
.../runners/dataflow/worker/ByteStringCoder.java | 2 +-
.../worker/DataflowMapTaskExecutorFactory.java | 13 +-
.../dataflow/worker/DataflowOperationContext.java | 2 +-
.../dataflow/worker/DataflowRunnerHarness.java | 23 +-
.../worker/DataflowWorkerHarnessHelper.java | 15 +-
.../dataflow/worker/DeltaDistributionCell.java | 5 +
.../runners/dataflow/worker/ExperimentContext.java | 6 +-
...FetchAndFilterStreamingSideInputsOperation.java | 6 +-
.../dataflow/worker/FnApiWindowMappingFn.java | 12 +-
.../worker/GroupAlsoByWindowParDoFnFactory.java | 2 +-
.../worker/IntrinsicMapTaskExecutorFactory.java | 15 +-
.../dataflow/worker/IsmSideInputReader.java | 3 +-
.../worker/MetricTrackingWindmillServerStub.java | 2 +-
.../beam/runners/dataflow/worker/PubsubSink.java | 2 +-
.../beam/runners/dataflow/worker/ReaderCache.java | 2 +-
...HarnessCoderCloudObjectTranslatorRegistrar.java | 2 -
.../dataflow/worker/SdkHarnessRegistries.java | 16 +-
.../dataflow/worker/SdkHarnessRegistry.java | 9 +-
.../beam/runners/dataflow/worker/StateFetcher.java | 2 +-
.../dataflow/worker/StreamingDataflowWorker.java | 53 +-
.../worker/StreamingModeExecutionContext.java | 30 +-
.../dataflow/worker/StreamingSideInputFetcher.java | 4 +-
.../dataflow/worker/WindmillNamespacePrefix.java | 2 +-
.../beam/runners/dataflow/worker/WindmillSink.java | 2 +-
.../dataflow/worker/WindmillStateCache.java | 2 +-
.../dataflow/worker/WindmillStateInternals.java | 2 +-
.../dataflow/worker/WindmillStateReader.java | 2 +-
.../dataflow/worker/WindmillTimerInternals.java | 2 +-
.../dataflow/worker/WorkItemStatusClient.java | 27 +-
.../dataflow/worker/WorkerCustomSources.java | 2 +-
.../FixMultiOutputInfosOnParDoInstructions.java | 8 +-
.../dataflow/worker/fn/BeamFnControlService.java | 2 +-
.../runners/dataflow/worker/fn/ServerFactory.java | 229 ---
.../dataflow/worker/fn/SocketAddressFactory.java | 68 -
.../worker/fn/control/BeamFnMapTaskExecutor.java | 262 ++-
.../fn/control/ProcessRemoteBundleOperation.java | 105 +
.../control/RegisterAndProcessBundleOperation.java | 38 +-
.../worker/fn/data/BeamFnDataGrpcService.java | 14 +-
.../fn/data/RemoteGrpcPortReadOperation.java | 8 +-
.../fn/data/RemoteGrpcPortWriteOperation.java | 9 +-
.../worker/fn/logging/BeamFnLoggingService.java | 4 +-
.../fn/stream/ServerStreamObserverFactory.java | 6 +-
.../graph/CloneAmbiguousFlattensFunction.java | 9 +-
...java => CreateExecutableStageNodeFunction.java} | 345 +---
.../graph/CreateRegisterFnOperationFunction.java | 49 +-
...nsertFetchAndFilterStreamingSideInputNodes.java | 7 +-
.../worker/graph/LengthPrefixUnknownCoders.java | 7 +-
.../worker/graph/MapTaskToNetworkFunction.java | 11 +-
.../beam/runners/dataflow/worker/graph/Nodes.java | 43 +-
.../worker/graph/RegisterNodeFunction.java | 30 +-
.../common/worker/BatchingShuffleEntryReader.java | 3 +-
.../worker/util/common/worker/MapTaskExecutor.java | 2 +-
.../worker/windmill/DirectStreamObserver.java | 4 +-
.../windmill/ForwardingClientResponseObserver.java | 6 +-
.../worker/windmill/GrpcWindmillServer.java | 32 +-
.../worker/windmill/StreamObserverFactory.java | 4 +-
.../runners/dataflow/harness/test/TestStreams.java | 4 +-
.../runners/dataflow/worker/ConcatReaderTest.java | 3 +-
.../runners/dataflow/worker/DataflowMatchers.java | 2 +-
.../worker/DataflowWorkerHarnessHelperTest.java | 2 +-
.../dataflow/worker/FnApiWindowMappingFnTest.java | 4 +-
.../dataflow/worker/GroupingShuffleReaderTest.java | 2 +-
.../IntrinsicMapTaskExecutorFactoryTest.java | 37 +-
.../dataflow/worker/IsmSideInputReaderTest.java | 3 +-
.../runners/dataflow/worker/PubsubReaderTest.java | 2 +-
.../runners/dataflow/worker/PubsubSinkTest.java | 2 +-
.../runners/dataflow/worker/ReaderCacheTest.java | 2 +-
.../dataflow/worker/ShuffleReaderFactoryTest.java | 3 +-
.../runners/dataflow/worker/StateFetcherTest.java | 2 +-
.../worker/StreamingDataflowWorkerTest.java | 6 +-
.../worker/StreamingGroupAlsoByWindowFnsTest.java | 2 +-
...reamingGroupAlsoByWindowsReshuffleDoFnTest.java | 2 +-
.../worker/StreamingModeExecutionContextTest.java | 16 +-
.../worker/StreamingSideInputDoFnRunnerTest.java | 2 +-
.../worker/StreamingSideInputFetcherTest.java | 2 +-
.../dataflow/worker/WindmillKeyedWorkItemTest.java | 2 +-
.../worker/WindmillReaderIteratorBaseTest.java | 2 +-
.../dataflow/worker/WindmillStateCacheTest.java | 2 +-
.../worker/WindmillStateInternalsTest.java | 2 +-
.../dataflow/worker/WindmillStateReaderTest.java | 4 +-
.../dataflow/worker/WorkerCustomSourcesTest.java | 4 +-
...FixMultiOutputInfosOnParDoInstructionsTest.java | 20 +-
.../worker/fn/BeamFnControlServiceTest.java | 13 +-
.../dataflow/worker/fn/ServerFactoryTest.java | 244 ---
.../worker/fn/SocketAddressFactoryTest.java | 55 -
.../fn/control/BeamFnMapTaskExecutorTest.java | 219 ++-
.../RegisterAndProcessBundleOperationTest.java | 96 +-
.../SingularProcessBundleProgressTrackerTest.java | 6 +-
.../worker/fn/data/BeamFnDataGrpcServiceTest.java | 32 +-
.../fn/data/RemoteGrpcPortReadOperationTest.java | 14 +-
.../fn/data/RemoteGrpcPortWriteOperationTest.java | 16 +-
.../fn/logging/BeamFnLoggingServiceTest.java | 12 +-
.../fn/stream/ServerStreamObserverFactoryTest.java | 4 +-
.../graph/CloneAmbiguousFlattensFunctionTest.java | 7 +-
.../CreateRegisterFnOperationFunctionTest.java | 6 +-
.../graph/DeduceFlattenLocationsFunctionTest.java | 2 +-
.../graph/DeduceNodeLocationsFunctionTest.java | 4 +-
...tFetchAndFilterStreamingSideInputNodesTest.java | 9 +-
.../graph/LengthPrefixUnknownCodersTest.java | 2 +-
.../worker/graph/MapTaskToNetworkFunctionTest.java | 25 +-
.../runners/dataflow/worker/graph/NodesTest.java | 16 +-
.../RemoveFlattenInstructionsFunctionTest.java | 74 +-
.../ReplacePgbkWithPrecombineFunctionTest.java | 2 +-
.../logging/DataflowWorkerLoggingHandlerTest.java | 2 +-
.../util/common/worker/ReadOperationTest.java | 6 +-
.../worker/windmill/GrpcWindmillServerTest.java | 10 +-
.../worker/windmill/build.gradle | 2 +-
.../apache/beam/runners/fnexecution/FnService.java | 6 +-
.../GrpcContextHeaderAccessorProvider.java | 16 +-
.../beam/runners/fnexecution/GrpcFnServer.java | 19 +-
.../fnexecution/InProcessServerFactory.java | 39 +-
.../beam/runners/fnexecution/ServerFactory.java | 192 +-
.../BeamFileSystemArtifactRetrievalService.java | 10 +-
.../BeamFileSystemArtifactStagingService.java | 10 +-
.../control/DefaultJobBundleFactory.java | 4 +-
.../fnexecution/control/FnApiControlClient.java | 6 +-
.../control/FnApiControlClientPoolService.java | 2 +-
.../control/ProcessBundleDescriptors.java | 2 +-
.../SingleEnvironmentInstanceJobBundleFactory.java | 13 +-
.../runners/fnexecution/data/GrpcDataService.java | 11 +-
...actory.java => ExternalEnvironmentFactory.java} | 116 +-
.../environment/ProcessEnvironmentFactory.java | 2 +-
.../environment/StaticRemoteEnvironment.java | 64 +
.../StaticRemoteEnvironmentFactory.java | 70 +
.../jobsubmission/InMemoryJobService.java | 10 +-
.../fnexecution/jobsubmission/JobInvoker.java | 2 +-
.../fnexecution/jobsubmission/JobPreparation.java | 2 +-
.../fnexecution/logging/GrpcLoggingService.java | 2 +-
.../runners/fnexecution/provisioning/JobInfo.java | 2 +-
.../provisioning/StaticGrpcProvisionService.java | 2 +-
.../splittabledofn/SDFFeederViaStateAndTimers.java | 4 +-
.../fnexecution/state/GrpcStateService.java | 6 +-
.../fnexecution/state/StateRequestHandlers.java | 2 +-
.../GrpcContextHeaderAccessorProviderTest.java | 24 +-
.../runners/fnexecution/ServerFactoryTest.java | 48 +-
.../BeamFileSystemArtifactServicesTest.java | 8 +-
.../control/DefaultJobBundleFactoryTest.java | 4 +-
.../control/FnApiControlClientPoolServiceTest.java | 4 +-
.../control/FnApiControlClientTest.java | 2 +-
.../fnexecution/control/RemoteExecutionTest.java | 220 ++-
...gleEnvironmentInstanceJobBundleFactoryTest.java | 3 +-
.../fnexecution/data/GrpcDataServiceTest.java | 8 +-
.../jobsubmission/InMemoryJobServiceTest.java | 4 +-
.../logging/GrpcLoggingServiceTest.java | 6 +-
.../StaticGrpcProvisionServiceTest.java | 10 +-
.../fnexecution/state/GrpcStateServiceTest.java | 4 +-
.../apache/beam/runners/local/StructuralKey.java | 14 +-
runners/reference/java/build.gradle | 2 +
.../runners/reference/ExternalWorkerService.java | 87 +
.../reference/JobServicePipelineResult.java | 10 +-
.../beam/runners/reference/PortableRunner.java | 34 +-
.../runners/reference/testing/TestJobService.java | 2 +-
.../beam/runners/reference/PortableRunnerTest.java | 4 +-
runners/reference/job-server/build.gradle | 1 +
runners/samza/build.gradle | 2 +-
.../org/apache/beam/runners/samza/SamzaRunner.java | 2 +-
.../runners/samza/adapter/BoundedSourceSystem.java | 2 +-
.../samza/adapter/UnboundedSourceSystem.java | 2 +-
.../apache/beam/runners/samza/runtime/DoFnOp.java | 14 +-
.../beam/runners/samza/runtime/GroupByKeyOp.java | 2 +-
.../samza/runtime/SamzaDoFnInvokerRegistrar.java | 21 +-
.../samza/runtime/SamzaStoreStateInternals.java | 10 +-
.../samza/transforms/GroupWithoutRepartition.java | 60 +
.../samza/transforms/UpdatingCombineFn.java | 19 +-
.../runners/samza/transforms}/package-info.java | 4 +-
.../runners/samza/translation/ConfigBuilder.java | 2 +-
.../runners/samza/translation/ConfigContext.java | 9 +-
.../samza/translation/GroupByKeyTranslator.java | 22 +-
.../translation/ParDoBoundMultiTranslator.java | 17 +
.../runners/samza/translation/ReadTranslator.java | 2 +-
.../samza/translation/SamzaPipelineTranslator.java | 7 +-
.../samza/translation/TranslationContext.java | 47 +-
.../runners/samza/util/PipelineDotRenderer.java | 3 +-
.../translation/TranslationContext.java | 10 +-
.../translation/batch/DatasetSourceBatch.java | 62 +-
.../translation/batch/FlattenTranslatorBatch.java | 2 +-
.../translation/batch/PipelineTranslatorBatch.java | 2 +-
.../batch/ReadSourceTranslatorBatch.java | 34 +-
.../batch/{ => mocks}/DatasetSourceMockBatch.java | 46 +-
.../{ => mocks}/ReadSourceTranslatorMockBatch.java | 22 +-
.../streaming/DatasetSourceStreaming.java | 16 +-
.../streaming/ReadSourceTranslatorStreaming.java | 22 +-
.../spark/structuredstreaming/SourceTest.java | 21 +-
runners/spark/build.gradle | 6 +-
.../org/apache/beam/runners/spark/SparkRunner.java | 4 +-
.../beam/runners/spark/examples/WordCount.java | 4 +-
.../spark/translation/TranslationUtils.java | 5 +-
.../streaming/StreamingTransformTranslator.java | 4 +-
.../runners/spark/ProvidedSparkContextTest.java | 2 +-
.../spark/metrics/SparkMetricsPusherTest.java | 10 +-
sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go | 14 +-
sdks/go/pkg/beam/artifact/gcsproxy/staging.go | 6 +-
sdks/go/pkg/beam/core/graph/coder/int.go | 24 +-
sdks/go/pkg/beam/core/graph/coder/varint.go | 10 +-
sdks/go/pkg/beam/core/graph/fn.go | 11 +
sdks/go/pkg/beam/core/runtime/exec/coder.go | 3 +-
sdks/go/pkg/beam/core/runtime/exec/fn.go | 104 +-
sdks/go/pkg/beam/core/runtime/exec/fn_arity.go | 251 +++
sdks/go/pkg/beam/core/runtime/exec/fn_arity.tmpl | 69 +
sdks/go/pkg/beam/core/runtime/exec/fn_test.go | 242 ++-
sdks/go/pkg/beam/core/runtime/exec/fullvalue.go | 58 +
.../pkg/beam/core/runtime/exec/fullvalue_test.go | 58 +
sdks/go/pkg/beam/core/runtime/init.go | 2 +-
sdks/go/pkg/beam/core/util/ioutilx/read.go | 42 +
.../beam/core/util/ioutilx/{read.go => write.go} | 30 +-
sdks/go/pkg/beam/core/util/reflectx/call.go | 2 +-
sdks/go/pkg/beam/core/util/reflectx/structs.go | 73 +
sdks/go/pkg/beam/forward.go | 4 +-
sdks/go/pkg/beam/io/filesystem/gcs/gcs.go | 70 +-
sdks/go/pkg/beam/runners/dataflow/dataflow.go | 4 +-
.../pkg/beam/runners/dataflow/dataflowlib/stage.go | 4 +-
.../beam/runners/universal/runnerlib/compile.go | 2 +-
sdks/go/pkg/beam/testing/ptest/ptest.go | 43 +-
sdks/go/pkg/beam/util/gcsx/gcs.go | 66 +-
.../util/ioutilx/read.go => util/gcsx/gcs_test.go} | 40 +-
sdks/go/pkg/beam/util/shimx/generate.go | 49 +-
sdks/go/pkg/beam/util/shimx/generate_test.go | 17 +
sdks/go/pkg/beam/util/starcgenx/starcgenx.go | 69 +-
sdks/go/pkg/beam/util/starcgenx/starcgenx_test.go | 4 +-
sdks/go/pkg/beam/x/hooks/perf/perf.go | 68 +-
sdks/java/bom/build.gradle | 122 ++
sdks/java/bom/pom.xml.template | 83 +
sdks/java/build-tools/build.gradle | 2 +-
.../src/main/resources/beam/checkstyle.xml | 10 +
.../java/org/apache/beam/sdk/coders/ListCoder.java | 19 +
.../java/org/apache/beam/sdk/coders/MapCoder.java | 19 +
.../apache/beam/sdk/coders/RowCoderGenerator.java | 4 +-
.../main/java/org/apache/beam/sdk/io/AvroIO.java | 81 +-
.../org/apache/beam/sdk/io/BlockBasedSource.java | 4 +-
.../org/apache/beam/sdk/io/LocalFileSystem.java | 6 +-
.../java/org/apache/beam/sdk/io/TFRecordIO.java | 6 +-
.../main/java/org/apache/beam/sdk/io/TextIO.java | 2 +-
.../org/apache/beam/sdk/metrics/Distribution.java | 2 +
.../java/org/apache/beam/sdk/metrics/Metrics.java | 8 +
.../apache/beam/sdk/metrics/MetricsOptions.java | 83 +
.../apache/beam/sdk/options/PipelineOptions.java | 55 -
.../org/apache/beam/sdk/options/ValueProvider.java | 6 +-
.../apache/beam/sdk/schemas/AvroRecordSchema.java | 53 +
.../apache/beam/sdk/schemas/CachingFactory.java | 55 +
.../java/org/apache/beam/sdk/schemas/Factory.java} | 9 +-
.../beam/sdk/schemas/FieldTypeDescriptors.java | 3 +-
.../apache/beam/sdk/schemas/FieldValueGetter.java | 2 -
.../beam/sdk/schemas/FieldValueGetterFactory.java | 6 +-
.../apache/beam/sdk/schemas/FieldValueSetter.java | 16 -
.../beam/sdk/schemas/FieldValueSetterFactory.java | 6 +-
.../sdk/schemas/FieldValueTypeInformation.java | 224 +++
.../schemas/FieldValueTypeInformationFactory.java} | 12 +-
.../beam/sdk/schemas/FromRowUsingCreator.java | 149 ++
.../sdk/schemas/GetterBasedSchemaProvider.java | 175 +-
.../apache/beam/sdk/schemas/JavaBeanSchema.java | 70 +-
.../apache/beam/sdk/schemas/JavaFieldSchema.java | 42 +-
.../java/org/apache/beam/sdk/schemas/Schema.java | 107 +-
.../schemas/SchemaUserTypeConstructorCreator.java | 44 +
.../beam/sdk/schemas/SchemaUserTypeCreator.java} | 13 +-
.../sdk/schemas/SetterBasedCreatorFactory.java | 57 +
.../beam/sdk/schemas/UserTypeCreatorFactory.java | 14 +-
.../apache/beam/sdk/schemas/transforms/Cast.java | 2 +-
.../apache/beam/sdk/schemas/transforms/Select.java | 7 +-
.../beam/sdk/schemas/utils/AvroByteBuddyUtils.java | 125 ++
.../apache/beam/sdk/schemas/utils/AvroUtils.java | 686 ++++++-
.../beam/sdk/schemas/utils/ByteBuddyUtils.java | 159 +-
...terFactory.java => FieldValueTypeSupplier.java} | 20 +-
.../sdk/schemas/utils/JavaBeanGetterFactory.java | 31 -
.../sdk/schemas/utils/JavaBeanSetterFactory.java | 31 -
.../beam/sdk/schemas/utils/JavaBeanUtils.java | 151 +-
.../apache/beam/sdk/schemas/utils/POJOUtils.java | 211 +-
.../sdk/schemas/utils/PojoValueSetterFactory.java | 31 -
.../beam/sdk/schemas/utils/ReflectUtils.java | 20 +-
.../sdk/schemas/utils/StaticSchemaInference.java | 100 +-
...aflowPortabilityExecutableStageUnsupported.java | 17 +-
.../apache/beam/sdk/testing/UsesSideInputs.java | 16 +-
.../org/apache/beam/sdk/transforms/Contextful.java | 4 +-
.../org/apache/beam/sdk/transforms/Create.java | 9 +-
.../java/org/apache/beam/sdk/transforms/DoFn.java | 49 +-
.../org/apache/beam/sdk/transforms/Filter.java | 23 +-
.../beam/sdk/transforms/FlatMapElements.java | 40 +-
...{SimpleFunction.java => InferableFunction.java} | 49 +-
.../apache/beam/sdk/transforms/MapElements.java | 36 +-
...ializableFunction.java => ProcessFunction.java} | 17 +-
.../beam/sdk/transforms/SerializableFunction.java | 11 +-
.../apache/beam/sdk/transforms/SimpleFunction.java | 38 +-
.../org/apache/beam/sdk/transforms/ToString.java | 8 +-
.../reflect/ByteBuddyDoFnInvokerFactory.java | 6 +-
.../reflect/ByteBuddyOnTimerInvokerFactory.java | 4 +-
.../beam/sdk/transforms/reflect/DoFnInvoker.java | 2 +
.../sdk/transforms/reflect/DoFnSignatures.java | 13 +-
.../sdk/transforms/splittabledofn/Backlog.java | 90 +
.../sdk/transforms/splittabledofn/Backlogs.java | 58 +
.../splittabledofn/ByteKeyRangeTracker.java | 30 +-
.../splittabledofn/OffsetRangeTracker.java | 28 +-
.../splittabledofn/RestrictionTracker.java | 9 +-
.../transforms/splittabledofn/Restrictions.java | 17 +-
.../java/org/apache/beam/sdk/util/CoderUtils.java | 3 +-
.../main/java/org/apache/beam/sdk/values/Row.java | 40 +-
.../org/apache/beam/sdk/values/RowWithGetters.java | 13 +-
.../apache/beam/sdk/values/TypeDescriptors.java | 68 +-
.../avro/org/apache/beam/sdk/schemas/test.avsc | 29 +
.../org/apache/beam/sdk/coders/ListCoderTest.java | 21 +
.../org/apache/beam/sdk/coders/MapCoderTest.java | 21 +
.../java/org/apache/beam/sdk/io/AvroIOTest.java | 2066 ++++++++++----------
.../sdk/io/BoundedReadFromUnboundedSourceTest.java | 4 +-
.../org/apache/beam/sdk/io/CountingSourceTest.java | 13 +-
.../java/org/apache/beam/sdk/io/FileIOTest.java | 2 +
.../sdk/io/SerializableAvroCodecFactoryTest.java | 2 +-
.../org/apache/beam/sdk/io/TFRecordIOTest.java | 11 +-
.../org/apache/beam/sdk/io/TextIOReadTest.java | 2 +
.../apache/beam/sdk/schemas/AvroSchemaTest.java | 368 ++++
.../beam/sdk/schemas/FieldTypeDescriptorsTest.java | 4 +-
.../beam/sdk/schemas/transforms/CastTest.java | 37 +-
.../beam/sdk/schemas/utils/AvroUtilsTest.java | 335 +++-
.../beam/sdk/schemas/utils/JavaBeanUtilsTest.java | 24 +-
.../beam/sdk/schemas/utils/POJOUtilsTest.java | 20 +-
.../org/apache/beam/sdk/testing/PAssertTest.java | 16 +-
.../apache/beam/sdk/transforms/CombineFnsTest.java | 5 +-
.../apache/beam/sdk/transforms/CombineTest.java | 70 +-
.../org/apache/beam/sdk/transforms/FilterTest.java | 17 +
.../beam/sdk/transforms/FlatMapElementsTest.java | 60 +-
.../apache/beam/sdk/transforms/FlattenTest.java | 3 +-
.../apache/beam/sdk/transforms/GroupByKeyTest.java | 11 +-
.../beam/sdk/transforms/MapElementsTest.java | 163 +-
.../org/apache/beam/sdk/transforms/ParDoTest.java | 17 +-
.../org/apache/beam/sdk/transforms/ReifyTest.java | 4 +-
.../beam/sdk/transforms/ReifyTimestampsTest.java | 4 +-
.../apache/beam/sdk/transforms/ReshuffleTest.java | 11 +-
.../beam/sdk/transforms/SplittableDoFnTest.java | 22 +-
.../org/apache/beam/sdk/transforms/ViewTest.java | 2 +
.../org/apache/beam/sdk/transforms/WatchTest.java | 9 +
.../beam/sdk/transforms/join/CoGroupByKeyTest.java | 8 +-
.../sdk/transforms/reflect/DoFnInvokersTest.java | 20 +-
.../reflect/DoFnSignaturesSplittableDoFnTest.java | 52 +-
.../splittabledofn/ByteKeyRangeTrackerTest.java | 40 +
.../splittabledofn/OffsetRangeTrackerTest.java | 35 +-
.../beam/sdk/transforms/windowing/WindowTest.java | 5 +-
.../sdk/transforms/windowing/WindowingTest.java | 9 +-
sdks/java/extensions/euphoria/build.gradle | 7 +-
.../core/translate/BeamMetricsTranslationTest.java | 14 +-
.../sdk/extensions/gcp/options/GcsOptions.java | 10 +
.../sdk/extensions/gcp/storage/GcsFileSystem.java | 20 +
.../org/apache/beam/sdk/util/CustomHttpErrors.java | 141 ++
.../java/org/apache/beam/sdk/util/GcsUtil.java | 31 +-
.../apache/beam/sdk/util/HttpCallCustomError.java | 13 +-
.../org/apache/beam/sdk/util/HttpCallMatcher.java | 16 +-
.../apache/beam/sdk/util/HttpRequestWrapper.java} | 28 +-
.../apache/beam/sdk/util/HttpResponseWrapper.java | 24 +-
.../beam/sdk/util/RetryHttpRequestInitializer.java | 32 +-
.../apache/beam/sdk/util/CustomHttpErrorsTest.java | 128 ++
.../apache/beam/sdk/util/gcsfs/GcsPathTest.java | 3 +-
sdks/java/extensions/kryo/build.gradle | 7 +-
.../beam/sdk/extensions/sorter/SortValues.java | 2 +-
sdks/java/extensions/sql/build.gradle | 4 +-
sdks/java/extensions/sql/jdbc/build.gradle | 7 +-
.../extensions/sql/src/main/codegen/config.fmpp | 291 +++
.../beam/sdk/extensions/sql/BeamSqlTable.java | 3 +
.../sdk/extensions/sql/impl/BeamCalciteTable.java | 9 +-
.../impl/{UdfImpl.java => ScalarFunctionImpl.java} | 34 +-
.../beam/sdk/extensions/sql/impl/UdfImpl.java | 144 +-
.../interpreter/BeamSqlExpressionEnvironment.java | 42 -
.../interpreter/BeamSqlExpressionEnvironments.java | 147 --
.../sql/impl/interpreter/BeamSqlFnExecutor.java | 550 ------
.../operator/BeamSqlBinaryOperator.java | 40 -
.../operator/BeamSqlCaseExpression.java | 64 -
.../operator/BeamSqlCastExpression.java | 138 --
.../operator/BeamSqlCorrelVariableExpression.java | 48 -
.../operator/BeamSqlDefaultExpression.java | 38 -
.../interpreter/operator/BeamSqlDotExpression.java | 57 -
.../interpreter/operator/BeamSqlExpression.java | 79 -
.../operator/BeamSqlInputRefExpression.java | 48 -
.../operator/BeamSqlLocalRefExpression.java | 48 -
.../operator/BeamSqlOperatorExpression.java | 52 -
.../interpreter/operator/BeamSqlPrimitive.java | 180 --
.../interpreter/operator/BeamSqlUdfExpression.java | 89 -
.../interpreter/operator/BeamSqlUnaryOperator.java | 41 -
.../impl/interpreter/operator/DateOperators.java | 164 --
.../impl/interpreter/operator/StringOperators.java | 245 ---
.../arithmetic/BeamSqlArithmeticExpression.java | 125 --
.../arithmetic/BeamSqlDivideExpression.java | 35 -
.../arithmetic/BeamSqlMinusExpression.java | 34 -
.../operator/arithmetic/BeamSqlModExpression.java | 34 -
.../arithmetic/BeamSqlMultiplyExpression.java | 34 -
.../operator/arithmetic/BeamSqlPlusExpression.java | 34 -
.../operator/array/BeamSqlArrayExpression.java | 51 -
.../operator/array/BeamSqlArrayItemExpression.java | 50 -
.../collection/BeamSqlCardinalityExpression.java | 50 -
.../collection/BeamSqlSingleElementExpression.java | 64 -
.../comparison/BeamSqlCompareExpression.java | 97 -
.../comparison/BeamSqlEqualsExpression.java | 53 -
.../comparison/BeamSqlGreaterThanExpression.java | 53 -
.../BeamSqlGreaterThanOrEqualsExpression.java | 53 -
.../comparison/BeamSqlIsNotNullExpression.java | 52 -
.../comparison/BeamSqlIsNullExpression.java | 52 -
.../comparison/BeamSqlLessThanExpression.java | 53 -
.../BeamSqlLessThanOrEqualsExpression.java | 53 -
.../operator/comparison/BeamSqlLikeExpression.java | 51 -
.../comparison/BeamSqlNotEqualsExpression.java | 53 -
.../comparison/BeamSqlNotLikeExpression.java | 52 -
.../date/BeamSqlCurrentDateExpression.java | 49 -
.../date/BeamSqlCurrentTimeExpression.java | 53 -
.../date/BeamSqlCurrentTimestampExpression.java | 53 -
.../date/BeamSqlDatetimeMinusExpression.java | 98 -
.../BeamSqlDatetimeMinusIntervalExpression.java | 77 -
.../date/BeamSqlDatetimePlusExpression.java | 118 --
.../date/BeamSqlIntervalMultiplyExpression.java | 95 -
.../BeamSqlTimestampMinusIntervalExpression.java | 79 -
.../BeamSqlTimestampMinusTimestampExpression.java | 97 -
.../interpreter/operator/date/TimeUnitUtils.java | 63 -
.../operator/logical/BeamSqlAndExpression.java | 47 -
.../operator/logical/BeamSqlNotExpression.java | 48 -
.../operator/logical/BeamSqlOrExpression.java | 48 -
.../operator/map/BeamSqlMapExpression.java | 60 -
.../operator/map/BeamSqlMapItemExpression.java | 49 -
.../operator/math/BeamSqlAbsExpression.java | 63 -
.../operator/math/BeamSqlAcosExpression.java | 38 -
.../operator/math/BeamSqlAsinExpression.java | 38 -
.../operator/math/BeamSqlAtan2Expression.java | 41 -
.../operator/math/BeamSqlAtanExpression.java | 38 -
.../operator/math/BeamSqlCeilExpression.java | 43 -
.../operator/math/BeamSqlCosExpression.java | 38 -
.../operator/math/BeamSqlCotExpression.java | 38 -
.../operator/math/BeamSqlDegreesExpression.java | 38 -
.../operator/math/BeamSqlExpExpression.java | 38 -
.../operator/math/BeamSqlFloorExpression.java | 43 -
.../operator/math/BeamSqlLnExpression.java | 38 -
.../operator/math/BeamSqlLogExpression.java | 38 -
.../operator/math/BeamSqlMathBinaryExpression.java | 63 -
.../operator/math/BeamSqlMathUnaryExpression.java | 57 -
.../operator/math/BeamSqlPiExpression.java | 44 -
.../operator/math/BeamSqlPowerExpression.java | 41 -
.../operator/math/BeamSqlRadiansExpression.java | 38 -
.../operator/math/BeamSqlRandExpression.java | 54 -
.../math/BeamSqlRandIntegerExpression.java | 58 -
.../operator/math/BeamSqlRoundExpression.java | 116 --
.../operator/math/BeamSqlSignExpression.java | 78 -
.../operator/math/BeamSqlSinExpression.java | 38 -
.../operator/math/BeamSqlTanExpression.java | 38 -
.../operator/math/BeamSqlTruncateExpression.java | 88 -
.../interpreter/operator/math/package-info.java | 20 -
.../impl/interpreter/operator/package-info.java | 20 -
.../reinterpret/BeamSqlReinterpretExpression.java | 70 -
.../DatetimeReinterpretConversions.java | 48 -
.../reinterpret/IntegerReinterpretConversions.java | 36 -
.../reinterpret/ReinterpretConversion.java | 112 --
.../operator/reinterpret/Reinterpreter.java | 94 -
.../operator/reinterpret/package-info.java | 20 -
.../operator/row/BeamSqlFieldAccessExpression.java | 69 -
.../interpreter/operator/row/package-info.java | 24 -
.../sql/impl/interpreter/package-info.java | 20 -
.../sql/impl/parser/SqlCreateExternalTable.java | 2 +-
.../BeamJavaTypeFactory.java} | 33 +-
.../sql/impl/planner/BeamRelDataTypeSystem.java | 6 +
.../sdk/extensions/sql/impl/rel/BeamCalcRel.java | 391 +++-
.../extensions/sql/impl/rel/BeamIOSourceRel.java | 5 +
.../sdk/extensions/sql/impl/rel/BeamJoinRel.java | 257 ++-
.../sdk/extensions/sql/impl/rel/BeamRelNode.java | 17 +
.../sql/impl/rel/BeamSetOperatorRelBase.java | 10 +-
.../sql/impl/schema/BeamPCollectionTable.java | 5 +
.../extensions/sql/impl/schema/BeamTableUtils.java | 2 +-
.../sql/impl/transform/BeamJoinTransforms.java | 22 +-
.../impl/transform/agg/CovarianceAccumulator.java | 37 +-
.../sql/impl/udf/BuiltinStringFunctions.java | 8 +-
.../extensions/sql/impl/utils/CalciteUtils.java | 4 +-
.../extensions/sql/impl/utils/SqlTypeUtils.java | 59 -
.../meta/provider/bigquery/BeamBigQueryTable.java | 5 +
.../sql/meta/provider/kafka/BeamKafkaTable.java | 5 +
.../meta/provider/kafka/KafkaTableProvider.java | 3 +-
.../meta/provider/pubsub/PubsubIOJsonTable.java | 5 +
.../provider/pubsub/PubsubJsonTableProvider.java | 7 +-
.../sql/meta/provider/test/TestBoundedTable.java | 5 +
.../sql/meta/provider/test/TestTableProvider.java | 8 +-
.../sql/meta/provider/test/TestUnboundedTable.java | 5 +
.../sql/meta/provider/text/TextTable.java | 5 +
.../sql/meta/provider/text/TextTableProvider.java | 5 +-
.../beam/sdk/extensions/sql/BeamSqlCastTest.java | 33 +-
.../sql/BeamSqlDslSqlStdOperatorsTest.java | 41 +-
.../sdk/extensions/sql/BeamSqlDslUdfUdafTest.java | 43 +-
.../impl/interpreter/BeamSqlFnExecutorTest.java | 204 --
.../interpreter/BeamSqlFnExecutorTestBase.java | 91 -
.../operator/BeamNullExpressionTest.java | 56 -
.../operator/BeamSqlAndOrExpressionTest.java | 70 -
.../operator/BeamSqlCaseExpressionTest.java | 101 -
.../operator/BeamSqlCastExpressionTest.java | 156 --
.../operator/BeamSqlCompareExpressionTest.java | 170 --
.../operator/BeamSqlDotExpressionTest.java | 76 -
.../operator/BeamSqlInputRefExpressionTest.java | 63 -
.../interpreter/operator/BeamSqlPrimitiveTest.java | 95 -
.../operator/BeamSqlReinterpretExpressionTest.java | 130 --
.../operator/BeamSqlUdfExpressionTest.java | 50 -
.../BeamSqlArithmeticExpressionTest.java | 332 ----
.../operator/array/BeamSqlArrayExpressionTest.java | 80 -
.../array/BeamSqlArrayItemExpressionTest.java | 98 -
.../BeamSqlCardinalityExpressionTest.java | 94 -
.../BeamSqlSingleElementExpressionTest.java | 94 -
.../date/BeamSqlCurrentDateExpressionTest.java | 36 -
.../date/BeamSqlCurrentTimeExpressionTest.java | 40 -
.../BeamSqlCurrentTimestampExpressionTest.java | 40 -
.../date/BeamSqlDateExpressionTestBase.java | 36 -
.../date/BeamSqlDatetimeMinusExpressionTest.java | 150 --
...BeamSqlDatetimeMinusIntervalExpressionTest.java | 142 --
.../date/BeamSqlDatetimePlusExpressionTest.java | 186 --
.../BeamSqlIntervalMultiplyExpressionTest.java | 110 --
...eamSqlTimestampMinusIntervalExpressionTest.java | 170 --
...amSqlTimestampMinusTimestampExpressionTest.java | 210 --
.../operator/date/TimeUnitUtilsTest.java | 59 -
.../operator/logical/BeamSqlNotExpressionTest.java | 55 -
.../math/BeamSqlMathBinaryExpressionTest.java | 289 ---
.../math/BeamSqlMathUnaryExpressionTest.java | 446 -----
.../DatetimeReinterpretConversionsTest.java | 68 -
.../IntegerReinterpretConversionsTest.java | 76 -
.../reinterpret/ReinterpretConversionTest.java | 106 -
.../operator/reinterpret/ReinterpreterTest.java | 180 --
.../row/BeamSqlFieldAccessExpressionTest.java | 91 -
.../sql/impl/rel/BeamEnumerableConverterTest.java | 5 +
.../rel/BeamJoinRelUnboundedVsBoundedTest.java | 25 +
.../sql/impl/utils/SqlTypeUtilsTest.java | 76 -
.../BeamSqlComparisonOperatorsIntegrationTest.java | 44 +-
.../BeamSqlDateFunctionsIntegrationTest.java | 32 +-
.../beam/sdk/fn/channel/ManagedChannelFactory.java | 18 +-
.../beam/sdk/fn/channel/SocketAddressFactory.java | 2 +-
.../data/BeamFnDataBufferingOutboundObserver.java | 4 +-
.../sdk/fn/data/BeamFnDataGrpcMultiplexer.java | 4 +-
.../beam/sdk/fn/data/RemoteGrpcPortRead.java | 2 +-
.../beam/sdk/fn/data/RemoteGrpcPortWrite.java | 2 +-
.../sdk/fn/splittabledofn/RestrictionTrackers.java | 90 +-
.../sdk/fn/stream/BufferingStreamObserver.java | 4 +-
.../org/apache/beam/sdk/fn/stream/DataStreams.java | 2 +-
.../beam/sdk/fn/stream/DirectStreamObserver.java | 4 +-
.../stream/ForwardingClientResponseObserver.java | 6 +-
.../sdk/fn/stream/OutboundObserverFactory.java | 4 +-
.../sdk/fn/stream/SynchronizedStreamObserver.java | 2 +-
.../fn/test/InProcessManagedChannelFactory.java | 4 +-
.../org/apache/beam/sdk/fn/test/TestStreams.java | 4 +-
.../sdk/fn/windowing/EncodedBoundedWindow.java | 2 +-
.../sdk/fn/channel/ManagedChannelFactoryTest.java | 6 +-
.../sdk/fn/channel/SocketAddressFactoryTest.java | 2 +-
.../BeamFnDataBufferingOutboundObserverTest.java | 2 +-
.../sdk/fn/data/BeamFnDataGrpcMultiplexerTest.java | 2 +-
.../beam/sdk/fn/data/RemoteGrpcPortReadTest.java | 2 +-
.../beam/sdk/fn/data/RemoteGrpcPortWriteTest.java | 2 +-
.../fn/splittabledofn/RestrictionTrackersTest.java | 72 +
.../apache/beam/sdk/fn/stream/DataStreamsTest.java | 2 +-
.../ForwardingClientResponseObserverTest.java | 6 +-
.../sdk/fn/stream/OutboundObserverFactoryTest.java | 4 +-
.../sdk/fn/windowing/EncodedBoundedWindowTest.java | 2 +-
sdks/java/harness/build.gradle | 2 +-
.../beam/fn/harness/BoundedSourceRunner.java | 2 +-
.../java/org/apache/beam/fn/harness/FnHarness.java | 2 +-
.../beam/fn/harness/PrecombineGroupingTable.java | 2 +-
.../harness/SplittableProcessElementsRunner.java | 8 +-
.../harness/control/AddHarnessIdInterceptor.java | 8 +-
.../fn/harness/control/BeamFnControlClient.java | 4 +-
.../fn/harness/control/ProcessBundleHandler.java | 58 +-
.../beam/fn/harness/control/RegisterHandler.java | 2 +-
.../beam/fn/harness/data/BeamFnDataGrpcClient.java | 2 +-
.../fn/harness/data/QueueingBeamFnDataClient.java | 182 ++
.../fn/harness/logging/BeamFnLoggingClient.java | 12 +-
.../apache/beam/fn/harness/state/BagUserState.java | 2 +-
.../harness/state/BeamFnStateGrpcClientCache.java | 4 +-
.../beam/fn/harness/state/FnApiStateAccessor.java | 2 +-
.../beam/fn/harness/state/MultimapSideInput.java | 2 +-
.../fn/harness/state/StateFetchingIterators.java | 2 +-
.../stream/HarnessStreamObserverFactories.java | 2 +-
.../beam/fn/harness/BoundedSourceRunnerTest.java | 2 +-
.../beam/fn/harness/FnApiDoFnRunnerTest.java | 138 +-
.../org/apache/beam/fn/harness/FnHarnessTest.java | 6 +-
.../harness/control/BeamFnControlClientTest.java | 8 +-
.../harness/control/ProcessBundleHandlerTest.java | 2 +-
.../fn/harness/data/BeamFnDataGrpcClientTest.java | 14 +-
.../data/BeamFnDataInboundObserverTest.java | 2 +-
...Test.java => QueueingBeamFnDataClientTest.java} | 229 ++-
.../harness/logging/BeamFnLoggingClientTest.java | 16 +-
.../beam/fn/harness/state/BagUserStateTest.java | 2 +-
.../state/BeamFnStateGrpcClientCacheTest.java | 16 +-
.../fn/harness/state/FakeBeamFnStateClient.java | 2 +-
.../fn/harness/state/MultimapSideInputTest.java | 2 +-
.../harness/state/StateFetchingIteratorsTest.java | 2 +-
.../stream/HarnessStreamObserverFactoriesTest.java | 4 +-
.../beam/sdk/io/aws/options/AwsModuleTest.java | 4 +-
.../beam/sdk/io/aws/s3/S3ResourceIdTest.java | 2 +-
.../sdk/io/aws/s3/S3WritableByteChannelTest.java | 2 +-
.../org/apache/beam/sdk/io/aws/sns/SnsIOTest.java | 3 +-
.../apache/beam/sdk/io/cassandra/CassandraIO.java | 161 +-
.../beam/sdk/io/cassandra/CassandraService.java | 16 +-
.../sdk/io/cassandra/CassandraServiceImpl.java | 76 +-
.../beam/sdk/io/cassandra/CassandraIOTest.java | 66 +-
sdks/java/io/common/build.gradle | 2 +-
.../sdk/io/elasticsearch/ElasticsearchIOTest.java | 4 +-
.../sdk/io/elasticsearch/ElasticsearchIOTest.java | 4 +-
.../sdk/io/elasticsearch/ElasticsearchIOTest.java | 4 +-
.../beam/sdk/io/elasticsearch/ElasticsearchIO.java | 74 +-
sdks/java/io/file-based-io-tests/build.gradle | 3 +-
.../io/common/FileBasedIOTestPipelineOptions.java | 19 +
.../java/org/apache/beam/sdk/io/text/TextIOIT.java | 44 +-
.../apache/beam/sdk/io/gcp/bigquery/AvroUtils.java | 2 +-
.../beam/sdk/io/gcp/bigquery/BigQueryHelpers.java | 10 +-
.../beam/sdk/io/gcp/bigquery/BigQueryIO.java | 4 +-
.../sdk/io/gcp/bigquery/BigQueryServicesImpl.java | 35 +-
.../beam/sdk/io/gcp/bigquery/BigQueryUtils.java | 2 +-
.../beam/sdk/io/gcp/bigtable/BigtableIO.java | 2 +-
.../apache/beam/sdk/io/gcp/pubsub/PubsubIO.java | 136 +-
.../beam/sdk/io/gcp/testing/BigqueryClient.java | 8 +-
.../sdk/io/gcp/bigquery/BigQueryAvroUtilsTest.java | 2 +-
.../io/gcp/bigquery/BigQueryServicesImplTest.java | 88 +-
.../sdk/io/gcp/bigquery/BigQueryToTableIT.java | 6 +-
.../beam/sdk/io/gcp/datastore/V1TestUtil.java | 2 +-
.../beam/sdk/io/gcp/pubsub/PubsubIOTest.java | 193 ++
.../beam/sdk/io/gcp/spanner/OrderedCodeTest.java | 2 +-
.../build.gradle | 55 +-
.../io/hadoop/format/ExternalSynchronization.java | 62 +
.../sdk/io/hadoop/format/HDFSSynchronization.java | 186 ++
.../beam/sdk/io/hadoop/format/HadoopFormatIO.java | 1987 +++++++++++++++++++
.../beam/sdk/io/hadoop/format/HadoopFormats.java | 243 +++
.../sdk/io/hadoop/format/IterableCombinerFn.java | 140 ++
.../beam/sdk/io/hadoop/format}/package-info.java | 10 +-
.../format/ConfigurableEmployeeInputFormat.java | 126 ++
.../apache/beam/sdk/io/hadoop/format/Employee.java | 87 +
.../sdk/io/hadoop/format}/EmployeeInputFormat.java | 10 +-
.../sdk/io/hadoop/format/EmployeeOutputFormat.java | 73 +
.../io/hadoop/format/HDFSSynchronizationTest.java | 173 ++
.../hadoop/format/HadoopFormatIOCassandraIT.java | 197 ++
.../hadoop/format/HadoopFormatIOCassandraTest.java | 235 +++
.../io/hadoop/format/HadoopFormatIOElasticIT.java | 220 +++
.../hadoop/format/HadoopFormatIOElasticTest.java | 277 +++
.../sdk/io/hadoop/format/HadoopFormatIOIT.java | 189 ++
.../io/hadoop/format/HadoopFormatIOReadTest.java} | 151 +-
.../format/HadoopFormatIOSequenceFileTest.java | 372 ++++
.../hadoop/format/HadoopFormatIOTestOptions.java | 76 +
.../io/hadoop/format/HadoopFormatIOWriteTest.java | 314 +++
.../sdk/io/hadoop/format/IterableCombinerTest.java | 98 +
.../format}/ReuseObjectsEmployeeInputFormat.java | 10 +-
.../sdk/io/hadoop/format}/TestEmployeeDataSet.java | 4 +-
.../sdk/io/hadoop/format}/TestRowDBWritable.java | 13 +-
.../src/test/resources/cassandra.yaml | 0
sdks/java/io/hadoop-input-format/build.gradle | 1 +
.../io/hadoop/inputformat/HadoopInputFormatIO.java | 871 +--------
.../io/hadoop/inputformat/EmployeeInputFormat.java | 6 +-
.../inputformat/HadoopInputFormatIOTest.java | 462 -----
.../ReuseObjectsEmployeeInputFormat.java | 4 +-
.../io/hadoop/inputformat/TestEmployeeDataSet.java | 2 +-
.../io/hadoop/inputformat/TestRowDBWritable.java | 9 +-
.../beam/sdk/io/hbase/HBaseReadSplittableDoFn.java | 3 +-
.../java/org/apache/beam/sdk/io/jdbc/JdbcIO.java | 3 +-
.../java/org/apache/beam/sdk/io/jms/JmsIO.java | 29 +-
.../java/org/apache/beam/sdk/io/kafka/KafkaIO.java | 13 +
.../org/apache/beam/sdk/io/kafka/KafkaIOTest.java | 4 +-
.../beam/sdk/io/kinesis/ShardReadersPool.java | 25 +-
.../beam/sdk/io/kinesis/ShardReadersPoolTest.java | 2 +-
sdks/java/io/kudu/build.gradle | 2 +-
sdks/java/io/mongodb/build.gradle | 6 +-
.../beam/sdk/io/mongodb/MongoDbGridFSIO.java | 4 +-
.../org/apache/beam/sdk/io/mongodb/MongoDbIO.java | 186 +-
.../org/apache/beam/sdk/io/mongodb/SSLUtils.java | 75 +
.../beam/sdk/io/mongodb/MongoDBGridFSIOTest.java | 2 +-
.../apache/beam/sdk/io/mongodb/MongoDbIOTest.java | 93 +
sdks/java/io/rabbitmq/build.gradle | 2 +-
sdks/java/io/synthetic/build.gradle | 2 +-
.../beam/sdk/io/synthetic/SyntheticOptions.java | 8 +
sdks/java/io/tika/build.gradle | 2 +-
.../java/org/apache/beam/sdk/io/xml/XmlIO.java | 4 +-
sdks/java/javadoc/build.gradle | 69 +-
sdks/java/maven-archetypes/examples/build.gradle | 2 +-
.../src/main/resources/archetype-resources/pom.xml | 28 +
sdks/java/maven-archetypes/starter/build.gradle | 2 +-
sdks/java/testing/load-tests/build.gradle | 3 +-
.../beam/sdk/loadtests/CoGroupByKeyLoadTest.java | 22 +-
.../apache/beam/sdk/loadtests/CombineLoadTest.java | 22 +-
.../beam/sdk/loadtests/ConsoleResultPublisher.java | 15 +-
.../beam/sdk/loadtests/GroupByKeyLoadTest.java | 13 +-
.../org/apache/beam/sdk/loadtests/LoadTest.java | 64 +-
.../apache/beam/sdk/loadtests/LoadTestOptions.java | 17 +
.../apache/beam/sdk/loadtests/LoadTestResult.java | 67 +
.../apache/beam/sdk/loadtests/ParDoLoadTest.java | 9 +-
.../loadtests/SyntheticDataPubSubPublisher.java | 107 +
.../{MetricsPublisher.java => ByteMonitor.java} | 30 +-
.../{MetricsMonitor.java => TimeMonitor.java} | 20 +-
sdks/java/testing/nexmark/build.gradle | 2 +-
.../java/org/apache/beam/sdk/nexmark/Main.java | 39 +-
.../beam/sdk/nexmark/NexmarkConfiguration.java | 4 +
.../apache/beam/sdk/nexmark/NexmarkLauncher.java | 62 +-
.../org/apache/beam/sdk/nexmark/NexmarkPerf.java | 11 +-
.../apache/beam/sdk/nexmark/NexmarkQueryName.java | 3 +-
.../org/apache/beam/sdk/nexmark/NexmarkUtils.java | 58 +-
.../sdk/nexmark/queries/SessionSideInputJoin.java | 106 +
.../nexmark/queries/SessionSideInputJoinModel.java | 151 ++
.../sdk/nexmark/sources/generator/Generator.java | 3 +-
.../sources/generator/model/PersonGenerator.java | 8 +-
.../beam/sdk/nexmark/PerfsToBigQueryTest.java | 50 +-
.../nexmark/queries/SessionSideInputJoinTest.java | 212 ++
sdks/java/testing/test-utils/build.gradle | 2 +-
.../apache/beam/sdk/testutils/NamedTestResult.java | 76 +
.../org/apache/beam/sdk/testutils/TestResult.java | 16 +-
.../beam/sdk/testutils/metrics/MetricsReader.java | 97 +-
.../apache/beam/sdk/testutils}/package-info.java | 5 +-
.../sdk/testutils/publishing/BigQueryClient.java | 12 +
.../publishing/BigQueryResultsPublisher.java | 60 +
.../sdk/testutils/fakes/FakeBigQueryClient.java | 4 +-
...ient.java => FakeBigQueryResultsPublisher.java} | 44 +-
.../sdk/testutils/metrics/MetricsReaderTest.java | 52 +-
.../publishing/BigQueryResultsPublisherTest.java | 102 +
sdks/python/apache_beam/coders/coder_impl.pxd | 65 +-
sdks/python/apache_beam/coders/coder_impl.py | 169 +-
sdks/python/apache_beam/coders/coders.py | 147 +-
.../apache_beam/coders/coders_test_common.py | 52 +-
.../cookbook/bigquery_tornadoes_it_test.py | 2 +-
.../apache_beam/examples/snippets/snippets.py | 5 +-
sdks/python/apache_beam/internal/http_client.py | 70 +
.../apache_beam/internal/http_client_test.py | 109 ++
sdks/python/apache_beam/io/__init__.py | 1 +
sdks/python/apache_beam/io/filebasedsource_test.py | 20 +-
sdks/python/apache_beam/io/filesystem.py | 4 +-
sdks/python/apache_beam/io/filesystem_test.py | 23 +-
sdks/python/apache_beam/io/filesystemio.py | 4 +-
sdks/python/apache_beam/io/filesystemio_test.py | 33 +-
sdks/python/apache_beam/io/gcp/bigquery.py | 18 +-
sdks/python/apache_beam/io/gcp/bigquery_test.py | 21 +-
sdks/python/apache_beam/io/gcp/gcsio.py | 174 +-
.../apache_beam/io/gcp/gcsio_integration_test.py | 183 ++
sdks/python/apache_beam/io/gcp/gcsio_test.py | 48 +-
sdks/python/apache_beam/io/gcp/pubsub_test.py | 106 +-
sdks/python/apache_beam/io/parquetio.py | 472 +++++
sdks/python/apache_beam/io/parquetio_it_test.py | 176 ++
sdks/python/apache_beam/io/parquetio_test.py | 463 +++++
sdks/python/apache_beam/io/range_trackers.py | 28 +-
sdks/python/apache_beam/io/range_trackers_test.py | 133 +-
.../apache_beam/io/source_test_utils_test.py | 29 +-
sdks/python/apache_beam/io/sources_test.py | 18 +-
sdks/python/apache_beam/io/textio.py | 27 +-
sdks/python/apache_beam/io/textio_test.py | 26 +
sdks/python/apache_beam/io/tfrecordio_test.py | 9 +-
.../python/apache_beam/options/pipeline_options.py | 8 +
.../apache_beam/options/pipeline_options_test.py | 17 +
sdks/python/apache_beam/pipeline.py | 13 +-
sdks/python/apache_beam/portability/python_urns.py | 16 +
.../runners/dataflow/dataflow_runner.py | 106 +-
.../runners/dataflow/internal/apiclient.py | 51 +-
.../apache_beam/runners/dataflow/internal/names.py | 4 +-
.../runners/dataflow/test_dataflow_runner.py | 13 +-
.../apache_beam/runners/direct/direct_runner.py | 14 +-
.../runners/direct/test_direct_runner.py | 10 +-
.../runners/interactive/interactive_runner.py | 12 +-
.../python/apache_beam/runners/pipeline_context.py | 7 +-
.../runners/portability/flink_runner_test.py | 93 +-
.../runners/portability/fn_api_runner.py | 1277 ++++--------
.../runners/portability/fn_api_runner_test.py | 69 +-
.../portability/fn_api_runner_transforms.py | 948 +++++++++
.../apache_beam/runners/portability/job_server.py | 27 +-
.../runners/portability/local_job_service.py | 84 +-
.../runners/portability/local_job_service_main.py | 4 +-
.../runners/portability/portable_runner.py | 153 +-
.../runners/portability/portable_runner_test.py | 45 +-
.../apache_beam/runners/portability/stager.py | 3 +-
sdks/python/apache_beam/runners/runner.py | 14 +-
.../apache_beam/runners/worker/bundle_processor.py | 90 +-
.../apache_beam/runners/worker/data_plane.py | 14 +-
.../apache_beam/runners/worker/operations.py | 2 +
.../apache_beam/runners/worker/sdk_worker.py | 77 +-
.../runners/worker/worker_id_interceptor.py | 5 +-
.../testing/load_tests/co_group_by_key_test.py | 75 +-
.../apache_beam/testing/load_tests/combine_test.py | 69 +-
.../testing/load_tests/group_by_key_test.py | 76 +-
.../testing/load_tests/load_test_metrics_utils.py | 165 +-
.../apache_beam/testing/load_tests/pardo_test.py | 132 +-
.../testing/load_tests/sideinput_test.py | 203 ++
.../apache_beam/testing/synthetic_pipeline.py | 2 +-
.../apache_beam/testing/synthetic_pipeline_test.py | 2 +-
sdks/python/apache_beam/testing/test_utils.py | 7 +-
.../apache_beam/tools/coders_microbenchmark.py | 50 +-
sdks/python/apache_beam/transforms/core.py | 6 +-
.../apache_beam/transforms/ptransform_test.py | 114 +-
.../apache_beam/transforms/userstate_test.py | 32 +-
sdks/python/apache_beam/transforms/util.py | 18 +-
sdks/python/apache_beam/transforms/util_test.py | 8 +
sdks/python/apache_beam/transforms/window.py | 26 +-
sdks/python/apache_beam/utils/proto_utils.py | 8 +-
sdks/python/apache_beam/utils/windowed_value.pxd | 19 +
sdks/python/apache_beam/utils/windowed_value.py | 61 +-
sdks/python/build.gradle | 161 +-
sdks/python/container/base_image_requirements.txt | 2 +-
sdks/python/container/build.gradle | 4 +-
.../python/precommit/dataflow}/build.gradle | 27 +-
sdks/python/scripts/generate_pydoc.sh | 1 +
sdks/python/scripts/run_integration_test.sh | 20 +-
sdks/python/setup.py | 24 +-
sdks/python/tox.ini | 4 +-
settings.gradle | 20 +
vendor/grpc-1_13_1/build.gradle | 4 +-
vendor/sdks-java-extensions-protobuf/build.gradle | 13 +-
website/Gemfile.lock | 16 +-
website/Rakefile | 13 +-
website/_config.yml | 2 +-
website/src/.htaccess | 2 +-
website/src/_data/authors.yml | 3 +
website/src/_includes/section-menu/community.html | 1 +
website/src/_posts/2017-01-09-added-apex-runner.md | 4 +-
website/src/_posts/2018-12-13-beam-2.9.0.md | 62 +
website/src/community/contact-us.md | 2 +-
website/src/community/in-person.md | 47 +
website/src/contribute/committer-guide.md | 37 +-
website/src/contribute/index.md | 25 +-
website/src/contribute/postcommits-guides.md | 2 +-
.../src/contribute/postcommits-policies-details.md | 18 +
website/src/contribute/ptransform-style-guide.md | 4 +-
website/src/contribute/release-guide.md | 2 +-
.../documentation/io/built-in-google-bigquery.md | 2 +-
website/src/documentation/runners/apex.md | 2 +-
website/src/documentation/runners/flink.md | 2 +-
website/src/documentation/sdks/euphoria.md | 4 +-
.../src/documentation/sdks/java-dependencies.md | 497 +++--
website/src/documentation/sdks/nexmark.md | 7 +-
.../src/documentation/sdks/python-dependencies.md | 64 +-
website/src/get-started/downloads.md | 7 +
997 files changed, 28980 insertions(+), 21156 deletions(-)
rename .test-infra/jenkins/{job_PostCommit_Go_GradleBuild.groovy => job_PostCommit_Go.groovy} (91%)
copy .test-infra/jenkins/{job_PostCommit_Java_GradleBuild.groovy => job_PostCommit_Java.groovy} (92%)
rename .test-infra/jenkins/{job_PostCommit_Java_PortabilityApi_GradleBuild.groovy => job_PostCommit_Java_PortabilityApi.groovy} (94%)
copy .test-infra/jenkins/{job_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow.groovy => job_PostCommit_Java_ValidatesRunner_DataflowPortabilityExecutableStage.groovy} (83%)
rename .test-infra/jenkins/{job_PostCommit_Java_GradleBuild.groovy => job_PostCommit_SQL.groovy} (78%)
rename sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/array/package-info.java => .test-infra/jenkins/job_PreCommit_Portable_Python.groovy (71%)
rename .test-infra/jenkins/{job_Release_Gradle_NightlySnapshot.groovy => job_Release_NightlySnapshot.groovy} (96%)
copy runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/{SerializablePipelineOptions.java => PipelineOptionsSerializationUtils.java} (58%)
create mode 100644 runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExecutableStageTranslationTest.java
create mode 100644 runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilder.java
rename runners/{samza/src/main/java/org/apache/beam/runners/samza/util => core-java/src/main/java/org/apache/beam/runners/core/serialization}/Base64Serializer.java (94%)
rename {sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date => runners/core-java/src/main/java/org/apache/beam/runners/core/serialization}/package-info.java (89%)
create mode 100644 runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/SimpleMonitoringInfoBuilderTest.java
copy {model/pipeline => runners/flink/1.6}/build.gradle (59%)
copy {sdks/java/build-tools => runners/flink/1.6/job-server-container}/build.gradle (79%)
copy {model/pipeline => runners/flink/1.6/job-server}/build.gradle (65%)
copy runners/flink/{build.gradle => flink_runner.gradle} (89%)
copy runners/flink/job-server-container/{build.gradle => flink_job_server_container.gradle} (61%)
copy runners/flink/job-server/{build.gradle => flink_job_server.gradle} (50%)
rename runners/flink/src/main/java/org/apache/beam/runners/flink/{PipelineTranslationOptimizer.java => PipelineTranslationModeOptimizer.java} (79%)
create mode 100644 runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FileReporter.java
create mode 100644 runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/Metrics.java
create mode 100644 runners/flink/src/main/java/org/apache/beam/runners/flink/translation/utils/NoopLock.java
create mode 100644 runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkStreamingTransformTranslatorsTest.java
create mode 100644 runners/flink/src/test/java/org/apache/beam/runners/flink/FlinkTransformOverridesTest.java
create mode 100644 runners/flink/src/test/java/org/apache/beam/runners/flink/PipelineTranslationModeOptimizerTest.java
create mode 100644 runners/flink/src/test/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainerTest.java
create mode 100644 runners/flink/src/test/java/org/apache/beam/runners/flink/streaming/GroupByWithNullValuesTest.java
create mode 100644 runners/flink/src/test/java/org/apache/beam/runners/flink/translation/FlinkPipelineTranslatorUtilsTest.java
rename runners/flink/src/test/java/org/apache/beam/runners/flink/{streaming => translation/wrappers/streaming/io}/TestCountingSource.java (94%)
rename runners/flink/src/test/java/org/apache/beam/runners/flink/{streaming => translation/wrappers/streaming/io}/UnboundedSourceWrapperTest.java (82%)
rename runners/google-cloud-dataflow-java/{worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/IdGeneratorTest.java => src/test/java/org/apache/beam/runners/dataflow/util/GCSUploadMain.java} (55%)
delete mode 100644 runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/ServerFactory.java
delete mode 100644 runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/SocketAddressFactory.java
create mode 100644 runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/ProcessRemoteBundleOperation.java
copy runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/graph/{RegisterNodeFunction.java => CreateExecutableStageNodeFunction.java} (56%)
delete mode 100644 runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/ServerFactoryTest.java
delete mode 100644 runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/fn/SocketAddressFactoryTest.java
copy runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/{ProcessEnvironmentFactory.java => ExternalEnvironmentFactory.java} (59%)
create mode 100644 runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/StaticRemoteEnvironment.java
create mode 100644 runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/StaticRemoteEnvironmentFactory.java
create mode 100644 runners/reference/java/src/main/java/org/apache/beam/runners/reference/ExternalWorkerService.java
rename sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlOperator.java => runners/samza/src/main/java/org/apache/beam/runners/samza/runtime/SamzaDoFnInvokerRegistrar.java (57%)
create mode 100644 runners/samza/src/main/java/org/apache/beam/runners/samza/transforms/GroupWithoutRepartition.java
copy sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/FieldValueGetterFactory.java => runners/samza/src/main/java/org/apache/beam/runners/samza/transforms/UpdatingCombineFn.java (57%)
rename {sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/arithmetic => runners/samza/src/main/java/org/apache/beam/runners/samza/transforms}/package-info.java (87%)
rename runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/{ => mocks}/DatasetSourceMockBatch.java (72%)
rename runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/{ => mocks}/ReadSourceTranslatorMockBatch.java (81%)
create mode 100644 sdks/go/pkg/beam/core/runtime/exec/fn_arity.go
create mode 100644 sdks/go/pkg/beam/core/runtime/exec/fn_arity.tmpl
copy sdks/go/pkg/beam/core/util/ioutilx/{read.go => write.go} (67%)
create mode 100644 sdks/go/pkg/beam/core/util/reflectx/structs.go
copy sdks/go/pkg/beam/{core/util/ioutilx/read.go => util/gcsx/gcs_test.go} (63%)
create mode 100644 sdks/java/bom/build.gradle
create mode 100644 sdks/java/bom/pom.xml.template
create mode 100644 sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricsOptions.java
create mode 100644 sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
create mode 100644 sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/CachingFactory.java
rename sdks/java/{extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/comparison/package-info.java => core/src/main/java/org/apache/beam/sdk/schemas/Factory.java} (76%)
create mode 100644 sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/FieldValueTypeInformation.java
rename sdks/java/{extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/map/package-info.java => core/src/main/java/org/apache/beam/sdk/schemas/FieldValueTypeInformationFactory.java} (71%)
create mode 100644 sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/FromRowUsingCreator.java
create mode 100644 sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaUserTypeConstructorCreator.java
rename sdks/java/{extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/collection/package-info.java => core/src/main/java/org/apache/beam/sdk/schemas/SchemaUserTypeCreator.java} (73%)
create mode 100644 sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SetterBasedCreatorFactory.java
copy runners/flink/src/main/java/org/apache/beam/runners/flink/TranslationMode.java => sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/UserTypeCreatorFactory.java (76%)
create mode 100644 sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java
rename sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/{PojoValueGetterFactory.java => FieldValueTypeSupplier.java} (64%)
delete mode 100644 sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/JavaBeanGetterFactory.java
delete mode 100644 sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/JavaBeanSetterFactory.java
delete mode 100644 sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/PojoValueSetterFactory.java
copy runners/flink/src/main/java/org/apache/beam/runners/flink/TranslationMode.java => sdks/java/core/src/main/java/org/apache/beam/sdk/testing/DataflowPortabilityExecutableStageUnsupported.java (69%)
copy runners/flink/src/main/java/org/apache/beam/runners/flink/TranslationMode.java => sdks/java/core/src/main/java/org/apache/beam/sdk/testing/UsesSideInputs.java (77%)
copy sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/{SimpleFunction.java => InferableFunction.java} (64%)
copy sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/{SerializableFunction.java => ProcessFunction.java} (51%)
create mode 100644 sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/Backlog.java
create mode 100644 sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/Backlogs.java
copy runners/flink/src/main/java/org/apache/beam/runners/flink/TranslationMode.java => sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/Restrictions.java (65%)
create mode 100644 sdks/java/core/src/test/avro/org/apache/beam/sdk/schemas/test.avsc
create mode 100644 sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/AvroSchemaTest.java
create mode 100644 sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/CustomHttpErrors.java
copy runners/flink/src/main/java/org/apache/beam/runners/flink/TranslationMode.java => sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpCallCustomError.java (70%)
copy runners/flink/src/main/java/org/apache/beam/runners/flink/TranslationMode.java => sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpCallMatcher.java (67%)
rename sdks/java/extensions/{sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/BeamSqlExpressionExecutor.java => google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpRequestWrapper.java} (54%)
rename runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/IdGenerator.java => sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/HttpResponseWrapper.java (55%)
create mode 100644 sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/util/CustomHttpErrorsTest.java
copy sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/{UdfImpl.java => ScalarFunctionImpl.java} (84%)
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/BeamSqlExpressionEnvironment.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/BeamSqlExpressionEnvironments.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/BeamSqlFnExecutor.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlBinaryOperator.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlCaseExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlCastExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlCorrelVariableExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlDefaultExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlDotExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlInputRefExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlLocalRefExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlOperatorExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlPrimitive.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlUdfExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlUnaryOperator.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/DateOperators.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/StringOperators.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/arithmetic/BeamSqlArithmeticExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/arithmetic/BeamSqlDivideExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/arithmetic/BeamSqlMinusExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/arithmetic/BeamSqlModExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/arithmetic/BeamSqlMultiplyExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/arithmetic/BeamSqlPlusExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/array/BeamSqlArrayExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/array/BeamSqlArrayItemExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/collection/BeamSqlCardinalityExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/collection/BeamSqlSingleElementExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/comparison/BeamSqlCompareExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/comparison/BeamSqlEqualsExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/comparison/BeamSqlGreaterThanExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/comparison/BeamSqlGreaterThanOrEqualsExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/comparison/BeamSqlIsNotNullExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/comparison/BeamSqlIsNullExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/comparison/BeamSqlLessThanExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/comparison/BeamSqlLessThanOrEqualsExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/comparison/BeamSqlLikeExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/comparison/BeamSqlNotEqualsExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/comparison/BeamSqlNotLikeExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlCurrentDateExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlCurrentTimeExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlCurrentTimestampExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimePlusExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlIntervalMultiplyExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlTimestampMinusIntervalExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlTimestampMinusTimestampExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/TimeUnitUtils.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/logical/BeamSqlAndExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/logical/BeamSqlNotExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/logical/BeamSqlOrExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/map/BeamSqlMapExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/map/BeamSqlMapItemExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlAbsExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlAcosExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlAsinExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlAtan2Expression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlAtanExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlCeilExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlCosExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlCotExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlDegreesExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlExpExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlFloorExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlLnExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlLogExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlMathBinaryExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlMathUnaryExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlPiExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlPowerExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlRadiansExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlRandExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlRandIntegerExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlRoundExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlSignExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlSinExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlTanExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlTruncateExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/package-info.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/package-info.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/reinterpret/BeamSqlReinterpretExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/reinterpret/DatetimeReinterpretConversions.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/reinterpret/IntegerReinterpretConversions.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/reinterpret/ReinterpretConversion.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/reinterpret/Reinterpreter.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/reinterpret/package-info.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/row/BeamSqlFieldAccessExpression.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/row/package-info.java
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/package-info.java
rename sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/{interpreter/operator/logical/BeamSqlLogicalExpression.java => planner/BeamJavaTypeFactory.java} (50%)
delete mode 100644 sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/SqlTypeUtils.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/BeamSqlFnExecutorTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/BeamSqlFnExecutorTestBase.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamNullExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlAndOrExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlCaseExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlCastExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlCompareExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlDotExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlInputRefExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlPrimitiveTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlReinterpretExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/BeamSqlUdfExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/arithmetic/BeamSqlArithmeticExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/array/BeamSqlArrayExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/array/BeamSqlArrayItemExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/collection/BeamSqlCardinalityExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/collection/BeamSqlSingleElementExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlCurrentDateExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlCurrentTimeExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlCurrentTimestampExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDateExpressionTestBase.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimePlusExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlIntervalMultiplyExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlTimestampMinusIntervalExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlTimestampMinusTimestampExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/TimeUnitUtilsTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/logical/BeamSqlNotExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlMathBinaryExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/math/BeamSqlMathUnaryExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/reinterpret/DatetimeReinterpretConversionsTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/reinterpret/IntegerReinterpretConversionsTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/reinterpret/ReinterpretConversionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/reinterpret/ReinterpreterTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/row/BeamSqlFieldAccessExpressionTest.java
delete mode 100644 sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/utils/SqlTypeUtilsTest.java
create mode 100644 sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/QueueingBeamFnDataClient.java
copy sdks/java/harness/src/test/java/org/apache/beam/fn/harness/data/{BeamFnDataGrpcClientTest.java => QueueingBeamFnDataClientTest.java} (63%)
copy sdks/java/io/{hadoop-input-format => hadoop-format}/build.gradle (66%)
create mode 100644 sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/ExternalSynchronization.java
create mode 100644 sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HDFSSynchronization.java
create mode 100644 sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java
create mode 100644 sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormats.java
create mode 100644 sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/IterableCombinerFn.java
copy sdks/java/{extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/logical => io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format}/package-info.java (76%)
create mode 100644 sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/ConfigurableEmployeeInputFormat.java
create mode 100644 sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/Employee.java
copy sdks/java/io/{hadoop-input-format/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat => hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format}/EmployeeInputFormat.java (93%)
create mode 100644 sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/EmployeeOutputFormat.java
create mode 100644 sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HDFSSynchronizationTest.java
create mode 100644 sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraIT.java
create mode 100644 sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOCassandraTest.java
create mode 100644 sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOElasticIT.java
create mode 100644 sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOElasticTest.java
create mode 100644 sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOIT.java
copy sdks/java/io/{hadoop-input-format/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIOTest.java => hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOReadTest.java} (85%)
create mode 100644 sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOSequenceFileTest.java
create mode 100644 sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOTestOptions.java
create mode 100644 sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIOWriteTest.java
create mode 100644 sdks/java/io/hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format/IterableCombinerTest.java
copy sdks/java/io/{hadoop-input-format/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat => hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format}/ReuseObjectsEmployeeInputFormat.java (93%)
copy sdks/java/io/{hadoop-input-format/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat => hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format}/TestEmployeeDataSet.java (97%)
copy sdks/java/io/{hadoop-input-format/src/test/java/org/apache/beam/sdk/io/hadoop/inputformat => hadoop-format/src/test/java/org/apache/beam/sdk/io/hadoop/format}/TestRowDBWritable.java (88%)
copy sdks/java/io/{hadoop-input-format => hadoop-format}/src/test/resources/cassandra.yaml (100%)
create mode 100644 sdks/java/io/mongodb/src/main/java/org/apache/beam/sdk/io/mongodb/SSLUtils.java
copy runners/flink/src/main/java/org/apache/beam/runners/flink/TranslationMode.java => sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/ConsoleResultPublisher.java (69%)
create mode 100644 sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTestResult.java
create mode 100644 sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/SyntheticDataPubSubPublisher.java
rename sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/metrics/{MetricsPublisher.java => ByteMonitor.java} (51%)
rename sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/metrics/{MetricsMonitor.java => TimeMonitor.java} (69%)
create mode 100644 sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/SessionSideInputJoin.java
create mode 100644 sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/SessionSideInputJoinModel.java
create mode 100644 sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/queries/SessionSideInputJoinTest.java
create mode 100644 sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/NamedTestResult.java
rename runners/flink/src/main/java/org/apache/beam/runners/flink/TranslationMode.java => sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/TestResult.java (70%)
rename sdks/java/{extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/logical => testing/test-utils/src/main/java/org/apache/beam/sdk/testutils}/package-info.java (88%)
create mode 100644 sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/publishing/BigQueryResultsPublisher.java
copy sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/fakes/{FakeBigQueryClient.java => FakeBigQueryResultsPublisher.java} (52%)
create mode 100644 sdks/java/testing/test-utils/src/test/java/org/apache/beam/sdk/testutils/publishing/BigQueryResultsPublisherTest.java
create mode 100644 sdks/python/apache_beam/internal/http_client.py
create mode 100644 sdks/python/apache_beam/internal/http_client_test.py
create mode 100644 sdks/python/apache_beam/io/gcp/gcsio_integration_test.py
create mode 100644 sdks/python/apache_beam/io/parquetio.py
create mode 100644 sdks/python/apache_beam/io/parquetio_it_test.py
create mode 100644 sdks/python/apache_beam/io/parquetio_test.py
create mode 100644 sdks/python/apache_beam/runners/portability/fn_api_runner_transforms.py
create mode 100644 sdks/python/apache_beam/testing/load_tests/sideinput_test.py
copy {model/job-management => sdks/python/precommit/dataflow}/build.gradle (50%)
create mode 100644 website/src/_posts/2018-12-13-beam-2.9.0.md
create mode 100644 website/src/community/in-person.md
[beam] 32/50: Improve exception flow
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit a3a87b49d589061c280cfc982a85ec1f85dd0138
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Tue Dec 11 16:00:26 2018 +0100
Improve exception flow
---
.../spark/structuredstreaming/translation/io/DatasetSource.java | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
index 75cdd5d..d23ecf3 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
@@ -30,7 +30,6 @@ import org.apache.beam.sdk.io.BoundedSource;
import org.apache.beam.sdk.io.BoundedSource.BoundedReader;
import org.apache.beam.sdk.util.WindowedValue;
import org.apache.spark.sql.catalyst.InternalRow;
-import org.apache.spark.sql.sources.DataSourceRegister;
import org.apache.spark.sql.sources.v2.ContinuousReadSupport;
import org.apache.spark.sql.sources.v2.DataSourceOptions;
import org.apache.spark.sql.sources.v2.DataSourceV2;
@@ -137,6 +136,8 @@ public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport{
try {
reader = source.createReader(options);
} catch (IOException e) {
+ throw new RuntimeException(
+ "Error creating BoundedReader " + reader.getClass().getCanonicalName(), e);
}
return new DatasetMicroBatchPartitionReader(reader);
}
@@ -145,9 +146,9 @@ public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport{
return result;
} catch (Exception e) {
- e.printStackTrace();
+ throw new RuntimeException(
+ "Error in splitting BoundedSource " + source.getClass().getCanonicalName(), e);
}
- return result;
}
}
[beam] 24/50: Create Datasets manipulation methods
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 31fb182784d86a633ad619d27dd5454ecff3291f
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Nov 29 16:11:35 2018 +0100
Create Datasets manipulation methods
---
.../translation/TranslationContext.java | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index a3276bf..98f77af 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -46,7 +46,9 @@ import org.apache.spark.sql.streaming.StreamingQueryException;
*/
public class TranslationContext {
+ /** All the datasets of the DAG */
private final Map<PValue, Dataset<?>> datasets;
+ /** datasets that are not used as input to other datasets (leaves of the DAG) */
private final Set<Dataset<?>> leaves;
private final SparkPipelineOptions options;
@@ -68,7 +70,7 @@ public class TranslationContext {
this.sparkSession = SparkSession.builder().config(sparkConf).getOrCreate();
this.options = options;
this.datasets = new HashMap<>();
- this.leaves = new LinkedHashSet<>();
+ this.leaves = new HashSet<>();
}
// --------------------------------------------------------------------------------------------
@@ -82,6 +84,20 @@ public class TranslationContext {
// Datasets methods
// --------------------------------------------------------------------------------------------
+ @SuppressWarnings("unchecked")
+ public <T> Dataset<WindowedValue<T>> getDataset(PValue value) {
+ Dataset<?> dataset = datasets.get(value);
+ // assume that the Dataset is used as an input if retrieved here. So it is not a leaf anymore
+ leaves.remove(dataset);
+ return (Dataset<WindowedValue<T>>) dataset;
+ }
+
+ public <T> void putDataset(PValue value, Dataset<WindowedValue<T>> dataset) {
+ if (!datasets.containsKey(value)) {
+ datasets.put(value, dataset);
+ leaves.add(dataset);
+ }
+ }
// --------------------------------------------------------------------------------------------
// PCollections methods
[beam] 30/50: Apply spotless
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit ebbab698a32c2c3b721f21a9805ad99927246f22
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Fri Dec 7 12:08:51 2018 +0100
Apply spotless
---
.../translation/TranslationContext.java | 7 --
.../batch/ReadSourceTranslatorBatch.java | 4 -
.../translation/io/DatasetSource.java | 109 +++++++++++++--------
3 files changed, 68 insertions(+), 52 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index 52ed11f..0f2493d 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -33,18 +33,11 @@ import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PValue;
import org.apache.beam.sdk.values.TupleTag;
-import org.apache.beam.sdk.values.WindowingStrategy;
import org.apache.spark.SparkConf;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.ForeachWriter;
-import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
-import org.apache.spark.sql.execution.datasources.DataSource;
-import org.apache.spark.sql.execution.streaming.Source;
-import org.apache.spark.sql.sources.v2.DataSourceOptions;
-import org.apache.spark.sql.sources.v2.ReadSupport;
-import org.apache.spark.sql.sources.v2.reader.DataSourceReader;
import org.apache.spark.sql.streaming.StreamingQueryException;
/**
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
index 05dc374..63f2fdf 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
@@ -19,14 +19,12 @@ package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import java.io.IOException;
import org.apache.beam.runners.core.construction.ReadTranslation;
-import org.apache.beam.runners.core.construction.SerializablePipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
import org.apache.beam.runners.spark.structuredstreaming.translation.io.DatasetSource;
import org.apache.beam.sdk.io.BoundedSource;
import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;
import org.apache.spark.sql.Dataset;
@@ -57,6 +55,4 @@ class ReadSourceTranslatorBatch<T>
context.putDataset(output, dataset);
}
-
-
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
index 60bdab6..f230a70 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation.io;
import static com.google.common.base.Preconditions.checkArgument;
@@ -25,8 +42,8 @@ import org.apache.spark.sql.types.StructType;
/**
* This is a spark structured streaming {@link DataSourceV2} implementation. As Continuous streaming
- * is tagged experimental in spark, this class does no implement {@link ContinuousReadSupport}.
- * This class is just a mix-in.
+ * is tagged experimental in spark, this class does no implement {@link ContinuousReadSupport}. This
+ * class is just a mix-in.
*/
public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport {
@@ -41,79 +58,87 @@ public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport {
this.numPartitions = context.getSparkSession().sparkContext().defaultParallelism();
checkArgument(this.numPartitions > 0, "Number of partitions must be greater than zero.");
this.bundleSize = context.getOptions().getBundleSize();
-
}
- @Override public MicroBatchReader createMicroBatchReader(Optional<StructType> schema,
- String checkpointLocation, DataSourceOptions options) {
+ @Override
+ public MicroBatchReader createMicroBatchReader(
+ Optional<StructType> schema, String checkpointLocation, DataSourceOptions options) {
return new DatasetMicroBatchReader(schema, checkpointLocation, options);
}
- /**
- * This class can be mapped to Beam {@link BoundedSource}.
- */
+ /** This class can be mapped to Beam {@link BoundedSource}. */
private class DatasetMicroBatchReader implements MicroBatchReader {
private Optional<StructType> schema;
private String checkpointLocation;
private DataSourceOptions options;
- private DatasetMicroBatchReader(Optional<StructType> schema, String checkpointLocation,
- DataSourceOptions options) {
+ private DatasetMicroBatchReader(
+ Optional<StructType> schema, String checkpointLocation, DataSourceOptions options) {
//TODO deal with schema and options
}
- @Override public void setOffsetRange(Optional<Offset> start, Optional<Offset> end) {
+ @Override
+ public void setOffsetRange(Optional<Offset> start, Optional<Offset> end) {
//TODO extension point for SDF
}
- @Override public Offset getStartOffset() {
+ @Override
+ public Offset getStartOffset() {
//TODO extension point for SDF
return null;
}
- @Override public Offset getEndOffset() {
+ @Override
+ public Offset getEndOffset() {
//TODO extension point for SDF
return null;
}
- @Override public Offset deserializeOffset(String json) {
+ @Override
+ public Offset deserializeOffset(String json) {
//TODO extension point for SDF
return null;
}
- @Override public void commit(Offset end) {
+ @Override
+ public void commit(Offset end) {
//TODO no more to read after end Offset
}
- @Override public void stop() {
- }
+ @Override
+ public void stop() {}
- @Override public StructType readSchema() {
+ @Override
+ public StructType readSchema() {
return null;
}
- @Override public List<InputPartition<InternalRow>> planInputPartitions() {
+ @Override
+ public List<InputPartition<InternalRow>> planInputPartitions() {
List<InputPartition<InternalRow>> result = new ArrayList<>();
long desiredSizeBytes;
SparkPipelineOptions options = context.getOptions();
try {
- desiredSizeBytes = (bundleSize == null) ?
- source.getEstimatedSizeBytes(options) / numPartitions :
- bundleSize;
+ desiredSizeBytes =
+ (bundleSize == null)
+ ? source.getEstimatedSizeBytes(options) / numPartitions
+ : bundleSize;
List<? extends BoundedSource<T>> sources = source.split(desiredSizeBytes, options);
for (BoundedSource<T> source : sources) {
- result.add(new InputPartition<InternalRow>() {
-
- @Override public InputPartitionReader<InternalRow> createPartitionReader() {
- BoundedReader<T> reader = null;
- try {
- reader = source.createReader(options);
- } catch (IOException e) {
- }
- return new DatasetMicroBatchPartitionReader(reader);
- }
- });
+ result.add(
+ new InputPartition<InternalRow>() {
+
+ @Override
+ public InputPartitionReader<InternalRow> createPartitionReader() {
+ BoundedReader<T> reader = null;
+ try {
+ reader = source.createReader(options);
+ } catch (IOException e) {
+ }
+ return new DatasetMicroBatchPartitionReader(reader);
+ }
+ });
}
return result;
@@ -122,12 +147,9 @@ public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport {
}
return result;
}
-
}
- /**
- * This class can be mapped to Beam {@link BoundedReader}
- */
+ /** This class can be mapped to Beam {@link BoundedReader} */
private class DatasetMicroBatchPartitionReader implements InputPartitionReader<InternalRow> {
BoundedReader<T> reader;
@@ -140,7 +162,8 @@ public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport {
this.closed = false;
}
- @Override public boolean next() throws IOException {
+ @Override
+ public boolean next() throws IOException {
if (!started) {
started = true;
return reader.start();
@@ -149,13 +172,17 @@ public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport {
}
}
- @Override public InternalRow get() {
+ @Override
+ public InternalRow get() {
List<Object> list = new ArrayList<>();
- list.add(WindowedValue.timestampedValueInGlobalWindow(reader.getCurrent(), reader.getCurrentTimestamp()));
+ list.add(
+ WindowedValue.timestampedValueInGlobalWindow(
+ reader.getCurrent(), reader.getCurrentTimestamp()));
return InternalRow.apply(asScalaBuffer(list).toList());
}
- @Override public void close() throws IOException {
+ @Override
+ public void close() throws IOException {
closed = true;
reader.close();
}
[beam] 44/50: Add ReadSourceTranslatorStreaming
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 92a104e680bb03a7ba16068ae80f055bbd82ea3a
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Fri Dec 28 10:28:18 2018 +0100
Add ReadSourceTranslatorStreaming
---
...mingSource.java => DatasetSourceStreaming.java} | 2 +-
.../streaming/ReadSourceTranslatorStreaming.java | 76 ++++++++++++++++++++++
2 files changed, 77 insertions(+), 1 deletion(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetStreamingSource.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java
similarity index 99%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetStreamingSource.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java
index 6947b6d..fad68d3 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetStreamingSource.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java
@@ -55,7 +55,7 @@ import scala.collection.immutable.Map;
* This is a spark structured streaming {@link DataSourceV2} implementation. As Continuous streaming
* is tagged experimental in spark, this class does no implement {@link ContinuousReadSupport}.
*/
-public class DatasetStreamingSource<T> implements DataSourceV2, MicroBatchReadSupport{
+public class DatasetSourceStreaming<T> implements DataSourceV2, MicroBatchReadSupport{
private int numPartitions;
private Long bundleSize;
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/ReadSourceTranslatorStreaming.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/ReadSourceTranslatorStreaming.java
new file mode 100644
index 0000000..6066822
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/ReadSourceTranslatorStreaming.java
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark.structuredstreaming.translation.streaming;
+
+import java.io.IOException;
+import org.apache.beam.runners.core.construction.ReadTranslation;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+import org.apache.beam.runners.spark.structuredstreaming.translation.batch.DatasetSourceBatch;
+import org.apache.beam.sdk.io.BoundedSource;
+import org.apache.beam.sdk.io.UnboundedSource;
+import org.apache.beam.sdk.runners.AppliedPTransform;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.spark.api.java.function.MapFunction;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Encoders;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+
+class ReadSourceTranslatorStreaming<T>
+ implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
+
+ private String SOURCE_PROVIDER_CLASS = DatasetSourceStreaming.class.getCanonicalName();
+
+ @SuppressWarnings("unchecked")
+ @Override
+ public void translateTransform(
+ PTransform<PBegin, PCollection<T>> transform, TranslationContext context) {
+ AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>> rootTransform =
+ (AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>>)
+ context.getCurrentTransform();
+
+ UnboundedSource<T, UnboundedSource.CheckpointMark> source;
+ try {
+ source = ReadTranslation
+ .unboundedSourceFromTransform(rootTransform);
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ SparkSession sparkSession = context.getSparkSession();
+
+ Dataset<Row> rowDataset = sparkSession.readStream().format(SOURCE_PROVIDER_CLASS).load();
+
+ //TODO pass the source and the translation context serialized as string to the DatasetSource
+ MapFunction<Row, WindowedValue> func = new MapFunction<Row, WindowedValue>() {
+ @Override public WindowedValue call(Row value) throws Exception {
+ //there is only one value put in each Row by the InputPartitionReader
+ return value.<WindowedValue>getAs(0);
+ }
+ };
+ //TODO: is there a better way than using the raw WindowedValue? Can an Encoder<WindowedVAlue<T>>
+ // be created ?
+ Dataset<WindowedValue> dataset = rowDataset.map(func, Encoders.kryo(WindowedValue.class));
+
+ PCollection<T> output = (PCollection<T>) context.getOutput();
+ context.putDatasetRaw(output, dataset);
+ }
+}
[beam] 26/50: Add primitive GroupByKeyTranslatorBatch implementation
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 9e6fc2cf8c4297996554473a19b2166254ee3f4d
Author: Alexey Romanenko <ar...@gmail.com>
AuthorDate: Fri Dec 7 10:54:12 2018 +0100
Add primitive GroupByKeyTranslatorBatch implementation
---
...KeyTranslatorBatch.java => EncoderHelpers.java} | 22 ++++------
.../translation/TranslationContext.java | 4 +-
.../batch/GroupByKeyTranslatorBatch.java | 49 ++++++++++++++++++++--
3 files changed, 56 insertions(+), 19 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/EncoderHelpers.java
similarity index 56%
copy from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
copy to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/EncoderHelpers.java
index 4ee77fb..4c56922 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/EncoderHelpers.java
@@ -15,20 +15,16 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+package org.apache.beam.runners.spark.structuredstreaming.translation;
-import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
-import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
-import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.beam.sdk.values.KV;
-import org.apache.beam.sdk.values.PCollection;
+import org.apache.spark.sql.Encoder;
+import org.apache.spark.sql.Encoders;
-class GroupByKeyTranslatorBatch<K, InputT>
- implements TransformTranslator<
- PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>>> {
+/** {@link Encoders} utility class. */
+public class EncoderHelpers {
- @Override
- public void translateTransform(
- PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>> transform,
- TranslationContext context) {}
+ @SuppressWarnings("unchecked")
+ public static <T> Encoder<T> encoder() {
+ return Encoders.kryo((Class<T>) Object.class);
+ }
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index 3c29867..e66bc90 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -46,9 +46,9 @@ import org.apache.spark.sql.streaming.StreamingQueryException;
*/
public class TranslationContext {
- /** All the datasets of the DAG */
+ /** All the datasets of the DAG. */
private final Map<PValue, Dataset<?>> datasets;
- /** datasets that are not used as input to other datasets (leaves of the DAG) */
+ /** datasets that are not used as input to other datasets (leaves of the DAG). */
private final Set<Dataset<?>> leaves;
private final SparkPipelineOptions options;
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
index 4ee77fb..7f2d7fa 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
@@ -17,18 +17,59 @@
*/
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+import com.google.common.collect.Iterables;
+import com.google.common.collect.Lists;
+import java.util.List;
+import org.apache.beam.runners.spark.structuredstreaming.translation.EncoderHelpers;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
+import org.apache.spark.api.java.function.MapFunction;
+import org.apache.spark.api.java.function.MapGroupsFunction;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.KeyValueGroupedDataset;
-class GroupByKeyTranslatorBatch<K, InputT>
+class GroupByKeyTranslatorBatch<K, V>
implements TransformTranslator<
- PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>>> {
+ PTransform<PCollection<KV<K, V>>, PCollection<KV<K, Iterable<V>>>>> {
@Override
public void translateTransform(
- PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>> transform,
- TranslationContext context) {}
+ PTransform<PCollection<KV<K, V>>, PCollection<KV<K, Iterable<V>>>> transform,
+ TranslationContext context) {
+
+ Dataset<WindowedValue<KV<K, V>>> input = context.getDataset(context.getInput());
+
+ // group by key only.
+ KeyValueGroupedDataset<K, KV<K, V>> grouped =
+ input
+ .map(
+ (MapFunction<WindowedValue<KV<K, V>>, KV<K, V>>) WindowedValue::getValue,
+ EncoderHelpers.encoder())
+ .groupByKey((MapFunction<KV<K, V>, K>) KV::getKey, EncoderHelpers.<K>encoder());
+
+ Dataset<KV<K, Iterable<V>>> materialized =
+ grouped.mapGroups(
+ (MapGroupsFunction<K, KV<K, V>, KV<K, Iterable<V>>>)
+ (key, iterator) -> {
+ // TODO: can we use here just "Iterable<V> iterable = () -> iterator;" ?
+ List<V> values = Lists.newArrayList();
+ while (iterator.hasNext()) {
+ values.add(iterator.next().getValue());
+ }
+ return KV.of(key, Iterables.unmodifiableIterable(values));
+ },
+ EncoderHelpers.encoder());
+
+ Dataset<WindowedValue<KV<K, Iterable<V>>>> output =
+ materialized.map(
+ (MapFunction<KV<K, Iterable<V>>, WindowedValue<KV<K, Iterable<V>>>>)
+ WindowedValue::valueInGlobalWindow,
+ EncoderHelpers.encoder());
+
+ context.putDataset(context.getOutput(), output);
+ }
}
[beam] 40/50: Run pipeline in batch mode or in streaming mode
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 340991eb2d68e040637d88d13bf1e60bc5b2fc74
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Dec 27 16:37:42 2018 +0100
Run pipeline in batch mode or in streaming mode
---
.../spark/structuredstreaming/translation/TranslationContext.java | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index fb36b37..82aa80b 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -164,7 +164,12 @@ public class TranslationContext {
try {
// to start a pipeline we need a DatastreamWriter to start
for (Dataset<?> dataset : leaves) {
- dataset.writeStream().foreach(new NoOpForeachWriter<>()).start().awaitTermination();
+
+ if (options.isStreaming()) {
+ dataset.writeStream().foreach(new NoOpForeachWriter<>()).start().awaitTermination();
+ } else {
+ dataset.write();
+ }
}
} catch (StreamingQueryException e) {
throw new RuntimeException("Pipeline execution failed: " + e);
[beam] 43/50: Cleaning
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 758c1ce371e9dc41018fa5c1668cb24cc1751c99
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Fri Dec 28 10:24:11 2018 +0100
Cleaning
---
.../translation/batch/DatasetSourceBatch.java | 3 +-
.../streaming/DatasetStreamingSource.java | 172 +--------------------
2 files changed, 2 insertions(+), 173 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
index 1ad16eb..f4cd885 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
@@ -41,8 +41,7 @@ import org.apache.spark.sql.types.StructType;
/**
* This is a spark structured streaming {@link DataSourceV2} implementation. As Continuous streaming
- * is tagged experimental in spark, this class does no implement {@link ContinuousReadSupport}. This
- * class is just a mix-in.
+ * is tagged experimental in spark, this class does no implement {@link ContinuousReadSupport}.
*/
public class DatasetSourceBatch<T> implements DataSourceV2, ReadSupport {
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetStreamingSource.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetStreamingSource.java
index 8701a83..6947b6d 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetStreamingSource.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetStreamingSource.java
@@ -53,8 +53,7 @@ import scala.collection.immutable.Map;
/**
* This is a spark structured streaming {@link DataSourceV2} implementation. As Continuous streaming
- * is tagged experimental in spark, this class does no implement {@link ContinuousReadSupport}. This
- * class is just a mix-in.
+ * is tagged experimental in spark, this class does no implement {@link ContinuousReadSupport}.
*/
public class DatasetStreamingSource<T> implements DataSourceV2, MicroBatchReadSupport{
@@ -196,173 +195,4 @@ public class DatasetStreamingSource<T> implements DataSourceV2, MicroBatchReadSu
reader.close();
}
}
-
- private static class DatasetCatalog<T> extends Catalog {
-
- TranslationContext context;
- Source<T> source;
-
- private DatasetCatalog(TranslationContext context, Source<T> source) {
- this.context = context;
- this.source = source;
- }
-
- @Override public String currentDatabase() {
- return null;
- }
-
- @Override public void setCurrentDatabase(String dbName) {
-
- }
-
- @Override public Dataset<Database> listDatabases() {
- return null;
- }
-
- @Override public Dataset<Table> listTables() {
- return null;
- }
-
- @Override public Dataset<Table> listTables(String dbName) throws AnalysisException {
- return null;
- }
-
- @Override public Dataset<Function> listFunctions() {
- return null;
- }
-
- @Override public Dataset<Function> listFunctions(String dbName) throws AnalysisException {
- return null;
- }
-
- @Override public Dataset<Column> listColumns(String tableName) throws AnalysisException {
- return null;
- }
-
- @Override public Dataset<Column> listColumns(String dbName, String tableName)
- throws AnalysisException {
- return null;
- }
-
- @Override public Database getDatabase(String dbName) throws AnalysisException {
- return null;
- }
-
- @Override public Table getTable(String tableName) throws AnalysisException {
- return new DatasetTable<>("beam", "beaam", "beam fake table to wire up with Beam sources",
- null, true, source, context);
- }
-
- @Override public Table getTable(String dbName, String tableName) throws AnalysisException {
- return null;
- }
-
- @Override public Function getFunction(String functionName) throws AnalysisException {
- return null;
- }
-
- @Override public Function getFunction(String dbName, String functionName)
- throws AnalysisException {
- return null;
- }
-
- @Override public boolean databaseExists(String dbName) {
- return false;
- }
-
- @Override public boolean tableExists(String tableName) {
- return false;
- }
-
- @Override public boolean tableExists(String dbName, String tableName) {
- return false;
- }
-
- @Override public boolean functionExists(String functionName) {
- return false;
- }
-
- @Override public boolean functionExists(String dbName, String functionName) {
- return false;
- }
-
- @Override public Dataset<Row> createTable(String tableName, String path) {
- return null;
- }
-
- @Override public Dataset<Row> createTable(String tableName, String path, String source) {
- return null;
- }
-
- @Override public Dataset<Row> createTable(String tableName, String source,
- Map<String, String> options) {
- return null;
- }
-
- @Override public Dataset<Row> createTable(String tableName, String source, StructType schema,
- Map<String, String> options) {
- return null;
- }
-
- @Override public boolean dropTempView(String viewName) {
- return false;
- }
-
- @Override public boolean dropGlobalTempView(String viewName) {
- return false;
- }
-
- @Override public void recoverPartitions(String tableName) {
-
- }
-
- @Override public boolean isCached(String tableName) {
- return false;
- }
-
- @Override public void cacheTable(String tableName) {
-
- }
-
- @Override public void cacheTable(String tableName, StorageLevel storageLevel) {
-
- }
-
- @Override public void uncacheTable(String tableName) {
-
- }
-
- @Override public void clearCache() {
-
- }
-
- @Override public void refreshTable(String tableName) {
-
- }
-
- @Override public void refreshByPath(String path) {
-
- }
-
- private static class DatasetTable<T> extends Table {
-
- private Source<T> source;
- private TranslationContext context;
-
- public DatasetTable(String name, String database, String description, String tableType,
- boolean isTemporary, Source<T> source, TranslationContext context) {
- super(name, database, description, tableType, isTemporary);
- this.source = source;
- this.context = context;
- }
-
- private Source<T> getSource() {
- return source;
- }
-
- private TranslationContext getContext() {
- return context;
- }
- }
- }
}
[beam] 16/50: apply spotless for e-formatting
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 0cfa70d0bc09fd2594c4dfd328b8a5745664f80a
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Nov 22 12:04:11 2018 +0100
apply spotless for e-formatting
---
.../structuredstreaming/SparkPipelineOptions.java | 1 -
.../structuredstreaming/SparkPipelineResult.java | 32 +++++++++++---
.../spark/structuredstreaming/SparkRunner.java | 41 +++++++++++++----
.../translation/PipelineTranslator.java | 46 ++++++++++++-------
.../translation/TransformTranslator.java | 21 +++++++--
.../translation/TranslationContext.java | 27 +++++++++---
.../batch/BatchCombinePerKeyTranslator.java | 29 +++++++++---
.../batch/BatchFlattenPCollectionTranslator.java | 28 +++++++++---
.../batch/BatchGroupByKeyTranslator.java | 29 +++++++++---
.../translation/batch/BatchParDoTranslator.java | 28 +++++++++---
.../translation/batch/BatchPipelineTranslator.java | 51 ++++++++++++++--------
.../batch/BatchReadSourceTranslator.java | 27 +++++++++---
.../batch/BatchReshuffleTranslator.java | 22 ++++++++--
.../translation/batch/BatchTranslationContext.java | 24 +++++++---
.../batch/BatchWindowAssignTranslator.java | 27 +++++++++---
.../streaming/StreamingPipelineTranslator.java | 32 +++++++++++---
.../streaming/StreamingTranslationContext.java | 21 +++++++--
17 files changed, 375 insertions(+), 111 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineOptions.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineOptions.java
index d381b5f..2e6653b 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineOptions.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineOptions.java
@@ -89,7 +89,6 @@ public interface SparkPipelineOptions
void setEnableSparkMetricSinks(Boolean enableSparkMetricSinks);
-
/**
* List of local files to make available to workers.
*
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineResult.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineResult.java
index 82d1b90..a8b3640 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineResult.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineResult.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming;
import java.io.IOException;
@@ -7,23 +24,28 @@ import org.joda.time.Duration;
public class SparkPipelineResult implements PipelineResult {
- @Override public State getState() {
+ @Override
+ public State getState() {
return null;
}
- @Override public State cancel() throws IOException {
+ @Override
+ public State cancel() throws IOException {
return null;
}
- @Override public State waitUntilFinish(Duration duration) {
+ @Override
+ public State waitUntilFinish(Duration duration) {
return null;
}
- @Override public State waitUntilFinish() {
+ @Override
+ public State waitUntilFinish() {
return null;
}
- @Override public MetricResults metrics() {
+ @Override
+ public MetricResults metrics() {
return null;
}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
index de20133..3a530f0 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
@@ -1,9 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming;
import static org.apache.beam.runners.core.construction.PipelineResources.detectClassPathResourcesToStage;
-import org.apache.beam.runners.spark.structuredstreaming.translation.batch.BatchPipelineTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.batch.BatchPipelineTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.streaming.StreamingPipelineTranslator;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.PipelineRunner;
@@ -65,12 +82,14 @@ public final class SparkRunner extends PipelineRunner<SparkPipelineResult> {
* @return A pipeline runner that will execute with specified options.
*/
public static SparkRunner fromOptions(PipelineOptions options) {
- SparkPipelineOptions sparkOptions = PipelineOptionsValidator
- .validate(SparkPipelineOptions.class, options);
+ SparkPipelineOptions sparkOptions =
+ PipelineOptionsValidator.validate(SparkPipelineOptions.class, options);
if (sparkOptions.getFilesToStage() == null) {
- sparkOptions.setFilesToStage(detectClassPathResourcesToStage(SparkRunner.class.getClassLoader()));
- LOG.info("PipelineOptions.filesToStage was not specified. "
+ sparkOptions.setFilesToStage(
+ detectClassPathResourcesToStage(SparkRunner.class.getClassLoader()));
+ LOG.info(
+ "PipelineOptions.filesToStage was not specified. "
+ "Defaulting to files from the classpath: will stage {} files. "
+ "Enable logging at DEBUG level to see which files will be staged.",
sparkOptions.getFilesToStage().size());
@@ -88,19 +107,23 @@ public final class SparkRunner extends PipelineRunner<SparkPipelineResult> {
this.options = options;
}
- @Override public SparkPipelineResult run(final Pipeline pipeline) {
+ @Override
+ public SparkPipelineResult run(final Pipeline pipeline) {
translatePipeline(pipeline);
executePipeline(pipeline);
return new SparkPipelineResult();
}
- private void translatePipeline(Pipeline pipeline){
+ private void translatePipeline(Pipeline pipeline) {
PipelineTranslator.detectTranslationMode(pipeline, options);
PipelineTranslator.replaceTransforms(pipeline, options);
PipelineTranslator.prepareFilesToStageForRemoteClusterExecution(options);
- PipelineTranslator pipelineTranslator = options.isStreaming() ? new StreamingPipelineTranslator(options) : new BatchPipelineTranslator(options);
+ PipelineTranslator pipelineTranslator =
+ options.isStreaming()
+ ? new StreamingPipelineTranslator(options)
+ : new BatchPipelineTranslator(options);
pipelineTranslator.translate(pipeline);
}
- private void executePipeline(Pipeline pipeline) {}
+ private void executePipeline(Pipeline pipeline) {}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
index c05fc92..bb40631 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation;
import org.apache.beam.runners.core.construction.PTransformTranslation;
@@ -14,17 +31,16 @@ import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
* {@link Pipeline.PipelineVisitor} that translates the Beam operators to their Spark counterparts.
- * It also does the pipeline preparation: mode detection, transforms replacement, classpath preparation.
- * If we have a streaming job, it is instantiated as a {@link StreamingPipelineTranslator}.
- * If we have a batch job, it is instantiated as a {@link BatchPipelineTranslator}.
+ * It also does the pipeline preparation: mode detection, transforms replacement, classpath
+ * preparation. If we have a streaming job, it is instantiated as a {@link
+ * StreamingPipelineTranslator}. If we have a batch job, it is instantiated as a {@link
+ * BatchPipelineTranslator}.
*/
-
-public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
+public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults {
private int depth = 0;
private static final Logger LOG = LoggerFactory.getLogger(PipelineTranslator.class);
protected TranslationContext translationContext;
-
// --------------------------------------------------------------------------------------------
// Pipeline preparation methods
// --------------------------------------------------------------------------------------------
@@ -41,13 +57,14 @@ public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaul
}
}
- public static void replaceTransforms(Pipeline pipeline, SparkPipelineOptions options){
+ public static void replaceTransforms(Pipeline pipeline, SparkPipelineOptions options) {
pipeline.replaceAll(SparkTransformOverrides.getDefaultOverrides(options.isStreaming()));
-
}
-
- /** Visit the pipeline to determine the translation mode (batch/streaming) and update options accordingly. */
+ /**
+ * Visit the pipeline to determine the translation mode (batch/streaming) and update options
+ * accordingly.
+ */
public static void detectTranslationMode(Pipeline pipeline, SparkPipelineOptions options) {
TranslationModeDetector detector = new TranslationModeDetector();
pipeline.traverseTopologically(detector);
@@ -117,17 +134,15 @@ public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaul
/**
* get a {@link TransformTranslator} for the given {@link TransformHierarchy.Node}
+ *
* @param node
* @return
*/
protected abstract TransformTranslator<?> getTransformTranslator(TransformHierarchy.Node node);
- /**
- * Apply the given TransformTranslator to the given node.
- */
+ /** Apply the given TransformTranslator to the given node. */
private <T extends PTransform<?, ?>> void applyTransformTranslator(
- TransformHierarchy.Node node,
- TransformTranslator<?> transformTranslator) {
+ TransformHierarchy.Node node, TransformTranslator<?> transformTranslator) {
// create the applied PTransform on the translationContext
translationContext.setCurrentTransform(node.toAppliedPTransform(getPipeline()));
@@ -141,7 +156,6 @@ public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaul
typedTransformTranslator.translateTransform(typedTransform, translationContext);
}
-
// --------------------------------------------------------------------------------------------
// Pipeline visitor entry point
// --------------------------------------------------------------------------------------------
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
index 54b0a85..fc55a9e 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation;
import org.apache.beam.sdk.transforms.PTransform;
@@ -5,7 +22,5 @@ import org.apache.beam.sdk.transforms.PTransform;
public interface TransformTranslator<TransformT extends PTransform> {
/** Base class for translators of {@link PTransform}. */
-
void translateTransform(TransformT transform, TranslationContext context);
- }
-
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index e651e70..8f61d0c 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation;
import java.util.HashMap;
@@ -11,8 +28,8 @@ import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SparkSession;
/**
- * Base class that gives a context for {@link PTransform} translation: keeping track of the datasets,
- * the {@link SparkSession}, the current transform being translated.
+ * Base class that gives a context for {@link PTransform} translation: keeping track of the
+ * datasets, the {@link SparkSession}, the current transform being translated.
*/
public class TranslationContext {
@@ -33,12 +50,8 @@ public class TranslationContext {
sparkConf.setJars(options.getFilesToStage().toArray(new String[0]));
}
- this.sparkSession = SparkSession
- .builder()
- .config(sparkConf)
- .getOrCreate();
+ this.sparkSession = SparkSession.builder().config(sparkConf).getOrCreate();
this.options = options;
this.datasets = new HashMap<>();
}
-
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java
index 858df18..4a10329 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
@@ -6,12 +23,12 @@ import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
-class BatchCombinePerKeyTranslator<K, InputT, AccumT, OutputT> implements
- TransformTranslator<PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, OutputT>>>> {
+class BatchCombinePerKeyTranslator<K, InputT, AccumT, OutputT>
+ implements TransformTranslator<
+ PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, OutputT>>>> {
- @Override public void translateTransform(
+ @Override
+ public void translateTransform(
PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, OutputT>>> transform,
- TranslationContext context) {
-
- }
+ TranslationContext context) {}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java
index 90c487a..d24f60c 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
@@ -6,11 +23,10 @@ import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionList;
-class BatchFlattenPCollectionTranslator<T> implements
- TransformTranslator<PTransform<PCollectionList<T>, PCollection<T>>> {
+class BatchFlattenPCollectionTranslator<T>
+ implements TransformTranslator<PTransform<PCollectionList<T>, PCollection<T>>> {
- @Override public void translateTransform(PTransform<PCollectionList<T>, PCollection<T>> transform,
- TranslationContext context) {
-
- }
+ @Override
+ public void translateTransform(
+ PTransform<PCollectionList<T>, PCollection<T>> transform, TranslationContext context) {}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java
index 52a3c39..829ba8a 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
@@ -6,12 +23,12 @@ import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
-class BatchGroupByKeyTranslator<K, InputT> implements
- TransformTranslator<PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>>> {
+class BatchGroupByKeyTranslator<K, InputT>
+ implements TransformTranslator<
+ PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>>> {
- @Override public void translateTransform(
+ @Override
+ public void translateTransform(
PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>> transform,
- TranslationContext context) {
-
- }
+ TranslationContext context) {}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java
index 6e7f342..56aa504 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
@@ -6,11 +23,10 @@ import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionTuple;
-class BatchParDoTranslator<InputT, OutputT> implements
- TransformTranslator<PTransform<PCollection<InputT>, PCollectionTuple>> {
+class BatchParDoTranslator<InputT, OutputT>
+ implements TransformTranslator<PTransform<PCollection<InputT>, PCollectionTuple>> {
- @Override public void translateTransform(PTransform<PCollection<InputT>, PCollectionTuple> transform,
- TranslationContext context) {
-
- }
+ @Override
+ public void translateTransform(
+ PTransform<PCollection<InputT>, PCollectionTuple> transform, TranslationContext context) {}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
index 38324c0..6648539 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import java.util.HashMap;
@@ -11,13 +28,13 @@ import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.runners.TransformHierarchy;
import org.apache.beam.sdk.transforms.PTransform;
-/** {@link PipelineTranslator} for executing a {@link Pipeline} in Spark in batch mode.
- * This contains only the components specific to batch: {@link BatchTranslationContext},
- * registry of batch {@link TransformTranslator} and registry lookup code. */
-
+/**
+ * {@link PipelineTranslator} for executing a {@link Pipeline} in Spark in batch mode. This contains
+ * only the components specific to batch: {@link BatchTranslationContext}, registry of batch {@link
+ * TransformTranslator} and registry lookup code.
+ */
public class BatchPipelineTranslator extends PipelineTranslator {
-
// --------------------------------------------------------------------------------------------
// Transform Translator Registry
// --------------------------------------------------------------------------------------------
@@ -26,21 +43,23 @@ public class BatchPipelineTranslator extends PipelineTranslator {
private static final Map<String, TransformTranslator> TRANSFORM_TRANSLATORS = new HashMap<>();
static {
- TRANSFORM_TRANSLATORS.put(PTransformTranslation.COMBINE_PER_KEY_TRANSFORM_URN,
- new BatchCombinePerKeyTranslator());
- TRANSFORM_TRANSLATORS
- .put(PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN, new BatchGroupByKeyTranslator());
+ TRANSFORM_TRANSLATORS.put(
+ PTransformTranslation.COMBINE_PER_KEY_TRANSFORM_URN, new BatchCombinePerKeyTranslator());
+ TRANSFORM_TRANSLATORS.put(
+ PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN, new BatchGroupByKeyTranslator());
TRANSFORM_TRANSLATORS.put(PTransformTranslation.RESHUFFLE_URN, new BatchReshuffleTranslator());
- TRANSFORM_TRANSLATORS
- .put(PTransformTranslation.FLATTEN_TRANSFORM_URN, new BatchFlattenPCollectionTranslator());
+ TRANSFORM_TRANSLATORS.put(
+ PTransformTranslation.FLATTEN_TRANSFORM_URN, new BatchFlattenPCollectionTranslator());
- TRANSFORM_TRANSLATORS
- .put(PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, new BatchWindowAssignTranslator());
+ TRANSFORM_TRANSLATORS.put(
+ PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, new BatchWindowAssignTranslator());
- TRANSFORM_TRANSLATORS.put(PTransformTranslation.PAR_DO_TRANSFORM_URN, new BatchParDoTranslator());
+ TRANSFORM_TRANSLATORS.put(
+ PTransformTranslation.PAR_DO_TRANSFORM_URN, new BatchParDoTranslator());
- TRANSFORM_TRANSLATORS.put(PTransformTranslation.READ_TRANSFORM_URN, new BatchReadSourceTranslator());
+ TRANSFORM_TRANSLATORS.put(
+ PTransformTranslation.READ_TRANSFORM_URN, new BatchReadSourceTranslator());
}
public BatchPipelineTranslator(SparkPipelineOptions options) {
@@ -58,6 +77,4 @@ public class BatchPipelineTranslator extends PipelineTranslator {
@Nullable String urn = PTransformTranslation.urnForTransformOrNull(transform);
return (urn == null) ? null : TRANSFORM_TRANSLATORS.get(urn);
}
-
-
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java
index 4236b1c..d9fcfbb 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
@@ -6,10 +23,10 @@ import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;
-class BatchReadSourceTranslator<T> implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
+class BatchReadSourceTranslator<T>
+ implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
- @Override public void translateTransform(PTransform<PBegin, PCollection<T>> transform,
- TranslationContext context) {
-
- }
+ @Override
+ public void translateTransform(
+ PTransform<PBegin, PCollection<T>> transform, TranslationContext context) {}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java
index 5baa331..1423308 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
@@ -6,7 +23,6 @@ import org.apache.beam.sdk.transforms.Reshuffle;
class BatchReshuffleTranslator<K, InputT> implements TransformTranslator<Reshuffle<K, InputT>> {
- @Override public void translateTransform(Reshuffle<K, InputT> transform, TranslationContext context) {
-
- }
+ @Override
+ public void translateTransform(Reshuffle<K, InputT> transform, TranslationContext context) {}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
index 02aad71..6f50895 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
@@ -1,18 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import java.util.HashMap;
import java.util.Map;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
-import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.values.PValue;
-import org.apache.spark.SparkConf;
import org.apache.spark.sql.Dataset;
-import org.apache.spark.sql.SparkSession;
-/**
- * This class contains only batch specific context components.
- */
+/** This class contains only batch specific context components. */
public class BatchTranslationContext extends TranslationContext {
/**
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java
index 1a8f68b..65a7cae 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
@@ -5,10 +22,10 @@ import org.apache.beam.runners.spark.structuredstreaming.translation.Translation
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.PCollection;
-class BatchWindowAssignTranslator<T> implements
- TransformTranslator<PTransform<PCollection<T>, PCollection<T>>> {
+class BatchWindowAssignTranslator<T>
+ implements TransformTranslator<PTransform<PCollection<T>, PCollection<T>>> {
- @Override public void translateTransform(PTransform<PCollection<T>, PCollection<T>> transform,
- TranslationContext context) {
- }
+ @Override
+ public void translateTransform(
+ PTransform<PCollection<T>, PCollection<T>> transform, TranslationContext context) {}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
index 9cbfbed..437aa25 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation.streaming;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
@@ -6,16 +23,17 @@ import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTr
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.runners.TransformHierarchy;
-/** {@link PipelineTranslator} for executing a {@link Pipeline} in Spark in streaming mode.
- * This contains only the components specific to streaming: {@link StreamingTranslationContext},
- * registry of batch {@link TransformTranslator} and registry lookup code. */
-
+/**
+ * {@link PipelineTranslator} for executing a {@link Pipeline} in Spark in streaming mode. This
+ * contains only the components specific to streaming: {@link StreamingTranslationContext}, registry
+ * of batch {@link TransformTranslator} and registry lookup code.
+ */
public class StreamingPipelineTranslator extends PipelineTranslator {
- public StreamingPipelineTranslator(SparkPipelineOptions options) {
- }
+ public StreamingPipelineTranslator(SparkPipelineOptions options) {}
- @Override protected TransformTranslator<?> getTransformTranslator(TransformHierarchy.Node node) {
+ @Override
+ protected TransformTranslator<?> getTransformTranslator(TransformHierarchy.Node node) {
return null;
}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java
index ebccfa7..f827cc4 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java
@@ -1,11 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
package org.apache.beam.runners.spark.structuredstreaming.translation.streaming;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
-/**
- * This class contains only streaming specific context components.
- */
+/** This class contains only streaming specific context components. */
public class StreamingTranslationContext extends TranslationContext {
public StreamingTranslationContext(SparkPipelineOptions options) {
[beam] 25/50: Add Flatten transformation translator
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 286d7f36480d79ad54f2e92f0b8af8c4ba716621
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Nov 29 16:02:11 2018 +0100
Add Flatten transformation translator
---
.../translation/TranslationContext.java | 4 +++
...latorBatch.java => FlattenTranslatorBatch.java} | 35 ++++++++++++++++++++--
.../translation/batch/PipelineTranslatorBatch.java | 2 +-
3 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index 98f77af..3c29867 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -83,6 +83,10 @@ public class TranslationContext {
// --------------------------------------------------------------------------------------------
// Datasets methods
// --------------------------------------------------------------------------------------------
+ @SuppressWarnings("unchecked")
+ public <T> Dataset<T> emptyDataset() {
+ return (Dataset<T>) sparkSession.emptyDataset(Encoders.bean(Void.class));
+ }
@SuppressWarnings("unchecked")
public <T> Dataset<WindowedValue<T>> getDataset(PValue value) {
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenPCollectionTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTranslatorBatch.java
similarity index 55%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenPCollectionTranslatorBatch.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTranslatorBatch.java
index 87a250e..2739e83 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenPCollectionTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTranslatorBatch.java
@@ -17,16 +17,47 @@
*/
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+import static com.google.common.base.Preconditions.checkArgument;
+
+import java.util.Map;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.PValue;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.spark.sql.Dataset;
-class FlattenPCollectionTranslatorBatch<T>
+class FlattenTranslatorBatch<T>
implements TransformTranslator<PTransform<PCollectionList<T>, PCollection<T>>> {
@Override
public void translateTransform(
- PTransform<PCollectionList<T>, PCollection<T>> transform, TranslationContext context) {}
+ PTransform<PCollectionList<T>, PCollection<T>> transform, TranslationContext context) {
+ Map<TupleTag<?>, PValue> inputs = context.getInputs();
+ Dataset<WindowedValue<T>> result = null;
+
+ if (inputs.isEmpty()) {
+ result = context.emptyDataset();
+ } else {
+ for (PValue pValue : inputs.values()) {
+ checkArgument(
+ pValue instanceof PCollection,
+ "Got non-PCollection input to flatten: %s of type %s",
+ pValue,
+ pValue.getClass().getSimpleName());
+ @SuppressWarnings("unchecked")
+ PCollection<T> pCollection = (PCollection<T>) pValue;
+ Dataset<WindowedValue<T>> current = context.getDataset(pCollection);
+ if (result == null) {
+ result = current;
+ } else {
+ result = result.union(current);
+ }
+ }
+ }
+ context.putDataset(context.getOutput(), result);
+ }
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
index 318d74c..26f1b9c 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
@@ -56,7 +56,7 @@ public class PipelineTranslatorBatch extends PipelineTranslator {
TRANSFORM_TRANSLATORS.put(PTransformTranslation.RESHUFFLE_URN, new ReshuffleTranslatorBatch());
TRANSFORM_TRANSLATORS.put(
- PTransformTranslation.FLATTEN_TRANSFORM_URN, new FlattenPCollectionTranslatorBatch());
+ PTransformTranslation.FLATTEN_TRANSFORM_URN, new FlattenTranslatorBatch());
TRANSFORM_TRANSLATORS.put(
PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, new WindowAssignTranslatorBatch());
[beam] 33/50: Improve type enforcement in ReadSourceTranslator
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit b7283d7810f5ac0fbbd6003dbacfd65d20458563
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Tue Dec 11 16:21:05 2018 +0100
Improve type enforcement in ReadSourceTranslator
---
.../translation/batch/ReadSourceTranslatorBatch.java | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
index a75730a..2c1aa93 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
@@ -26,6 +26,7 @@ import org.apache.beam.sdk.coders.SerializableCoder;
import org.apache.beam.sdk.io.BoundedSource;
import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.windowing.Window;
import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;
@@ -62,11 +63,11 @@ class ReadSourceTranslatorBatch<T>
// instantiates to be able to call DatasetSource.initialize()
MapFunction<Row, WindowedValue<T>> func = new MapFunction<Row, WindowedValue<T>>() {
@Override public WindowedValue<T> call(Row value) throws Exception {
- //TODO fix row content extraction: I guess cast is not enough
- return (WindowedValue<T>) value.get(0);
+ //there is only one value put in each Row by the InputPartitionReader
+ return value.<WindowedValue<T>>getAs(0);
}
};
- //TODO fix encoder
+ //TODO fix encoder: how to get an Encoder<WindowedValue<T>>
Dataset<WindowedValue<T>> dataset = rowDataset.map(func, null);
PCollection<T> output = (PCollection<T>) context.getOutput();
[beam] 23/50: Create PCollections manipulation methods
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 7a645e1bd44f95cae5108c63d5f49f555c91f7d6
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Nov 29 11:48:20 2018 +0100
Create PCollections manipulation methods
---
.../translation/TranslationContext.java | 56 +++++++++++++++++++++-
1 file changed, 55 insertions(+), 1 deletion(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index 71ae276..a3276bf 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -17,17 +17,25 @@
*/
package org.apache.beam.runners.spark.structuredstreaming.translation;
+import com.google.common.collect.Iterables;
import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
import java.util.HashMap;
-import java.util.LinkedHashSet;
+import java.util.HashSet;
import java.util.Map;
import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.runners.core.construction.TransformInputs;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
+import org.apache.beam.sdk.coders.Coder;
import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PValue;
+import org.apache.beam.sdk.values.TupleTag;
import org.apache.spark.SparkConf;
import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.ForeachWriter;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.streaming.StreamingQueryException;
@@ -40,6 +48,7 @@ public class TranslationContext {
private final Map<PValue, Dataset<?>> datasets;
private final Set<Dataset<?>> leaves;
+
private final SparkPipelineOptions options;
@SuppressFBWarnings("URF_UNREAD_FIELD") // make findbug happy
@@ -62,10 +71,55 @@ public class TranslationContext {
this.leaves = new LinkedHashSet<>();
}
+ // --------------------------------------------------------------------------------------------
+ // Transforms methods
+ // --------------------------------------------------------------------------------------------
public void setCurrentTransform(AppliedPTransform<?, ?, ?> currentTransform) {
this.currentTransform = currentTransform;
}
+ // --------------------------------------------------------------------------------------------
+ // Datasets methods
+ // --------------------------------------------------------------------------------------------
+
+
+ // --------------------------------------------------------------------------------------------
+ // PCollections methods
+ // --------------------------------------------------------------------------------------------
+ @SuppressWarnings("unchecked")
+ public PValue getInput() {
+ return Iterables.getOnlyElement(TransformInputs.nonAdditionalInputs(currentTransform));
+ }
+
+ @SuppressWarnings("unchecked")
+ public Map<TupleTag<?>, PValue> getInputs() {
+ return currentTransform.getInputs();
+ }
+
+ @SuppressWarnings("unchecked")
+ public PValue getOutput() {
+ return Iterables.getOnlyElement(currentTransform.getOutputs().values());
+ }
+
+ @SuppressWarnings("unchecked")
+ public Map<TupleTag<?>, PValue> getOutputs() {
+ return currentTransform.getOutputs();
+ }
+
+ @SuppressWarnings("unchecked")
+ public Map<TupleTag<?>, Coder<?>> getOutputCoders() {
+ return currentTransform
+ .getOutputs()
+ .entrySet()
+ .stream()
+ .filter(e -> e.getValue() instanceof PCollection)
+ .collect(Collectors.toMap(e -> e.getKey(), e -> ((PCollection) e.getValue()).getCoder()));
+ }
+
+ // --------------------------------------------------------------------------------------------
+ // Pipeline methods
+ // --------------------------------------------------------------------------------------------
+
public void startPipeline() {
try {
// to start a pipeline we need a DatastreamWriter to start
[beam] 09/50: Organise methods in PipelineTranslator
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 6695d6462020afd46857c7b50f981d4187c4a802
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Nov 21 11:31:43 2018 +0100
Organise methods in PipelineTranslator
---
.../spark/structuredstreaming/SparkRunner.java | 1 -
.../translation/PipelineTranslator.java | 64 +++++++++++++---------
2 files changed, 38 insertions(+), 27 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
index 3e3b112..ab2215b 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
@@ -99,7 +99,6 @@ public final class SparkRunner extends PipelineRunner<SparkPipelineResult> {
PipelineTranslator.replaceTransforms(pipeline, options);
PipelineTranslator.prepareFilesToStageForRemoteClusterExecution(options);
PipelineTranslator pipelineTranslator = options.isStreaming() ? new StreamingPipelineTranslator() : new BatchPipelineTranslator();
- //init pipelineTranslator with subclass based on mode and env
pipelineTranslator.translate(pipeline);
}
private void executePipeline(Pipeline pipeline) {}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
index db5c354..8eb1fb6 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
@@ -21,6 +21,9 @@ import org.slf4j.LoggerFactory;
public class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
+ // --------------------------------------------------------------------------------------------
+ // Pipeline preparation methods
+ // --------------------------------------------------------------------------------------------
/**
* Local configurations work in the same JVM and have no problems with improperly formatted files
* on classpath (eg. directories with .class files or empty directories). Prepare files for
@@ -49,32 +52,6 @@ public class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
}
}
- /**
- * Utility formatting method.
- *
- * @param n number of spaces to generate
- * @return String with "|" followed by n spaces
- */
- protected static String genSpaces(int n) {
- StringBuilder builder = new StringBuilder();
- for (int i = 0; i < n; i++) {
- builder.append("| ");
- }
- return builder.toString();
- }
-
- /**
- * Translates the pipeline by passing this class as a visitor.
- *
- * @param pipeline The pipeline to be translated
- */
- public void translate(Pipeline pipeline) {
- pipeline.traverseTopologically(this);
- }
-
-
-
-
/** The translation mode of the Beam Pipeline. */
private enum TranslationMode {
@@ -116,4 +93,39 @@ public class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
}
}
+ // --------------------------------------------------------------------------------------------
+ // Pipeline utility methods
+ // --------------------------------------------------------------------------------------------
+
+ /**
+ * Utility formatting method.
+ *
+ * @param n number of spaces to generate
+ * @return String with "|" followed by n spaces
+ */
+ protected static String genSpaces(int n) {
+ StringBuilder builder = new StringBuilder();
+ for (int i = 0; i < n; i++) {
+ builder.append("| ");
+ }
+ return builder.toString();
+ }
+
+ // --------------------------------------------------------------------------------------------
+ // Pipeline visitor methods
+ // --------------------------------------------------------------------------------------------
+
+ /**
+ * Translates the pipeline by passing this class as a visitor.
+ *
+ * @param pipeline The pipeline to be translated
+ */
+ public void translate(Pipeline pipeline) {
+ pipeline.traverseTopologically(this);
+ }
+
+
+
+
+
}
[beam] 10/50: Initialise BatchTranslationContext
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit ec9d63462703ebe9fdac170c8ffa1aeb1d14972c
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Nov 21 12:13:21 2018 +0100
Initialise BatchTranslationContext
---
.../runners/spark/structuredstreaming/SparkRunner.java | 2 +-
.../translation/batch/BatchPipelineTranslator.java | 7 ++++++-
.../translation/batch/BatchTranslationContext.java | 17 ++++++++++++++---
.../streaming/StreamingPipelineTranslator.java | 5 ++++-
4 files changed, 25 insertions(+), 6 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
index ab2215b..de20133 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
@@ -98,7 +98,7 @@ public final class SparkRunner extends PipelineRunner<SparkPipelineResult> {
PipelineTranslator.detectTranslationMode(pipeline, options);
PipelineTranslator.replaceTransforms(pipeline, options);
PipelineTranslator.prepareFilesToStageForRemoteClusterExecution(options);
- PipelineTranslator pipelineTranslator = options.isStreaming() ? new StreamingPipelineTranslator() : new BatchPipelineTranslator();
+ PipelineTranslator pipelineTranslator = options.isStreaming() ? new StreamingPipelineTranslator(options) : new BatchPipelineTranslator(options);
pipelineTranslator.translate(pipeline);
}
private void executePipeline(Pipeline pipeline) {}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
index 2459372..1bf660f 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
@@ -4,10 +4,13 @@ import java.util.HashMap;
import java.util.Map;
import javax.annotation.Nullable;
import org.apache.beam.runners.core.construction.PTransformTranslation;
+import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.runners.TransformHierarchy;
import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.spark.SparkConf;
+import org.apache.spark.sql.SparkSession;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@@ -45,7 +48,9 @@ public class BatchPipelineTranslator extends PipelineTranslator {
}
private static final Logger LOG = LoggerFactory.getLogger(BatchPipelineTranslator.class);
-
+ public BatchPipelineTranslator(SparkPipelineOptions options) {
+ translationContext = new BatchTranslationContext(options);
+ }
/** Returns a translator for the given node, if it is possible, otherwise null. */
private static BatchTransformTranslator<?> getTransformTranslator(TransformHierarchy.Node node) {
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
index 1d991f1..b53aa19 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
@@ -5,6 +5,7 @@ import java.util.Map;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.values.PValue;
+import org.apache.spark.SparkConf;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SparkSession;
@@ -20,14 +21,24 @@ public class BatchTranslationContext {
*/
private final Map<PValue, Dataset<?>> danglingDataSets;
- private final SparkSession sparkSession;
+ private SparkSession sparkSession;
private final SparkPipelineOptions options;
private AppliedPTransform<?, ?, ?> currentTransform;
- public BatchTranslationContext(SparkSession sparkSession, SparkPipelineOptions options) {
- this.sparkSession = sparkSession;
+ public BatchTranslationContext(SparkPipelineOptions options) {
+ SparkConf sparkConf = new SparkConf();
+ sparkConf.setMaster(options.getSparkMaster());
+ sparkConf.setAppName(options.getAppName());
+ if (options.getFilesToStage() != null && !options.getFilesToStage().isEmpty()) {
+ sparkConf.setJars(options.getFilesToStage().toArray(new String[0]));
+ }
+
+ SparkSession sparkSession = SparkSession
+ .builder()
+ .config(sparkConf)
+ .getOrCreate();
this.options = options;
this.datasets = new HashMap<>();
this.danglingDataSets = new HashMap<>();
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
index 547083c..7bed930 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
@@ -1,7 +1,10 @@
package org.apache.beam.runners.spark.structuredstreaming.translation.streaming;
+import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
public class StreamingPipelineTranslator extends PipelineTranslator {
-//TODO impl
+
+ public StreamingPipelineTranslator(SparkPipelineOptions options) {
+ }
}
[beam] 06/50: Add nodes translators structure
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 28a9422293fc7390286bea084d2c7c895d2b32b6
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Nov 21 09:36:49 2018 +0100
Add nodes translators structure
---
.../translation/BatchPipelineTranslator.java | 20 -------
.../translation/batch/BatchPipelineTranslator.java | 66 ++++++++++++++++++++++
.../batch/BatchTransformTranslator.java | 11 ++++
.../translation/batch/BatchTranslationContext.java | 36 ++++++++++++
.../batch/CombinePerKeyTranslatorBatch.java | 14 +++++
.../batch/FlattenPCollectionTranslatorBatch.java | 13 +++++
.../batch/GroupByKeyTranslatorBatch.java | 14 +++++
.../translation/batch/ParDoTranslatorBatch.java | 13 +++++
.../batch/ReadSourceTranslatorBatch.java | 12 ++++
.../batch/ReshuffleTranslatorBatch.java | 11 ++++
.../batch/WindowAssignTranslatorBatch.java | 12 ++++
.../StreamingPipelineTranslator.java | 6 +-
12 files changed, 206 insertions(+), 22 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/BatchPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/BatchPipelineTranslator.java
deleted file mode 100644
index e66555c..0000000
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/BatchPipelineTranslator.java
+++ /dev/null
@@ -1,20 +0,0 @@
-package org.apache.beam.runners.spark.structuredstreaming.translation;
-
-import org.apache.beam.sdk.Pipeline;
-import org.apache.beam.sdk.runners.TransformHierarchy;
-import org.apache.beam.sdk.values.PValue;
-
-public class BatchPipelineTranslator extends PipelineTranslator {
-
-
- @Override public CompositeBehavior enterCompositeTransform(TransformHierarchy.Node node) {
- return super.enterCompositeTransform(node);
- }
-
-
- @Override public void visitPrimitiveTransform(TransformHierarchy.Node node) {
- super.visitPrimitiveTransform(node);
- }
-
-
-}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
new file mode 100644
index 0000000..2f7ac23
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
@@ -0,0 +1,66 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import java.util.HashMap;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.runners.core.construction.PTransformTranslation;
+import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.runners.TransformHierarchy;
+import org.apache.beam.sdk.transforms.PTransform;
+
+/** {@link Pipeline.PipelineVisitor} for executing a {@link Pipeline} as a Spark batch job. */
+
+public class BatchPipelineTranslator extends PipelineTranslator {
+
+
+ // --------------------------------------------------------------------------------------------
+ // Transform Translator Registry
+ // --------------------------------------------------------------------------------------------
+
+ @SuppressWarnings("rawtypes")
+ private static final Map<String, BatchTransformTranslator>
+ TRANSLATORS = new HashMap<>();
+
+ static {
+ TRANSLATORS.put(PTransformTranslation.COMBINE_PER_KEY_TRANSFORM_URN,
+ new CombinePerKeyTranslatorBatch());
+ TRANSLATORS
+ .put(PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN, new GroupByKeyTranslatorBatch());
+ TRANSLATORS.put(PTransformTranslation.RESHUFFLE_URN, new ReshuffleTranslatorBatch());
+
+ TRANSLATORS
+ .put(PTransformTranslation.FLATTEN_TRANSFORM_URN, new FlattenPCollectionTranslatorBatch());
+
+ TRANSLATORS
+ .put(PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, new WindowAssignTranslatorBatch());
+
+ TRANSLATORS.put(PTransformTranslation.PAR_DO_TRANSFORM_URN, new ParDoTranslatorBatch());
+
+ TRANSLATORS.put(PTransformTranslation.READ_TRANSFORM_URN, new ReadSourceTranslatorBatch());
+ }
+
+ /** Returns a translator for the given node, if it is possible, otherwise null. */
+ private static BatchTransformTranslator<?> getTranslator(TransformHierarchy.Node node) {
+ @Nullable PTransform<?, ?> transform = node.getTransform();
+ // Root of the graph is null
+ if (transform == null) {
+ return null;
+ }
+ @Nullable String urn = PTransformTranslation.urnForTransformOrNull(transform);
+ return (urn == null) ? null : TRANSLATORS.get(urn);
+ }
+
+
+ @Override public CompositeBehavior enterCompositeTransform(TransformHierarchy.Node node) {
+ return super.enterCompositeTransform(node);
+ //TODO impl
+ }
+
+
+ @Override public void visitPrimitiveTransform(TransformHierarchy.Node node) {
+ super.visitPrimitiveTransform(node);
+ //TODO impl
+ }
+
+ }
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTransformTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTransformTranslator.java
new file mode 100644
index 0000000..ab0cf68
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTransformTranslator.java
@@ -0,0 +1,11 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.sdk.transforms.PTransform;
+
+public interface BatchTransformTranslator<TransformT extends PTransform> {
+
+ /** A translator of a {@link PTransform} in batch mode. */
+
+ void translateNode(TransformT transform, BatchTranslationContext context);
+ }
+
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
new file mode 100644
index 0000000..554beea
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
@@ -0,0 +1,36 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
+import org.apache.beam.sdk.runners.AppliedPTransform;
+import org.apache.beam.sdk.values.PValue;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.SparkSession;
+
+/**
+ * Keeps track of the {@link Dataset} and the step the translation is in.
+ */
+public class BatchTranslationContext {
+ private final Map<PValue, Dataset<?>> datasets;
+
+ /**
+ * For keeping track about which DataSets don't have a successor. We need to terminate these with
+ * a discarding sink because the Beam model allows dangling operations.
+ */
+ private final Map<PValue, Dataset<?>> danglingDataSets;
+
+ private final SparkSession sparkSession;
+ private final SparkPipelineOptions options;
+
+ private AppliedPTransform<?, ?, ?> currentTransform;
+
+
+ public BatchTranslationContext(SparkSession sparkSession, SparkPipelineOptions options) {
+ this.sparkSession = sparkSession;
+ this.options = options;
+ this.datasets = new HashMap<>();
+ this.danglingDataSets = new HashMap<>();
+ }
+
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java
new file mode 100644
index 0000000..6099fbc
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java
@@ -0,0 +1,14 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+
+class CombinePerKeyTranslatorBatch<K, InputT, AccumT, OutputT> implements BatchTransformTranslator<PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, OutputT>>>> {
+
+ @Override public void translateNode(
+ PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, OutputT>>> transform,
+ BatchTranslationContext context) {
+
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenPCollectionTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenPCollectionTranslatorBatch.java
new file mode 100644
index 0000000..281eda9
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenPCollectionTranslatorBatch.java
@@ -0,0 +1,13 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+
+class FlattenPCollectionTranslatorBatch<T> implements BatchTransformTranslator<PTransform<PCollectionList<T>, PCollection<T>>> {
+
+ @Override public void translateNode(PTransform<PCollectionList<T>, PCollection<T>> transform,
+ BatchTranslationContext context) {
+
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
new file mode 100644
index 0000000..bb0ccc1
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
@@ -0,0 +1,14 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+
+class GroupByKeyTranslatorBatch<K, InputT> implements BatchTransformTranslator<PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>>> {
+
+ @Override public void translateNode(
+ PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>> transform,
+ BatchTranslationContext context) {
+
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java
new file mode 100644
index 0000000..4477853
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java
@@ -0,0 +1,13 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionTuple;
+
+class ParDoTranslatorBatch<InputT, OutputT> implements BatchTransformTranslator<PTransform<PCollection<InputT>, PCollectionTuple>> {
+
+ @Override public void translateNode(PTransform<PCollection<InputT>, PCollectionTuple> transform,
+ BatchTranslationContext context) {
+
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
new file mode 100644
index 0000000..a30fa70
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
@@ -0,0 +1,12 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+
+class ReadSourceTranslatorBatch<T> implements BatchTransformTranslator<PTransform<PBegin, PCollection<T>>> {
+
+ @Override public void translateNode(PTransform<PBegin, PCollection<T>> transform, BatchTranslationContext context) {
+
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java
new file mode 100644
index 0000000..6283fdb
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java
@@ -0,0 +1,11 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.sdk.transforms.Reshuffle;
+
+class ReshuffleTranslatorBatch<K, InputT> implements BatchTransformTranslator<Reshuffle<K, InputT>> {
+
+ @Override public void translateNode(Reshuffle<K, InputT> transform,
+ BatchTranslationContext context) {
+
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java
new file mode 100644
index 0000000..21b71b9
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java
@@ -0,0 +1,12 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PCollection;
+
+class WindowAssignTranslatorBatch<T> implements BatchTransformTranslator<PTransform<PCollection<T>, PCollection<T>>> {
+
+ @Override public void translateNode(PTransform<PCollection<T>, PCollection<T>> transform,
+ BatchTranslationContext context) {
+
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/StreamingPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
similarity index 53%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/StreamingPipelineTranslator.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
index 2058b37..547083c 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/StreamingPipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
@@ -1,5 +1,7 @@
-package org.apache.beam.runners.spark.structuredstreaming.translation;
+package org.apache.beam.runners.spark.structuredstreaming.translation.streaming;
-public class StreamingPipelineTranslator extends PipelineTranslator {
+import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
+public class StreamingPipelineTranslator extends PipelineTranslator {
+//TODO impl
}
[beam] 28/50: Implement read transform
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 4f150da0d1b01cd8a2a0d52142e3f441d40004df
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Mon Dec 3 09:28:11 2018 +0100
Implement read transform
---
.../translation/TranslationContext.java | 19 +++
.../batch/ReadSourceTranslatorBatch.java | 32 +++-
.../translation/io/DatasetSource.java | 163 +++++++++++++++++++++
3 files changed, 213 insertions(+), 1 deletion(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index e66bc90..52ed11f 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -33,11 +33,18 @@ import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PValue;
import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.sdk.values.WindowingStrategy;
import org.apache.spark.SparkConf;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.ForeachWriter;
+import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.execution.datasources.DataSource;
+import org.apache.spark.sql.execution.streaming.Source;
+import org.apache.spark.sql.sources.v2.DataSourceOptions;
+import org.apache.spark.sql.sources.v2.ReadSupport;
+import org.apache.spark.sql.sources.v2.reader.DataSourceReader;
import org.apache.spark.sql.streaming.StreamingQueryException;
/**
@@ -73,6 +80,14 @@ public class TranslationContext {
this.leaves = new HashSet<>();
}
+ public SparkSession getSparkSession() {
+ return sparkSession;
+ }
+
+ public SparkPipelineOptions getOptions() {
+ return options;
+ }
+
// --------------------------------------------------------------------------------------------
// Transforms methods
// --------------------------------------------------------------------------------------------
@@ -80,6 +95,10 @@ public class TranslationContext {
this.currentTransform = currentTransform;
}
+ public AppliedPTransform<?, ?, ?> getCurrentTransform() {
+ return currentTransform;
+ }
+
// --------------------------------------------------------------------------------------------
// Datasets methods
// --------------------------------------------------------------------------------------------
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
index d18eb2e..05dc374 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
@@ -17,16 +17,46 @@
*/
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+import java.io.IOException;
+import org.apache.beam.runners.core.construction.ReadTranslation;
+import org.apache.beam.runners.core.construction.SerializablePipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+import org.apache.beam.runners.spark.structuredstreaming.translation.io.DatasetSource;
+import org.apache.beam.sdk.io.BoundedSource;
+import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
class ReadSourceTranslatorBatch<T>
implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
+ @SuppressWarnings("unchecked")
@Override
public void translateTransform(
- PTransform<PBegin, PCollection<T>> transform, TranslationContext context) {}
+ PTransform<PBegin, PCollection<T>> transform, TranslationContext context) {
+ AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>> rootTransform =
+ (AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>>)
+ context.getCurrentTransform();
+ BoundedSource<T> source;
+ try {
+ source = ReadTranslation.boundedSourceFromTransform(rootTransform);
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ PCollection<T> output = (PCollection<T>) context.getOutput();
+
+ SparkSession sparkSession = context.getSparkSession();
+ DatasetSource datasetSource = new DatasetSource(context, source);
+ Dataset<Row> dataset = sparkSession.readStream().format("DatasetSource").load();
+
+ context.putDataset(output, dataset);
+ }
+
+
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
new file mode 100644
index 0000000..d9d283e
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
@@ -0,0 +1,163 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.io;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static scala.collection.JavaConversions.asScalaBuffer;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+import org.apache.beam.sdk.io.BoundedSource;
+import org.apache.beam.sdk.io.BoundedSource.BoundedReader;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.sources.v2.ContinuousReadSupport;
+import org.apache.spark.sql.sources.v2.DataSourceOptions;
+import org.apache.spark.sql.sources.v2.DataSourceV2;
+import org.apache.spark.sql.sources.v2.MicroBatchReadSupport;
+import org.apache.spark.sql.sources.v2.reader.InputPartition;
+import org.apache.spark.sql.sources.v2.reader.InputPartitionReader;
+import org.apache.spark.sql.sources.v2.reader.streaming.MicroBatchReader;
+import org.apache.spark.sql.sources.v2.reader.streaming.Offset;
+import org.apache.spark.sql.types.StructType;
+
+/**
+ * This is a spark structured streaming {@link DataSourceV2} implementation. As Continuous streaming
+ * is tagged experimental in spark, this class does no implement {@link ContinuousReadSupport}.
+ * This class is just a mix-in.
+ */
+public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport {
+
+ private final int numPartitions;
+ private final Long bundleSize;
+ private TranslationContext context;
+ private BoundedSource<T> source;
+
+ public DatasetSource(TranslationContext context, BoundedSource<T> source) {
+ this.context = context;
+ this.source = source;
+ this.numPartitions = context.getSparkSession().sparkContext().defaultParallelism();
+ checkArgument(this.numPartitions > 0, "Number of partitions must be greater than zero.");
+ this.bundleSize = context.getOptions().getBundleSize();
+
+ }
+
+ @Override public MicroBatchReader createMicroBatchReader(Optional<StructType> schema,
+ String checkpointLocation, DataSourceOptions options) {
+ return new DatasetMicroBatchReader(schema, checkpointLocation, options);
+ }
+
+ /**
+ * This class can be mapped to Beam {@link BoundedSource}.
+ */
+ private class DatasetMicroBatchReader implements MicroBatchReader {
+
+ private Optional<StructType> schema;
+ private String checkpointLocation;
+ private DataSourceOptions options;
+
+ private DatasetMicroBatchReader(Optional<StructType> schema, String checkpointLocation,
+ DataSourceOptions options) {
+ //TODO start reading from the source here, inc offset at each element read
+ }
+
+ @Override public void setOffsetRange(Optional<Offset> start, Optional<Offset> end) {
+ //TODO extension point for SDF
+ }
+
+ @Override public Offset getStartOffset() {
+ //TODO extension point for SDF
+ return null;
+ }
+
+ @Override public Offset getEndOffset() {
+ //TODO extension point for SDF
+ return null;
+ }
+
+ @Override public Offset deserializeOffset(String json) {
+ //TODO extension point for SDF
+ return null;
+ }
+
+ @Override public void commit(Offset end) {
+ //TODO no more to read after end Offset
+ }
+
+ @Override public void stop() {
+ }
+
+ @Override public StructType readSchema() {
+ return null;
+ }
+
+ @Override public List<InputPartition<InternalRow>> planInputPartitions() {
+ List<InputPartition<InternalRow>> result = new ArrayList<>();
+ long desiredSizeBytes;
+ SparkPipelineOptions options = context.getOptions();
+ try {
+ desiredSizeBytes = (bundleSize == null) ?
+ source.getEstimatedSizeBytes(options) / numPartitions :
+ bundleSize;
+ List<? extends BoundedSource<T>> sources = source.split(desiredSizeBytes, options);
+ for (BoundedSource<T> source : sources) {
+ result.add(new InputPartition<InternalRow>() {
+
+ @Override public InputPartitionReader<InternalRow> createPartitionReader() {
+ BoundedReader<T> reader = null;
+ try {
+ reader = source.createReader(options);
+ } catch (IOException e) {
+ }
+ return new DatasetMicroBatchPartitionReader(reader);
+ }
+ });
+ }
+ return result;
+
+ } catch (Exception e) {
+ e.printStackTrace();
+ }
+ return result;
+ }
+
+ }
+
+ /**
+ * This class can be mapped to Beam {@link BoundedReader}
+ */
+ private class DatasetMicroBatchPartitionReader implements InputPartitionReader<InternalRow> {
+
+ BoundedReader<T> reader;
+ private boolean started;
+ private boolean closed;
+
+ DatasetMicroBatchPartitionReader(BoundedReader<T> reader) {
+ this.reader = reader;
+ this.started = false;
+ this.closed = false;
+ }
+
+ @Override public boolean next() throws IOException {
+ if (!started) {
+ started = true;
+ return reader.start();
+ } else {
+ return !closed && reader.advance();
+ }
+ }
+
+ @Override public InternalRow get() {
+ List<Object> list = new ArrayList<>();
+ list.add(WindowedValue.timestampedValueInGlobalWindow(reader.getCurrent(), reader.getCurrentTimestamp()));
+ return InternalRow.apply(asScalaBuffer(list).toList());
+ }
+
+ @Override public void close() throws IOException {
+ closed = true;
+ reader.close();
+ }
+ }
+}
[beam] 42/50: Use raw Encoder also in regular
ReadSourceTranslatorBatch
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 4e0f7a0eeb56791f9d3f66873573eab946f5cbf5
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Fri Dec 28 10:16:01 2018 +0100
Use raw Encoder<WindowedValue> also in regular ReadSourceTranslatorBatch
---
.../translation/TranslationContext.java | 1 -
.../batch/ReadSourceTranslatorBatch.java | 22 ++++++++++------------
.../batch/ReadSourceTranslatorMockBatch.java | 2 ++
3 files changed, 12 insertions(+), 13 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index 82aa80b..acc49f4 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -115,7 +115,6 @@ public class TranslationContext {
}
}
- //TODO: remove. It is just for testing
public void putDatasetRaw(PValue value, Dataset<WindowedValue> dataset) {
if (!datasets.containsKey(value)) {
datasets.put(value, dataset);
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
index 370e3f4..d980a52 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
@@ -21,7 +21,6 @@ import java.io.IOException;
import org.apache.beam.runners.core.construction.ReadTranslation;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
-import org.apache.beam.runners.spark.structuredstreaming.translation.streaming.DatasetStreamingSource;
import org.apache.beam.sdk.io.BoundedSource;
import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.transforms.PTransform;
@@ -30,9 +29,9 @@ import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
-import org.apache.spark.sql.streaming.DataStreamReader;
class ReadSourceTranslatorBatch<T>
implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
@@ -47,7 +46,6 @@ class ReadSourceTranslatorBatch<T>
(AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>>)
context.getCurrentTransform();
- String providerClassName = SOURCE_PROVIDER_CLASS.substring(0, SOURCE_PROVIDER_CLASS.indexOf("$"));
BoundedSource<T> source;
try {
source = ReadTranslation.boundedSourceFromTransform(rootTransform);
@@ -56,20 +54,20 @@ class ReadSourceTranslatorBatch<T>
}
SparkSession sparkSession = context.getSparkSession();
- Dataset<Row> rowDataset = sparkSession.read().format(providerClassName).load();
+ Dataset<Row> rowDataset = sparkSession.read().format(SOURCE_PROVIDER_CLASS).load();
- //TODO initialize source : how, to get a reference to the DatasetStreamingSource instance that spark
- // instantiates to be able to call DatasetStreamingSource.initialize(). How to pass in a DatasetCatalog?
- MapFunction<Row, WindowedValue<T>> func = new MapFunction<Row, WindowedValue<T>>() {
- @Override public WindowedValue<T> call(Row value) throws Exception {
+ //TODO pass the source and the translation context serialized as string to the DatasetSource
+ MapFunction<Row, WindowedValue> func = new MapFunction<Row, WindowedValue>() {
+ @Override public WindowedValue call(Row value) throws Exception {
//there is only one value put in each Row by the InputPartitionReader
- return value.<WindowedValue<T>>getAs(0);
+ return value.<WindowedValue>getAs(0);
}
};
- //TODO fix encoder: how to get an Encoder<WindowedValue<T>>
- Dataset<WindowedValue<T>> dataset = rowDataset.map(func, null);
+ //TODO: is there a better way than using the raw WindowedValue? Can an Encoder<WindowedVAlue<T>>
+ // be created ?
+ Dataset<WindowedValue> dataset = rowDataset.map(func, Encoders.kryo(WindowedValue.class));
PCollection<T> output = (PCollection<T>) context.getOutput();
- context.putDataset(output, dataset);
+ context.putDatasetRaw(output, dataset);
}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
index 758ff1d..d7b9175 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
@@ -52,6 +52,8 @@ class ReadSourceTranslatorMockBatch<T>
return value.<WindowedValue>getAs(0);
}
};
+ //TODO: is there a better way than using the raw WindowedValue? Can an Encoder<WindowedVAlue<T>>
+ // be created ?
Dataset<WindowedValue> dataset = rowDataset.map(func, Encoders.kryo(WindowedValue.class));
PCollection<T> output = (PCollection<T>) context.getOutput();
[beam] 17/50: Make codestyle and firebug happy
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 91f9ef55503cc565c82605e7459e9baa4e6d8ff8
Author: Alexey Romanenko <ar...@gmail.com>
AuthorDate: Fri Nov 23 16:10:11 2018 +0100
Make codestyle and firebug happy
---
.../runners/spark/structuredstreaming/SparkPipelineResult.java | 7 +++++++
.../TransformTranslator.java => package-info.java} | 10 ++--------
.../structuredstreaming/translation/PipelineTranslator.java | 2 +-
.../structuredstreaming/translation/TransformTranslator.java | 1 +
.../structuredstreaming/translation/TranslationContext.java | 6 ++++++
.../{TransformTranslator.java => batch/package-info.java} | 10 ++--------
.../{TransformTranslator.java => package-info.java} | 10 ++--------
.../{TransformTranslator.java => streaming/package-info.java} | 10 ++--------
8 files changed, 23 insertions(+), 33 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineResult.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineResult.java
index a8b3640..c55526f 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineResult.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineResult.java
@@ -18,32 +18,39 @@
package org.apache.beam.runners.spark.structuredstreaming;
import java.io.IOException;
+import javax.annotation.Nullable;
import org.apache.beam.sdk.PipelineResult;
import org.apache.beam.sdk.metrics.MetricResults;
import org.joda.time.Duration;
+/** Represents a Spark pipeline execution result. */
public class SparkPipelineResult implements PipelineResult {
+ @Nullable // TODO: remove once method will be implemented
@Override
public State getState() {
return null;
}
+ @Nullable // TODO: remove once method will be implemented
@Override
public State cancel() throws IOException {
return null;
}
+ @Nullable // TODO: remove once method will be implemented
@Override
public State waitUntilFinish(Duration duration) {
return null;
}
+ @Nullable // TODO: remove once method will be implemented
@Override
public State waitUntilFinish() {
return null;
}
+ @Nullable // TODO: remove once method will be implemented
@Override
public MetricResults metrics() {
return null;
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/package-info.java
similarity index 70%
copy from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
copy to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/package-info.java
index fc55a9e..aefeb28 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/package-info.java
@@ -15,12 +15,6 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.runners.spark.structuredstreaming.translation;
-import org.apache.beam.sdk.transforms.PTransform;
-
-public interface TransformTranslator<TransformT extends PTransform> {
-
- /** Base class for translators of {@link PTransform}. */
- void translateTransform(TransformT transform, TranslationContext context);
-}
+/** Internal implementation of the Beam runner for Apache Spark. */
+package org.apache.beam.runners.spark.structuredstreaming;
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
index bb40631..c771915 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
@@ -133,7 +133,7 @@ public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaul
}
/**
- * get a {@link TransformTranslator} for the given {@link TransformHierarchy.Node}
+ * Get a {@link TransformTranslator} for the given {@link TransformHierarchy.Node}.
*
* @param node
* @return
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
index fc55a9e..f9558c9 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
@@ -19,6 +19,7 @@ package org.apache.beam.runners.spark.structuredstreaming.translation;
import org.apache.beam.sdk.transforms.PTransform;
+/** Supports translation between a Beam transform, and Spark's operations on Datasets. */
public interface TransformTranslator<TransformT extends PTransform> {
/** Base class for translators of {@link PTransform}. */
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index 8f61d0c..aa831ed 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -17,6 +17,7 @@
*/
package org.apache.beam.runners.spark.structuredstreaming.translation;
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
import java.util.HashMap;
import java.util.Map;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
@@ -33,9 +34,14 @@ import org.apache.spark.sql.SparkSession;
*/
public class TranslationContext {
+ @SuppressFBWarnings("URF_UNREAD_FIELD") // make findbug happy
private AppliedPTransform<?, ?, ?> currentTransform;
+
private final Map<PValue, Dataset<?>> datasets;
+
+ @SuppressFBWarnings("URF_UNREAD_FIELD") // make findbug happy
private SparkSession sparkSession;
+
private final SparkPipelineOptions options;
public void setCurrentTransform(AppliedPTransform<?, ?, ?> currentTransform) {
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/package-info.java
similarity index 76%
copy from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
copy to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/package-info.java
index fc55a9e..6d3ce5a 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/package-info.java
@@ -15,12 +15,6 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.runners.spark.structuredstreaming.translation;
-import org.apache.beam.sdk.transforms.PTransform;
-
-public interface TransformTranslator<TransformT extends PTransform> {
-
- /** Base class for translators of {@link PTransform}. */
- void translateTransform(TransformT transform, TranslationContext context);
-}
+/** Internal utilities to translate Beam pipelines to Spark batching. */
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/package-info.java
similarity index 77%
copy from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
copy to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/package-info.java
index fc55a9e..2754ac5 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/package-info.java
@@ -15,12 +15,6 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.runners.spark.structuredstreaming.translation;
-
-import org.apache.beam.sdk.transforms.PTransform;
-public interface TransformTranslator<TransformT extends PTransform> {
-
- /** Base class for translators of {@link PTransform}. */
- void translateTransform(TransformT transform, TranslationContext context);
-}
+/** Internal translators for running Beam pipelines on Spark. */
+package org.apache.beam.runners.spark.structuredstreaming.translation;
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/package-info.java
similarity index 76%
copy from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
copy to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/package-info.java
index fc55a9e..67f3613 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/package-info.java
@@ -15,12 +15,6 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.runners.spark.structuredstreaming.translation;
-import org.apache.beam.sdk.transforms.PTransform;
-
-public interface TransformTranslator<TransformT extends PTransform> {
-
- /** Base class for translators of {@link PTransform}. */
- void translateTransform(TransformT transform, TranslationContext context);
-}
+/** Internal utilities to translate Beam pipelines to Spark streaming. */
+package org.apache.beam.runners.spark.structuredstreaming.translation.streaming;
[beam] 41/50: Split batch and streaming sources and translators
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 1ca4192fadaa80d327655e80e0a8bf3eb22ea932
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Dec 27 17:20:21 2018 +0100
Split batch and streaming sources and translators
---
.../translation/batch/DatasetSourceBatch.java | 148 +++++++++++++++++++++
.../DatasetSourceMockBatch.java} | 4 +-
.../batch/ReadSourceTranslatorBatch.java | 20 +--
.../batch/ReadSourceTranslatorMockBatch.java | 5 +-
.../DatasetStreamingSource.java} | 4 +-
5 files changed, 158 insertions(+), 23 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
new file mode 100644
index 0000000..1ad16eb
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static scala.collection.JavaConversions.asScalaBuffer;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+import org.apache.beam.sdk.io.BoundedSource;
+import org.apache.beam.sdk.io.BoundedSource.BoundedReader;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.sources.v2.ContinuousReadSupport;
+import org.apache.spark.sql.sources.v2.DataSourceOptions;
+import org.apache.spark.sql.sources.v2.DataSourceV2;
+import org.apache.spark.sql.sources.v2.ReadSupport;
+import org.apache.spark.sql.sources.v2.reader.DataSourceReader;
+import org.apache.spark.sql.sources.v2.reader.InputPartition;
+import org.apache.spark.sql.sources.v2.reader.InputPartitionReader;
+import org.apache.spark.sql.types.StructType;
+
+/**
+ * This is a spark structured streaming {@link DataSourceV2} implementation. As Continuous streaming
+ * is tagged experimental in spark, this class does no implement {@link ContinuousReadSupport}. This
+ * class is just a mix-in.
+ */
+public class DatasetSourceBatch<T> implements DataSourceV2, ReadSupport {
+
+ private int numPartitions;
+ private Long bundleSize;
+ private TranslationContext context;
+ private BoundedSource<T> source;
+
+
+ @Override public DataSourceReader createReader(DataSourceOptions options) {
+ this.numPartitions = context.getSparkSession().sparkContext().defaultParallelism();
+ checkArgument(this.numPartitions > 0, "Number of partitions must be greater than zero.");
+ this.bundleSize = context.getOptions().getBundleSize();
+ return new DatasetReader(); }
+
+ /** This class can be mapped to Beam {@link BoundedSource}. */
+ private class DatasetReader implements DataSourceReader {
+
+ private Optional<StructType> schema;
+ private String checkpointLocation;
+ private DataSourceOptions options;
+
+ @Override
+ public StructType readSchema() {
+ return new StructType();
+ }
+
+ @Override
+ public List<InputPartition<InternalRow>> planInputPartitions() {
+ List<InputPartition<InternalRow>> result = new ArrayList<>();
+ long desiredSizeBytes;
+ SparkPipelineOptions options = context.getOptions();
+ try {
+ desiredSizeBytes =
+ (bundleSize == null)
+ ? source.getEstimatedSizeBytes(options) / numPartitions
+ : bundleSize;
+ List<? extends BoundedSource<T>> sources = source.split(desiredSizeBytes, options);
+ for (BoundedSource<T> source : sources) {
+ result.add(
+ new InputPartition<InternalRow>() {
+
+ @Override
+ public InputPartitionReader<InternalRow> createPartitionReader() {
+ BoundedReader<T> reader = null;
+ try {
+ reader = source.createReader(options);
+ } catch (IOException e) {
+ throw new RuntimeException(
+ "Error creating BoundedReader " + reader.getClass().getCanonicalName(), e);
+ }
+ return new DatasetPartitionReader(reader);
+ }
+ });
+ }
+ return result;
+
+ } catch (Exception e) {
+ throw new RuntimeException(
+ "Error in splitting BoundedSource " + source.getClass().getCanonicalName(), e);
+ }
+ }
+ }
+
+ /** This class can be mapped to Beam {@link BoundedReader} */
+ private class DatasetPartitionReader implements InputPartitionReader<InternalRow> {
+
+ BoundedReader<T> reader;
+ private boolean started;
+ private boolean closed;
+
+ DatasetPartitionReader(BoundedReader<T> reader) {
+ this.reader = reader;
+ this.started = false;
+ this.closed = false;
+ }
+
+ @Override
+ public boolean next() throws IOException {
+ if (!started) {
+ started = true;
+ return reader.start();
+ } else {
+ return !closed && reader.advance();
+ }
+ }
+
+ @Override
+ public InternalRow get() {
+ List<Object> list = new ArrayList<>();
+ list.add(
+ WindowedValue.timestampedValueInGlobalWindow(
+ reader.getCurrent(), reader.getCurrentTimestamp()));
+ return InternalRow.apply(asScalaBuffer(list).toList());
+ }
+
+ @Override
+ public void close() throws IOException {
+ closed = true;
+ reader.close();
+ }
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceMockBatch.java
similarity index 97%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceMockBatch.java
index f722377..b616a6f 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceMockBatch.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.runners.spark.structuredstreaming.translation.io;
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import static scala.collection.JavaConversions.asScalaBuffer;
@@ -37,7 +37,7 @@ import org.joda.time.Instant;
/**
* This is a mock source that gives values between 0 and 999.
*/
-public class DatasetSourceMock implements DataSourceV2, ReadSupport {
+public class DatasetSourceMockBatch implements DataSourceV2, ReadSupport {
@Override public DataSourceReader createReader(DataSourceOptions options) {
return new DatasetReader();
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
index aed016a..370e3f4 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
@@ -21,32 +21,23 @@ import java.io.IOException;
import org.apache.beam.runners.core.construction.ReadTranslation;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
-import org.apache.beam.runners.spark.structuredstreaming.translation.io.DatasetSource;
-import org.apache.beam.sdk.coders.SerializableCoder;
+import org.apache.beam.runners.spark.structuredstreaming.translation.streaming.DatasetStreamingSource;
import org.apache.beam.sdk.io.BoundedSource;
import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.beam.sdk.transforms.windowing.Window;
import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;
-import org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetCache;
import org.apache.spark.api.java.function.MapFunction;
-import org.apache.spark.scheduler.SparkListener;
-import org.apache.spark.scheduler.SparkListenerApplicationStart;
import org.apache.spark.sql.Dataset;
-import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
-import org.apache.spark.sql.catalog.Catalog;
-import org.apache.spark.sql.catalyst.catalog.CatalogTable;
-import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation;
import org.apache.spark.sql.streaming.DataStreamReader;
class ReadSourceTranslatorBatch<T>
implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
- private String SOURCE_PROVIDER_CLASS = DatasetSource.class.getCanonicalName();
+ private String SOURCE_PROVIDER_CLASS = DatasetSourceBatch.class.getCanonicalName();
@SuppressWarnings("unchecked")
@Override
@@ -64,12 +55,11 @@ class ReadSourceTranslatorBatch<T>
throw new RuntimeException(e);
}
SparkSession sparkSession = context.getSparkSession();
- DataStreamReader dataStreamReader = sparkSession.readStream().format(providerClassName);
- Dataset<Row> rowDataset = dataStreamReader.load();
+ Dataset<Row> rowDataset = sparkSession.read().format(providerClassName).load();
- //TODO initialize source : how, to get a reference to the DatasetSource instance that spark
- // instantiates to be able to call DatasetSource.initialize(). How to pass in a DatasetCatalog?
+ //TODO initialize source : how, to get a reference to the DatasetStreamingSource instance that spark
+ // instantiates to be able to call DatasetStreamingSource.initialize(). How to pass in a DatasetCatalog?
MapFunction<Row, WindowedValue<T>> func = new MapFunction<Row, WindowedValue<T>>() {
@Override public WindowedValue<T> call(Row value) throws Exception {
//there is only one value put in each Row by the InputPartitionReader
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
index 184d24c..758ff1d 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
@@ -19,7 +19,6 @@ package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
-import org.apache.beam.runners.spark.structuredstreaming.translation.io.DatasetSourceMock;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.PBegin;
@@ -29,8 +28,6 @@ import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
-import org.apache.spark.sql.streaming.DataStreamReader;
-
/**
* Mock translator that generates a source of 0 to 999 and prints it.
@@ -39,7 +36,7 @@ import org.apache.spark.sql.streaming.DataStreamReader;
class ReadSourceTranslatorMockBatch<T>
implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
- private String SOURCE_PROVIDER_CLASS = DatasetSourceMock.class.getCanonicalName();
+ private String SOURCE_PROVIDER_CLASS = DatasetSourceMockBatch.class.getCanonicalName();
@SuppressWarnings("unchecked")
@Override
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetStreamingSource.java
similarity index 99%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetStreamingSource.java
index deacdf4..8701a83 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetStreamingSource.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.runners.spark.structuredstreaming.translation.io;
+package org.apache.beam.runners.spark.structuredstreaming.translation.streaming;
import static com.google.common.base.Preconditions.checkArgument;
import static scala.collection.JavaConversions.asScalaBuffer;
@@ -56,7 +56,7 @@ import scala.collection.immutable.Map;
* is tagged experimental in spark, this class does no implement {@link ContinuousReadSupport}. This
* class is just a mix-in.
*/
-public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport{
+public class DatasetStreamingSource<T> implements DataSourceV2, MicroBatchReadSupport{
private int numPartitions;
private Long bundleSize;
[beam] 21/50: Added SparkRunnerRegistrar
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit bbf583ca0ed9dc7641566cad65a05c61b388c4cd
Author: Alexey Romanenko <ar...@gmail.com>
AuthorDate: Tue Nov 27 18:19:46 2018 +0100
Added SparkRunnerRegistrar
---
.../structuredstreaming/SparkRunnerRegistrar.java | 54 ++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunnerRegistrar.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunnerRegistrar.java
new file mode 100644
index 0000000..e1f930b
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunnerRegistrar.java
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark.structuredstreaming;
+
+import com.google.auto.service.AutoService;
+import com.google.common.collect.ImmutableList;
+import org.apache.beam.sdk.PipelineRunner;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsRegistrar;
+import org.apache.beam.sdk.runners.PipelineRunnerRegistrar;
+
+/**
+ * Contains the {@link PipelineRunnerRegistrar} and {@link PipelineOptionsRegistrar} for the {@link
+ * SparkRunner}.
+ *
+ * <p>{@link AutoService} will register Spark's implementations of the {@link PipelineRunner} and
+ * {@link PipelineOptions} as available pipeline runner services.
+ */
+public final class SparkRunnerRegistrar {
+ private SparkRunnerRegistrar() {}
+
+ /** Registers the {@link SparkRunner}. */
+ @AutoService(PipelineRunnerRegistrar.class)
+ public static class Runner implements PipelineRunnerRegistrar {
+ @Override
+ public Iterable<Class<? extends PipelineRunner<?>>> getPipelineRunners() {
+ return ImmutableList.of(SparkRunner.class);
+ }
+ }
+
+ /** Registers the {@link SparkPipelineOptions}. */
+ @AutoService(PipelineOptionsRegistrar.class)
+ public static class Options implements PipelineOptionsRegistrar {
+ @Override
+ public Iterable<Class<? extends PipelineOptions>> getPipelineOptions() {
+ return ImmutableList.of(SparkPipelineOptions.class);
+ }
+ }
+}
[beam] 46/50: Pass Beam Source and PipelineOptions to the spark
DataSource as serialized strings
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 1cea29df3702ef438d8eb5964450a7bafea3c7d5
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Jan 2 15:52:46 2019 +0100
Pass Beam Source and PipelineOptions to the spark DataSource as serialized strings
---
.../translation/batch/DatasetSourceBatch.java | 41 ++++++++++++++++------
.../batch/ReadSourceTranslatorBatch.java | 16 +++++++--
2 files changed, 45 insertions(+), 12 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
index f4cd885..331e397 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
@@ -24,8 +24,9 @@ import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
+import org.apache.beam.runners.core.construction.SerializablePipelineOptions;
+import org.apache.beam.runners.core.serialization.Base64Serializer;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
-import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
import org.apache.beam.sdk.io.BoundedSource;
import org.apache.beam.sdk.io.BoundedSource.BoundedReader;
import org.apache.beam.sdk.util.WindowedValue;
@@ -45,16 +46,38 @@ import org.apache.spark.sql.types.StructType;
*/
public class DatasetSourceBatch<T> implements DataSourceV2, ReadSupport {
+ static final String BEAM_SOURCE_OPTION = "beam-source";
+ static final String DEFAULT_PARALLELISM = "default-parallelism";
+ static final String PIPELINE_OPTIONS = "pipeline-options";
private int numPartitions;
private Long bundleSize;
- private TranslationContext context;
private BoundedSource<T> source;
+ private SparkPipelineOptions sparkPipelineOptions;
- @Override public DataSourceReader createReader(DataSourceOptions options) {
- this.numPartitions = context.getSparkSession().sparkContext().defaultParallelism();
+ @SuppressWarnings("unchecked")
+ @Override
+ public DataSourceReader createReader(DataSourceOptions options) {
+ if (!options.get(BEAM_SOURCE_OPTION).isPresent()){
+ throw new RuntimeException("Beam source was not set in DataSource options");
+ }
+ this.source = Base64Serializer
+ .deserializeUnchecked(options.get(BEAM_SOURCE_OPTION).get(), BoundedSource.class);
+
+ if (!options.get(DEFAULT_PARALLELISM).isPresent()){
+ throw new RuntimeException("Spark default parallelism was not set in DataSource options");
+ }
+ if (!options.get(BEAM_SOURCE_OPTION).isPresent()){
+ throw new RuntimeException("Beam source was not set in DataSource options");
+ }
+ this.numPartitions = Integer.valueOf(options.get(DEFAULT_PARALLELISM).get());
checkArgument(this.numPartitions > 0, "Number of partitions must be greater than zero.");
- this.bundleSize = context.getOptions().getBundleSize();
+ if (!options.get(PIPELINE_OPTIONS).isPresent()){
+ throw new RuntimeException("Beam pipelineOptions were not set in DataSource options");
+ }
+ this.sparkPipelineOptions = SerializablePipelineOptions
+ .deserializeFromJson(options.get(PIPELINE_OPTIONS).get()).as(SparkPipelineOptions.class);
+ this.bundleSize = sparkPipelineOptions.getBundleSize();
return new DatasetReader(); }
/** This class can be mapped to Beam {@link BoundedSource}. */
@@ -62,7 +85,6 @@ public class DatasetSourceBatch<T> implements DataSourceV2, ReadSupport {
private Optional<StructType> schema;
private String checkpointLocation;
- private DataSourceOptions options;
@Override
public StructType readSchema() {
@@ -73,13 +95,12 @@ public class DatasetSourceBatch<T> implements DataSourceV2, ReadSupport {
public List<InputPartition<InternalRow>> planInputPartitions() {
List<InputPartition<InternalRow>> result = new ArrayList<>();
long desiredSizeBytes;
- SparkPipelineOptions options = context.getOptions();
try {
desiredSizeBytes =
(bundleSize == null)
- ? source.getEstimatedSizeBytes(options) / numPartitions
+ ? source.getEstimatedSizeBytes(sparkPipelineOptions) / numPartitions
: bundleSize;
- List<? extends BoundedSource<T>> sources = source.split(desiredSizeBytes, options);
+ List<? extends BoundedSource<T>> sources = source.split(desiredSizeBytes, sparkPipelineOptions);
for (BoundedSource<T> source : sources) {
result.add(
new InputPartition<InternalRow>() {
@@ -88,7 +109,7 @@ public class DatasetSourceBatch<T> implements DataSourceV2, ReadSupport {
public InputPartitionReader<InternalRow> createPartitionReader() {
BoundedReader<T> reader = null;
try {
- reader = source.createReader(options);
+ reader = source.createReader(sparkPipelineOptions);
} catch (IOException e) {
throw new RuntimeException(
"Error creating BoundedReader " + reader.getClass().getCanonicalName(), e);
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
index d980a52..50f4915 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
@@ -18,7 +18,11 @@
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
import org.apache.beam.runners.core.construction.ReadTranslation;
+import org.apache.beam.runners.core.construction.SerializablePipelineOptions;
+import org.apache.beam.runners.core.serialization.Base64Serializer;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
import org.apache.beam.sdk.io.BoundedSource;
@@ -54,7 +58,15 @@ class ReadSourceTranslatorBatch<T>
}
SparkSession sparkSession = context.getSparkSession();
- Dataset<Row> rowDataset = sparkSession.read().format(SOURCE_PROVIDER_CLASS).load();
+ String serializedSource = Base64Serializer.serializeUnchecked(source);
+ Map<String, String> datasetSourceOptions = new HashMap<>();
+ datasetSourceOptions.put(DatasetSourceBatch.BEAM_SOURCE_OPTION, serializedSource);
+ datasetSourceOptions.put(DatasetSourceBatch.DEFAULT_PARALLELISM,
+ String.valueOf(context.getSparkSession().sparkContext().defaultParallelism()));
+ datasetSourceOptions.put(DatasetSourceBatch.PIPELINE_OPTIONS,
+ SerializablePipelineOptions.serializeToJson(context.getOptions()));
+ Dataset<Row> rowDataset = sparkSession.read().format(SOURCE_PROVIDER_CLASS).options(datasetSourceOptions)
+ .load();
//TODO pass the source and the translation context serialized as string to the DatasetSource
MapFunction<Row, WindowedValue> func = new MapFunction<Row, WindowedValue>() {
@@ -63,7 +75,7 @@ class ReadSourceTranslatorBatch<T>
return value.<WindowedValue>getAs(0);
}
};
- //TODO: is there a better way than using the raw WindowedValue? Can an Encoder<WindowedVAlue<T>>
+ //TODO: is there a better way than using the raw WindowedValue? Can an Encoder<WindowedValue<T>>
// be created ?
Dataset<WindowedValue> dataset = rowDataset.map(func, Encoders.kryo(WindowedValue.class));
[beam] 22/50: Add basic pipeline execution. Refactor
translatePipeline() to return the translationContext on which we can run
startPipeline()
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 866ef13df71f5042e0fbd8e33a13ac1e0308d487
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Nov 28 14:52:20 2018 +0100
Add basic pipeline execution.
Refactor translatePipeline() to return the translationContext on which we can run startPipeline()
---
.../spark/structuredstreaming/SparkRunner.java | 12 +++---
.../translation/PipelineTranslator.java | 4 ++
.../translation/TranslationContext.java | 50 ++++++++++++++++++----
3 files changed, 53 insertions(+), 13 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
index e3fd6b4..8e0cf25 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
@@ -20,6 +20,7 @@ package org.apache.beam.runners.spark.structuredstreaming;
import static org.apache.beam.runners.core.construction.PipelineResources.detectClassPathResourcesToStage;
import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
import org.apache.beam.runners.spark.structuredstreaming.translation.batch.PipelineTranslatorBatch;
import org.apache.beam.runners.spark.structuredstreaming.translation.streaming.StreamingPipelineTranslator;
import org.apache.beam.sdk.Pipeline;
@@ -53,6 +54,8 @@ public final class SparkRunner extends PipelineRunner<SparkPipelineResult> {
/** Options used in this pipeline runner. */
private final SparkPipelineOptions options;
+ private TranslationContext translationContext;
+
/**
* Creates and returns a new SparkRunner with default options. In particular, against a spark
* instance running in local mode.
@@ -109,13 +112,13 @@ public final class SparkRunner extends PipelineRunner<SparkPipelineResult> {
@Override
public SparkPipelineResult run(final Pipeline pipeline) {
- translatePipeline(pipeline);
+ translationContext = translatePipeline(pipeline);
//TODO initialise other services: checkpointing, metrics system, listeners, ...
- executePipeline(pipeline);
+ translationContext.startPipeline();
return new SparkPipelineResult();
}
- private void translatePipeline(Pipeline pipeline) {
+ private TranslationContext translatePipeline(Pipeline pipeline) {
PipelineTranslator.detectTranslationMode(pipeline, options);
PipelineTranslator.replaceTransforms(pipeline, options);
PipelineTranslator.prepareFilesToStageForRemoteClusterExecution(options);
@@ -124,7 +127,6 @@ public final class SparkRunner extends PipelineRunner<SparkPipelineResult> {
? new StreamingPipelineTranslator(options)
: new PipelineTranslatorBatch(options);
pipelineTranslator.translate(pipeline);
+ return pipelineTranslator.getTranslationContext();
}
-
- private void executePipeline(Pipeline pipeline) {}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
index d64b8b1..e0924e3 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
@@ -210,4 +210,8 @@ public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaul
}
applyTransformTranslator(node, transformTranslator);
}
+
+ public TranslationContext getTranslationContext() {
+ return translationContext;
+ }
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index aa831ed..71ae276 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -19,14 +19,18 @@ package org.apache.beam.runners.spark.structuredstreaming.translation;
import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
import java.util.HashMap;
+import java.util.LinkedHashSet;
import java.util.Map;
+import java.util.Set;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.PValue;
import org.apache.spark.SparkConf;
import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.ForeachWriter;
import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.streaming.StreamingQueryException;
/**
* Base class that gives a context for {@link PTransform} translation: keeping track of the
@@ -34,20 +38,16 @@ import org.apache.spark.sql.SparkSession;
*/
public class TranslationContext {
+ private final Map<PValue, Dataset<?>> datasets;
+ private final Set<Dataset<?>> leaves;
+ private final SparkPipelineOptions options;
+
@SuppressFBWarnings("URF_UNREAD_FIELD") // make findbug happy
private AppliedPTransform<?, ?, ?> currentTransform;
- private final Map<PValue, Dataset<?>> datasets;
-
@SuppressFBWarnings("URF_UNREAD_FIELD") // make findbug happy
private SparkSession sparkSession;
- private final SparkPipelineOptions options;
-
- public void setCurrentTransform(AppliedPTransform<?, ?, ?> currentTransform) {
- this.currentTransform = currentTransform;
- }
-
public TranslationContext(SparkPipelineOptions options) {
SparkConf sparkConf = new SparkConf();
sparkConf.setMaster(options.getSparkMaster());
@@ -59,5 +59,39 @@ public class TranslationContext {
this.sparkSession = SparkSession.builder().config(sparkConf).getOrCreate();
this.options = options;
this.datasets = new HashMap<>();
+ this.leaves = new LinkedHashSet<>();
+ }
+
+ public void setCurrentTransform(AppliedPTransform<?, ?, ?> currentTransform) {
+ this.currentTransform = currentTransform;
+ }
+
+ public void startPipeline() {
+ try {
+ // to start a pipeline we need a DatastreamWriter to start
+ for (Dataset<?> dataset : leaves) {
+ dataset.writeStream().foreach(new NoOpForeachWriter<>()).start().awaitTermination();
+ }
+ } catch (StreamingQueryException e) {
+ throw new RuntimeException("Pipeline execution failed: " + e);
+ }
+ }
+
+ private static class NoOpForeachWriter<T> extends ForeachWriter<T> {
+
+ @Override
+ public boolean open(long partitionId, long epochId) {
+ return false;
+ }
+
+ @Override
+ public void process(T value) {
+ // do nothing
+ }
+
+ @Override
+ public void close(Throwable errorOrNull) {
+ // do nothing
+ }
}
}
[beam] 07/50: Wire node translators with pipeline translator
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 3a743c21a6f6586d65c99625852fa9ebb2a61af5
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Nov 21 10:43:19 2018 +0100
Wire node translators with pipeline translator
---
.../translation/PipelineTranslator.java | 15 ++++-
.../translation/batch/BatchPipelineTranslator.java | 66 ++++++++++++++++++++--
.../translation/batch/BatchTranslationContext.java | 3 +
3 files changed, 77 insertions(+), 7 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
index 99621f6..db5c354 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
@@ -21,7 +21,6 @@ import org.slf4j.LoggerFactory;
public class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
-
/**
* Local configurations work in the same JVM and have no problems with improperly formatted files
* on classpath (eg. directories with .class files or empty directories). Prepare files for
@@ -51,6 +50,20 @@ public class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
}
/**
+ * Utility formatting method.
+ *
+ * @param n number of spaces to generate
+ * @return String with "|" followed by n spaces
+ */
+ protected static String genSpaces(int n) {
+ StringBuilder builder = new StringBuilder();
+ for (int i = 0; i < n; i++) {
+ builder.append("| ");
+ }
+ return builder.toString();
+ }
+
+ /**
* Translates the pipeline by passing this class as a visitor.
*
* @param pipeline The pipeline to be translated
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
index 2f7ac23..e20e4c0 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
@@ -8,6 +8,8 @@ import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTra
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.runners.TransformHierarchy;
import org.apache.beam.sdk.transforms.PTransform;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
/** {@link Pipeline.PipelineVisitor} for executing a {@link Pipeline} as a Spark batch job. */
@@ -18,6 +20,9 @@ public class BatchPipelineTranslator extends PipelineTranslator {
// Transform Translator Registry
// --------------------------------------------------------------------------------------------
+ private BatchTranslationContext translationContext;
+ private int depth = 0;
+
@SuppressWarnings("rawtypes")
private static final Map<String, BatchTransformTranslator>
TRANSLATORS = new HashMap<>();
@@ -39,6 +44,9 @@ public class BatchPipelineTranslator extends PipelineTranslator {
TRANSLATORS.put(PTransformTranslation.READ_TRANSFORM_URN, new ReadSourceTranslatorBatch());
}
+ private static final Logger LOG = LoggerFactory.getLogger(BatchPipelineTranslator.class);
+
+
/** Returns a translator for the given node, if it is possible, otherwise null. */
private static BatchTransformTranslator<?> getTranslator(TransformHierarchy.Node node) {
@@ -52,15 +60,61 @@ public class BatchPipelineTranslator extends PipelineTranslator {
}
- @Override public CompositeBehavior enterCompositeTransform(TransformHierarchy.Node node) {
- return super.enterCompositeTransform(node);
- //TODO impl
+ // --------------------------------------------------------------------------------------------
+ // Pipeline Visitor Methods
+ // --------------------------------------------------------------------------------------------
+
+ @Override
+ public CompositeBehavior enterCompositeTransform(TransformHierarchy.Node node) {
+ LOG.info("{} enterCompositeTransform- {}", genSpaces(depth), node.getFullName());
+ depth++;
+
+ BatchTransformTranslator<?> translator = getTranslator(node);
+
+ if (translator != null) {
+ translateNode(node, translator);
+ LOG.info("{} translated- {}", genSpaces(depth), node.getFullName());
+ return CompositeBehavior.DO_NOT_ENTER_TRANSFORM;
+ } else {
+ return CompositeBehavior.ENTER_TRANSFORM;
+ }
}
+ @Override
+ public void leaveCompositeTransform(TransformHierarchy.Node node) {
+ depth--;
+ LOG.info("{} leaveCompositeTransform- {}", genSpaces(depth), node.getFullName());
+ }
- @Override public void visitPrimitiveTransform(TransformHierarchy.Node node) {
- super.visitPrimitiveTransform(node);
- //TODO impl
+ @Override
+ public void visitPrimitiveTransform(TransformHierarchy.Node node) {
+ LOG.info("{} visitPrimitiveTransform- {}", genSpaces(depth), node.getFullName());
+
+ // get the transformation corresponding to the node we are
+ // currently visiting and translate it into its Spark alternative.
+ BatchTransformTranslator<?> translator = getTranslator(node);
+ if (translator == null) {
+ String transformUrn = PTransformTranslation.urnForTransform(node.getTransform());
+ throw new UnsupportedOperationException(
+ "The transform " + transformUrn + " is currently not supported.");
+ }
+ translateNode(node, translator);
}
+ private <T extends PTransform<?, ?>> void translateNode(
+ TransformHierarchy.Node node,
+ BatchTransformTranslator<?> translator) {
+
+ @SuppressWarnings("unchecked")
+ T typedTransform = (T) node.getTransform();
+
+ @SuppressWarnings("unchecked")
+ BatchTransformTranslator<T> typedTranslator = (BatchTransformTranslator<T>) translator;
+
+ // create the applied PTransform on the translationContext
+ translationContext.setCurrentTransform(node.toAppliedPTransform(getPipeline()));
+ typedTranslator.translateNode(typedTransform, translationContext);
}
+
+
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
index 554beea..1d991f1 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
@@ -33,4 +33,7 @@ public class BatchTranslationContext {
this.danglingDataSets = new HashMap<>();
}
+ public void setCurrentTransform(AppliedPTransform<?, ?, ?> currentTransform) {
+ this.currentTransform = currentTransform;
+ }
}
[beam] 11/50: Refactoring: -move batch/streaming common translation
visitor and utility methods to PipelineTranslator -rename batch dedicated
classes to Batch* to differentiate with their streaming counterparts
-Introduce TranslationContext for common batch/streaming components
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 476cae8ae424c4a271adff73ffb32b604d483361
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Nov 21 16:51:40 2018 +0100
Refactoring:
-move batch/streaming common translation visitor and utility methods to PipelineTranslator
-rename batch dedicated classes to Batch* to differentiate with their streaming counterparts
-Introduce TranslationContext for common batch/streaming components
---
.../translation/PipelineTranslator.java | 73 ++++++++++++++++++-
.../translation/TransformTranslator.java | 11 +++
.../translation/TranslationContext.java | 13 ++++
.../batch/BatchCombinePerKeyTranslator.java | 17 +++++
.../batch/BatchFlattenPCollectionTranslator.java | 16 ++++
.../batch/BatchGroupByKeyTranslator.java | 17 +++++
.../translation/batch/BatchParDoTranslator.java | 16 ++++
.../translation/batch/BatchPipelineTranslator.java | 85 +++-------------------
.../batch/BatchReadSourceTranslator.java | 15 ++++
.../batch/BatchReshuffleTranslator.java | 12 +++
.../batch/BatchTransformTranslator.java | 11 ---
.../translation/batch/BatchTranslationContext.java | 12 +--
.../batch/BatchWindowAssignTranslator.java | 14 ++++
.../batch/CombinePerKeyTranslatorBatch.java | 14 ----
.../batch/FlattenPCollectionTranslatorBatch.java | 13 ----
.../batch/GroupByKeyTranslatorBatch.java | 14 ----
.../translation/batch/ParDoTranslatorBatch.java | 13 ----
.../batch/ReadSourceTranslatorBatch.java | 12 ---
.../batch/ReshuffleTranslatorBatch.java | 11 ---
.../batch/WindowAssignTranslatorBatch.java | 12 ---
.../streaming/StreamingPipelineTranslator.java | 6 ++
.../streaming/StreamingTranslationContext.java | 7 ++
22 files changed, 227 insertions(+), 187 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
index 8eb1fb6..62e87f2 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
@@ -1,5 +1,6 @@
package org.apache.beam.runners.spark.structuredstreaming.translation;
+import org.apache.beam.runners.core.construction.PTransformTranslation;
import org.apache.beam.runners.core.construction.PipelineResources;
import org.apache.beam.runners.spark.SparkTransformOverrides;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
@@ -7,11 +8,11 @@ import org.apache.beam.runners.spark.structuredstreaming.translation.batch.Batch
import org.apache.beam.runners.spark.structuredstreaming.translation.streaming.StreamingPipelineTranslator;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.runners.TransformHierarchy;
+import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PValue;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
-
/**
/**
* The role of this class is to detect the pipeline mode and to translate the Beam operators to their Spark counterparts. If we have
@@ -19,7 +20,11 @@ import org.slf4j.LoggerFactory;
* case, i.e. for a batch job, a {@link BatchPipelineTranslator} is created. Correspondingly,
*/
-public class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
+public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
+ private int depth = 0;
+ private static final Logger LOG = LoggerFactory.getLogger(PipelineTranslator.class);
+ protected TranslationContext translationContext;
+
// --------------------------------------------------------------------------------------------
// Pipeline preparation methods
@@ -103,7 +108,7 @@ public class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
* @param n number of spaces to generate
* @return String with "|" followed by n spaces
*/
- protected static String genSpaces(int n) {
+ private static String genSpaces(int n) {
StringBuilder builder = new StringBuilder();
for (int i = 0; i < n; i++) {
builder.append("| ");
@@ -111,8 +116,31 @@ public class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
return builder.toString();
}
+ /**
+ * get a {@link TransformTranslator} for the given {@link TransformHierarchy.Node}
+ * @param node
+ * @return
+ */
+ protected abstract TransformTranslator<?> getTransformTranslator(TransformHierarchy.Node node);
+
+ private <T extends PTransform<?, ?>> void translateNode(
+ TransformHierarchy.Node node,
+ TransformTranslator<?> transformTranslator) {
+
+ @SuppressWarnings("unchecked")
+ T typedTransform = (T) node.getTransform();
+
+ @SuppressWarnings("unchecked")
+ TransformTranslator<T> typedTransformTranslator = (TransformTranslator<T>) transformTranslator;
+
+ // create the applied PTransform on the translationContext
+ translationContext.setCurrentTransform(node.toAppliedPTransform(getPipeline()));
+ typedTransformTranslator.translateNode(typedTransform, translationContext);
+ }
+
+
// --------------------------------------------------------------------------------------------
- // Pipeline visitor methods
+ // Pipeline visitor entry point
// --------------------------------------------------------------------------------------------
/**
@@ -121,11 +149,48 @@ public class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
* @param pipeline The pipeline to be translated
*/
public void translate(Pipeline pipeline) {
+ LOG.info("starting translation of the pipeline using {}", getClass().getName());
pipeline.traverseTopologically(this);
}
+ // --------------------------------------------------------------------------------------------
+ // Pipeline Visitor Methods
+ // --------------------------------------------------------------------------------------------
+ @Override
+ public CompositeBehavior enterCompositeTransform(TransformHierarchy.Node node) {
+ LOG.info("{} enterCompositeTransform- {}", genSpaces(depth), node.getFullName());
+ depth++;
+ TransformTranslator<?> transformTranslator = getTransformTranslator(node);
+ if (transformTranslator != null) {
+ translateNode(node, transformTranslator);
+ LOG.info("{} translated- {}", genSpaces(depth), node.getFullName());
+ return CompositeBehavior.DO_NOT_ENTER_TRANSFORM;
+ } else {
+ return CompositeBehavior.ENTER_TRANSFORM;
+ }
+ }
+ @Override
+ public void leaveCompositeTransform(TransformHierarchy.Node node) {
+ depth--;
+ LOG.info("{} leaveCompositeTransform- {}", genSpaces(depth), node.getFullName());
+ }
+
+ @Override
+ public void visitPrimitiveTransform(TransformHierarchy.Node node) {
+ LOG.info("{} visitPrimitiveTransform- {}", genSpaces(depth), node.getFullName());
+
+ // get the transformation corresponding to the node we are
+ // currently visiting and translate it into its Spark alternative.
+ TransformTranslator<?> transformTranslator = getTransformTranslator(node);
+ if (transformTranslator == null) {
+ String transformUrn = PTransformTranslation.urnForTransform(node.getTransform());
+ throw new UnsupportedOperationException(
+ "The transform " + transformUrn + " is currently not supported.");
+ }
+ translateNode(node, transformTranslator);
+ }
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
new file mode 100644
index 0000000..51cdd99
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
@@ -0,0 +1,11 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation;
+
+import org.apache.beam.sdk.transforms.PTransform;
+
+public interface TransformTranslator<TransformT extends PTransform> {
+
+ /** A translator of a {@link PTransform}. */
+
+ void translateNode(TransformT transform, TranslationContext context);
+ }
+
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
new file mode 100644
index 0000000..341ed49
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -0,0 +1,13 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation;
+
+import org.apache.beam.sdk.runners.AppliedPTransform;
+
+public class TranslationContext {
+
+ private AppliedPTransform<?, ?, ?> currentTransform;
+
+ public void setCurrentTransform(AppliedPTransform<?, ?, ?> currentTransform) {
+ this.currentTransform = currentTransform;
+ }
+
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java
new file mode 100644
index 0000000..c9cae47
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java
@@ -0,0 +1,17 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+
+class BatchCombinePerKeyTranslator<K, InputT, AccumT, OutputT> implements
+ TransformTranslator<PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, OutputT>>>> {
+
+ @Override public void translateNode(
+ PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, OutputT>>> transform,
+ TranslationContext context) {
+
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java
new file mode 100644
index 0000000..77f6fdb
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java
@@ -0,0 +1,16 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+
+class BatchFlattenPCollectionTranslator<T> implements
+ TransformTranslator<PTransform<PCollectionList<T>, PCollection<T>>> {
+
+ @Override public void translateNode(PTransform<PCollectionList<T>, PCollection<T>> transform,
+ TranslationContext context) {
+
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java
new file mode 100644
index 0000000..1bd42f5
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java
@@ -0,0 +1,17 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+
+class BatchGroupByKeyTranslator<K, InputT> implements
+ TransformTranslator<PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>>> {
+
+ @Override public void translateNode(
+ PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>> transform,
+ TranslationContext context) {
+
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java
new file mode 100644
index 0000000..cf8c896
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java
@@ -0,0 +1,16 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionTuple;
+
+class BatchParDoTranslator<InputT, OutputT> implements
+ TransformTranslator<PTransform<PCollection<InputT>, PCollectionTuple>> {
+
+ @Override public void translateNode(PTransform<PCollection<InputT>, PCollectionTuple> transform,
+ TranslationContext context) {
+
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
index 1bf660f..ff92d89 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
@@ -6,13 +6,10 @@ import javax.annotation.Nullable;
import org.apache.beam.runners.core.construction.PTransformTranslation;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.runners.TransformHierarchy;
import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.spark.SparkConf;
-import org.apache.spark.sql.SparkSession;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
/** {@link Pipeline.PipelineVisitor} for executing a {@link Pipeline} as a Spark batch job. */
@@ -23,37 +20,34 @@ public class BatchPipelineTranslator extends PipelineTranslator {
// Transform Translator Registry
// --------------------------------------------------------------------------------------------
- private BatchTranslationContext translationContext;
- private int depth = 0;
-
@SuppressWarnings("rawtypes")
- private static final Map<String, BatchTransformTranslator> TRANSFORM_TRANSLATORS = new HashMap<>();
+ private static final Map<String, TransformTranslator> TRANSFORM_TRANSLATORS = new HashMap<>();
static {
TRANSFORM_TRANSLATORS.put(PTransformTranslation.COMBINE_PER_KEY_TRANSFORM_URN,
- new CombinePerKeyTranslatorBatch());
+ new BatchCombinePerKeyTranslator());
TRANSFORM_TRANSLATORS
- .put(PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN, new GroupByKeyTranslatorBatch());
- TRANSFORM_TRANSLATORS.put(PTransformTranslation.RESHUFFLE_URN, new ReshuffleTranslatorBatch());
+ .put(PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN, new BatchGroupByKeyTranslator());
+ TRANSFORM_TRANSLATORS.put(PTransformTranslation.RESHUFFLE_URN, new BatchReshuffleTranslator());
TRANSFORM_TRANSLATORS
- .put(PTransformTranslation.FLATTEN_TRANSFORM_URN, new FlattenPCollectionTranslatorBatch());
+ .put(PTransformTranslation.FLATTEN_TRANSFORM_URN, new BatchFlattenPCollectionTranslator());
TRANSFORM_TRANSLATORS
- .put(PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, new WindowAssignTranslatorBatch());
+ .put(PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, new BatchWindowAssignTranslator());
- TRANSFORM_TRANSLATORS.put(PTransformTranslation.PAR_DO_TRANSFORM_URN, new ParDoTranslatorBatch());
+ TRANSFORM_TRANSLATORS.put(PTransformTranslation.PAR_DO_TRANSFORM_URN, new BatchParDoTranslator());
- TRANSFORM_TRANSLATORS.put(PTransformTranslation.READ_TRANSFORM_URN, new ReadSourceTranslatorBatch());
+ TRANSFORM_TRANSLATORS.put(PTransformTranslation.READ_TRANSFORM_URN, new BatchReadSourceTranslator());
}
- private static final Logger LOG = LoggerFactory.getLogger(BatchPipelineTranslator.class);
public BatchPipelineTranslator(SparkPipelineOptions options) {
translationContext = new BatchTranslationContext(options);
}
/** Returns a translator for the given node, if it is possible, otherwise null. */
- private static BatchTransformTranslator<?> getTransformTranslator(TransformHierarchy.Node node) {
+ @Override
+ protected TransformTranslator<?> getTransformTranslator(TransformHierarchy.Node node) {
@Nullable PTransform<?, ?> transform = node.getTransform();
// Root of the graph is null
if (transform == null) {
@@ -64,61 +58,4 @@ public class BatchPipelineTranslator extends PipelineTranslator {
}
- // --------------------------------------------------------------------------------------------
- // Pipeline Visitor Methods
- // --------------------------------------------------------------------------------------------
-
- @Override
- public CompositeBehavior enterCompositeTransform(TransformHierarchy.Node node) {
- LOG.info("{} enterCompositeTransform- {}", genSpaces(depth), node.getFullName());
- depth++;
-
- BatchTransformTranslator<?> transformTranslator = getTransformTranslator(node);
-
- if (transformTranslator != null) {
- translateNode(node, transformTranslator);
- LOG.info("{} translated- {}", genSpaces(depth), node.getFullName());
- return CompositeBehavior.DO_NOT_ENTER_TRANSFORM;
- } else {
- return CompositeBehavior.ENTER_TRANSFORM;
- }
- }
-
- @Override
- public void leaveCompositeTransform(TransformHierarchy.Node node) {
- depth--;
- LOG.info("{} leaveCompositeTransform- {}", genSpaces(depth), node.getFullName());
- }
-
- @Override
- public void visitPrimitiveTransform(TransformHierarchy.Node node) {
- LOG.info("{} visitPrimitiveTransform- {}", genSpaces(depth), node.getFullName());
-
- // get the transformation corresponding to the node we are
- // currently visiting and translate it into its Spark alternative.
- BatchTransformTranslator<?> transformTranslator = getTransformTranslator(node);
- if (transformTranslator == null) {
- String transformUrn = PTransformTranslation.urnForTransform(node.getTransform());
- throw new UnsupportedOperationException(
- "The transform " + transformUrn + " is currently not supported.");
- }
- translateNode(node, transformTranslator);
- }
-
- private <T extends PTransform<?, ?>> void translateNode(
- TransformHierarchy.Node node,
- BatchTransformTranslator<?> transformTranslator) {
-
- @SuppressWarnings("unchecked")
- T typedTransform = (T) node.getTransform();
-
- @SuppressWarnings("unchecked")
- BatchTransformTranslator<T> typedTransformTranslator = (BatchTransformTranslator<T>) transformTranslator;
-
- // create the applied PTransform on the translationContext
- translationContext.setCurrentTransform(node.toAppliedPTransform(getPipeline()));
- typedTransformTranslator.translateNode(typedTransform, translationContext);
- }
-
-
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java
new file mode 100644
index 0000000..f5f0351
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java
@@ -0,0 +1,15 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+
+class BatchReadSourceTranslator<T> implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
+
+ @Override public void translateNode(PTransform<PBegin, PCollection<T>> transform,
+ TranslationContext context) {
+
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java
new file mode 100644
index 0000000..5fab1c8
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java
@@ -0,0 +1,12 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+import org.apache.beam.sdk.transforms.Reshuffle;
+
+class BatchReshuffleTranslator<K, InputT> implements TransformTranslator<Reshuffle<K, InputT>> {
+
+ @Override public void translateNode(Reshuffle<K, InputT> transform, TranslationContext context) {
+
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTransformTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTransformTranslator.java
deleted file mode 100644
index ab0cf68..0000000
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTransformTranslator.java
+++ /dev/null
@@ -1,11 +0,0 @@
-package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
-
-import org.apache.beam.sdk.transforms.PTransform;
-
-public interface BatchTransformTranslator<TransformT extends PTransform> {
-
- /** A translator of a {@link PTransform} in batch mode. */
-
- void translateNode(TransformT transform, BatchTranslationContext context);
- }
-
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
index b53aa19..71ef315 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
@@ -3,6 +3,7 @@ package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
import java.util.HashMap;
import java.util.Map;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.values.PValue;
import org.apache.spark.SparkConf;
@@ -12,7 +13,7 @@ import org.apache.spark.sql.SparkSession;
/**
* Keeps track of the {@link Dataset} and the step the translation is in.
*/
-public class BatchTranslationContext {
+public class BatchTranslationContext extends TranslationContext {
private final Map<PValue, Dataset<?>> datasets;
/**
@@ -24,9 +25,6 @@ public class BatchTranslationContext {
private SparkSession sparkSession;
private final SparkPipelineOptions options;
- private AppliedPTransform<?, ?, ?> currentTransform;
-
-
public BatchTranslationContext(SparkPipelineOptions options) {
SparkConf sparkConf = new SparkConf();
sparkConf.setMaster(options.getSparkMaster());
@@ -35,7 +33,7 @@ public class BatchTranslationContext {
sparkConf.setJars(options.getFilesToStage().toArray(new String[0]));
}
- SparkSession sparkSession = SparkSession
+ this.sparkSession = SparkSession
.builder()
.config(sparkConf)
.getOrCreate();
@@ -43,8 +41,4 @@ public class BatchTranslationContext {
this.datasets = new HashMap<>();
this.danglingDataSets = new HashMap<>();
}
-
- public void setCurrentTransform(AppliedPTransform<?, ?, ?> currentTransform) {
- this.currentTransform = currentTransform;
- }
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java
new file mode 100644
index 0000000..fbbced5
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java
@@ -0,0 +1,14 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PCollection;
+
+class BatchWindowAssignTranslator<T> implements
+ TransformTranslator<PTransform<PCollection<T>, PCollection<T>>> {
+
+ @Override public void translateNode(PTransform<PCollection<T>, PCollection<T>> transform,
+ TranslationContext context) {
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java
deleted file mode 100644
index 6099fbc..0000000
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java
+++ /dev/null
@@ -1,14 +0,0 @@
-package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
-
-import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.beam.sdk.values.KV;
-import org.apache.beam.sdk.values.PCollection;
-
-class CombinePerKeyTranslatorBatch<K, InputT, AccumT, OutputT> implements BatchTransformTranslator<PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, OutputT>>>> {
-
- @Override public void translateNode(
- PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, OutputT>>> transform,
- BatchTranslationContext context) {
-
- }
-}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenPCollectionTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenPCollectionTranslatorBatch.java
deleted file mode 100644
index 281eda9..0000000
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenPCollectionTranslatorBatch.java
+++ /dev/null
@@ -1,13 +0,0 @@
-package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
-
-import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.beam.sdk.values.PCollection;
-import org.apache.beam.sdk.values.PCollectionList;
-
-class FlattenPCollectionTranslatorBatch<T> implements BatchTransformTranslator<PTransform<PCollectionList<T>, PCollection<T>>> {
-
- @Override public void translateNode(PTransform<PCollectionList<T>, PCollection<T>> transform,
- BatchTranslationContext context) {
-
- }
-}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
deleted file mode 100644
index bb0ccc1..0000000
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
+++ /dev/null
@@ -1,14 +0,0 @@
-package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
-
-import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.beam.sdk.values.KV;
-import org.apache.beam.sdk.values.PCollection;
-
-class GroupByKeyTranslatorBatch<K, InputT> implements BatchTransformTranslator<PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>>> {
-
- @Override public void translateNode(
- PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>> transform,
- BatchTranslationContext context) {
-
- }
-}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java
deleted file mode 100644
index 4477853..0000000
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java
+++ /dev/null
@@ -1,13 +0,0 @@
-package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
-
-import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.beam.sdk.values.PCollection;
-import org.apache.beam.sdk.values.PCollectionTuple;
-
-class ParDoTranslatorBatch<InputT, OutputT> implements BatchTransformTranslator<PTransform<PCollection<InputT>, PCollectionTuple>> {
-
- @Override public void translateNode(PTransform<PCollection<InputT>, PCollectionTuple> transform,
- BatchTranslationContext context) {
-
- }
-}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
deleted file mode 100644
index a30fa70..0000000
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
+++ /dev/null
@@ -1,12 +0,0 @@
-package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
-
-import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.beam.sdk.values.PBegin;
-import org.apache.beam.sdk.values.PCollection;
-
-class ReadSourceTranslatorBatch<T> implements BatchTransformTranslator<PTransform<PBegin, PCollection<T>>> {
-
- @Override public void translateNode(PTransform<PBegin, PCollection<T>> transform, BatchTranslationContext context) {
-
- }
-}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java
deleted file mode 100644
index 6283fdb..0000000
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java
+++ /dev/null
@@ -1,11 +0,0 @@
-package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
-
-import org.apache.beam.sdk.transforms.Reshuffle;
-
-class ReshuffleTranslatorBatch<K, InputT> implements BatchTransformTranslator<Reshuffle<K, InputT>> {
-
- @Override public void translateNode(Reshuffle<K, InputT> transform,
- BatchTranslationContext context) {
-
- }
-}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java
deleted file mode 100644
index 21b71b9..0000000
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java
+++ /dev/null
@@ -1,12 +0,0 @@
-package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
-
-import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.beam.sdk.values.PCollection;
-
-class WindowAssignTranslatorBatch<T> implements BatchTransformTranslator<PTransform<PCollection<T>, PCollection<T>>> {
-
- @Override public void translateNode(PTransform<PCollection<T>, PCollection<T>> transform,
- BatchTranslationContext context) {
-
- }
-}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
index 7bed930..9303d59 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
@@ -2,9 +2,15 @@ package org.apache.beam.runners.spark.structuredstreaming.translation.streaming;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
+import org.apache.beam.sdk.runners.TransformHierarchy;
public class StreamingPipelineTranslator extends PipelineTranslator {
public StreamingPipelineTranslator(SparkPipelineOptions options) {
}
+
+ @Override protected TransformTranslator<?> getTransformTranslator(TransformHierarchy.Node node) {
+ return null;
+ }
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java
new file mode 100644
index 0000000..460dbf6
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java
@@ -0,0 +1,7 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation.streaming;
+
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+
+public class StreamingTranslationContext extends TranslationContext {
+
+}
[beam] 01/50: Add an empty spark-structured-streaming runner
project targeting spark 2.4.0
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit ce39f9371dc2bb5929e806f3cb240ae72ab79e57
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Tue Nov 13 17:07:02 2018 +0100
Add an empty spark-structured-streaming runner project targeting spark 2.4.0
---
.../org/apache/beam/gradle/BeamModulePlugin.groovy | 2 +
runners/spark-structured-streaming/build.gradle | 93 ++++++++++++++++++++++
settings.gradle | 2 +
3 files changed, 97 insertions(+)
diff --git a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
index 8662c4a..e6cfed1 100644
--- a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
+++ b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
@@ -329,6 +329,7 @@ class BeamModulePlugin implements Plugin<Project> {
def hadoop_version = "2.7.3"
def jackson_version = "2.9.8"
def spark_version = "2.3.2"
+ def spark_structured_streaming_version = "2.4.0"
def apex_core_version = "3.7.0"
def apex_malhar_version = "3.4.0"
def postgres_version = "42.2.2"
@@ -440,6 +441,7 @@ class BeamModulePlugin implements Plugin<Project> {
slf4j_jdk14 : "org.slf4j:slf4j-jdk14:1.7.25",
slf4j_log4j12 : "org.slf4j:slf4j-log4j12:1.7.25",
snappy_java : "org.xerial.snappy:snappy-java:1.1.4",
+ spark_sql : "org.apache.spark:spark-core_2.11:$spark_structured_streaming_version",
spark_core : "org.apache.spark:spark-core_2.11:$spark_version",
spark_network_common : "org.apache.spark:spark-network-common_2.11:$spark_version",
spark_streaming : "org.apache.spark:spark-streaming_2.11:$spark_version",
diff --git a/runners/spark-structured-streaming/build.gradle b/runners/spark-structured-streaming/build.gradle
new file mode 100644
index 0000000..b33a2b6
--- /dev/null
+++ b/runners/spark-structured-streaming/build.gradle
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import groovy.json.JsonOutput
+
+apply plugin: org.apache.beam.gradle.BeamModulePlugin
+applyJavaNature()
+
+description = "Apache Beam :: Runners :: Spark-Structured-Streaming"
+
+/*
+ * We need to rely on manually specifying these evaluationDependsOn to ensure that
+ * the following projects are evaluated before we evaluate this project. This is because
+ * we are attempting to reference the "sourceSets.test.output" directly.
+ */
+evaluationDependsOn(":beam-sdks-java-core")
+
+configurations {
+ validatesRunner
+}
+
+test {
+ systemProperty "spark.ui.enabled", "false"
+ systemProperty "spark.ui.showConsoleProgress", "false"
+ forkEvery 1
+ maxParallelForks 4
+ useJUnit {
+ //TODO add test excludes
+ }
+}
+
+dependencies {
+ shadow project(path: ":beam-model-pipeline", configuration: "shadow")
+ shadow project(path: ":beam-sdks-java-core", configuration: "shadow")
+ shadow project(path: ":beam-runners-core-construction-java", configuration: "shadow")
+ shadow project(path: ":beam-runners-core-java", configuration: "shadow")
+ shadow library.java.guava
+ shadow library.java.jackson_annotations
+ shadow library.java.slf4j_api
+ shadow library.java.joda_time
+ shadow "io.dropwizard.metrics:metrics-core:3.1.2"
+ shadow library.java.jackson_module_scala
+ provided library.java.spark_sql
+ provided library.java.hadoop_common
+ provided library.java.hadoop_mapreduce_client_core
+ provided library.java.commons_compress
+ provided library.java.commons_lang3
+ provided library.java.commons_io_2x
+ provided library.java.hamcrest_core
+ provided library.java.hamcrest_library
+ provided "org.apache.zookeeper:zookeeper:3.4.11"
+ provided "org.scala-lang:scala-library:2.11.8"
+ provided "com.esotericsoftware.kryo:kryo:2.21"
+ shadowTest project(path: ":beam-sdks-java-io-kafka", configuration: "shadow")
+ shadowTest project(path: ":beam-sdks-java-core", configuration: "shadowTest")
+ shadowTest project(path: ":beam-runners-core-java", configuration: "shadowTest")
+ shadowTest library.java.avro
+ shadowTest library.java.kafka_clients
+ shadowTest library.java.junit
+ shadowTest library.java.mockito_core
+ shadowTest library.java.jackson_dataformat_yaml
+ shadowTest "org.apache.kafka:kafka_2.11:0.11.0.1"
+ validatesRunner project(path: ":beam-sdks-java-core", configuration: "shadowTest")
+ validatesRunner project(path: project.path, configuration: "shadowTest")
+ validatesRunner project(path: project.path, configuration: "shadow")
+ validatesRunner project(path: project.path, configuration: "provided")
+}
+
+configurations.testRuntimeClasspath {
+ // Testing the Spark runner causes a StackOverflowError if slf4j-jdk14 is on the classpath
+ exclude group: "org.slf4j", module: "slf4j-jdk14"
+}
+
+configurations.validatesRunner {
+ // Testing the Spark runner causes a StackOverflowError if slf4j-jdk14 is on the classpath
+ exclude group: "org.slf4j", module: "slf4j-jdk14"
+}
+
diff --git a/settings.gradle b/settings.gradle
index aac5bf9..6e70016 100644
--- a/settings.gradle
+++ b/settings.gradle
@@ -76,6 +76,8 @@ include "beam-runners-reference-job-server"
project(":beam-runners-reference-job-server").dir = file("runners/reference/job-server")
include "beam-runners-spark"
project(":beam-runners-spark").dir = file("runners/spark")
+include "beam-runners-spark-structured-streaming"
+project(":beam-runners-spark-structured-streaming").dir = file("runners/spark-structured-streaming")
include "beam-runners-samza"
project(":beam-runners-samza").dir = file("runners/samza")
include "beam-sdks-go"
[beam] 14/50: Move SparkTransformOverrides to correct package
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 26f2e4bdb257b7be7ef341e7e36ec8d758d27e24
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Nov 22 11:50:45 2018 +0100
Move SparkTransformOverrides to correct package
---
.../spark/structuredstreaming/translation/PipelineTranslator.java | 1 -
.../structuredstreaming/translation/SparkTransformOverrides.java | 4 ++--
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
index 51d65ff..c05fc92 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
@@ -2,7 +2,6 @@ package org.apache.beam.runners.spark.structuredstreaming.translation;
import org.apache.beam.runners.core.construction.PTransformTranslation;
import org.apache.beam.runners.core.construction.PipelineResources;
-import org.apache.beam.runners.spark.SparkTransformOverrides;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.batch.BatchPipelineTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.streaming.StreamingPipelineTranslator;
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SparkTransformOverrides.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SparkTransformOverrides.java
index 897ac01..8e250bd 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SparkTransformOverrides.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SparkTransformOverrides.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.runners.spark;
+package org.apache.beam.runners.spark.structuredstreaming.translation;
import com.google.common.collect.ImmutableList;
import java.util.List;
@@ -27,7 +27,7 @@ import org.apache.beam.runners.core.construction.UnsupportedOverrideFactory;
import org.apache.beam.sdk.runners.PTransformOverride;
import org.apache.beam.sdk.transforms.PTransform;
-/** {@link PTransform} overrides for Flink runner. */
+/** {@link PTransform} overrides for Spark runner. */
public class SparkTransformOverrides {
public static List<PTransformOverride> getDefaultOverrides(boolean streaming) {
ImmutableList.Builder<PTransformOverride> builder = ImmutableList.builder();
[beam] 05/50: Add global pipeline translation structure
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit abf4b46a8a547a343b240af8ad895f0ec6975423
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Nov 21 09:36:32 2018 +0100
Add global pipeline translation structure
---
.../runners/spark/structuredstreaming/SparkRunner.java | 9 ++++-----
.../translation/PipelineTranslator.java | 16 ++++++++++++++--
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
index 62cd7d3..59c08f7 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
@@ -2,15 +2,14 @@ package org.apache.beam.runners.spark.structuredstreaming;
import static org.apache.beam.runners.core.construction.PipelineResources.detectClassPathResourcesToStage;
-import org.apache.beam.runners.spark.structuredstreaming.translation.BatchPipelineTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.batch.BatchPipelineTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
-import org.apache.beam.runners.spark.structuredstreaming.translation.StreamingPipelineTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.streaming.StreamingPipelineTranslator;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.PipelineRunner;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.PipelineOptionsValidator;
-import org.apache.spark.sql.SparkSession;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@@ -99,10 +98,10 @@ public final class SparkRunner extends PipelineRunner<SparkPipelineResult> {
PipelineTranslator.detectTranslationMode(pipeline, options);
PipelineTranslator.replaceTransforms(pipeline, options);
PipelineTranslator.prepareFilesToStageForRemoteClusterExecution(options);
- PipelineTranslator translator = options.isStreaming() ? new StreamingPipelineTranslator() : new BatchPipelineTranslator()
+ PipelineTranslator translator = options.isStreaming() ? new StreamingPipelineTranslator() : new BatchPipelineTranslator();
//init translator with subclass based on mode and env
translator.translate(pipeline);
}
- public void executePipeline(Pipeline pipeline) {}
+ private void executePipeline(Pipeline pipeline) {}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
index f0ce1e5..99621f6 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
@@ -3,6 +3,8 @@ package org.apache.beam.runners.spark.structuredstreaming.translation;
import org.apache.beam.runners.core.construction.PipelineResources;
import org.apache.beam.runners.spark.SparkTransformOverrides;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
+import org.apache.beam.runners.spark.structuredstreaming.translation.batch.BatchPipelineTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.streaming.StreamingPipelineTranslator;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.runners.TransformHierarchy;
import org.apache.beam.sdk.values.PCollection;
@@ -11,7 +13,10 @@ import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
- * Does all the translation work: mode detection, nodes translation.
+ /**
+ * The role of this class is to detect the pipeline mode and to translate the Beam operators to their Spark counterparts. If we have
+ * a streaming job, this is instantiated as a {@link StreamingPipelineTranslator}. In other
+ * case, i.e. for a batch job, a {@link BatchPipelineTranslator} is created. Correspondingly,
*/
public class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
@@ -41,11 +46,18 @@ public class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
TranslationModeDetector detector = new TranslationModeDetector();
pipeline.traverseTopologically(detector);
if (detector.getTranslationMode().equals(TranslationMode.STREAMING)) {
- // set streaming mode if it's a streaming pipeline
options.setStreaming(true);
}
}
+ /**
+ * Translates the pipeline by passing this class as a visitor.
+ *
+ * @param pipeline The pipeline to be translated
+ */
+ public void translate(Pipeline pipeline) {
+ pipeline.traverseTopologically(this);
+ }
[beam] 13/50: Improve javadocs
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 0033f898efca4068e8e730363e133b08df9bedf3
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Nov 22 11:50:17 2018 +0100
Improve javadocs
---
.../spark/structuredstreaming/translation/PipelineTranslator.java | 8 ++++----
.../structuredstreaming/translation/TransformTranslator.java | 2 +-
.../spark/structuredstreaming/translation/TranslationContext.java | 4 ++++
.../translation/batch/BatchPipelineTranslator.java | 4 +++-
.../translation/batch/BatchTranslationContext.java | 2 +-
.../translation/streaming/StreamingPipelineTranslator.java | 5 +++++
.../translation/streaming/StreamingTranslationContext.java | 3 +++
7 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
index 185879b..51d65ff 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
@@ -14,10 +14,10 @@ import org.apache.beam.sdk.values.PValue;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
- /**
- * The role of this class is to detect the pipeline mode and to translate the Beam operators to their Spark counterparts. If we have
- * a streaming job, this is instantiated as a {@link StreamingPipelineTranslator}. In other
- * case, i.e. for a batch job, a {@link BatchPipelineTranslator} is created. Correspondingly,
+ * {@link Pipeline.PipelineVisitor} that translates the Beam operators to their Spark counterparts.
+ * It also does the pipeline preparation: mode detection, transforms replacement, classpath preparation.
+ * If we have a streaming job, it is instantiated as a {@link StreamingPipelineTranslator}.
+ * If we have a batch job, it is instantiated as a {@link BatchPipelineTranslator}.
*/
public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
index ebb8bf8..54b0a85 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
@@ -4,7 +4,7 @@ import org.apache.beam.sdk.transforms.PTransform;
public interface TransformTranslator<TransformT extends PTransform> {
- /** A translator of a {@link PTransform}. */
+ /** Base class for translators of {@link PTransform}. */
void translateTransform(TransformT transform, TranslationContext context);
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index 341ed49..3dacde4 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -1,7 +1,11 @@
package org.apache.beam.runners.spark.structuredstreaming.translation;
import org.apache.beam.sdk.runners.AppliedPTransform;
+import org.apache.beam.sdk.transforms.PTransform;
+/**
+ * Base class that gives a context for {@link PTransform} translation.
+ */
public class TranslationContext {
private AppliedPTransform<?, ?, ?> currentTransform;
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
index ff92d89..38324c0 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
@@ -11,7 +11,9 @@ import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.runners.TransformHierarchy;
import org.apache.beam.sdk.transforms.PTransform;
-/** {@link Pipeline.PipelineVisitor} for executing a {@link Pipeline} as a Spark batch job. */
+/** {@link PipelineTranslator} for executing a {@link Pipeline} in Spark in batch mode.
+ * This contains only the components specific to batch: {@link BatchTranslationContext},
+ * registry of batch {@link TransformTranslator} and registry lookup code. */
public class BatchPipelineTranslator extends PipelineTranslator {
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
index 71ef315..f08e33c 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
@@ -11,7 +11,7 @@ import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SparkSession;
/**
- * Keeps track of the {@link Dataset} and the step the translation is in.
+ * Keeps track of context of the translation.
*/
public class BatchTranslationContext extends TranslationContext {
private final Map<PValue, Dataset<?>> datasets;
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
index 9303d59..9cbfbed 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingPipelineTranslator.java
@@ -3,8 +3,13 @@ package org.apache.beam.runners.spark.structuredstreaming.translation.streaming;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
+import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.runners.TransformHierarchy;
+/** {@link PipelineTranslator} for executing a {@link Pipeline} in Spark in streaming mode.
+ * This contains only the components specific to streaming: {@link StreamingTranslationContext},
+ * registry of batch {@link TransformTranslator} and registry lookup code. */
+
public class StreamingPipelineTranslator extends PipelineTranslator {
public StreamingPipelineTranslator(SparkPipelineOptions options) {
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java
index 460dbf6..f2ee34b 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java
@@ -2,6 +2,9 @@ package org.apache.beam.runners.spark.structuredstreaming.translation.streaming;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+/**
+ * * Keeps track of context of the translation.
+ */
public class StreamingTranslationContext extends TranslationContext {
}
[beam] 18/50: Add TODOs
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 901a1acc34abe3e74d5636fd4d30dbcb9918793d
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Nov 22 17:10:25 2018 +0100
Add TODOs
---
.../org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java | 1 +
1 file changed, 1 insertion(+)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
index 3a530f0..b76a530 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
@@ -110,6 +110,7 @@ public final class SparkRunner extends PipelineRunner<SparkPipelineResult> {
@Override
public SparkPipelineResult run(final Pipeline pipeline) {
translatePipeline(pipeline);
+ //TODO initialise other services: checkpointing, metrics system, listeners, ...
executePipeline(pipeline);
return new SparkPipelineResult();
}
[beam] 29/50: update TODO
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 1ec9356e9e57d1ac89427fb61fc87de6458ae26d
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Dec 6 17:28:57 2018 +0100
update TODO
---
.../runners/spark/structuredstreaming/translation/io/DatasetSource.java | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
index d9d283e..60bdab6 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
@@ -60,7 +60,7 @@ public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport {
private DatasetMicroBatchReader(Optional<StructType> schema, String checkpointLocation,
DataSourceOptions options) {
- //TODO start reading from the source here, inc offset at each element read
+ //TODO deal with schema and options
}
@Override public void setOffsetRange(Optional<Offset> start, Optional<Offset> end) {
[beam] 08/50: Renames: better differenciate pipeline translator for
transform translator
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 051e8dc47ce952606bce2fec93f05fd877b2cdcd
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Nov 21 10:52:43 2018 +0100
Renames: better differenciate pipeline translator for transform translator
---
.../spark/structuredstreaming/SparkRunner.java | 6 ++--
.../translation/batch/BatchPipelineTranslator.java | 39 +++++++++++-----------
2 files changed, 22 insertions(+), 23 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
index 59c08f7..3e3b112 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
@@ -98,9 +98,9 @@ public final class SparkRunner extends PipelineRunner<SparkPipelineResult> {
PipelineTranslator.detectTranslationMode(pipeline, options);
PipelineTranslator.replaceTransforms(pipeline, options);
PipelineTranslator.prepareFilesToStageForRemoteClusterExecution(options);
- PipelineTranslator translator = options.isStreaming() ? new StreamingPipelineTranslator() : new BatchPipelineTranslator();
- //init translator with subclass based on mode and env
- translator.translate(pipeline);
+ PipelineTranslator pipelineTranslator = options.isStreaming() ? new StreamingPipelineTranslator() : new BatchPipelineTranslator();
+ //init pipelineTranslator with subclass based on mode and env
+ pipelineTranslator.translate(pipeline);
}
private void executePipeline(Pipeline pipeline) {}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
index e20e4c0..2459372 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
@@ -24,39 +24,38 @@ public class BatchPipelineTranslator extends PipelineTranslator {
private int depth = 0;
@SuppressWarnings("rawtypes")
- private static final Map<String, BatchTransformTranslator>
- TRANSLATORS = new HashMap<>();
+ private static final Map<String, BatchTransformTranslator> TRANSFORM_TRANSLATORS = new HashMap<>();
static {
- TRANSLATORS.put(PTransformTranslation.COMBINE_PER_KEY_TRANSFORM_URN,
+ TRANSFORM_TRANSLATORS.put(PTransformTranslation.COMBINE_PER_KEY_TRANSFORM_URN,
new CombinePerKeyTranslatorBatch());
- TRANSLATORS
+ TRANSFORM_TRANSLATORS
.put(PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN, new GroupByKeyTranslatorBatch());
- TRANSLATORS.put(PTransformTranslation.RESHUFFLE_URN, new ReshuffleTranslatorBatch());
+ TRANSFORM_TRANSLATORS.put(PTransformTranslation.RESHUFFLE_URN, new ReshuffleTranslatorBatch());
- TRANSLATORS
+ TRANSFORM_TRANSLATORS
.put(PTransformTranslation.FLATTEN_TRANSFORM_URN, new FlattenPCollectionTranslatorBatch());
- TRANSLATORS
+ TRANSFORM_TRANSLATORS
.put(PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, new WindowAssignTranslatorBatch());
- TRANSLATORS.put(PTransformTranslation.PAR_DO_TRANSFORM_URN, new ParDoTranslatorBatch());
+ TRANSFORM_TRANSLATORS.put(PTransformTranslation.PAR_DO_TRANSFORM_URN, new ParDoTranslatorBatch());
- TRANSLATORS.put(PTransformTranslation.READ_TRANSFORM_URN, new ReadSourceTranslatorBatch());
+ TRANSFORM_TRANSLATORS.put(PTransformTranslation.READ_TRANSFORM_URN, new ReadSourceTranslatorBatch());
}
private static final Logger LOG = LoggerFactory.getLogger(BatchPipelineTranslator.class);
/** Returns a translator for the given node, if it is possible, otherwise null. */
- private static BatchTransformTranslator<?> getTranslator(TransformHierarchy.Node node) {
+ private static BatchTransformTranslator<?> getTransformTranslator(TransformHierarchy.Node node) {
@Nullable PTransform<?, ?> transform = node.getTransform();
// Root of the graph is null
if (transform == null) {
return null;
}
@Nullable String urn = PTransformTranslation.urnForTransformOrNull(transform);
- return (urn == null) ? null : TRANSLATORS.get(urn);
+ return (urn == null) ? null : TRANSFORM_TRANSLATORS.get(urn);
}
@@ -69,10 +68,10 @@ public class BatchPipelineTranslator extends PipelineTranslator {
LOG.info("{} enterCompositeTransform- {}", genSpaces(depth), node.getFullName());
depth++;
- BatchTransformTranslator<?> translator = getTranslator(node);
+ BatchTransformTranslator<?> transformTranslator = getTransformTranslator(node);
- if (translator != null) {
- translateNode(node, translator);
+ if (transformTranslator != null) {
+ translateNode(node, transformTranslator);
LOG.info("{} translated- {}", genSpaces(depth), node.getFullName());
return CompositeBehavior.DO_NOT_ENTER_TRANSFORM;
} else {
@@ -92,28 +91,28 @@ public class BatchPipelineTranslator extends PipelineTranslator {
// get the transformation corresponding to the node we are
// currently visiting and translate it into its Spark alternative.
- BatchTransformTranslator<?> translator = getTranslator(node);
- if (translator == null) {
+ BatchTransformTranslator<?> transformTranslator = getTransformTranslator(node);
+ if (transformTranslator == null) {
String transformUrn = PTransformTranslation.urnForTransform(node.getTransform());
throw new UnsupportedOperationException(
"The transform " + transformUrn + " is currently not supported.");
}
- translateNode(node, translator);
+ translateNode(node, transformTranslator);
}
private <T extends PTransform<?, ?>> void translateNode(
TransformHierarchy.Node node,
- BatchTransformTranslator<?> translator) {
+ BatchTransformTranslator<?> transformTranslator) {
@SuppressWarnings("unchecked")
T typedTransform = (T) node.getTransform();
@SuppressWarnings("unchecked")
- BatchTransformTranslator<T> typedTranslator = (BatchTransformTranslator<T>) translator;
+ BatchTransformTranslator<T> typedTransformTranslator = (BatchTransformTranslator<T>) transformTranslator;
// create the applied PTransform on the translationContext
translationContext.setCurrentTransform(node.toAppliedPTransform(getPipeline()));
- typedTranslator.translateNode(typedTransform, translationContext);
+ typedTransformTranslator.translateNode(typedTransform, translationContext);
}
[beam] 38/50: clean deps
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 1184022cca5d485f0aaa9f51adc5322372ff31af
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Fri Dec 21 16:24:23 2018 +0100
clean deps
---
runners/spark-structured-streaming/build.gradle | 17 ++++-------------
1 file changed, 4 insertions(+), 13 deletions(-)
diff --git a/runners/spark-structured-streaming/build.gradle b/runners/spark-structured-streaming/build.gradle
index b33a2b6..803058f 100644
--- a/runners/spark-structured-streaming/build.gradle
+++ b/runners/spark-structured-streaming/build.gradle
@@ -45,36 +45,27 @@ test {
}
dependencies {
+ compile('org.apache.spark:spark-core_2.11:2.4.0'){
+ exclude group: 'org.json4s', module: 'json4s-jackson_2.11'
+ }
+ shadow "org.json4s:json4s-jackson_2.11:3.6.3"
shadow project(path: ":beam-model-pipeline", configuration: "shadow")
shadow project(path: ":beam-sdks-java-core", configuration: "shadow")
shadow project(path: ":beam-runners-core-construction-java", configuration: "shadow")
shadow project(path: ":beam-runners-core-java", configuration: "shadow")
shadow library.java.guava
- shadow library.java.jackson_annotations
shadow library.java.slf4j_api
shadow library.java.joda_time
- shadow "io.dropwizard.metrics:metrics-core:3.1.2"
- shadow library.java.jackson_module_scala
provided library.java.spark_sql
- provided library.java.hadoop_common
- provided library.java.hadoop_mapreduce_client_core
provided library.java.commons_compress
provided library.java.commons_lang3
provided library.java.commons_io_2x
provided library.java.hamcrest_core
provided library.java.hamcrest_library
- provided "org.apache.zookeeper:zookeeper:3.4.11"
- provided "org.scala-lang:scala-library:2.11.8"
- provided "com.esotericsoftware.kryo:kryo:2.21"
- shadowTest project(path: ":beam-sdks-java-io-kafka", configuration: "shadow")
shadowTest project(path: ":beam-sdks-java-core", configuration: "shadowTest")
shadowTest project(path: ":beam-runners-core-java", configuration: "shadowTest")
- shadowTest library.java.avro
- shadowTest library.java.kafka_clients
shadowTest library.java.junit
shadowTest library.java.mockito_core
- shadowTest library.java.jackson_dataformat_yaml
- shadowTest "org.apache.kafka:kafka_2.11:0.11.0.1"
validatesRunner project(path: ":beam-sdks-java-core", configuration: "shadowTest")
validatesRunner project(path: project.path, configuration: "shadowTest")
validatesRunner project(path: project.path, configuration: "shadow")
[beam] 03/50: Add SparkPipelineOptions
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 018c77329fcfc96276c9a98249c9ddb3fab10278
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Nov 14 15:24:28 2018 +0100
Add SparkPipelineOptions
---
.../structuredstreaming/SparkPipelineOptions.java | 106 +++++++++++++++++++++
1 file changed, 106 insertions(+)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineOptions.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineOptions.java
new file mode 100644
index 0000000..d381b5f
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineOptions.java
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark.structuredstreaming;
+
+import java.util.List;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.options.ApplicationNameOptions;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.DefaultValueFactory;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.StreamingOptions;
+
+/**
+ * Spark runner {@link PipelineOptions} handles Spark execution-related configurations, such as the
+ * master address, and other user-related knobs.
+ */
+public interface SparkPipelineOptions
+ extends PipelineOptions, StreamingOptions, ApplicationNameOptions {
+
+ @Description("The url of the spark master to connect to, (e.g. spark://host:port, local[4]).")
+ @Default.String("local[4]")
+ String getSparkMaster();
+
+ void setSparkMaster(String master);
+
+ @Description("Batch default storage level")
+ @Default.String("MEMORY_ONLY")
+ String getStorageLevel();
+
+ void setStorageLevel(String storageLevel);
+
+ @Description(
+ "A checkpoint directory for streaming resilience, ignored in batch. "
+ + "For durability, a reliable filesystem such as HDFS/S3/GS is necessary.")
+ @Default.InstanceFactory(TmpCheckpointDirFactory.class)
+ String getCheckpointDir();
+
+ void setCheckpointDir(String checkpointDir);
+
+ /**
+ * Returns the default checkpoint directory of /tmp/${job.name}. For testing purposes only.
+ * Production applications should use a reliable filesystem such as HDFS/S3/GS.
+ */
+ class TmpCheckpointDirFactory implements DefaultValueFactory<String> {
+ @Override
+ public String create(PipelineOptions options) {
+ return "/tmp/" + options.as(SparkPipelineOptions.class).getJobName();
+ }
+ }
+
+ @Description(
+ "The period to checkpoint (in Millis). If not set, Spark will default "
+ + "to Max(slideDuration, Seconds(10)). This PipelineOptions default (-1) will end-up "
+ + "with the described Spark default.")
+ @Default.Long(-1)
+ Long getCheckpointDurationMillis();
+
+ void setCheckpointDurationMillis(Long durationMillis);
+
+ @Description(
+ "If set bundleSize will be used for splitting BoundedSources, otherwise default to "
+ + "splitting BoundedSources on Spark defaultParallelism. Most effective when used with "
+ + "Spark dynamicAllocation.")
+ @Default.Long(0)
+ Long getBundleSize();
+
+ @Experimental
+ void setBundleSize(Long value);
+
+ @Description("Enable/disable sending aggregator values to Spark's metric sinks")
+ @Default.Boolean(true)
+ Boolean getEnableSparkMetricSinks();
+
+ void setEnableSparkMetricSinks(Boolean enableSparkMetricSinks);
+
+
+ /**
+ * List of local files to make available to workers.
+ *
+ * <p>Jars are placed on the worker's classpath.
+ *
+ * <p>The default value is the list of jars from the main program's classpath.
+ */
+ @Description(
+ "Jar-Files to send to all workers and put on the classpath. "
+ + "The default value is all files from the classpath.")
+ List<String> getFilesToStage();
+
+ void setFilesToStage(List<String> value);
+}
[beam] 45/50: Move Source and translator mocks to a mock package.
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 2f5bdd36f0ef6e6f5689806351cc2176763c14c0
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Jan 2 11:56:45 2019 +0100
Move Source and translator mocks to a mock package.
---
.../translation/batch/PipelineTranslatorBatch.java | 1 +
.../translation/batch/{ => mocks}/DatasetSourceMockBatch.java | 2 +-
.../translation/batch/{ => mocks}/ReadSourceTranslatorMockBatch.java | 4 ++--
3 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
index 9ccc712..3b9a7d6 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
@@ -24,6 +24,7 @@ import org.apache.beam.runners.core.construction.PTransformTranslation;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.batch.mocks.ReadSourceTranslatorMockBatch;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.runners.TransformHierarchy;
import org.apache.beam.sdk.transforms.PTransform;
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceMockBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/mocks/DatasetSourceMockBatch.java
similarity index 99%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceMockBatch.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/mocks/DatasetSourceMockBatch.java
index b616a6f..914eed0 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceMockBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/mocks/DatasetSourceMockBatch.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch.mocks;
import static scala.collection.JavaConversions.asScalaBuffer;
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/mocks/ReadSourceTranslatorMockBatch.java
similarity index 97%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/mocks/ReadSourceTranslatorMockBatch.java
index d7b9175..17c7f62 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/mocks/ReadSourceTranslatorMockBatch.java
@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch.mocks;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
@@ -33,7 +33,7 @@ import org.apache.spark.sql.SparkSession;
* Mock translator that generates a source of 0 to 999 and prints it.
* @param <T>
*/
-class ReadSourceTranslatorMockBatch<T>
+public class ReadSourceTranslatorMockBatch<T>
implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
private String SOURCE_PROVIDER_CLASS = DatasetSourceMockBatch.class.getCanonicalName();
[beam] 31/50: start source instanciation
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit d531bb5fd34589d5f16cee212338f6ee6118595e
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Mon Dec 10 15:27:49 2018 +0100
start source instanciation
---
.../batch/ReadSourceTranslatorBatch.java | 27 ++++++++++++++++++----
.../translation/io/DatasetSource.java | 10 ++++----
2 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
index 63f2fdf..a75730a 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
@@ -22,18 +22,25 @@ import org.apache.beam.runners.core.construction.ReadTranslation;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
import org.apache.beam.runners.spark.structuredstreaming.translation.io.DatasetSource;
+import org.apache.beam.sdk.coders.SerializableCoder;
import org.apache.beam.sdk.io.BoundedSource;
import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;
+import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.streaming.DataStreamReader;
class ReadSourceTranslatorBatch<T>
implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
+ private String SOURCE_PROVIDER_CLASS = DatasetSource.class.getCanonicalName();
+
@SuppressWarnings("unchecked")
@Override
public void translateTransform(
@@ -41,18 +48,28 @@ class ReadSourceTranslatorBatch<T>
AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>> rootTransform =
(AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>>)
context.getCurrentTransform();
- BoundedSource<T> source;
+
+ String providerClassName = SOURCE_PROVIDER_CLASS.substring(0, SOURCE_PROVIDER_CLASS.indexOf("$"));
+ BoundedSource<T> source;
try {
source = ReadTranslation.boundedSourceFromTransform(rootTransform);
} catch (IOException e) {
throw new RuntimeException(e);
}
- PCollection<T> output = (PCollection<T>) context.getOutput();
-
SparkSession sparkSession = context.getSparkSession();
- DatasetSource datasetSource = new DatasetSource(context, source);
- Dataset<Row> dataset = sparkSession.readStream().format("DatasetSource").load();
+ Dataset<Row> rowDataset = sparkSession.readStream().format(providerClassName).load();
+ //TODO initialize source : how, to get a reference to the DatasetSource instance that spark
+ // instantiates to be able to call DatasetSource.initialize()
+ MapFunction<Row, WindowedValue<T>> func = new MapFunction<Row, WindowedValue<T>>() {
+ @Override public WindowedValue<T> call(Row value) throws Exception {
+ //TODO fix row content extraction: I guess cast is not enough
+ return (WindowedValue<T>) value.get(0);
+ }
+ };
+ //TODO fix encoder
+ Dataset<WindowedValue<T>> dataset = rowDataset.map(func, null);
+ PCollection<T> output = (PCollection<T>) context.getOutput();
context.putDataset(output, dataset);
}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
index f230a70..75cdd5d 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
@@ -30,6 +30,7 @@ import org.apache.beam.sdk.io.BoundedSource;
import org.apache.beam.sdk.io.BoundedSource.BoundedReader;
import org.apache.beam.sdk.util.WindowedValue;
import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.sources.DataSourceRegister;
import org.apache.spark.sql.sources.v2.ContinuousReadSupport;
import org.apache.spark.sql.sources.v2.DataSourceOptions;
import org.apache.spark.sql.sources.v2.DataSourceV2;
@@ -45,14 +46,15 @@ import org.apache.spark.sql.types.StructType;
* is tagged experimental in spark, this class does no implement {@link ContinuousReadSupport}. This
* class is just a mix-in.
*/
-public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport {
+public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport{
- private final int numPartitions;
- private final Long bundleSize;
+ private int numPartitions;
+ private Long bundleSize;
private TranslationContext context;
private BoundedSource<T> source;
- public DatasetSource(TranslationContext context, BoundedSource<T> source) {
+
+ public void initialize(TranslationContext context, BoundedSource<T> source){
this.context = context;
this.source = source;
this.numPartitions = context.getSparkSession().sparkContext().defaultParallelism();
[beam] 47/50: Refactor DatasetSource fields
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 92c94b123c11eab6e4cfd2441a64463253f2afa2
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Jan 2 16:08:31 2019 +0100
Refactor DatasetSource fields
---
.../translation/batch/DatasetSourceBatch.java | 40 ++++++++++++----------
1 file changed, 22 insertions(+), 18 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
index 331e397..e19bbdb 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
@@ -49,10 +49,6 @@ public class DatasetSourceBatch<T> implements DataSourceV2, ReadSupport {
static final String BEAM_SOURCE_OPTION = "beam-source";
static final String DEFAULT_PARALLELISM = "default-parallelism";
static final String PIPELINE_OPTIONS = "pipeline-options";
- private int numPartitions;
- private Long bundleSize;
- private BoundedSource<T> source;
- private SparkPipelineOptions sparkPipelineOptions;
@SuppressWarnings("unchecked")
@@ -61,31 +57,39 @@ public class DatasetSourceBatch<T> implements DataSourceV2, ReadSupport {
if (!options.get(BEAM_SOURCE_OPTION).isPresent()){
throw new RuntimeException("Beam source was not set in DataSource options");
}
- this.source = Base64Serializer
+ BoundedSource<T> source = Base64Serializer
.deserializeUnchecked(options.get(BEAM_SOURCE_OPTION).get(), BoundedSource.class);
if (!options.get(DEFAULT_PARALLELISM).isPresent()){
throw new RuntimeException("Spark default parallelism was not set in DataSource options");
}
- if (!options.get(BEAM_SOURCE_OPTION).isPresent()){
- throw new RuntimeException("Beam source was not set in DataSource options");
- }
- this.numPartitions = Integer.valueOf(options.get(DEFAULT_PARALLELISM).get());
- checkArgument(this.numPartitions > 0, "Number of partitions must be greater than zero.");
+ int numPartitions = Integer.valueOf(options.get(DEFAULT_PARALLELISM).get());
+ checkArgument(numPartitions > 0, "Number of partitions must be greater than zero.");
+
if (!options.get(PIPELINE_OPTIONS).isPresent()){
throw new RuntimeException("Beam pipelineOptions were not set in DataSource options");
}
- this.sparkPipelineOptions = SerializablePipelineOptions
+ SparkPipelineOptions sparkPipelineOptions = SerializablePipelineOptions
.deserializeFromJson(options.get(PIPELINE_OPTIONS).get()).as(SparkPipelineOptions.class);
- this.bundleSize = sparkPipelineOptions.getBundleSize();
- return new DatasetReader(); }
+ return new DatasetReader(numPartitions, source, sparkPipelineOptions);
+ }
/** This class can be mapped to Beam {@link BoundedSource}. */
private class DatasetReader implements DataSourceReader {
+ private int numPartitions;
+ private BoundedSource<T> source;
+ private SparkPipelineOptions sparkPipelineOptions;
private Optional<StructType> schema;
private String checkpointLocation;
+ private DatasetReader(int numPartitions, BoundedSource<T> source,
+ SparkPipelineOptions sparkPipelineOptions) {
+ this.numPartitions = numPartitions;
+ this.source = source;
+ this.sparkPipelineOptions = sparkPipelineOptions;
+ }
+
@Override
public StructType readSchema() {
return new StructType();
@@ -97,11 +101,11 @@ public class DatasetSourceBatch<T> implements DataSourceV2, ReadSupport {
long desiredSizeBytes;
try {
desiredSizeBytes =
- (bundleSize == null)
+ (sparkPipelineOptions.getBundleSize() == null)
? source.getEstimatedSizeBytes(sparkPipelineOptions) / numPartitions
- : bundleSize;
- List<? extends BoundedSource<T>> sources = source.split(desiredSizeBytes, sparkPipelineOptions);
- for (BoundedSource<T> source : sources) {
+ : sparkPipelineOptions.getBundleSize();
+ List<? extends BoundedSource<T>> splits = source.split(desiredSizeBytes, sparkPipelineOptions);
+ for (BoundedSource<T> split : splits) {
result.add(
new InputPartition<InternalRow>() {
@@ -109,7 +113,7 @@ public class DatasetSourceBatch<T> implements DataSourceV2, ReadSupport {
public InputPartitionReader<InternalRow> createPartitionReader() {
BoundedReader<T> reader = null;
try {
- reader = source.createReader(sparkPipelineOptions);
+ reader = split.createReader(sparkPipelineOptions);
} catch (IOException e) {
throw new RuntimeException(
"Error creating BoundedReader " + reader.getClass().getCanonicalName(), e);
[beam] 15/50: Move common translation context components to
superclass
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 4777e22592bbc8a92eccae30250bd9998bfd11da
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Nov 22 12:01:28 2018 +0100
Move common translation context components to superclass
---
.../translation/TranslationContext.java | 29 +++++++++++++++++++++-
.../translation/batch/BatchTranslationContext.java | 20 ++-------------
.../streaming/StreamingTranslationContext.java | 6 ++++-
3 files changed, 35 insertions(+), 20 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index 3dacde4..e651e70 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -1,17 +1,44 @@
package org.apache.beam.runners.spark.structuredstreaming.translation;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PValue;
+import org.apache.spark.SparkConf;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.SparkSession;
/**
- * Base class that gives a context for {@link PTransform} translation.
+ * Base class that gives a context for {@link PTransform} translation: keeping track of the datasets,
+ * the {@link SparkSession}, the current transform being translated.
*/
public class TranslationContext {
private AppliedPTransform<?, ?, ?> currentTransform;
+ private final Map<PValue, Dataset<?>> datasets;
+ private SparkSession sparkSession;
+ private final SparkPipelineOptions options;
public void setCurrentTransform(AppliedPTransform<?, ?, ?> currentTransform) {
this.currentTransform = currentTransform;
}
+ public TranslationContext(SparkPipelineOptions options) {
+ SparkConf sparkConf = new SparkConf();
+ sparkConf.setMaster(options.getSparkMaster());
+ sparkConf.setAppName(options.getAppName());
+ if (options.getFilesToStage() != null && !options.getFilesToStage().isEmpty()) {
+ sparkConf.setJars(options.getFilesToStage().toArray(new String[0]));
+ }
+
+ this.sparkSession = SparkSession
+ .builder()
+ .config(sparkConf)
+ .getOrCreate();
+ this.options = options;
+ this.datasets = new HashMap<>();
+ }
+
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
index f08e33c..02aad71 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
@@ -11,10 +11,9 @@ import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SparkSession;
/**
- * Keeps track of context of the translation.
+ * This class contains only batch specific context components.
*/
public class BatchTranslationContext extends TranslationContext {
- private final Map<PValue, Dataset<?>> datasets;
/**
* For keeping track about which DataSets don't have a successor. We need to terminate these with
@@ -22,23 +21,8 @@ public class BatchTranslationContext extends TranslationContext {
*/
private final Map<PValue, Dataset<?>> danglingDataSets;
- private SparkSession sparkSession;
- private final SparkPipelineOptions options;
-
public BatchTranslationContext(SparkPipelineOptions options) {
- SparkConf sparkConf = new SparkConf();
- sparkConf.setMaster(options.getSparkMaster());
- sparkConf.setAppName(options.getAppName());
- if (options.getFilesToStage() != null && !options.getFilesToStage().isEmpty()) {
- sparkConf.setJars(options.getFilesToStage().toArray(new String[0]));
- }
-
- this.sparkSession = SparkSession
- .builder()
- .config(sparkConf)
- .getOrCreate();
- this.options = options;
- this.datasets = new HashMap<>();
+ super(options);
this.danglingDataSets = new HashMap<>();
}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java
index f2ee34b..ebccfa7 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/StreamingTranslationContext.java
@@ -1,10 +1,14 @@
package org.apache.beam.runners.spark.structuredstreaming.translation.streaming;
+import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
/**
- * * Keeps track of context of the translation.
+ * This class contains only streaming specific context components.
*/
public class StreamingTranslationContext extends TranslationContext {
+ public StreamingTranslationContext(SparkPipelineOptions options) {
+ super(options);
+ }
}
[beam] 34/50: Experiment over using spark Catalog to pass in Beam
Source through spark Table
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit e9ac3c36ca414f5cf014cf787eb67045ce3d4b2a
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Dec 19 17:08:58 2018 +0100
Experiment over using spark Catalog to pass in Beam Source through spark Table
---
.../batch/ReadSourceTranslatorBatch.java | 12 +-
.../translation/io/DatasetSource.java | 191 ++++++++++++++++++++-
2 files changed, 193 insertions(+), 10 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
index 2c1aa93..0b828fb 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
@@ -30,11 +30,16 @@ import org.apache.beam.sdk.transforms.windowing.Window;
import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;
+import org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetCache;
import org.apache.spark.api.java.function.MapFunction;
+import org.apache.spark.scheduler.SparkListener;
+import org.apache.spark.scheduler.SparkListenerApplicationStart;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.catalog.Catalog;
+import org.apache.spark.sql.catalyst.catalog.CatalogTable;
import org.apache.spark.sql.streaming.DataStreamReader;
class ReadSourceTranslatorBatch<T>
@@ -58,9 +63,12 @@ class ReadSourceTranslatorBatch<T>
throw new RuntimeException(e);
}
SparkSession sparkSession = context.getSparkSession();
- Dataset<Row> rowDataset = sparkSession.readStream().format(providerClassName).load();
+
+ DataStreamReader dataStreamReader = sparkSession.readStream().format(providerClassName);
+ Dataset<Row> rowDataset = dataStreamReader.load();
+
//TODO initialize source : how, to get a reference to the DatasetSource instance that spark
- // instantiates to be able to call DatasetSource.initialize()
+ // instantiates to be able to call DatasetSource.initialize(). How to pass in a DatasetCatalog?
MapFunction<Row, WindowedValue<T>> func = new MapFunction<Row, WindowedValue<T>>() {
@Override public WindowedValue<T> call(Row value) throws Exception {
//there is only one value put in each Row by the InputPartitionReader
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
index d23ecf3..deacdf4 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSource.java
@@ -28,7 +28,16 @@ import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
import org.apache.beam.sdk.io.BoundedSource;
import org.apache.beam.sdk.io.BoundedSource.BoundedReader;
+import org.apache.beam.sdk.io.Source;
import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.spark.sql.AnalysisException;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.catalog.Catalog;
+import org.apache.spark.sql.catalog.Column;
+import org.apache.spark.sql.catalog.Database;
+import org.apache.spark.sql.catalog.Function;
+import org.apache.spark.sql.catalog.Table;
import org.apache.spark.sql.catalyst.InternalRow;
import org.apache.spark.sql.sources.v2.ContinuousReadSupport;
import org.apache.spark.sql.sources.v2.DataSourceOptions;
@@ -39,6 +48,8 @@ import org.apache.spark.sql.sources.v2.reader.InputPartitionReader;
import org.apache.spark.sql.sources.v2.reader.streaming.MicroBatchReader;
import org.apache.spark.sql.sources.v2.reader.streaming.Offset;
import org.apache.spark.sql.types.StructType;
+import org.apache.spark.storage.StorageLevel;
+import scala.collection.immutable.Map;
/**
* This is a spark structured streaming {@link DataSourceV2} implementation. As Continuous streaming
@@ -53,17 +64,12 @@ public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport{
private BoundedSource<T> source;
- public void initialize(TranslationContext context, BoundedSource<T> source){
- this.context = context;
- this.source = source;
- this.numPartitions = context.getSparkSession().sparkContext().defaultParallelism();
- checkArgument(this.numPartitions > 0, "Number of partitions must be greater than zero.");
- this.bundleSize = context.getOptions().getBundleSize();
- }
-
@Override
public MicroBatchReader createMicroBatchReader(
Optional<StructType> schema, String checkpointLocation, DataSourceOptions options) {
+ this.numPartitions = context.getSparkSession().sparkContext().defaultParallelism();
+ checkArgument(this.numPartitions > 0, "Number of partitions must be greater than zero.");
+ this.bundleSize = context.getOptions().getBundleSize();
return new DatasetMicroBatchReader(schema, checkpointLocation, options);
}
@@ -190,4 +196,173 @@ public class DatasetSource<T> implements DataSourceV2, MicroBatchReadSupport{
reader.close();
}
}
+
+ private static class DatasetCatalog<T> extends Catalog {
+
+ TranslationContext context;
+ Source<T> source;
+
+ private DatasetCatalog(TranslationContext context, Source<T> source) {
+ this.context = context;
+ this.source = source;
+ }
+
+ @Override public String currentDatabase() {
+ return null;
+ }
+
+ @Override public void setCurrentDatabase(String dbName) {
+
+ }
+
+ @Override public Dataset<Database> listDatabases() {
+ return null;
+ }
+
+ @Override public Dataset<Table> listTables() {
+ return null;
+ }
+
+ @Override public Dataset<Table> listTables(String dbName) throws AnalysisException {
+ return null;
+ }
+
+ @Override public Dataset<Function> listFunctions() {
+ return null;
+ }
+
+ @Override public Dataset<Function> listFunctions(String dbName) throws AnalysisException {
+ return null;
+ }
+
+ @Override public Dataset<Column> listColumns(String tableName) throws AnalysisException {
+ return null;
+ }
+
+ @Override public Dataset<Column> listColumns(String dbName, String tableName)
+ throws AnalysisException {
+ return null;
+ }
+
+ @Override public Database getDatabase(String dbName) throws AnalysisException {
+ return null;
+ }
+
+ @Override public Table getTable(String tableName) throws AnalysisException {
+ return new DatasetTable<>("beam", "beaam", "beam fake table to wire up with Beam sources",
+ null, true, source, context);
+ }
+
+ @Override public Table getTable(String dbName, String tableName) throws AnalysisException {
+ return null;
+ }
+
+ @Override public Function getFunction(String functionName) throws AnalysisException {
+ return null;
+ }
+
+ @Override public Function getFunction(String dbName, String functionName)
+ throws AnalysisException {
+ return null;
+ }
+
+ @Override public boolean databaseExists(String dbName) {
+ return false;
+ }
+
+ @Override public boolean tableExists(String tableName) {
+ return false;
+ }
+
+ @Override public boolean tableExists(String dbName, String tableName) {
+ return false;
+ }
+
+ @Override public boolean functionExists(String functionName) {
+ return false;
+ }
+
+ @Override public boolean functionExists(String dbName, String functionName) {
+ return false;
+ }
+
+ @Override public Dataset<Row> createTable(String tableName, String path) {
+ return null;
+ }
+
+ @Override public Dataset<Row> createTable(String tableName, String path, String source) {
+ return null;
+ }
+
+ @Override public Dataset<Row> createTable(String tableName, String source,
+ Map<String, String> options) {
+ return null;
+ }
+
+ @Override public Dataset<Row> createTable(String tableName, String source, StructType schema,
+ Map<String, String> options) {
+ return null;
+ }
+
+ @Override public boolean dropTempView(String viewName) {
+ return false;
+ }
+
+ @Override public boolean dropGlobalTempView(String viewName) {
+ return false;
+ }
+
+ @Override public void recoverPartitions(String tableName) {
+
+ }
+
+ @Override public boolean isCached(String tableName) {
+ return false;
+ }
+
+ @Override public void cacheTable(String tableName) {
+
+ }
+
+ @Override public void cacheTable(String tableName, StorageLevel storageLevel) {
+
+ }
+
+ @Override public void uncacheTable(String tableName) {
+
+ }
+
+ @Override public void clearCache() {
+
+ }
+
+ @Override public void refreshTable(String tableName) {
+
+ }
+
+ @Override public void refreshByPath(String path) {
+
+ }
+
+ private static class DatasetTable<T> extends Table {
+
+ private Source<T> source;
+ private TranslationContext context;
+
+ public DatasetTable(String name, String database, String description, String tableType,
+ boolean isTemporary, Source<T> source, TranslationContext context) {
+ super(name, database, description, tableType, isTemporary);
+ this.source = source;
+ this.context = context;
+ }
+
+ private Source<T> getSource() {
+ return source;
+ }
+
+ private TranslationContext getContext() {
+ return context;
+ }
+ }
+ }
}
[beam] 49/50: Add missing 0-arg public constructor
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 878ff4e9fc66c5377031995d1ba67a548863c7c9
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Jan 3 15:56:11 2019 +0100
Add missing 0-arg public constructor
---
.../spark/structuredstreaming/translation/batch/DatasetSourceBatch.java | 2 ++
1 file changed, 2 insertions(+)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
index e19bbdb..496b95a 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
@@ -50,6 +50,8 @@ public class DatasetSourceBatch<T> implements DataSourceV2, ReadSupport {
static final String DEFAULT_PARALLELISM = "default-parallelism";
static final String PIPELINE_OPTIONS = "pipeline-options";
+ public DatasetSourceBatch() {
+ }
@SuppressWarnings("unchecked")
@Override
[beam] 48/50: Wire real SourceTransform and not mock and update the
test
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit d1b549ebc3341c576184961b4a32ba91c42c3c9b
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Jan 2 16:11:10 2019 +0100
Wire real SourceTransform and not mock and update the test
---
.../structuredstreaming/translation/batch/PipelineTranslatorBatch.java | 2 +-
.../org/apache/beam/runners/spark/structuredstreaming/SourceTest.java | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
index 3b9a7d6..c7e9167 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
@@ -66,7 +66,7 @@ public class PipelineTranslatorBatch extends PipelineTranslator {
PTransformTranslation.PAR_DO_TRANSFORM_URN, new ParDoTranslatorBatch());
TRANSFORM_TRANSLATORS.put(
- PTransformTranslation.READ_TRANSFORM_URN, new ReadSourceTranslatorMockBatch());
+ PTransformTranslation.READ_TRANSFORM_URN, new ReadSourceTranslatorBatch());
}
public PipelineTranslatorBatch(SparkPipelineOptions options) {
diff --git a/runners/spark-structured-streaming/src/test/java/org/apache/beam/runners/spark/structuredstreaming/SourceTest.java b/runners/spark-structured-streaming/src/test/java/org/apache/beam/runners/spark/structuredstreaming/SourceTest.java
index eea9769..79a85f3 100644
--- a/runners/spark-structured-streaming/src/test/java/org/apache/beam/runners/spark/structuredstreaming/SourceTest.java
+++ b/runners/spark-structured-streaming/src/test/java/org/apache/beam/runners/spark/structuredstreaming/SourceTest.java
@@ -9,7 +9,7 @@ public class SourceTest {
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
Pipeline pipeline = Pipeline.create(options);
- pipeline.apply(Create.of(1));
+ pipeline.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));
pipeline.run();
}
[beam] 36/50: fix mock,
wire mock in translators and create a main test.
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 8cdc20f7e0a53de18afb70afd31da374dcf6d93e
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Dec 20 17:18:54 2018 +0100
fix mock, wire mock in translators and create a main test.
---
.../translation/batch/PipelineTranslatorBatch.java | 2 +-
.../batch/ReadSourceTranslatorBatch.java | 3 ++-
.../batch/ReadSourceTranslatorMockBatch.java | 21 +++++++--------------
.../translation/io/DatasetSourceMock.java | 6 +++---
.../spark/structuredstreaming/SourceTest.java | 16 ++++++++++++++++
5 files changed, 29 insertions(+), 19 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
index 26f1b9c..9ccc712 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
@@ -65,7 +65,7 @@ public class PipelineTranslatorBatch extends PipelineTranslator {
PTransformTranslation.PAR_DO_TRANSFORM_URN, new ParDoTranslatorBatch());
TRANSFORM_TRANSLATORS.put(
- PTransformTranslation.READ_TRANSFORM_URN, new ReadSourceTranslatorBatch());
+ PTransformTranslation.READ_TRANSFORM_URN, new ReadSourceTranslatorMockBatch());
}
public PipelineTranslatorBatch(SparkPipelineOptions options) {
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
index 0b828fb..aed016a 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
@@ -40,6 +40,7 @@ import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.catalog.Catalog;
import org.apache.spark.sql.catalyst.catalog.CatalogTable;
+import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation;
import org.apache.spark.sql.streaming.DataStreamReader;
class ReadSourceTranslatorBatch<T>
@@ -63,8 +64,8 @@ class ReadSourceTranslatorBatch<T>
throw new RuntimeException(e);
}
SparkSession sparkSession = context.getSparkSession();
-
DataStreamReader dataStreamReader = sparkSession.readStream().format(providerClassName);
+
Dataset<Row> rowDataset = dataStreamReader.load();
//TODO initialize source : how, to get a reference to the DatasetSource instance that spark
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
index 5b1bada..504a64d 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
@@ -17,28 +17,25 @@
*/
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
-import java.io.IOException;
-import org.apache.beam.runners.core.construction.ReadTranslation;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
-import org.apache.beam.runners.spark.structuredstreaming.translation.io.DatasetSource;
import org.apache.beam.runners.spark.structuredstreaming.translation.io.DatasetSourceMock;
-import org.apache.beam.sdk.io.BoundedSource;
-import org.apache.beam.sdk.runners.AppliedPTransform;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.util.WindowedValue;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Dataset;
-import org.apache.spark.sql.Encoder;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.streaming.DataStreamReader;
-import org.apache.spark.sql.types.StructType;
-import scala.reflect.ClassTag;
+
+/**
+ * Mock translator that generates a source of 0 to 999 and prints it.
+ * @param <T>
+ */
class ReadSourceTranslatorMockBatch<T>
implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
@@ -48,13 +45,8 @@ class ReadSourceTranslatorMockBatch<T>
@Override
public void translateTransform(
PTransform<PBegin, PCollection<T>> transform, TranslationContext context) {
- AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>> rootTransform =
- (AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>>)
- context.getCurrentTransform();
-
- String providerClassName = SOURCE_PROVIDER_CLASS.substring(0, SOURCE_PROVIDER_CLASS.indexOf("$"));
SparkSession sparkSession = context.getSparkSession();
- DataStreamReader dataStreamReader = sparkSession.readStream().format(providerClassName);
+ DataStreamReader dataStreamReader = sparkSession.readStream().format(SOURCE_PROVIDER_CLASS);
Dataset<Row> rowDataset = dataStreamReader.load();
@@ -77,5 +69,6 @@ class ReadSourceTranslatorMockBatch<T>
PCollection<T> output = (PCollection<T>) context.getOutput();
context.putDataset(output, dataset);
+ dataset.show();
}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java
index fa42fdf..ec88364 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java
@@ -46,7 +46,7 @@ public class DatasetSourceMock implements DataSourceV2, MicroBatchReadSupport {
}
/** This class can be mapped to Beam {@link BoundedSource}. */
- private class DatasetMicroBatchReader implements MicroBatchReader {
+ private static class DatasetMicroBatchReader implements MicroBatchReader {
@Override public void setOffsetRange(Optional<Offset> start, Optional<Offset> end) {
}
@@ -70,7 +70,7 @@ public class DatasetSourceMock implements DataSourceV2, MicroBatchReadSupport {
}
@Override public StructType readSchema() {
- return null;
+ return new StructType();
}
@Override public List<InputPartition<InternalRow>> planInputPartitions() {
@@ -86,7 +86,7 @@ public class DatasetSourceMock implements DataSourceV2, MicroBatchReadSupport {
}
/** This class is a mocked reader*/
- private class DatasetMicroBatchPartitionReaderMock implements InputPartitionReader<InternalRow> {
+ private static class DatasetMicroBatchPartitionReaderMock implements InputPartitionReader<InternalRow> {
private ArrayList<Integer> values;
private int currentIndex = 0;
diff --git a/runners/spark-structured-streaming/src/test/java/org/apache/beam/runners/spark/structuredstreaming/SourceTest.java b/runners/spark-structured-streaming/src/test/java/org/apache/beam/runners/spark/structuredstreaming/SourceTest.java
new file mode 100644
index 0000000..eea9769
--- /dev/null
+++ b/runners/spark-structured-streaming/src/test/java/org/apache/beam/runners/spark/structuredstreaming/SourceTest.java
@@ -0,0 +1,16 @@
+package org.apache.beam.runners.spark.structuredstreaming;
+
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.transforms.Create;
+
+public class SourceTest {
+ public static void main(String[] args) {
+ PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
+ Pipeline pipeline = Pipeline.create(options);
+ pipeline.apply(Create.of(1));
+ pipeline.run();
+ }
+
+}
[beam] 27/50: Use Iterators.transform() to return Iterable
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 57ce2d180f94de28f718f355a8a5bc69f940b3be
Author: Alexey Romanenko <ar...@gmail.com>
AuthorDate: Mon Dec 10 10:52:19 2018 +0100
Use Iterators.transform() to return Iterable
---
.../translation/batch/GroupByKeyTranslatorBatch.java | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
index 7f2d7fa..0ff0750 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
@@ -17,9 +17,7 @@
*/
package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
-import com.google.common.collect.Iterables;
-import com.google.common.collect.Lists;
-import java.util.List;
+import com.google.common.collect.Iterators;
import org.apache.beam.runners.spark.structuredstreaming.translation.EncoderHelpers;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
@@ -54,14 +52,7 @@ class GroupByKeyTranslatorBatch<K, V>
Dataset<KV<K, Iterable<V>>> materialized =
grouped.mapGroups(
(MapGroupsFunction<K, KV<K, V>, KV<K, Iterable<V>>>)
- (key, iterator) -> {
- // TODO: can we use here just "Iterable<V> iterable = () -> iterator;" ?
- List<V> values = Lists.newArrayList();
- while (iterator.hasNext()) {
- values.add(iterator.next().getValue());
- }
- return KV.of(key, Iterables.unmodifiableIterable(values));
- },
+ (key, iterator) -> KV.of(key, () -> Iterators.transform(iterator, KV::getValue)),
EncoderHelpers.encoder());
Dataset<WindowedValue<KV<K, Iterable<V>>>> output =
[beam] 04/50: Start pipeline translation
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 1c977888ed4a3572a7af8d476fbcaa48cc36e5b4
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Fri Nov 16 19:06:04 2018 +0100
Start pipeline translation
---
.../structuredstreaming/SparkPipelineResult.java | 29 +++
.../spark/structuredstreaming/SparkRunner.java | 108 +++++++++
.../translation/BatchPipelineTranslator.java | 20 ++
.../translation/EvaluationContext.java | 261 ---------------------
.../translation/PipelineTranslator.java | 94 ++++++++
.../translation/SparkTransformOverrides.java | 52 ++++
.../translation/StreamingPipelineTranslator.java | 5 +
7 files changed, 308 insertions(+), 261 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineResult.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineResult.java
new file mode 100644
index 0000000..82d1b90
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkPipelineResult.java
@@ -0,0 +1,29 @@
+package org.apache.beam.runners.spark.structuredstreaming;
+
+import java.io.IOException;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.metrics.MetricResults;
+import org.joda.time.Duration;
+
+public class SparkPipelineResult implements PipelineResult {
+
+ @Override public State getState() {
+ return null;
+ }
+
+ @Override public State cancel() throws IOException {
+ return null;
+ }
+
+ @Override public State waitUntilFinish(Duration duration) {
+ return null;
+ }
+
+ @Override public State waitUntilFinish() {
+ return null;
+ }
+
+ @Override public MetricResults metrics() {
+ return null;
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
new file mode 100644
index 0000000..62cd7d3
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
@@ -0,0 +1,108 @@
+package org.apache.beam.runners.spark.structuredstreaming;
+
+import static org.apache.beam.runners.core.construction.PipelineResources.detectClassPathResourcesToStage;
+
+import org.apache.beam.runners.spark.structuredstreaming.translation.BatchPipelineTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.StreamingPipelineTranslator;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.PipelineRunner;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.options.PipelineOptionsValidator;
+import org.apache.spark.sql.SparkSession;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * The SparkRunner translate operations defined on a pipeline to a representation executable by
+ * Spark, and then submitting the job to Spark to be executed. If we wanted to run a Beam pipeline
+ * with the default options of a single threaded spark instance in local mode, we would do the
+ * following:
+ *
+ * <p>{@code Pipeline p = [logic for pipeline creation] SparkPipelineResult result =
+ * (SparkPipelineResult) p.run(); }
+ *
+ * <p>To create a pipeline runner to run against a different spark cluster, with a custom master url
+ * we would do the following:
+ *
+ * <p>{@code Pipeline p = [logic for pipeline creation] SparkPipelineOptions options =
+ * SparkPipelineOptionsFactory.create(); options.setSparkMaster("spark://host:port");
+ * SparkPipelineResult result = (SparkPipelineResult) p.run(); }
+ */
+public final class SparkRunner extends PipelineRunner<SparkPipelineResult> {
+
+ private static final Logger LOG = LoggerFactory.getLogger(SparkRunner.class);
+
+ /** Options used in this pipeline runner. */
+ private final SparkPipelineOptions options;
+
+ /**
+ * Creates and returns a new SparkRunner with default options. In particular, against a spark
+ * instance running in local mode.
+ *
+ * @return A pipeline runner with default options.
+ */
+ public static SparkRunner create() {
+ SparkPipelineOptions options = PipelineOptionsFactory.as(SparkPipelineOptions.class);
+ options.setRunner(SparkRunner.class);
+ return new SparkRunner(options);
+ }
+
+ /**
+ * Creates and returns a new SparkRunner with specified options.
+ *
+ * @param options The SparkPipelineOptions to use when executing the job.
+ * @return A pipeline runner that will execute with specified options.
+ */
+ public static SparkRunner create(SparkPipelineOptions options) {
+ return new SparkRunner(options);
+ }
+
+ /**
+ * Creates and returns a new SparkRunner with specified options.
+ *
+ * @param options The PipelineOptions to use when executing the job.
+ * @return A pipeline runner that will execute with specified options.
+ */
+ public static SparkRunner fromOptions(PipelineOptions options) {
+ SparkPipelineOptions sparkOptions = PipelineOptionsValidator
+ .validate(SparkPipelineOptions.class, options);
+
+ if (sparkOptions.getFilesToStage() == null) {
+ sparkOptions.setFilesToStage(detectClassPathResourcesToStage(SparkRunner.class.getClassLoader()));
+ LOG.info("PipelineOptions.filesToStage was not specified. "
+ + "Defaulting to files from the classpath: will stage {} files. "
+ + "Enable logging at DEBUG level to see which files will be staged.",
+ sparkOptions.getFilesToStage().size());
+ LOG.debug("Classpath elements: {}", sparkOptions.getFilesToStage());
+ }
+
+ return new SparkRunner(sparkOptions);
+ }
+
+ /**
+ * No parameter constructor defaults to running this pipeline in Spark's local mode, in a single
+ * thread.
+ */
+ private SparkRunner(SparkPipelineOptions options) {
+ this.options = options;
+ }
+
+ @Override public SparkPipelineResult run(final Pipeline pipeline) {
+ translatePipeline(pipeline);
+ executePipeline(pipeline);
+ return new SparkPipelineResult();
+ }
+
+ private void translatePipeline(Pipeline pipeline){
+ PipelineTranslator.detectTranslationMode(pipeline, options);
+ PipelineTranslator.replaceTransforms(pipeline, options);
+ PipelineTranslator.prepareFilesToStageForRemoteClusterExecution(options);
+ PipelineTranslator translator = options.isStreaming() ? new StreamingPipelineTranslator() : new BatchPipelineTranslator()
+ //init translator with subclass based on mode and env
+ translator.translate(pipeline);
+ }
+ public void executePipeline(Pipeline pipeline) {}
+
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/BatchPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/BatchPipelineTranslator.java
new file mode 100644
index 0000000..e66555c
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/BatchPipelineTranslator.java
@@ -0,0 +1,20 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation;
+
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.runners.TransformHierarchy;
+import org.apache.beam.sdk.values.PValue;
+
+public class BatchPipelineTranslator extends PipelineTranslator {
+
+
+ @Override public CompositeBehavior enterCompositeTransform(TransformHierarchy.Node node) {
+ return super.enterCompositeTransform(node);
+ }
+
+
+ @Override public void visitPrimitiveTransform(TransformHierarchy.Node node) {
+ super.visitPrimitiveTransform(node);
+ }
+
+
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/EvaluationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/EvaluationContext.java
deleted file mode 100644
index 47a3098..0000000
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/EvaluationContext.java
+++ /dev/null
@@ -1,261 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.beam.runners.spark.structuredstreaming.translation;
-
-import static com.google.common.base.Preconditions.checkArgument;
-
-import com.google.common.collect.Iterables;
-import java.util.HashMap;
-import java.util.LinkedHashMap;
-import java.util.LinkedHashSet;
-import java.util.Map;
-import java.util.Set;
-import java.util.stream.Collectors;
-import org.apache.beam.runners.core.construction.SerializablePipelineOptions;
-import org.apache.beam.runners.core.construction.TransformInputs;
-import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
-import org.apache.beam.sdk.Pipeline;
-import org.apache.beam.sdk.coders.Coder;
-import org.apache.beam.sdk.options.PipelineOptions;
-import org.apache.beam.sdk.runners.AppliedPTransform;
-import org.apache.beam.sdk.transforms.PTransform;
-import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
-import org.apache.beam.sdk.util.WindowedValue;
-import org.apache.beam.sdk.values.PCollection;
-import org.apache.beam.sdk.values.PCollectionView;
-import org.apache.beam.sdk.values.PValue;
-import org.apache.beam.sdk.values.TupleTag;
-import org.apache.spark.api.java.JavaRDD;
-import org.apache.spark.api.java.JavaSparkContext;
-import org.apache.spark.sql.SparkSession;
-import org.apache.spark.streaming.api.java.JavaStreamingContext;
-
-/**
- * The EvaluationContext allows us to define pipeline instructions and translate between {@code
- * PObject<T>}s or {@code PCollection<T>}s and Ts or DStreams/RDDs of Ts.
- */
-public class EvaluationContext {
- private SparkSession sparkSession;
- private final Pipeline pipeline;
- private final Map<PValue, Dataset> datasets = new LinkedHashMap<>();
- private final Map<PValue, Dataset> pcollections = new LinkedHashMap<>();
- private final Set<Dataset> leaves = new LinkedHashSet<>();
- private final Map<PValue, Object> pobjects = new LinkedHashMap<>();
- private AppliedPTransform<?, ?, ?> currentTransform;
- private final SparkPCollectionView pviews = new SparkPCollectionView();
- private final Map<PCollection, Long> cacheCandidates = new HashMap<>();
- private final PipelineOptions options;
- private final SerializablePipelineOptions serializableOptions;
-
- public EvaluationContext(JavaSparkContext jsc, Pipeline pipeline, PipelineOptions options) {
- this.jsc = jsc;
- this.pipeline = pipeline;
- this.options = options;
- this.serializableOptions = new SerializablePipelineOptions(options);
- }
-
- public EvaluationContext(
- JavaSparkContext jsc, Pipeline pipeline, PipelineOptions options, JavaStreamingContext jssc) {
- this(jsc, pipeline, options);
- this.jssc = jssc;
- }
-
- public JavaSparkContext getSparkContext() {
- return jsc;
- }
-
- public JavaStreamingContext getStreamingContext() {
- return jssc;
- }
-
- public Pipeline getPipeline() {
- return pipeline;
- }
-
- public PipelineOptions getOptions() {
- return options;
- }
-
- public SerializablePipelineOptions getSerializableOptions() {
- return serializableOptions;
- }
-
- public void setCurrentTransform(AppliedPTransform<?, ?, ?> transform) {
- this.currentTransform = transform;
- }
-
- public AppliedPTransform<?, ?, ?> getCurrentTransform() {
- return currentTransform;
- }
-
- public <T extends PValue> T getInput(PTransform<T, ?> transform) {
- @SuppressWarnings("unchecked")
- T input =
- (T) Iterables.getOnlyElement(TransformInputs.nonAdditionalInputs(getCurrentTransform()));
- return input;
- }
-
- public <T> Map<TupleTag<?>, PValue> getInputs(PTransform<?, ?> transform) {
- checkArgument(currentTransform != null, "can only be called with non-null currentTransform");
- checkArgument(
- currentTransform.getTransform() == transform, "can only be called with current transform");
- return currentTransform.getInputs();
- }
-
- public <T extends PValue> T getOutput(PTransform<?, T> transform) {
- @SuppressWarnings("unchecked")
- T output = (T) Iterables.getOnlyElement(getOutputs(transform).values());
- return output;
- }
-
- public Map<TupleTag<?>, PValue> getOutputs(PTransform<?, ?> transform) {
- checkArgument(currentTransform != null, "can only be called with non-null currentTransform");
- checkArgument(
- currentTransform.getTransform() == transform, "can only be called with current transform");
- return currentTransform.getOutputs();
- }
-
- public Map<TupleTag<?>, Coder<?>> getOutputCoders() {
- return currentTransform
- .getOutputs()
- .entrySet()
- .stream()
- .filter(e -> e.getValue() instanceof PCollection)
- .collect(Collectors.toMap(e -> e.getKey(), e -> ((PCollection) e.getValue()).getCoder()));
- }
-
- private boolean shouldCache(PValue pvalue) {
- if ((pvalue instanceof PCollection)
- && cacheCandidates.containsKey(pvalue)
- && cacheCandidates.get(pvalue) > 1) {
- return true;
- }
- return false;
- }
-
- public void putDataset(
- PTransform<?, ? extends PValue> transform, Dataset dataset, boolean forceCache) {
- putDataset(getOutput(transform), dataset, forceCache);
- }
-
- public void putDataset(PTransform<?, ? extends PValue> transform, Dataset dataset) {
- putDataset(transform, dataset, false);
- }
-
- public void putDataset(PValue pvalue, Dataset dataset, boolean forceCache) {
- try {
- dataset.setName(pvalue.getName());
- } catch (IllegalStateException e) {
- // name not set, ignore
- }
- if ((forceCache || shouldCache(pvalue)) && pvalue instanceof PCollection) {
- // we cache only PCollection
- Coder<?> coder = ((PCollection<?>) pvalue).getCoder();
- Coder<? extends BoundedWindow> wCoder =
- ((PCollection<?>) pvalue).getWindowingStrategy().getWindowFn().windowCoder();
- dataset.cache(storageLevel(), WindowedValue.getFullCoder(coder, wCoder));
- }
- datasets.put(pvalue, dataset);
- leaves.add(dataset);
- }
-
- public Dataset borrowDataset(PTransform<? extends PValue, ?> transform) {
- return borrowDataset(getInput(transform));
- }
-
- public Dataset borrowDataset(PValue pvalue) {
- Dataset dataset = datasets.get(pvalue);
- leaves.remove(dataset);
- return dataset;
- }
-
- /**
- * Computes the outputs for all RDDs that are leaves in the DAG and do not have any actions (like
- * saving to a file) registered on them (i.e. they are performed for side effects).
- */
- public void computeOutputs() {
- for (Dataset dataset : leaves) {
- dataset.action(); // force computation.
- }
- }
-
- /**
- * Retrieve an object of Type T associated with the PValue passed in.
- *
- * @param value PValue to retrieve associated data for.
- * @param <T> Type of object to return.
- * @return Native object.
- */
- @SuppressWarnings("TypeParameterUnusedInFormals")
- public <T> T get(PValue value) {
- if (pobjects.containsKey(value)) {
- T result = (T) pobjects.get(value);
- return result;
- }
- if (pcollections.containsKey(value)) {
- JavaRDD<?> rdd = ((BoundedDataset) pcollections.get(value)).getRDD();
- T res = (T) Iterables.getOnlyElement(rdd.collect());
- pobjects.put(value, res);
- return res;
- }
- throw new IllegalStateException("Cannot resolve un-known PObject: " + value);
- }
-
- /**
- * Return the current views creates in the pipeline.
- *
- * @return SparkPCollectionView
- */
- public SparkPCollectionView getPViews() {
- return pviews;
- }
-
- /**
- * Adds/Replaces a view to the current views creates in the pipeline.
- *
- * @param view - Identifier of the view
- * @param value - Actual value of the view
- * @param coder - Coder of the value
- */
- public void putPView(
- PCollectionView<?> view,
- Iterable<WindowedValue<?>> value,
- Coder<Iterable<WindowedValue<?>>> coder) {
- pviews.putPView(view, value, coder);
- }
-
- /**
- * Get the map of cache candidates hold by the evaluation context.
- *
- * @return The current {@link Map} of cache candidates.
- */
- public Map<PCollection, Long> getCacheCandidates() {
- return this.cacheCandidates;
- }
-
- <T> Iterable<WindowedValue<T>> getWindowedValues(PCollection<T> pcollection) {
- @SuppressWarnings("unchecked")
- BoundedDataset<T> boundedDataset = (BoundedDataset<T>) datasets.get(pcollection);
- leaves.remove(boundedDataset);
- return boundedDataset.getValues(pcollection);
- }
-
- public String storageLevel() {
- return serializableOptions.get().as(SparkPipelineOptions.class).getStorageLevel();
- }
-}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
new file mode 100644
index 0000000..f0ce1e5
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
@@ -0,0 +1,94 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation;
+
+import org.apache.beam.runners.core.construction.PipelineResources;
+import org.apache.beam.runners.spark.SparkTransformOverrides;
+import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.runners.TransformHierarchy;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PValue;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Does all the translation work: mode detection, nodes translation.
+ */
+
+public class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults{
+
+
+ /**
+ * Local configurations work in the same JVM and have no problems with improperly formatted files
+ * on classpath (eg. directories with .class files or empty directories). Prepare files for
+ * staging only when using remote cluster (passing the master address explicitly).
+ */
+ public static void prepareFilesToStageForRemoteClusterExecution(SparkPipelineOptions options) {
+ if (!options.getSparkMaster().matches("local\\[?\\d*\\]?")) {
+ options.setFilesToStage(
+ PipelineResources.prepareFilesForStaging(
+ options.getFilesToStage(), options.getTempLocation()));
+ }
+ }
+
+ public static void replaceTransforms(Pipeline pipeline, SparkPipelineOptions options){
+ pipeline.replaceAll(SparkTransformOverrides.getDefaultOverrides(options.isStreaming()));
+
+ }
+
+
+ /** Visit the pipeline to determine the translation mode (batch/streaming) and update options accordingly. */
+ public static void detectTranslationMode(Pipeline pipeline, SparkPipelineOptions options) {
+ TranslationModeDetector detector = new TranslationModeDetector();
+ pipeline.traverseTopologically(detector);
+ if (detector.getTranslationMode().equals(TranslationMode.STREAMING)) {
+ // set streaming mode if it's a streaming pipeline
+ options.setStreaming(true);
+ }
+ }
+
+
+
+
+
+ /** The translation mode of the Beam Pipeline. */
+ private enum TranslationMode {
+
+ /** Uses the batch mode. */
+ BATCH,
+
+ /** Uses the streaming mode. */
+ STREAMING
+ }
+
+ /** Traverses the Pipeline to determine the {@link TranslationMode} for this pipeline. */
+ private static class TranslationModeDetector extends Pipeline.PipelineVisitor.Defaults {
+ private static final Logger LOG = LoggerFactory.getLogger(TranslationModeDetector.class);
+
+ private TranslationMode translationMode;
+
+ TranslationModeDetector(TranslationMode defaultMode) {
+ this.translationMode = defaultMode;
+ }
+
+ TranslationModeDetector() {
+ this(TranslationMode.BATCH);
+ }
+
+ TranslationMode getTranslationMode() {
+ return translationMode;
+ }
+
+ @Override
+ public void visitValue(PValue value, TransformHierarchy.Node producer) {
+ if (translationMode.equals(TranslationMode.BATCH)) {
+ if (value instanceof PCollection
+ && ((PCollection) value).isBounded() == PCollection.IsBounded.UNBOUNDED) {
+ LOG.info(
+ "Found unbounded PCollection {}. Switching to streaming execution.", value.getName());
+ translationMode = TranslationMode.STREAMING;
+ }
+ }
+ }
+ }
+
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SparkTransformOverrides.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SparkTransformOverrides.java
new file mode 100644
index 0000000..897ac01
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SparkTransformOverrides.java
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark;
+
+import com.google.common.collect.ImmutableList;
+import java.util.List;
+import org.apache.beam.runners.core.construction.PTransformMatchers;
+import org.apache.beam.runners.core.construction.PTransformTranslation;
+import org.apache.beam.runners.core.construction.SplittableParDo;
+import org.apache.beam.runners.core.construction.SplittableParDoNaiveBounded;
+import org.apache.beam.runners.core.construction.UnsupportedOverrideFactory;
+import org.apache.beam.sdk.runners.PTransformOverride;
+import org.apache.beam.sdk.transforms.PTransform;
+
+/** {@link PTransform} overrides for Flink runner. */
+public class SparkTransformOverrides {
+ public static List<PTransformOverride> getDefaultOverrides(boolean streaming) {
+ ImmutableList.Builder<PTransformOverride> builder = ImmutableList.builder();
+ // TODO: [BEAM-5358] Support @RequiresStableInput on Spark runner
+ builder.add(
+ PTransformOverride.of(
+ PTransformMatchers.requiresStableInputParDoMulti(),
+ UnsupportedOverrideFactory.withMessage(
+ "Spark runner currently doesn't support @RequiresStableInput annotation.")));
+ if (!streaming) {
+ builder
+ .add(
+ PTransformOverride.of(
+ PTransformMatchers.splittableParDo(), new SplittableParDo.OverrideFactory()))
+ .add(
+ PTransformOverride.of(
+ PTransformMatchers.urnEqualTo(PTransformTranslation.SPLITTABLE_PROCESS_KEYED_URN),
+ new SplittableParDoNaiveBounded.OverrideFactory()));
+ }
+ return builder.build();
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/StreamingPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/StreamingPipelineTranslator.java
new file mode 100644
index 0000000..2058b37
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/StreamingPipelineTranslator.java
@@ -0,0 +1,5 @@
+package org.apache.beam.runners.spark.structuredstreaming.translation;
+
+public class StreamingPipelineTranslator extends PipelineTranslator {
+
+}
[beam] 35/50: Add source mocks
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 045273333a2a7b017f76cac8e20c570f68fd0ce5
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Dec 20 15:38:11 2018 +0100
Add source mocks
---
.../batch/ReadSourceTranslatorMockBatch.java | 81 +++++++++++++++
.../translation/io/DatasetSourceMock.java | 114 +++++++++++++++++++++
2 files changed, 195 insertions(+)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
new file mode 100644
index 0000000..5b1bada
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark.structuredstreaming.translation.batch;
+
+import java.io.IOException;
+import org.apache.beam.runners.core.construction.ReadTranslation;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
+import org.apache.beam.runners.spark.structuredstreaming.translation.io.DatasetSource;
+import org.apache.beam.runners.spark.structuredstreaming.translation.io.DatasetSourceMock;
+import org.apache.beam.sdk.io.BoundedSource;
+import org.apache.beam.sdk.runners.AppliedPTransform;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.spark.api.java.function.MapFunction;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Encoder;
+import org.apache.spark.sql.Encoders;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.sql.streaming.DataStreamReader;
+import org.apache.spark.sql.types.StructType;
+import scala.reflect.ClassTag;
+
+class ReadSourceTranslatorMockBatch<T>
+ implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
+
+ private String SOURCE_PROVIDER_CLASS = DatasetSourceMock.class.getCanonicalName();
+
+ @SuppressWarnings("unchecked")
+ @Override
+ public void translateTransform(
+ PTransform<PBegin, PCollection<T>> transform, TranslationContext context) {
+ AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>> rootTransform =
+ (AppliedPTransform<PBegin, PCollection<T>, PTransform<PBegin, PCollection<T>>>)
+ context.getCurrentTransform();
+
+ String providerClassName = SOURCE_PROVIDER_CLASS.substring(0, SOURCE_PROVIDER_CLASS.indexOf("$"));
+ SparkSession sparkSession = context.getSparkSession();
+ DataStreamReader dataStreamReader = sparkSession.readStream().format(providerClassName);
+
+ Dataset<Row> rowDataset = dataStreamReader.load();
+
+ MapFunction<Row, WindowedValue<Integer>> func = new MapFunction<Row, WindowedValue<Integer>>() {
+ @Override public WindowedValue<Integer> call(Row value) throws Exception {
+ //there is only one value put in each Row by the InputPartitionReader
+ return value.<WindowedValue<Integer>>getAs(0);
+ }
+ };
+ Dataset<WindowedValue<Integer>> dataset = rowDataset.map(func, new Encoder<WindowedValue<Integer>>() {
+
+ @Override public StructType schema() {
+ return null;
+ }
+
+ @Override public ClassTag<WindowedValue<Integer>> clsTag() {
+ return scala.reflect.ClassTag$.MODULE$.<WindowedValue<Integer>>apply(WindowedValue.class);
+ }
+ });
+
+ PCollection<T> output = (PCollection<T>) context.getOutput();
+ context.putDataset(output, dataset);
+ }
+}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java
new file mode 100644
index 0000000..fa42fdf
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark.structuredstreaming.translation.io;
+
+import static scala.collection.JavaConversions.asScalaBuffer;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+import org.apache.beam.sdk.io.BoundedSource;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.sources.v2.DataSourceOptions;
+import org.apache.spark.sql.sources.v2.DataSourceV2;
+import org.apache.spark.sql.sources.v2.MicroBatchReadSupport;
+import org.apache.spark.sql.sources.v2.reader.InputPartition;
+import org.apache.spark.sql.sources.v2.reader.InputPartitionReader;
+import org.apache.spark.sql.sources.v2.reader.streaming.MicroBatchReader;
+import org.apache.spark.sql.sources.v2.reader.streaming.Offset;
+import org.apache.spark.sql.types.StructType;
+import org.joda.time.Instant;
+
+/**
+ * This is a mock source that gives values between 0 and 999.
+ */
+public class DatasetSourceMock implements DataSourceV2, MicroBatchReadSupport {
+
+ @Override public MicroBatchReader createMicroBatchReader(Optional<StructType> schema, String checkpointLocation, DataSourceOptions options) {
+ return new DatasetMicroBatchReader();
+ }
+
+ /** This class can be mapped to Beam {@link BoundedSource}. */
+ private class DatasetMicroBatchReader implements MicroBatchReader {
+
+ @Override public void setOffsetRange(Optional<Offset> start, Optional<Offset> end) {
+ }
+
+ @Override public Offset getStartOffset() {
+ return null;
+ }
+
+ @Override public Offset getEndOffset() {
+ return null;
+ }
+
+ @Override public Offset deserializeOffset(String json) {
+ return null;
+ }
+
+ @Override public void commit(Offset end) {
+ }
+
+ @Override public void stop() {
+ }
+
+ @Override public StructType readSchema() {
+ return null;
+ }
+
+ @Override public List<InputPartition<InternalRow>> planInputPartitions() {
+ List<InputPartition<InternalRow>> result = new ArrayList<>();
+ result.add(new InputPartition<InternalRow>() {
+
+ @Override public InputPartitionReader<InternalRow> createPartitionReader() {
+ return new DatasetMicroBatchPartitionReaderMock();
+ }
+ });
+ return result;
+ }
+ }
+
+ /** This class is a mocked reader*/
+ private class DatasetMicroBatchPartitionReaderMock implements InputPartitionReader<InternalRow> {
+
+ private ArrayList<Integer> values;
+ private int currentIndex = 0;
+
+ private DatasetMicroBatchPartitionReaderMock() {
+ for (int i = 0; i < 1000; i++){
+ values.add(i);
+ }
+ }
+
+ @Override public boolean next() throws IOException {
+ currentIndex++;
+ return (currentIndex <= values.size());
+ }
+
+ @Override public void close() throws IOException {
+ }
+
+ @Override public InternalRow get() {
+ List<Object> list = new ArrayList<>();
+ list.add(WindowedValue.timestampedValueInGlobalWindow(values.get(currentIndex), new Instant()));
+ return InternalRow.apply(asScalaBuffer(list).toList());
+ }
+ }
+}
\ No newline at end of file
[beam] 39/50: Move DatasetSourceMock to proper batch mode
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 49ee25994ceaf766bca17f5600e72df48583e3f0
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Dec 27 16:33:00 2018 +0100
Move DatasetSourceMock to proper batch mode
---
.../batch/ReadSourceTranslatorMockBatch.java | 3 +-
.../translation/io/DatasetSourceMock.java | 41 +++++-----------------
2 files changed, 10 insertions(+), 34 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
index 4a509de..184d24c 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
@@ -46,9 +46,8 @@ class ReadSourceTranslatorMockBatch<T>
public void translateTransform(
PTransform<PBegin, PCollection<T>> transform, TranslationContext context) {
SparkSession sparkSession = context.getSparkSession();
- DataStreamReader dataStreamReader = sparkSession.readStream().format(SOURCE_PROVIDER_CLASS);
- Dataset<Row> rowDataset = dataStreamReader.load();
+ Dataset<Row> rowDataset = sparkSession.read().format(SOURCE_PROVIDER_CLASS).load();
MapFunction<Row, WindowedValue> func = new MapFunction<Row, WindowedValue>() {
@Override public WindowedValue call(Row value) throws Exception {
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java
index ec88364..f722377 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/io/DatasetSourceMock.java
@@ -22,52 +22,29 @@ import static scala.collection.JavaConversions.asScalaBuffer;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
-import java.util.Optional;
import org.apache.beam.sdk.io.BoundedSource;
import org.apache.beam.sdk.util.WindowedValue;
import org.apache.spark.sql.catalyst.InternalRow;
import org.apache.spark.sql.sources.v2.DataSourceOptions;
import org.apache.spark.sql.sources.v2.DataSourceV2;
-import org.apache.spark.sql.sources.v2.MicroBatchReadSupport;
+import org.apache.spark.sql.sources.v2.ReadSupport;
+import org.apache.spark.sql.sources.v2.reader.DataSourceReader;
import org.apache.spark.sql.sources.v2.reader.InputPartition;
import org.apache.spark.sql.sources.v2.reader.InputPartitionReader;
-import org.apache.spark.sql.sources.v2.reader.streaming.MicroBatchReader;
-import org.apache.spark.sql.sources.v2.reader.streaming.Offset;
import org.apache.spark.sql.types.StructType;
import org.joda.time.Instant;
/**
* This is a mock source that gives values between 0 and 999.
*/
-public class DatasetSourceMock implements DataSourceV2, MicroBatchReadSupport {
+public class DatasetSourceMock implements DataSourceV2, ReadSupport {
- @Override public MicroBatchReader createMicroBatchReader(Optional<StructType> schema, String checkpointLocation, DataSourceOptions options) {
- return new DatasetMicroBatchReader();
+ @Override public DataSourceReader createReader(DataSourceOptions options) {
+ return new DatasetReader();
}
/** This class can be mapped to Beam {@link BoundedSource}. */
- private static class DatasetMicroBatchReader implements MicroBatchReader {
-
- @Override public void setOffsetRange(Optional<Offset> start, Optional<Offset> end) {
- }
-
- @Override public Offset getStartOffset() {
- return null;
- }
-
- @Override public Offset getEndOffset() {
- return null;
- }
-
- @Override public Offset deserializeOffset(String json) {
- return null;
- }
-
- @Override public void commit(Offset end) {
- }
-
- @Override public void stop() {
- }
+ private static class DatasetReader implements DataSourceReader {
@Override public StructType readSchema() {
return new StructType();
@@ -78,7 +55,7 @@ public class DatasetSourceMock implements DataSourceV2, MicroBatchReadSupport {
result.add(new InputPartition<InternalRow>() {
@Override public InputPartitionReader<InternalRow> createPartitionReader() {
- return new DatasetMicroBatchPartitionReaderMock();
+ return new DatasetPartitionReaderMock();
}
});
return result;
@@ -86,12 +63,12 @@ public class DatasetSourceMock implements DataSourceV2, MicroBatchReadSupport {
}
/** This class is a mocked reader*/
- private static class DatasetMicroBatchPartitionReaderMock implements InputPartitionReader<InternalRow> {
+ private static class DatasetPartitionReaderMock implements InputPartitionReader<InternalRow> {
private ArrayList<Integer> values;
private int currentIndex = 0;
- private DatasetMicroBatchPartitionReaderMock() {
+ private DatasetPartitionReaderMock() {
for (int i = 0; i < 1000; i++){
values.add(i);
}
[beam] 02/50: Fix missing dep
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 737af2f32b74ea1c067851ae5efb376ad31f392b
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Wed Nov 14 14:35:08 2018 +0100
Fix missing dep
---
.../org/apache/beam/gradle/BeamModulePlugin.groovy | 2 +-
.../translation/EvaluationContext.java | 261 +++++++++++++++++++++
2 files changed, 262 insertions(+), 1 deletion(-)
diff --git a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
index e6cfed1..f171445 100644
--- a/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
+++ b/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
@@ -441,7 +441,7 @@ class BeamModulePlugin implements Plugin<Project> {
slf4j_jdk14 : "org.slf4j:slf4j-jdk14:1.7.25",
slf4j_log4j12 : "org.slf4j:slf4j-log4j12:1.7.25",
snappy_java : "org.xerial.snappy:snappy-java:1.1.4",
- spark_sql : "org.apache.spark:spark-core_2.11:$spark_structured_streaming_version",
+ spark_sql : "org.apache.spark:spark-sql_2.11:$spark_structured_streaming_version",
spark_core : "org.apache.spark:spark-core_2.11:$spark_version",
spark_network_common : "org.apache.spark:spark-network-common_2.11:$spark_version",
spark_streaming : "org.apache.spark:spark-streaming_2.11:$spark_version",
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/EvaluationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/EvaluationContext.java
new file mode 100644
index 0000000..47a3098
--- /dev/null
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/EvaluationContext.java
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.spark.structuredstreaming.translation;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+import com.google.common.collect.Iterables;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.LinkedHashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.runners.core.construction.SerializablePipelineOptions;
+import org.apache.beam.runners.core.construction.TransformInputs;
+import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.runners.AppliedPTransform;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.util.WindowedValue;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+import org.apache.beam.sdk.values.PValue;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SparkSession;
+import org.apache.spark.streaming.api.java.JavaStreamingContext;
+
+/**
+ * The EvaluationContext allows us to define pipeline instructions and translate between {@code
+ * PObject<T>}s or {@code PCollection<T>}s and Ts or DStreams/RDDs of Ts.
+ */
+public class EvaluationContext {
+ private SparkSession sparkSession;
+ private final Pipeline pipeline;
+ private final Map<PValue, Dataset> datasets = new LinkedHashMap<>();
+ private final Map<PValue, Dataset> pcollections = new LinkedHashMap<>();
+ private final Set<Dataset> leaves = new LinkedHashSet<>();
+ private final Map<PValue, Object> pobjects = new LinkedHashMap<>();
+ private AppliedPTransform<?, ?, ?> currentTransform;
+ private final SparkPCollectionView pviews = new SparkPCollectionView();
+ private final Map<PCollection, Long> cacheCandidates = new HashMap<>();
+ private final PipelineOptions options;
+ private final SerializablePipelineOptions serializableOptions;
+
+ public EvaluationContext(JavaSparkContext jsc, Pipeline pipeline, PipelineOptions options) {
+ this.jsc = jsc;
+ this.pipeline = pipeline;
+ this.options = options;
+ this.serializableOptions = new SerializablePipelineOptions(options);
+ }
+
+ public EvaluationContext(
+ JavaSparkContext jsc, Pipeline pipeline, PipelineOptions options, JavaStreamingContext jssc) {
+ this(jsc, pipeline, options);
+ this.jssc = jssc;
+ }
+
+ public JavaSparkContext getSparkContext() {
+ return jsc;
+ }
+
+ public JavaStreamingContext getStreamingContext() {
+ return jssc;
+ }
+
+ public Pipeline getPipeline() {
+ return pipeline;
+ }
+
+ public PipelineOptions getOptions() {
+ return options;
+ }
+
+ public SerializablePipelineOptions getSerializableOptions() {
+ return serializableOptions;
+ }
+
+ public void setCurrentTransform(AppliedPTransform<?, ?, ?> transform) {
+ this.currentTransform = transform;
+ }
+
+ public AppliedPTransform<?, ?, ?> getCurrentTransform() {
+ return currentTransform;
+ }
+
+ public <T extends PValue> T getInput(PTransform<T, ?> transform) {
+ @SuppressWarnings("unchecked")
+ T input =
+ (T) Iterables.getOnlyElement(TransformInputs.nonAdditionalInputs(getCurrentTransform()));
+ return input;
+ }
+
+ public <T> Map<TupleTag<?>, PValue> getInputs(PTransform<?, ?> transform) {
+ checkArgument(currentTransform != null, "can only be called with non-null currentTransform");
+ checkArgument(
+ currentTransform.getTransform() == transform, "can only be called with current transform");
+ return currentTransform.getInputs();
+ }
+
+ public <T extends PValue> T getOutput(PTransform<?, T> transform) {
+ @SuppressWarnings("unchecked")
+ T output = (T) Iterables.getOnlyElement(getOutputs(transform).values());
+ return output;
+ }
+
+ public Map<TupleTag<?>, PValue> getOutputs(PTransform<?, ?> transform) {
+ checkArgument(currentTransform != null, "can only be called with non-null currentTransform");
+ checkArgument(
+ currentTransform.getTransform() == transform, "can only be called with current transform");
+ return currentTransform.getOutputs();
+ }
+
+ public Map<TupleTag<?>, Coder<?>> getOutputCoders() {
+ return currentTransform
+ .getOutputs()
+ .entrySet()
+ .stream()
+ .filter(e -> e.getValue() instanceof PCollection)
+ .collect(Collectors.toMap(e -> e.getKey(), e -> ((PCollection) e.getValue()).getCoder()));
+ }
+
+ private boolean shouldCache(PValue pvalue) {
+ if ((pvalue instanceof PCollection)
+ && cacheCandidates.containsKey(pvalue)
+ && cacheCandidates.get(pvalue) > 1) {
+ return true;
+ }
+ return false;
+ }
+
+ public void putDataset(
+ PTransform<?, ? extends PValue> transform, Dataset dataset, boolean forceCache) {
+ putDataset(getOutput(transform), dataset, forceCache);
+ }
+
+ public void putDataset(PTransform<?, ? extends PValue> transform, Dataset dataset) {
+ putDataset(transform, dataset, false);
+ }
+
+ public void putDataset(PValue pvalue, Dataset dataset, boolean forceCache) {
+ try {
+ dataset.setName(pvalue.getName());
+ } catch (IllegalStateException e) {
+ // name not set, ignore
+ }
+ if ((forceCache || shouldCache(pvalue)) && pvalue instanceof PCollection) {
+ // we cache only PCollection
+ Coder<?> coder = ((PCollection<?>) pvalue).getCoder();
+ Coder<? extends BoundedWindow> wCoder =
+ ((PCollection<?>) pvalue).getWindowingStrategy().getWindowFn().windowCoder();
+ dataset.cache(storageLevel(), WindowedValue.getFullCoder(coder, wCoder));
+ }
+ datasets.put(pvalue, dataset);
+ leaves.add(dataset);
+ }
+
+ public Dataset borrowDataset(PTransform<? extends PValue, ?> transform) {
+ return borrowDataset(getInput(transform));
+ }
+
+ public Dataset borrowDataset(PValue pvalue) {
+ Dataset dataset = datasets.get(pvalue);
+ leaves.remove(dataset);
+ return dataset;
+ }
+
+ /**
+ * Computes the outputs for all RDDs that are leaves in the DAG and do not have any actions (like
+ * saving to a file) registered on them (i.e. they are performed for side effects).
+ */
+ public void computeOutputs() {
+ for (Dataset dataset : leaves) {
+ dataset.action(); // force computation.
+ }
+ }
+
+ /**
+ * Retrieve an object of Type T associated with the PValue passed in.
+ *
+ * @param value PValue to retrieve associated data for.
+ * @param <T> Type of object to return.
+ * @return Native object.
+ */
+ @SuppressWarnings("TypeParameterUnusedInFormals")
+ public <T> T get(PValue value) {
+ if (pobjects.containsKey(value)) {
+ T result = (T) pobjects.get(value);
+ return result;
+ }
+ if (pcollections.containsKey(value)) {
+ JavaRDD<?> rdd = ((BoundedDataset) pcollections.get(value)).getRDD();
+ T res = (T) Iterables.getOnlyElement(rdd.collect());
+ pobjects.put(value, res);
+ return res;
+ }
+ throw new IllegalStateException("Cannot resolve un-known PObject: " + value);
+ }
+
+ /**
+ * Return the current views creates in the pipeline.
+ *
+ * @return SparkPCollectionView
+ */
+ public SparkPCollectionView getPViews() {
+ return pviews;
+ }
+
+ /**
+ * Adds/Replaces a view to the current views creates in the pipeline.
+ *
+ * @param view - Identifier of the view
+ * @param value - Actual value of the view
+ * @param coder - Coder of the value
+ */
+ public void putPView(
+ PCollectionView<?> view,
+ Iterable<WindowedValue<?>> value,
+ Coder<Iterable<WindowedValue<?>>> coder) {
+ pviews.putPView(view, value, coder);
+ }
+
+ /**
+ * Get the map of cache candidates hold by the evaluation context.
+ *
+ * @return The current {@link Map} of cache candidates.
+ */
+ public Map<PCollection, Long> getCacheCandidates() {
+ return this.cacheCandidates;
+ }
+
+ <T> Iterable<WindowedValue<T>> getWindowedValues(PCollection<T> pcollection) {
+ @SuppressWarnings("unchecked")
+ BoundedDataset<T> boundedDataset = (BoundedDataset<T>) datasets.get(pcollection);
+ leaves.remove(boundedDataset);
+ return boundedDataset.getValues(pcollection);
+ }
+
+ public String storageLevel() {
+ return serializableOptions.get().as(SparkPipelineOptions.class).getStorageLevel();
+ }
+}
[beam] 19/50: Post-pone batch qualifier in all classes names for
readability
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 2ccccdde5eee8620fc85ced9dddfa2281bf6fbe5
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Mon Nov 26 11:14:55 2018 +0100
Post-pone batch qualifier in all classes names for readability
---
.../spark/structuredstreaming/SparkRunner.java | 4 ++--
.../translation/PipelineTranslator.java | 4 ++--
...ator.java => CombinePerKeyTranslatorBatch.java} | 2 +-
...java => FlattenPCollectionTranslatorBatch.java} | 2 +-
...nslator.java => GroupByKeyTranslatorBatch.java} | 2 +-
...DoTranslator.java => ParDoTranslatorBatch.java} | 2 +-
...ranslator.java => PipelineTranslatorBatch.java} | 22 +++++++++++-----------
...nslator.java => ReadSourceTranslatorBatch.java} | 2 +-
...anslator.java => ReshuffleTranslatorBatch.java} | 2 +-
...onContext.java => TranslationContextBatch.java} | 4 ++--
...lator.java => WindowAssignTranslatorBatch.java} | 2 +-
11 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
index b76a530..e3fd6b4 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkRunner.java
@@ -20,7 +20,7 @@ package org.apache.beam.runners.spark.structuredstreaming;
import static org.apache.beam.runners.core.construction.PipelineResources.detectClassPathResourcesToStage;
import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
-import org.apache.beam.runners.spark.structuredstreaming.translation.batch.BatchPipelineTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.batch.PipelineTranslatorBatch;
import org.apache.beam.runners.spark.structuredstreaming.translation.streaming.StreamingPipelineTranslator;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.PipelineRunner;
@@ -122,7 +122,7 @@ public final class SparkRunner extends PipelineRunner<SparkPipelineResult> {
PipelineTranslator pipelineTranslator =
options.isStreaming()
? new StreamingPipelineTranslator(options)
- : new BatchPipelineTranslator(options);
+ : new PipelineTranslatorBatch(options);
pipelineTranslator.translate(pipeline);
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
index c771915..d64b8b1 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
@@ -20,7 +20,7 @@ package org.apache.beam.runners.spark.structuredstreaming.translation;
import org.apache.beam.runners.core.construction.PTransformTranslation;
import org.apache.beam.runners.core.construction.PipelineResources;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
-import org.apache.beam.runners.spark.structuredstreaming.translation.batch.BatchPipelineTranslator;
+import org.apache.beam.runners.spark.structuredstreaming.translation.batch.PipelineTranslatorBatch;
import org.apache.beam.runners.spark.structuredstreaming.translation.streaming.StreamingPipelineTranslator;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.runners.TransformHierarchy;
@@ -34,7 +34,7 @@ import org.slf4j.LoggerFactory;
* It also does the pipeline preparation: mode detection, transforms replacement, classpath
* preparation. If we have a streaming job, it is instantiated as a {@link
* StreamingPipelineTranslator}. If we have a batch job, it is instantiated as a {@link
- * BatchPipelineTranslator}.
+ * PipelineTranslatorBatch}.
*/
public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaults {
private int depth = 0;
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java
similarity index 95%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java
index 4a10329..c8946d9 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java
@@ -23,7 +23,7 @@ import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
-class BatchCombinePerKeyTranslator<K, InputT, AccumT, OutputT>
+class CombinePerKeyTranslatorBatch<K, InputT, AccumT, OutputT>
implements TransformTranslator<
PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, OutputT>>>> {
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenPCollectionTranslatorBatch.java
similarity index 97%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenPCollectionTranslatorBatch.java
index d24f60c..87a250e 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenPCollectionTranslatorBatch.java
@@ -23,7 +23,7 @@ import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionList;
-class BatchFlattenPCollectionTranslator<T>
+class FlattenPCollectionTranslatorBatch<T>
implements TransformTranslator<PTransform<PCollectionList<T>, PCollection<T>>> {
@Override
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
similarity index 97%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
index 829ba8a..4ee77fb 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
@@ -23,7 +23,7 @@ import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
-class BatchGroupByKeyTranslator<K, InputT>
+class GroupByKeyTranslatorBatch<K, InputT>
implements TransformTranslator<
PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>>> {
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java
similarity index 97%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java
index 56aa504..1e57098 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java
@@ -23,7 +23,7 @@ import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionTuple;
-class BatchParDoTranslator<InputT, OutputT>
+class ParDoTranslatorBatch<InputT, OutputT>
implements TransformTranslator<PTransform<PCollection<InputT>, PCollectionTuple>> {
@Override
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
similarity index 79%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
index 6648539..e883131 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchPipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
@@ -30,10 +30,10 @@ import org.apache.beam.sdk.transforms.PTransform;
/**
* {@link PipelineTranslator} for executing a {@link Pipeline} in Spark in batch mode. This contains
- * only the components specific to batch: {@link BatchTranslationContext}, registry of batch {@link
+ * only the components specific to batch: {@link TranslationContextBatch}, registry of batch {@link
* TransformTranslator} and registry lookup code.
*/
-public class BatchPipelineTranslator extends PipelineTranslator {
+public class PipelineTranslatorBatch extends PipelineTranslator {
// --------------------------------------------------------------------------------------------
// Transform Translator Registry
@@ -44,26 +44,26 @@ public class BatchPipelineTranslator extends PipelineTranslator {
static {
TRANSFORM_TRANSLATORS.put(
- PTransformTranslation.COMBINE_PER_KEY_TRANSFORM_URN, new BatchCombinePerKeyTranslator());
+ PTransformTranslation.COMBINE_PER_KEY_TRANSFORM_URN, new CombinePerKeyTranslatorBatch());
TRANSFORM_TRANSLATORS.put(
- PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN, new BatchGroupByKeyTranslator());
- TRANSFORM_TRANSLATORS.put(PTransformTranslation.RESHUFFLE_URN, new BatchReshuffleTranslator());
+ PTransformTranslation.GROUP_BY_KEY_TRANSFORM_URN, new GroupByKeyTranslatorBatch());
+ TRANSFORM_TRANSLATORS.put(PTransformTranslation.RESHUFFLE_URN, new ReshuffleTranslatorBatch());
TRANSFORM_TRANSLATORS.put(
- PTransformTranslation.FLATTEN_TRANSFORM_URN, new BatchFlattenPCollectionTranslator());
+ PTransformTranslation.FLATTEN_TRANSFORM_URN, new FlattenPCollectionTranslatorBatch());
TRANSFORM_TRANSLATORS.put(
- PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, new BatchWindowAssignTranslator());
+ PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, new WindowAssignTranslatorBatch());
TRANSFORM_TRANSLATORS.put(
- PTransformTranslation.PAR_DO_TRANSFORM_URN, new BatchParDoTranslator());
+ PTransformTranslation.PAR_DO_TRANSFORM_URN, new ParDoTranslatorBatch());
TRANSFORM_TRANSLATORS.put(
- PTransformTranslation.READ_TRANSFORM_URN, new BatchReadSourceTranslator());
+ PTransformTranslation.READ_TRANSFORM_URN, new ReadSourceTranslatorBatch());
}
- public BatchPipelineTranslator(SparkPipelineOptions options) {
- translationContext = new BatchTranslationContext(options);
+ public PipelineTranslatorBatch(SparkPipelineOptions options) {
+ translationContext = new TranslationContextBatch(options);
}
/** Returns a translator for the given node, if it is possible, otherwise null. */
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
similarity index 97%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
index d9fcfbb..d18eb2e 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
@@ -23,7 +23,7 @@ import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;
-class BatchReadSourceTranslator<T>
+class ReadSourceTranslatorBatch<T>
implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
@Override
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java
similarity index 95%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java
index 1423308..17589ef 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java
@@ -21,7 +21,7 @@ import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTr
import org.apache.beam.runners.spark.structuredstreaming.translation.TranslationContext;
import org.apache.beam.sdk.transforms.Reshuffle;
-class BatchReshuffleTranslator<K, InputT> implements TransformTranslator<Reshuffle<K, InputT>> {
+class ReshuffleTranslatorBatch<K, InputT> implements TransformTranslator<Reshuffle<K, InputT>> {
@Override
public void translateTransform(Reshuffle<K, InputT> transform, TranslationContext context) {}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/TranslationContextBatch.java
similarity index 92%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/TranslationContextBatch.java
index 6f50895..e849471 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchTranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/TranslationContextBatch.java
@@ -25,7 +25,7 @@ import org.apache.beam.sdk.values.PValue;
import org.apache.spark.sql.Dataset;
/** This class contains only batch specific context components. */
-public class BatchTranslationContext extends TranslationContext {
+public class TranslationContextBatch extends TranslationContext {
/**
* For keeping track about which DataSets don't have a successor. We need to terminate these with
@@ -33,7 +33,7 @@ public class BatchTranslationContext extends TranslationContext {
*/
private final Map<PValue, Dataset<?>> danglingDataSets;
- public BatchTranslationContext(SparkPipelineOptions options) {
+ public TranslationContextBatch(SparkPipelineOptions options) {
super(options);
this.danglingDataSets = new HashMap<>();
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java
similarity index 97%
rename from runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java
rename to runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java
index 65a7cae..51e21c2 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java
@@ -22,7 +22,7 @@ import org.apache.beam.runners.spark.structuredstreaming.translation.Translation
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.values.PCollection;
-class BatchWindowAssignTranslator<T>
+class WindowAssignTranslatorBatch<T>
implements TransformTranslator<PTransform<PCollection<T>, PCollection<T>>> {
@Override
[beam] 12/50: Make transform translation clearer: renaming, comments
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit ce484e9efeffeba3678958521dc61958a18449fd
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Thu Nov 22 11:06:15 2018 +0100
Make transform translation clearer: renaming, comments
---
.../translation/PipelineTranslator.java | 18 +++++++++++-------
.../translation/TransformTranslator.java | 2 +-
.../batch/BatchCombinePerKeyTranslator.java | 2 +-
.../batch/BatchFlattenPCollectionTranslator.java | 2 +-
.../translation/batch/BatchGroupByKeyTranslator.java | 2 +-
.../translation/batch/BatchParDoTranslator.java | 2 +-
.../translation/batch/BatchReadSourceTranslator.java | 2 +-
.../translation/batch/BatchReshuffleTranslator.java | 2 +-
.../translation/batch/BatchWindowAssignTranslator.java | 2 +-
9 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
index 62e87f2..185879b 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
@@ -123,19 +123,23 @@ public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaul
*/
protected abstract TransformTranslator<?> getTransformTranslator(TransformHierarchy.Node node);
- private <T extends PTransform<?, ?>> void translateNode(
+ /**
+ * Apply the given TransformTranslator to the given node.
+ */
+ private <T extends PTransform<?, ?>> void applyTransformTranslator(
TransformHierarchy.Node node,
TransformTranslator<?> transformTranslator) {
+ // create the applied PTransform on the translationContext
+ translationContext.setCurrentTransform(node.toAppliedPTransform(getPipeline()));
+ // avoid type capture
@SuppressWarnings("unchecked")
T typedTransform = (T) node.getTransform();
-
@SuppressWarnings("unchecked")
TransformTranslator<T> typedTransformTranslator = (TransformTranslator<T>) transformTranslator;
- // create the applied PTransform on the translationContext
- translationContext.setCurrentTransform(node.toAppliedPTransform(getPipeline()));
- typedTransformTranslator.translateNode(typedTransform, translationContext);
+ // apply the transformTranslator
+ typedTransformTranslator.translateTransform(typedTransform, translationContext);
}
@@ -165,7 +169,7 @@ public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaul
TransformTranslator<?> transformTranslator = getTransformTranslator(node);
if (transformTranslator != null) {
- translateNode(node, transformTranslator);
+ applyTransformTranslator(node, transformTranslator);
LOG.info("{} translated- {}", genSpaces(depth), node.getFullName());
return CompositeBehavior.DO_NOT_ENTER_TRANSFORM;
} else {
@@ -191,6 +195,6 @@ public abstract class PipelineTranslator extends Pipeline.PipelineVisitor.Defaul
throw new UnsupportedOperationException(
"The transform " + transformUrn + " is currently not supported.");
}
- translateNode(node, transformTranslator);
+ applyTransformTranslator(node, transformTranslator);
}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
index 51cdd99..ebb8bf8 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
@@ -6,6 +6,6 @@ public interface TransformTranslator<TransformT extends PTransform> {
/** A translator of a {@link PTransform}. */
- void translateNode(TransformT transform, TranslationContext context);
+ void translateTransform(TransformT transform, TranslationContext context);
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java
index c9cae47..858df18 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchCombinePerKeyTranslator.java
@@ -9,7 +9,7 @@ import org.apache.beam.sdk.values.PCollection;
class BatchCombinePerKeyTranslator<K, InputT, AccumT, OutputT> implements
TransformTranslator<PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, OutputT>>>> {
- @Override public void translateNode(
+ @Override public void translateTransform(
PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, OutputT>>> transform,
TranslationContext context) {
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java
index 77f6fdb..90c487a 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchFlattenPCollectionTranslator.java
@@ -9,7 +9,7 @@ import org.apache.beam.sdk.values.PCollectionList;
class BatchFlattenPCollectionTranslator<T> implements
TransformTranslator<PTransform<PCollectionList<T>, PCollection<T>>> {
- @Override public void translateNode(PTransform<PCollectionList<T>, PCollection<T>> transform,
+ @Override public void translateTransform(PTransform<PCollectionList<T>, PCollection<T>> transform,
TranslationContext context) {
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java
index 1bd42f5..52a3c39 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchGroupByKeyTranslator.java
@@ -9,7 +9,7 @@ import org.apache.beam.sdk.values.PCollection;
class BatchGroupByKeyTranslator<K, InputT> implements
TransformTranslator<PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>>> {
- @Override public void translateNode(
+ @Override public void translateTransform(
PTransform<PCollection<KV<K, InputT>>, PCollection<KV<K, Iterable<InputT>>>> transform,
TranslationContext context) {
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java
index cf8c896..6e7f342 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchParDoTranslator.java
@@ -9,7 +9,7 @@ import org.apache.beam.sdk.values.PCollectionTuple;
class BatchParDoTranslator<InputT, OutputT> implements
TransformTranslator<PTransform<PCollection<InputT>, PCollectionTuple>> {
- @Override public void translateNode(PTransform<PCollection<InputT>, PCollectionTuple> transform,
+ @Override public void translateTransform(PTransform<PCollection<InputT>, PCollectionTuple> transform,
TranslationContext context) {
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java
index f5f0351..4236b1c 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReadSourceTranslator.java
@@ -8,7 +8,7 @@ import org.apache.beam.sdk.values.PCollection;
class BatchReadSourceTranslator<T> implements TransformTranslator<PTransform<PBegin, PCollection<T>>> {
- @Override public void translateNode(PTransform<PBegin, PCollection<T>> transform,
+ @Override public void translateTransform(PTransform<PBegin, PCollection<T>> transform,
TranslationContext context) {
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java
index 5fab1c8..5baa331 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchReshuffleTranslator.java
@@ -6,7 +6,7 @@ import org.apache.beam.sdk.transforms.Reshuffle;
class BatchReshuffleTranslator<K, InputT> implements TransformTranslator<Reshuffle<K, InputT>> {
- @Override public void translateNode(Reshuffle<K, InputT> transform, TranslationContext context) {
+ @Override public void translateTransform(Reshuffle<K, InputT> transform, TranslationContext context) {
}
}
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java
index fbbced5..1a8f68b 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/BatchWindowAssignTranslator.java
@@ -8,7 +8,7 @@ import org.apache.beam.sdk.values.PCollection;
class BatchWindowAssignTranslator<T> implements
TransformTranslator<PTransform<PCollection<T>, PCollection<T>>> {
- @Override public void translateNode(PTransform<PCollection<T>, PCollection<T>> transform,
+ @Override public void translateTransform(PTransform<PCollection<T>, PCollection<T>> transform,
TranslationContext context) {
}
}
[beam] 20/50: Add precise TODO for multiple TransformTranslator per
transform URN
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit b37da3e80a8f678463fb5e3cf2c991ff93de5b23
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Mon Nov 26 11:31:36 2018 +0100
Add precise TODO for multiple TransformTranslator per transform URN
---
.../translation/batch/PipelineTranslatorBatch.java | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
index e883131..318d74c 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
@@ -42,6 +42,12 @@ public class PipelineTranslatorBatch extends PipelineTranslator {
@SuppressWarnings("rawtypes")
private static final Map<String, TransformTranslator> TRANSFORM_TRANSLATORS = new HashMap<>();
+ //TODO the ability to have more than one TransformTranslator per URN
+ // that could be dynamically chosen by a predicated that evaluates based on PCollection
+ // obtainable though node.getInputs.getValue()
+ // See https://github.com/seznam/euphoria/blob/master/euphoria-spark/src/main/java/cz/seznam/euphoria/spark/SparkFlowTranslator.java#L83
+ // And https://github.com/seznam/euphoria/blob/master/euphoria-spark/src/main/java/cz/seznam/euphoria/spark/SparkFlowTranslator.java#L106
+
static {
TRANSFORM_TRANSLATORS.put(
PTransformTranslation.COMBINE_PER_KEY_TRANSFORM_URN, new CombinePerKeyTranslatorBatch());
[beam] 37/50: Use raw WindowedValue so that spark Encoders could
work (temporary)
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 1060121a4356b6c0d01227aed7631821df4394e1
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Fri Dec 21 16:05:31 2018 +0100
Use raw WindowedValue so that spark Encoders could work (temporary)
---
.../translation/TranslationContext.java | 8 ++++++++
.../batch/ReadSourceTranslatorMockBatch.java | 20 +++++---------------
2 files changed, 13 insertions(+), 15 deletions(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
index 0f2493d..fb36b37 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
@@ -115,6 +115,14 @@ public class TranslationContext {
}
}
+ //TODO: remove. It is just for testing
+ public void putDatasetRaw(PValue value, Dataset<WindowedValue> dataset) {
+ if (!datasets.containsKey(value)) {
+ datasets.put(value, dataset);
+ leaves.add(dataset);
+ }
+ }
+
// --------------------------------------------------------------------------------------------
// PCollections methods
// --------------------------------------------------------------------------------------------
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
index 504a64d..4a509de 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorMockBatch.java
@@ -50,25 +50,15 @@ class ReadSourceTranslatorMockBatch<T>
Dataset<Row> rowDataset = dataStreamReader.load();
- MapFunction<Row, WindowedValue<Integer>> func = new MapFunction<Row, WindowedValue<Integer>>() {
- @Override public WindowedValue<Integer> call(Row value) throws Exception {
+ MapFunction<Row, WindowedValue> func = new MapFunction<Row, WindowedValue>() {
+ @Override public WindowedValue call(Row value) throws Exception {
//there is only one value put in each Row by the InputPartitionReader
- return value.<WindowedValue<Integer>>getAs(0);
+ return value.<WindowedValue>getAs(0);
}
};
- Dataset<WindowedValue<Integer>> dataset = rowDataset.map(func, new Encoder<WindowedValue<Integer>>() {
-
- @Override public StructType schema() {
- return null;
- }
-
- @Override public ClassTag<WindowedValue<Integer>> clsTag() {
- return scala.reflect.ClassTag$.MODULE$.<WindowedValue<Integer>>apply(WindowedValue.class);
- }
- });
+ Dataset<WindowedValue> dataset = rowDataset.map(func, Encoders.kryo(WindowedValue.class));
PCollection<T> output = (PCollection<T>) context.getOutput();
- context.putDataset(output, dataset);
- dataset.show();
+ context.putDatasetRaw(output, dataset);
}
}
[beam] 50/50: Apply spotless
Posted by ec...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
echauchot pushed a commit to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git
commit 639217915221a11d413f2c5bdac9a4f0fc6a0c7e
Author: Etienne Chauchot <ec...@apache.org>
AuthorDate: Fri Jan 4 11:36:49 2019 +0100
Apply spotless
---
.../structuredstreaming/translation/batch/PipelineTranslatorBatch.java | 1 -
1 file changed, 1 deletion(-)
diff --git a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
index c7e9167..26f1b9c 100644
--- a/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
+++ b/runners/spark-structured-streaming/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
@@ -24,7 +24,6 @@ import org.apache.beam.runners.core.construction.PTransformTranslation;
import org.apache.beam.runners.spark.structuredstreaming.SparkPipelineOptions;
import org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator;
import org.apache.beam.runners.spark.structuredstreaming.translation.TransformTranslator;
-import org.apache.beam.runners.spark.structuredstreaming.translation.batch.mocks.ReadSourceTranslatorMockBatch;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.runners.TransformHierarchy;
import org.apache.beam.sdk.transforms.PTransform;