You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/01/14 07:13:39 UTC

[GitHub] [incubator-hudi] zhedoubushishi opened a new pull request #1226: [HUDI-238] Make Hudi support Scala 2.12

zhedoubushishi opened a new pull request #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   Most ideas of this PR is pretty similar to https://github.com/apache/incubator-hudi/pull/1109. 
   
   This PR is also compatible with Scala 2.12. You can build it with: 
   ```mvn clean install -Dscala.version=2.12.10 -scala.binary.version=2.12```
   
   Here are some major differences between https://github.com/apache/incubator-hudi/pull/1109:
   - updated kafka-source.properties & kafka-source.properties .
   - This parameter: ```ConsumerConfig.GROUP_ID_CONFIG``` is defined in ```TestKafkaSource.java``` rather than in ```KafkaOffsetGen.java```. Because this config should be decided by the client side but not the Hudi side.
   -  For ```AvroKafkaSource.java ```, ```KafkaAvroDeserializer.class``` need to be set.
   ```
       props.put("key.deserializer", StringDeserializer.class);
       props.put("value.deserializer", KafkaAvroDeserializer.class);
   ```
   ## Verify this pull request
   
   This pull request is already covered by existing tests.
   
   
   ## Committer checklist
   
    - [x] Has a corresponding JIRA in PR title & commit
    
    - [x] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#issuecomment-575802938
 
 
   This is the same set of tests with spark-2.4.4, scala_2.11 and hudi
   
   (base) varadarb-C02SH0P1G8WL:zhedoubushishi_hudi varadarb$ ~/spark-2.4.4-bin-hadoop2.7/bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-SNAPSHOT,org.apache.spark:spark-avro_2.11:2.4.2 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   Ivy Default Cache set to: /Users/varadarb/.ivy2/cache
   The jars for the packages stored in: /Users/varadarb/.ivy2/jars
   :: loading settings :: url = jar:file:/Users/varadarb/spark-2.4.4-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
   org.apache.hudi#hudi-spark-bundle_2.11 added as a dependency
   org.apache.spark#spark-avro_2.11 added as a dependency
   :: resolving dependencies :: org.apache.spark#spark-submit-parent-db5734f8-665a-44e6-a55c-d389e9c9b0d4;1.0
   	confs: [default]
   	found org.apache.hudi#hudi-spark-bundle_2.11;0.5.1-SNAPSHOT in local-m2-cache
   	found org.apache.spark#spark-avro_2.11;2.4.2 in central
   	found org.spark-project.spark#unused;1.0.0 in spark-list
   :: resolution report :: resolve 233ms :: artifacts dl 6ms
   	:: modules in use:
   	org.apache.hudi#hudi-spark-bundle_2.11;0.5.1-SNAPSHOT from local-m2-cache in [default]
   	org.apache.spark#spark-avro_2.11;2.4.2 from central in [default]
   	org.spark-project.spark#unused;1.0.0 from spark-list in [default]
   	---------------------------------------------------------------------
   	|                  |            modules            ||   artifacts   |
   	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
   	---------------------------------------------------------------------
   	|      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
   	---------------------------------------------------------------------
   :: retrieving :: org.apache.spark#spark-submit-parent-db5734f8-665a-44e6-a55c-d389e9c9b0d4
   	confs: [default]
   	0 artifacts copied, 3 already retrieved (0kB/5ms)
   20/01/17 13:30:05 WARN Utils: Your hostname, varadarb-C02SH0P1G8WL resolves to a loopback address: 127.0.0.1; using 172.26.16.136 instead (on interface en0)
   20/01/17 13:30:05 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
   20/01/17 13:30:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
   Spark context Web UI available at http://varadarb-c02sh0p1g8wl.corp.uber.com:4040
   Spark context available as 'sc' (master = local[*], app id = local-1579296611401).
   Spark session available as 'spark'.
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
         /_/
            
   Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_191)
   Type in expressions to have them evaluated.
   Type :help for more information.
   
   scala> import org.apache.hudi.QuickstartUtils._
   import org.apache.hudi.QuickstartUtils._
   
   scala> import scala.collection.JavaConversions._
   import scala.collection.JavaConversions._
   
   scala> import org.apache.spark.sql.SaveMode._
   import org.apache.spark.sql.SaveMode._
   
   scala> import org.apache.hudi.DataSourceReadOptions._
   import org.apache.hudi.DataSourceReadOptions._
   
   scala> import org.apache.hudi.DataSourceWriteOptions._
   import org.apache.hudi.DataSourceWriteOptions._
   
   scala> import org.apache.hudi.config.HoodieWriteConfig._
   import org.apache.hudi.config.HoodieWriteConfig._
   
   scala> 
   
   scala> val tableName = "hudi_cow_table"
   tableName: String = hudi_cow_table
   
   scala> val basePath = "file:///tmp/hudi_cow_table"
   basePath: String = file:///tmp/hudi_cow_table
   
   scala> val dataGen = new DataGenerator
   dataGen: org.apache.hudi.QuickstartUtils.DataGenerator = org.apache.hudi.QuickstartUtils$DataGenerator@22c4151b
   
   scala> val inserts = convertToStringList(dataGen.generateInserts(10))
   inserts: java.util.List[String] = [{"ts": 0.0, "uuid": "d41c911e-41b5-4c54-ac4f-492623c5a8fe", "rider": "rider-213", "driver": "driver-213", "begin_lat": 0.4726905879569653, "begin_lon": 0.46157858450465483, "end_lat": 0.754803407008858, "end_lon": 0.9671159942018241, "fare": 34.158284716382845, "partitionpath": "americas/brazil/sao_paulo"}, {"ts": 0.0, "uuid": "c59f096d-ec05-4a18-aac4-67866c007925", "rider": "rider-213", "driver": "driver-213", "begin_lat": 0.6100070562136587, "begin_lon": 0.8779402295427752, "end_lat": 0.3407870505929602, "end_lon": 0.5030798142293655, "fare": 43.4923811219014, "partitionpath": "americas/brazil/sao_paulo"}, {"ts": 0.0, "uuid": "6e340dc7-0992-454c-a069-816b8de93a22", "rider": "rider-213", "driver": "driver-213", "begin_lat": 0.5731835407930634, "begin_...
   scala> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
   warning: there was one deprecation warning; re-run with -deprecation for details
   df: org.apache.spark.sql.DataFrame = [begin_lat: double, begin_lon: double ... 8 more fields]
   
   scala> df.write.format("org.apache.hudi").
        |     options(getQuickstartWriteConfigs).
        |     option(PRECOMBINE_FIELD_OPT_KEY, "ts").
        |     option(RECORDKEY_FIELD_OPT_KEY, "uuid").
        |     option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
        |     option(TABLE_NAME, tableName).
        |     mode(Overwrite).
        |     save(basePath);
                                                                                   
   scala> val roViewDF = spark.
        |     read.
        |     format("org.apache.hudi").
        |     load(basePath + "/*/*/*/*")
   roViewDF: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 13 more fields]
   
   scala> roViewDF.createOrReplaceTempView("hudi_ro_table")
   
   scala> spark.sql("select fare, begin_lon, begin_lat, ts from  hudi_ro_table where fare > 20.0").show()
   20/01/17 13:30:53 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
   20/01/17 13:30:53 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
   +------------------+-------------------+-------------------+---+
   |              fare|          begin_lon|          begin_lat| ts|
   +------------------+-------------------+-------------------+---+
   | 64.27696295884016| 0.4923479652912024| 0.5731835407930634|0.0|
   | 93.56018115236618|0.14285051259466197|0.21624150367601136|0.0|
   | 27.79478688582596| 0.6273212202489661|0.11488393157088261|0.0|
   | 33.92216483948643| 0.9694586417848392| 0.1856488085068272|0.0|
   |  43.4923811219014| 0.8779402295427752| 0.6100070562136587|0.0|
   | 66.62084366450246|0.03844104444445928| 0.0750588760043035|0.0|
   |34.158284716382845|0.46157858450465483| 0.4726905879569653|0.0|
   | 41.06290929046368| 0.8192868687714224|  0.651058505660742|0.0|
   +------------------+-------------------+-------------------+---+
   
   
   scala> spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, rider, driver, fare from  hudi_ro_table").show()
   +-------------------+--------------------+----------------------+---------+----------+------------------+
   |_hoodie_commit_time|  _hoodie_record_key|_hoodie_partition_path|    rider|    driver|              fare|
   +-------------------+--------------------+----------------------+---------+----------+------------------+
   |     20200117133031|6e340dc7-0992-454...|  americas/united_s...|rider-213|driver-213| 64.27696295884016|
   |     20200117133031|be18b898-c04c-4bf...|  americas/united_s...|rider-213|driver-213| 93.56018115236618|
   |     20200117133031|73ba1d25-2cc5-407...|  americas/united_s...|rider-213|driver-213| 27.79478688582596|
   |     20200117133031|82dbd63c-a6b6-4f7...|  americas/united_s...|rider-213|driver-213| 33.92216483948643|
   |     20200117133031|da517676-c854-495...|  americas/united_s...|rider-213|driver-213|19.179139106643607|
   |     20200117133031|c59f096d-ec05-4a1...|  americas/brazil/s...|rider-213|driver-213|  43.4923811219014|
   |     20200117133031|cd150bb8-7f29-4e5...|  americas/brazil/s...|rider-213|driver-213| 66.62084366450246|
   |     20200117133031|d41c911e-41b5-4c5...|  americas/brazil/s...|rider-213|driver-213|34.158284716382845|
   |     20200117133031|31a5958c-2fac-4bf...|    asia/india/chennai|rider-213|driver-213|17.851135255091155|
   |     20200117133031|95cbb4b3-118c-4d6...|    asia/india/chennai|rider-213|driver-213| 41.06290929046368|
   +-------------------+--------------------+----------------------+---------+----------+------------------+
   
   
   scala> val updates = convertToStringList(dataGen.generateUpdates(10))
   updates: java.util.List[String] = [{"ts": 0.0, "uuid": "6e340dc7-0992-454c-a069-816b8de93a22", "rider": "rider-284", "driver": "driver-284", "begin_lat": 0.7340133901254792, "begin_lon": 0.5142184937933181, "end_lat": 0.7814655558162802, "end_lon": 0.6592596683641996, "fare": 49.527694252432056, "partitionpath": "americas/united_states/san_francisco"}, {"ts": 0.0, "uuid": "d41c911e-41b5-4c54-ac4f-492623c5a8fe", "rider": "rider-284", "driver": "driver-284", "begin_lat": 0.1593867607188556, "begin_lon": 0.010872312870502165, "end_lat": 0.9808530350038475, "end_lon": 0.7963756520507014, "fare": 29.47661370147079, "partitionpath": "americas/brazil/sao_paulo"}, {"ts": 0.0, "uuid": "d41c911e-41b5-4c54-ac4f-492623c5a8fe", "rider": "rider-284", "driver": "driver-284", "begin_lat": 0.71801964677...
   scala> val df = spark.read.json(spark.sparkContext.parallelize(updates, 2));
   warning: there was one deprecation warning; re-run with -deprecation for details
   df: org.apache.spark.sql.DataFrame = [begin_lat: double, begin_lon: double ... 8 more fields]
   
   scala> df.write.format("org.apache.hudi").
        |     options(getQuickstartWriteConfigs).
        |     option(PRECOMBINE_FIELD_OPT_KEY, "ts").
        |     option(RECORDKEY_FIELD_OPT_KEY, "uuid").
        |     option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
        |     option(TABLE_NAME, tableName).
        |     mode(Append).
        |     save(basePath);
                                                                                   
   scala> // reload data
   
   scala> spark.
        |     read.
        |     format("org.apache.hudi").
        |     load(basePath + "/*/*/*/*").
        |     createOrReplaceTempView("hudi_ro_table")
   
   scala> 
   
   scala> val commits = spark.sql("select distinct(_hoodie_commit_time) as commitTime from  hudi_ro_table order by commitTime").map(k => k.getString(0)).take(50)
   commits: Array[String] = Array(20200117133031, 20200117133058)
   
   scala> val beginTime = commits(commits.length - 2) // commit time we are interested in
   beginTime: String = 20200117133031
   
   scala> 
   
   scala> // incrementally query data
   
   scala> val incViewDF = spark.
        |     read.
        |     format("org.apache.hudi").
        |     option(VIEW_TYPE_OPT_KEY, VIEW_TYPE_INCREMENTAL_OPT_VAL).
        |     option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
        |     load(basePath);
   incViewDF: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 13 more fields]
   
   scala> incViewDF.registerTempTable("hudi_incr_table")
   warning: there was one deprecation warning; re-run with -deprecation for details
   
   scala> spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  hudi_incr_table where fare > 20.0").show()
   +-------------------+------------------+--------------------+-------------------+---+
   |_hoodie_commit_time|              fare|           begin_lon|          begin_lat| ts|
   +-------------------+------------------+--------------------+-------------------+---+
   |     20200117133058|49.527694252432056|  0.5142184937933181| 0.7340133901254792|0.0|
   |     20200117133058|  98.3428192817987|  0.3349917833248327| 0.4777395067707303|0.0|
   |     20200117133058|  90.9053809533154| 0.19949323322922063|0.18294079059016366|0.0|
   |     20200117133058| 90.25710109008239|  0.4006983139989222|0.08528650347654165|0.0|
   |     20200117133058| 63.72504913279929|   0.888493603696927| 0.6570857443423376|0.0|
   |     20200117133058| 29.47661370147079|0.010872312870502165| 0.1593867607188556|0.0|
   +-------------------+------------------+--------------------+-------------------+---+
   
   
   scala> val beginTime = "000" // Represents all commits > this time.
   beginTime: String = 000
   
   scala> val endTime = commits(commits.length - 2) // commit time we are interested in
   endTime: String = 20200117133031
   
   scala> 
   
   scala> //incrementally query data
   
   scala> val incViewDF = spark.read.format("org.apache.hudi").
        |     option(VIEW_TYPE_OPT_KEY, VIEW_TYPE_INCREMENTAL_OPT_VAL).
        |     option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
        |     option(END_INSTANTTIME_OPT_KEY, endTime).
        |     load(basePath);
   incViewDF: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 13 more fields]
   
   scala> incViewDF.registerTempTable("hudi_incr_table")
   warning: there was one deprecation warning; re-run with -deprecation for details
   
   scala> spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  hudi_incr_table where fare > 20.0").show()
   +-------------------+------------------+-------------------+-------------------+---+
   |_hoodie_commit_time|              fare|          begin_lon|          begin_lat| ts|
   +-------------------+------------------+-------------------+-------------------+---+
   |     20200117133031| 64.27696295884016| 0.4923479652912024| 0.5731835407930634|0.0|
   |     20200117133031| 93.56018115236618|0.14285051259466197|0.21624150367601136|0.0|
   |     20200117133031| 27.79478688582596| 0.6273212202489661|0.11488393157088261|0.0|
   |     20200117133031| 33.92216483948643| 0.9694586417848392| 0.1856488085068272|0.0|
   |     20200117133031|  43.4923811219014| 0.8779402295427752| 0.6100070562136587|0.0|
   |     20200117133031| 66.62084366450246|0.03844104444445928| 0.0750588760043035|0.0|
   |     20200117133031|34.158284716382845|0.46157858450465483| 0.4726905879569653|0.0|
   |     20200117133031| 41.06290929046368| 0.8192868687714224|  0.651058505660742|0.0|
   +-------------------+------------------+-------------------+-------------------+---+
   
   
   scala>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#issuecomment-574986730
 
 
   @leesf : FYI : This diff will have implication on how we release packages. We will be changing the names of packages : hudi-spark, hudi-spark-bundle and hudi-utilities-bundle to include scala version. As part of building jars when releasing, you would have to run "mvn clean install xxx" twice one without any additional settings to build 2.11 scala versions and then run
   
   dev/change-scala-version 2.12
   mvn -Pscala-2.12 clean install
   
   for 2.12
   
   cc @vinothchandar 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

leesf commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#issuecomment-576623890
 
 
   Could we remove _2.11 in the artifactId in hudi-spark, hudi-utilities, packing/hudi-spark-bundle ,packing/hudi-utilities-bundle? cc @bvaradar @zhedoubushishi ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#issuecomment-575785278
 
 
   I tested @zhedoubushishi changes by building with scala 2.12  and ran hudi quickstart with the hudi packages and spark-3.0-preview2 (which is built with 2.12). It ran successfully !!
   
   (base) varadarb-C02SH0P1G8WL:zhedoubushishi_hudi varadarb$ ~/spark-3.0.0-preview2-bin-hadoop2.7/bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle_2.12:0.5.1-SNAPSHOT,org.apache.spark:spark-avro_2.12:3.0.0-preview2  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   20/01/17 12:22:40 WARN Utils: Your hostname, varadarb-C02SH0P1G8WL resolves to a loopback address: 127.0.0.1; using 172.26.16.136 instead (on interface en0)
   20/01/17 12:22:40 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
   Ivy Default Cache set to: /Users/varadarb/.ivy2/cache
   The jars for the packages stored in: /Users/varadarb/.ivy2/jars
   :: loading settings :: url = jar:file:/Users/varadarb/spark-3.0.0-preview2-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
   org.apache.hudi#hudi-spark-bundle_2.12 added as a dependency
   org.apache.spark#spark-avro_2.12 added as a dependency
   :: resolving dependencies :: org.apache.spark#spark-submit-parent-e8c0c949-bada-47b9-a444-5b32205e1ff3;1.0
   	confs: [default]
   	found org.apache.hudi#hudi-spark-bundle_2.12;0.5.1-SNAPSHOT in local-m2-cache
   	found org.apache.spark#spark-avro_2.12;3.0.0-preview2 in central
   	found org.spark-project.spark#unused;1.0.0 in spark-list
   downloading https://repo1.maven.org/maven2/org/apache/spark/spark-avro_2.12/3.0.0-preview2/spark-avro_2.12-3.0.0-preview2.jar ...
   	[SUCCESSFUL ] org.apache.spark#spark-avro_2.12;3.0.0-preview2!spark-avro_2.12.jar (44ms)
   :: resolution report :: resolve 1173ms :: artifacts dl 48ms
   	:: modules in use:
   	org.apache.hudi#hudi-spark-bundle_2.12;0.5.1-SNAPSHOT from local-m2-cache in [default]
   	org.apache.spark#spark-avro_2.12;3.0.0-preview2 from central in [default]
   	org.spark-project.spark#unused;1.0.0 from spark-list in [default]
   	---------------------------------------------------------------------
   	|                  |            modules            ||   artifacts   |
   	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
   	---------------------------------------------------------------------
   	|      default     |   3   |   1   |   1   |   0   ||   3   |   1   |
   	---------------------------------------------------------------------
   
   :: problems summary ::
   :::: ERRORS
   	unknown resolver null
   
   
   :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
   :: retrieving :: org.apache.spark#spark-submit-parent-e8c0c949-bada-47b9-a444-5b32205e1ff3
   	confs: [default]
   	1 artifacts copied, 2 already retrieved (146kB/6ms)
   20/01/17 12:22:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
   Spark context Web UI available at http://varadarb-c02sh0p1g8wl.corp.uber.com:4040
   Spark context available as 'sc' (master = local[*], app id = local-1579292567995).
   Spark session available as 'spark'.
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /___/ .__/\_,_/_/ /_/\_\   version 3.0.0-preview2
         /_/
            
   Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_191)
   Type in expressions to have them evaluated.
   Type :help for more information.
   
   scala> import org.apache.hudi.QuickstartUtils._
   import org.apache.hudi.QuickstartUtils._
   
   scala> import scala.collection.JavaConversions._
   import scala.collection.JavaConversions._
   
   scala> import org.apache.spark.sql.SaveMode._
   import org.apache.spark.sql.SaveMode._
   
   scala> import org.apache.hudi.DataSourceReadOptions._
   import org.apache.hudi.DataSourceReadOptions._
   
   scala> import org.apache.hudi.DataSourceWriteOptions._
   import org.apache.hudi.DataSourceWriteOptions._
   
   scala> import org.apache.hudi.config.HoodieWriteConfig._
   import org.apache.hudi.config.HoodieWriteConfig._
   
   scala> 
   
   scala> val tableName = "hudi_cow_table"
   tableName: String = hudi_cow_table
   
   scala> val basePath = "file:///tmp/hudi_cow_table"
   basePath: String = file:///tmp/hudi_cow_table
   
   scala> val dataGen = new DataGenerator
   dataGen: org.apache.hudi.QuickstartUtils.DataGenerator = org.apache.hudi.QuickstartUtils$DataGenerator@1c00809b
   
   scala> val inserts = convertToStringList(dataGen.generateInserts(10))
   inserts: java.util.List[String] = [{"ts": 0.0, "uuid": "44e96c81-a406-4429-a56e-d6f6121a1182", "rider": "rider-213", "driver": "driver-213", "begin_lat": 0.4726905879569653, "begin_lon": 0.46157858450465483, "end_lat": 0.754803407008858, "end_lon": 0.9671159942018241, "fare": 34.158284716382845, "partitionpath": "americas/brazil/sao_paulo"}, {"ts": 0.0, "uuid": "4e31e5af-348a-4b39-ada5-e20213e8c307", "rider": "rider-213", "driver": "driver-213", "begin_lat": 0.6100070562136587, "begin_lon": 0.8779402295427752, "end_lat": 0.3407870505929602, "end_lon": 0.5030798142293655, "fare": 43.4923811219014, "partitionpath": "americas/brazil/sao_paulo"}, {"ts": 0.0, "uuid": "6f24f083-eff9-4328-b1ee-a54889cfd6e2", "rider": "rider-213", "driver": "driver-213", "begin_lat": 0...
   
   scala> val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
   warning: there was one deprecation warning (since 2.12.0)
   warning: there was one deprecation warning (since 2.2.0)
   warning: there were two deprecation warnings in total; for details, enable `:setting -deprecation' or `:replay -deprecation'
   df: org.apache.spark.sql.DataFrame = [begin_lat: double, begin_lon: double ... 8 more fields]
   
   scala> df.write.format("org.apache.hudi").
        |     options(getQuickstartWriteConfigs).
        |     option(PRECOMBINE_FIELD_OPT_KEY, "ts").
        |     option(RECORDKEY_FIELD_OPT_KEY, "uuid").
        |     option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
        |     option(TABLE_NAME, tableName).
        |     mode(Overwrite).
        |     save(basePath);
                                                                                   
   scala> val roViewDF = spark.
        |     read.
        |     format("org.apache.hudi").
        |     load(basePath + "/*/*/*/*")
   roViewDF: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 13 more fields]
   
   scala> roViewDF.createOrReplaceTempView("hudi_ro_table")
   
   scala> spark.sql("select fare, begin_lon, begin_lat, ts from  hudi_ro_table where fare > 20.0").show()
   +------------------+-------------------+-------------------+---+
   |              fare|          begin_lon|          begin_lat| ts|
   +------------------+-------------------+-------------------+---+
   | 93.56018115236618|0.14285051259466197|0.21624150367601136|0.0|
   | 33.92216483948643| 0.9694586417848392| 0.1856488085068272|0.0|
   | 27.79478688582596| 0.6273212202489661|0.11488393157088261|0.0|
   | 64.27696295884016| 0.4923479652912024| 0.5731835407930634|0.0|
   | 66.62084366450246|0.03844104444445928| 0.0750588760043035|0.0|
   |  43.4923811219014| 0.8779402295427752| 0.6100070562136587|0.0|
   |34.158284716382845|0.46157858450465483| 0.4726905879569653|0.0|
   | 41.06290929046368| 0.8192868687714224|  0.651058505660742|0.0|
   +------------------+-------------------+-------------------+---+
   
   
   scala> spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, rider, driver, fare from  hudi_ro_table").show()
   +-------------------+--------------------+----------------------+---------+----------+------------------+
   |_hoodie_commit_time|  _hoodie_record_key|_hoodie_partition_path|    rider|    driver|              fare|
   +-------------------+--------------------+----------------------+---------+----------+------------------+
   |     20200117122356|91e41b3e-1c25-403...|  americas/united_s...|rider-213|driver-213|19.179139106643607|
   |     20200117122356|17ee8945-22d2-496...|  americas/united_s...|rider-213|driver-213| 93.56018115236618|
   |     20200117122356|129345e8-a885-428...|  americas/united_s...|rider-213|driver-213| 33.92216483948643|
   |     20200117122356|420f2771-23b3-4df...|  americas/united_s...|rider-213|driver-213| 27.79478688582596|
   |     20200117122356|6f24f083-eff9-432...|  americas/united_s...|rider-213|driver-213| 64.27696295884016|
   |     20200117122356|c2a0969f-3381-492...|  americas/brazil/s...|rider-213|driver-213| 66.62084366450246|
   |     20200117122356|4e31e5af-348a-4b3...|  americas/brazil/s...|rider-213|driver-213|  43.4923811219014|
   |     20200117122356|44e96c81-a406-442...|  americas/brazil/s...|rider-213|driver-213|34.158284716382845|
   |     20200117122356|8cd77166-8a33-47c...|    asia/india/chennai|rider-213|driver-213| 41.06290929046368|
   |     20200117122356|7efe1602-e219-419...|    asia/india/chennai|rider-213|driver-213|17.851135255091155|
   +-------------------+--------------------+----------------------+---------+----------+------------------+
   
   
   scala> val updates = convertToStringList(dataGen.generateUpdates(10))
   updates: java.util.List[String] = [{"ts": 0.0, "uuid": "6f24f083-eff9-4328-b1ee-a54889cfd6e2", "rider": "rider-284", "driver": "driver-284", "begin_lat": 0.7340133901254792, "begin_lon": 0.5142184937933181, "end_lat": 0.7814655558162802, "end_lon": 0.6592596683641996, "fare": 49.527694252432056, "partitionpath": "americas/united_states/san_francisco"}, {"ts": 0.0, "uuid": "44e96c81-a406-4429-a56e-d6f6121a1182", "rider": "rider-284", "driver": "driver-284", "begin_lat": 0.1593867607188556, "begin_lon": 0.010872312870502165, "end_lat": 0.9808530350038475, "end_lon": 0.7963756520507014, "fare": 29.47661370147079, "partitionpath": "americas/brazil/sao_paulo"}, {"ts": 0.0, "uuid": "44e96c81-a406-4429-a56e-d6f6121a1182", "rider": "rider-284", "driver": "driver-284", ...
   
   scala> val df = spark.read.json(spark.sparkContext.parallelize(updates, 2));
   warning: there was one deprecation warning (since 2.12.0)
   warning: there was one deprecation warning (since 2.2.0)
   warning: there were two deprecation warnings in total; for details, enable `:setting -deprecation' or `:replay -deprecation'
   df: org.apache.spark.sql.DataFrame = [begin_lat: double, begin_lon: double ... 8 more fields]
   
   scala> df.write.format("org.apache.hudi").
        |     options(getQuickstartWriteConfigs).
        |     option(PRECOMBINE_FIELD_OPT_KEY, "ts").
        |     option(RECORDKEY_FIELD_OPT_KEY, "uuid").
        |     option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
        |     option(TABLE_NAME, tableName).
        |     mode(Append).
        |     save(basePath);
                                                                                   
   scala> spark.
        |     read.
        |     format("org.apache.hudi").
        |     load(basePath + "/*/*/*/*").
        |     createOrReplaceTempView("hudi_ro_table")
   
   scala> 
   
   scala> val commits = spark.sql("select distinct(_hoodie_commit_time) as commitTime from  hudi_ro_table order by commitTime").map(k => k.getString(0)).take(50)
   commits: Array[String] = Array(20200117122356, 20200117122430)                  
   
   scala> val beginTime = commits(commits.length - 2) // commit time we are interested in
   beginTime: String = 20200117122356
   
   scala> 
   
   scala> // incrementally query data
   
   scala> val incViewDF = spark.
        |     read.
        |     format("org.apache.hudi").
        |     option(VIEW_TYPE_OPT_KEY, VIEW_TYPE_INCREMENTAL_OPT_VAL).
        |     option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
        |     load(basePath);
   incViewDF: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 13 more fields]
   
   scala> incViewDF.createOrReplaceTempView("hudi_incr_table")
   
   scala> spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  hudi_incr_table where fare > 20.0").show()
   +-------------------+------------------+--------------------+-------------------+---+
   |_hoodie_commit_time|              fare|           begin_lon|          begin_lat| ts|
   +-------------------+------------------+--------------------+-------------------+---+
   |     20200117122430|  90.9053809533154| 0.19949323322922063|0.18294079059016366|0.0|
   |     20200117122430|  98.3428192817987|  0.3349917833248327| 0.4777395067707303|0.0|
   |     20200117122430|49.527694252432056|  0.5142184937933181| 0.7340133901254792|0.0|
   |     20200117122430| 90.25710109008239|  0.4006983139989222|0.08528650347654165|0.0|
   |     20200117122430| 63.72504913279929|   0.888493603696927| 0.6570857443423376|0.0|
   |     20200117122430| 29.47661370147079|0.010872312870502165| 0.1593867607188556|0.0|
   +-------------------+------------------+--------------------+-------------------+---+
   
   
   scala> val beginTime = "000" // Represents all commits > this time.
   beginTime: String = 000
   
   scala> val endTime = commits(commits.length - 2) // commit time we are interested in
   endTime: String = 20200117122356
   
   scala> 
   
   scala> //incrementally query data
   
   scala> val incViewDF = spark.read.format("org.apache.hudi").
        |     option(VIEW_TYPE_OPT_KEY, VIEW_TYPE_INCREMENTAL_OPT_VAL).
        |     option(BEGIN_INSTANTTIME_OPT_KEY, beginTime).
        |     option(END_INSTANTTIME_OPT_KEY, endTime).
        |     load(basePath);
   incViewDF: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, _hoodie_commit_seqno: string ... 13 more fields]
   
   scala> incViewDF.createOrReplaceTempView("hudi_incr_table")
   
   scala> spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  hudi_incr_table where fare > 20.0").show()
   +-------------------+------------------+-------------------+-------------------+---+
   |_hoodie_commit_time|              fare|          begin_lon|          begin_lat| ts|
   +-------------------+------------------+-------------------+-------------------+---+
   |     20200117122356| 93.56018115236618|0.14285051259466197|0.21624150367601136|0.0|
   |     20200117122356| 33.92216483948643| 0.9694586417848392| 0.1856488085068272|0.0|
   |     20200117122356| 27.79478688582596| 0.6273212202489661|0.11488393157088261|0.0|
   |     20200117122356| 64.27696295884016| 0.4923479652912024| 0.5731835407930634|0.0|
   |     20200117122356| 66.62084366450246|0.03844104444445928| 0.0750588760043035|0.0|
   |     20200117122356|  43.4923811219014| 0.8779402295427752| 0.6100070562136587|0.0|
   |     20200117122356|34.158284716382845|0.46157858450465483| 0.4726905879569653|0.0|
   |     20200117122356| 41.06290929046368| 0.8192868687714224|  0.651058505660742|0.0|
   +-------------------+------------------+-------------------+-------------------+---+
   
   
   scala> spark.sql("select `_hoodie_commit_time`, fare, begin_lon, begin_lat, ts from  hudi_incr_table where fare > 20.0").show()
   +-------------------+------------------+-------------------+-------------------+---+
   |_hoodie_commit_time|              fare|          begin_lon|          begin_lat| ts|
   +-------------------+------------------+-------------------+-------------------+---+
   |     20200117122356| 93.56018115236618|0.14285051259466197|0.21624150367601136|0.0|
   |     20200117122356| 33.92216483948643| 0.9694586417848392| 0.1856488085068272|0.0|
   |     20200117122356| 27.79478688582596| 0.6273212202489661|0.11488393157088261|0.0|
   |     20200117122356| 64.27696295884016| 0.4923479652912024| 0.5731835407930634|0.0|
   |     20200117122356| 66.62084366450246|0.03844104444445928| 0.0750588760043035|0.0|
   |     20200117122356|  43.4923811219014| 0.8779402295427752| 0.6100070562136587|0.0|
   |     20200117122356|34.158284716382845|0.46157858450465483| 0.4726905879569653|0.0|
   |     20200117122356| 41.06290929046368| 0.8192868687714224|  0.651058505660742|0.0|
   +-------------------+------------------+-------------------+-------------------+---+
   
   
   scala> 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] zhedoubushishi commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

zhedoubushishi commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#discussion_r367250686
 
 

 ##########
 File path: docker/hoodie/hadoop/hive_base/pom.xml
 ##########
 @@ -57,10 +57,10 @@
               <tasks>
                 <copy file="${project.basedir}/../../../../packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-${project.version}.jar" tofile="target/hoodie-hadoop-mr-bundle.jar" />
                 <copy file="${project.basedir}/../../../../packaging/hudi-hive-bundle/target/hudi-hive-bundle-${project.version}.jar" tofile="target/hoodie-hive-bundle.jar" />
-                <copy file="${project.basedir}/../../../../packaging/hudi-spark-bundle/target/hudi-spark-bundle-${project.version}.jar" tofile="target/hoodie-spark-bundle.jar" />
+                <copy file="${project.basedir}/../../../../packaging/hudi-spark-bundle/target/hudi-spark-bundle_${scala.binary.version}-${project.version}.jar" tofile="target/hoodie-spark-bundle_${scala.binary.version}.jar" />
 
 Review comment:
   > @zhedoubushishi : You are on right track. Don't add scala version to the target jars inside docker. The docker setup has hard-wired this configuration. Removing the version from "tofile" entry should hopefully have the tests pass.
   
   Thanks for your advise. I'll have a try.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#discussion_r368925697
 
 

 ##########
 File path: hudi-spark/pom.xml
 ##########
 @@ -23,7 +23,7 @@
   </parent>
   <modelVersion>4.0.0</modelVersion>
 
-  <artifactId>hudi-spark</artifactId>
+  <artifactId>hudi-spark_2.11</artifactId>
 
 Review comment:
   A little wierd when using scala 2.12 to build. 2.11 and 2.12 do not match.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] zhedoubushishi commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

zhedoubushishi commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#issuecomment-574917561
 
 
   I am trying to use the Spark way to support Scala 2.12. (https://spark.apache.org/docs/latest/building-spark.html)
   
   To build Hudi with Scala 2.12, first change the major Scala version using (e.g. 2.12):
   ```
   dev/change-scala-version.sh 2.12
   ```
   Then build with Maven:
   ```
   mvn -Pscala-2.12 clean install
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#discussion_r368926048
 
 

 ##########
 File path: packaging/hudi-utilities-bundle/pom.xml
 ##########
 @@ -24,7 +24,7 @@
     <relativePath>../../pom.xml</relativePath>
   </parent>
   <modelVersion>4.0.0</modelVersion>
-  <artifactId>hudi-utilities-bundle</artifactId>
+  <artifactId>hudi-utilities-bundle_2.11</artifactId>
 
 Review comment:
   ditto

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#discussion_r368926125
 
 

 ##########
 File path: packaging/hudi-spark-bundle/pom.xml
 ##########
 @@ -23,7 +23,7 @@
     <relativePath>../../pom.xml</relativePath>
   </parent>
   <modelVersion>4.0.0</modelVersion>
-  <artifactId>hudi-spark-bundle</artifactId>
+  <artifactId>hudi-spark-bundle_2.11</artifactId>
 
 Review comment:
   ditto

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#discussion_r368926125
 
 

 ##########
 File path: packaging/hudi-spark-bundle/pom.xml
 ##########
 @@ -23,7 +23,7 @@
     <relativePath>../../pom.xml</relativePath>
   </parent>
   <modelVersion>4.0.0</modelVersion>
-  <artifactId>hudi-spark-bundle</artifactId>
+  <artifactId>hudi-spark-bundle_2.11</artifactId>
 
 Review comment:
   ditto

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#discussion_r368925801
 
 

 ##########
 File path: hudi-utilities/pom.xml
 ##########
 @@ -23,7 +23,7 @@
   </parent>
   <modelVersion>4.0.0</modelVersion>
 
-  <artifactId>hudi-utilities</artifactId>
+  <artifactId>hudi-utilities_2.11</artifactId>
 
 Review comment:
   ditto.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] zhedoubushishi commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

zhedoubushishi commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#issuecomment-575513618
 
 
   > > Also there is another way to do this: I can rename artifact Id, e.g.`hudi-spark` to `hudi-spark_${scala.binary.version}`, and thus avoid using `dev/change-scala-version 2.12`. I am not sure why Spark does not choose this way.
   > 
   > @zhedoubushishi : If the artifact id renaming to hudi-spark_${scala.binary.version} works, please go for it and make the change. I will take a look at this PR tomorrow morning PST again.
   
   Sorry I just found ```hudi-spark_${scala.binary.version}``` might not work. 
   See the dependency tree of hudi-cli (hudi-cli depends on hudi-utilities, and hudi-utilities depends on hudi-spark and other spark libraries):
   ```
   $ mvn dependency:tree -Pscala-2.12 -pl hudi-cli
   
   [INFO] ----------------------< org.apache.hudi:hudi-cli >----------------------
   [INFO] Building hudi-cli 0.5.1-SNAPSHOT
   [INFO] --------------------------------[ jar ]---------------------------------
   [INFO] 
   [INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ hudi-cli ---
   [INFO] org.apache.hudi:hudi-cli:jar:0.5.1-SNAPSHOT
   [INFO] +- org.scala-lang:scala-library:jar:2.12.10:compile
   [INFO] +- org.apache.hudi:hudi-hive:jar:0.5.1-SNAPSHOT:compile
   [INFO] |  +- org.apache.hudi:hudi-hadoop-mr:jar:0.5.1-SNAPSHOT:compile
   [INFO] |  \- com.beust:jcommander:jar:1.72:compile
   ...
   [INFO] +- org.apache.hudi:hudi-utilities_2.12:jar:0.5.1-SNAPSHOT:compile
   [INFO] |  +- org.apache.hudi:hudi-spark_2.11:jar:0.5.1-SNAPSHOT:compile
   [INFO] |  +- com.fasterxml.jackson.module:jackson-module-scala_2.11:jar:2.6.7.1:compile
   [INFO] |  |  +- org.scala-lang:scala-reflect:jar:2.11.8:compile
   [INFO] |  |  \- com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.7.9:compile
   [INFO] |  +- org.apache.spark:spark-streaming_2.11:jar:2.4.4:compile
   [INFO] |  |  +- org.apache.spark:spark-core_2.11:jar:2.4.4:compile
   [INFO] |  |  |  +- com.twitter:chill_2.11:jar:0.9.3:compile
   [INFO] |  |  |  +- org.apache.spark:spark-launcher_2.11:jar:2.4.4:compile
   [INFO] |  |  |  +- org.apache.spark:spark-kvstore_2.11:jar:2.4.4:compile
   [INFO] |  |  |  +- org.apache.spark:spark-network-common_2.11:jar:2.4.4:compile
   [INFO] |  |  |  +- org.apache.spark:spark-network-shuffle_2.11:jar:2.4.4:compile
   [INFO] |  |  |  +- org.apache.spark:spark-unsafe_2.11:jar:2.4.4:compile
   [INFO] |  |  |  \- org.json4s:json4s-jackson_2.11:jar:3.5.3:compile
   [INFO] |  |  |     \- org.json4s:json4s-core_2.11:jar:3.5.3:compile
   [INFO] |  |  |        +- org.json4s:json4s-ast_2.11:jar:3.5.3:compile
   [INFO] |  |  |        +- org.json4s:json4s-scalap_2.11:jar:3.5.3:compile
   [INFO] |  |  |        \- org.scala-lang.modules:scala-xml_2.11:jar:1.0.6:compile
   [INFO] |  |  \- org.apache.spark:spark-tags_2.11:jar:2.4.4:compile
   [INFO] |  +- org.apache.spark:spark-streaming-kafka-0-10_2.11:jar:2.4.4:compile
   [INFO] |  |  \- org.apache.kafka:kafka-clients:jar:2.0.0:compile
   [INFO] |  +- org.apache.spark:spark-streaming-kafka-0-10_2.11:jar:tests:2.4.4:compile
   [INFO] |  +- org.antlr:stringtemplate:jar:4.0.2:compile
   [INFO] |  |  \- org.antlr:antlr-runtime:jar:3.3:compile
   [INFO] |  +- com.twitter:bijection-avro_2.11:jar:0.9.3:compile
   [INFO] |  |  +- com.twitter:bijection-core_2.11:jar:0.9.3:compile
   [INFO] |  |  \- org.scoverage:scalac-scoverage-runtime_2.11:jar:1.3.0:compile
   [INFO] |  +- io.confluent:kafka-avro-serializer:jar:3.0.0:compile
   [INFO] |  +- io.confluent:common-config:jar:3.0.0:compile
   [INFO] |  +- io.confluent:common-utils:jar:3.0.0:compile
   [INFO] |  |  \- com.101tec:zkclient:jar:0.5:compile
   [INFO] |  +- io.confluent:kafka-schema-registry-client:jar:3.0.0:compile
   [INFO] |  \- org.apache.httpcomponents:httpcore:jar:4.3.2:compile
   ```
   
   Although scala-2.12 profile overrides the ```hudi-utilities_${scala.binary.version}``` to ```hudi-utilities_2.12```, but when it comes to the dependency of hudi-utilities_2.12, it seems maven would only use the default ```scala.binary.version``` inside ```hudi-utilities_${scala.binary.version}``` which is 2.11. And thus all the dependencies of  ```hudi-utilities_${scala.binary.version}``` goes will ```xxx_2.11```.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] zhedoubushishi commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

zhedoubushishi commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#issuecomment-575296997
 
 
   > @leesf : FYI : This diff will have implication on how we release packages. We will be changing the names of packages : hudi-spark, hudi-spark-bundle and hudi-utilities-bundle to include scala version. As part of building jars when releasing, you would have to run "mvn clean install xxx" twice one without any additional settings to build 2.11 scala versions and then run
   > 
   > dev/change-scala-version 2.12
   > mvn -Pscala-2.12 clean install
   > 
   > for 2.12
   > 
   > cc @vinothchandar
   
   Also there is another way to do this: I can rename artifact Id, e.g.```hudi-spark``` to ```hudi-spark_${scala.binary.version}```, and thus avoid using ```dev/change-scala-version 2.12```. I am not sure why Spark does not choose this way.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#discussion_r368926048
 
 

 ##########
 File path: packaging/hudi-utilities-bundle/pom.xml
 ##########
 @@ -24,7 +24,7 @@
     <relativePath>../../pom.xml</relativePath>
   </parent>
   <modelVersion>4.0.0</modelVersion>
-  <artifactId>hudi-utilities-bundle</artifactId>
+  <artifactId>hudi-utilities-bundle_2.11</artifactId>
 
 Review comment:
   ditto

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#discussion_r368925801
 
 

 ##########
 File path: hudi-utilities/pom.xml
 ##########
 @@ -23,7 +23,7 @@
   </parent>
   <modelVersion>4.0.0</modelVersion>
 
-  <artifactId>hudi-utilities</artifactId>
+  <artifactId>hudi-utilities_2.11</artifactId>
 
 Review comment:
   ditto.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf removed a comment on issue #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

leesf removed a comment on issue #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#issuecomment-576623890
 
 
   Could we remove _2.11 in the artifactId in hudi-spark, hudi-utilities, packing/hudi-spark-bundle ,packing/hudi-utilities-bundle? cc @bvaradar @zhedoubushishi ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

leesf commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#discussion_r368925697
 
 

 ##########
 File path: hudi-spark/pom.xml
 ##########
 @@ -23,7 +23,7 @@
   </parent>
   <modelVersion>4.0.0</modelVersion>
 
-  <artifactId>hudi-spark</artifactId>
+  <artifactId>hudi-spark_2.11</artifactId>
 
 Review comment:
   A little wierd when using scala 2.12 to build. 2.11 and 2.12 do not match.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#issuecomment-575469290
 
 
   > Also there is another way to do this: I can rename artifact Id, e.g.`hudi-spark` to `hudi-spark_${scala.binary.version}`, and thus avoid using `dev/change-scala-version 2.12`. I am not sure why Spark does not choose this way.
   
   @zhedoubushishi : If the artifact id renaming to hudi-spark_${scala.binary.version} works, please go for it and make the change. I will take a look at this PR tomorrow morning PST again.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar merged pull request #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

bvaradar merged pull request #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

bvaradar commented on a change in pull request #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#discussion_r367232920
 
 

 ##########
 File path: docker/hoodie/hadoop/hive_base/pom.xml
 ##########
 @@ -57,10 +57,10 @@
               <tasks>
                 <copy file="${project.basedir}/../../../../packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-${project.version}.jar" tofile="target/hoodie-hadoop-mr-bundle.jar" />
                 <copy file="${project.basedir}/../../../../packaging/hudi-hive-bundle/target/hudi-hive-bundle-${project.version}.jar" tofile="target/hoodie-hive-bundle.jar" />
-                <copy file="${project.basedir}/../../../../packaging/hudi-spark-bundle/target/hudi-spark-bundle-${project.version}.jar" tofile="target/hoodie-spark-bundle.jar" />
+                <copy file="${project.basedir}/../../../../packaging/hudi-spark-bundle/target/hudi-spark-bundle_${scala.binary.version}-${project.version}.jar" tofile="target/hoodie-spark-bundle_${scala.binary.version}.jar" />
 
 Review comment:
   @zhedoubushishi : You are on right track. Don't add scala version to the target jars inside docker. The docker setup has hard-wired this configuration. Removing the version from "tofile" entry should hopefully have the tests pass.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] zhedoubushishi commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12

Posted by GitBox <gi...@apache.org>.

zhedoubushishi commented on issue #1226: [HUDI-238] Make Hudi support Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1226#issuecomment-575854401
 
 
   > Looks good . I have added jiras for updating documentation and release notes (cc @leesf @bhasudha )
   > Thanks to both @ezhux and @zhedoubushishi for helping with this upgrade.
   
   Thanks @bvaradar, also thanks @ezhux!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services