You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "王范明 (Jira)" <ji...@apache.org> on 2021/11/25 08:46:00 UTC

[jira] [Created] (HUDI-2857) HoodieTableMetaClient.TEMPFOLDER_NAME causes IllegalArgumentException in windows environment

王范明 created HUDI-2857:
-------------------------

             Summary: HoodieTableMetaClient.TEMPFOLDER_NAME causes IllegalArgumentException in windows environment
                 Key: HUDI-2857
                 URL: https://issues.apache.org/jira/browse/HUDI-2857
             Project: Apache Hudi
          Issue Type: Bug
          Components: Spark Integration
    Affects Versions: 0.9.0
         Environment: win10   spark2.4.4     hudi 0.9.0
            Reporter: 王范明


val tableName = "cow_prices"
val basePath = "hdfs://10.38.23.2:9000//tmp//cow_prices//"
val dataGen = new DataGenerator

// spark-shell
val inserts = convertToStringList(dataGen.generateInserts(10))
val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))
df.write.format("hudi").
options(getQuickstartWriteConfigs).
option(PRECOMBINE_FIELD.key(), "ts").
option(RECORDKEY_FIELD.key(), "uuid").
option(PARTITIONPATH_FIELD.key(), "partitionpath").
option(TBL_NAME.key(), tableName).
mode(Overwrite).
save(basePath)

The above is the sample code provided by Hudi's official website. I plan to run the Spark program directly on the win10 environment and store the data on the remote HDFS.The following exception occurred:
{code:java}
Caused by: java.lang.IllegalArgumentException: Not in marker dir. Marker Path=hdfs://10.38.23.2:9000/tmp/cow_prices/.hoodie\.temp/20211125163531/asia/india/chennai/c9218a3b-f248-436b-b41f-4a0b968dfff2-0_2-27-29_20211125163531.parquet.marker.CREATE, Expected Marker Root=/tmp/cow_prices/.hoodie/.temp/20211125163531
    at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
    at org.apache.hudi.common.util.MarkerUtils.stripMarkerFolderPrefix(MarkerUtils.java:87)
    at org.apache.hudi.common.util.MarkerUtils.stripMarkerFolderPrefix(MarkerUtils.java:75)
    at org.apache.hudi.table.marker.DirectWriteMarkers.translateMarkerToDataPath(DirectWriteMarkers.java:153)
    at org.apache.hudi.table.marker.DirectWriteMarkers.lambda$createdAndMergedDataPaths$69cdea3b$1(DirectWriteMarkers.java:142)
    at org.apache.hudi.client.common.HoodieSparkEngineContext.lambda$flatMap$7d470b86$1(HoodieSparkEngineContext.java:78)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125)
    at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125)
    at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
    at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
    at scala.collection.AbstractIterator.to(Iterator.scala:1334)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
{code}
After investigation, it was found that the root cause of the abnormality was that 
{code:java}
HoodieTableMetaClient.TEMPFOLDER_NAME {code}
 was constructed incorrectly during initialization.

 
{code:java}
public static final String TEMPFOLDER_NAME = METAFOLDER_NAME + File.separator + ".temp"; {code}
File.separator is {color:#FF0000}"\"{color} in the windows environment



--
This message was sent by Atlassian Jira
(v8.20.1#820001)