You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by lostrain A <do...@gmail.com> on 2015/08/23 10:01:59 UTC

Error when saving a dataframe as ORC file

Hi,
  I'm trying to save a simple dataframe to S3 in ORC format. The code is as
follows:


     val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
>       import sqlContext.implicits._
>       val df=sc.parallelize(1 to 1000).toDF()
>       df.write.format("orc").save("s3://logs/dummy)


I ran the above code in spark-shell and only the _SUCCESS file was saved
under the directory.
The last part of the spark-shell log said:

15/08/23 07:38:23 task-result-getter-1 INFO TaskSetManager: Finished task
> 95.0 in stage 2.0 (TID 295) in 801 ms on ip-*-*-*-*.ec2.internal (100/100)
>


> 15/08/23 07:38:23 dag-scheduler-event-loop INFO DAGScheduler: ResultStage
> 2 (save at <console>:29) finished in 0.834 s
>


> 15/08/23 07:38:23 task-result-getter-1 INFO YarnScheduler: Removed TaskSet
> 2.0, whose tasks have all completed, from pool
>


> 15/08/23 07:38:23 main INFO DAGScheduler: Job 2 finished: save at
> <console>:29, took 0.895912 s
>


> 15/08/23 07:38:24 main INFO
> LocalDirAllocator$AllocatorPerContext$DirSelector: Returning directory:
> /media/ephemeral0/s3/output-
>


> 15/08/23 07:38:24 main ERROR NativeS3FileSystem: md5Hash for
> dummy/_SUCCESS is [-44, 29, -128, -39, -113, 0, -78,
>  4, -23, -103, 9, -104, -20, -8, 66, 126]
>


> 15/08/23 07:38:24 main INFO DefaultWriterContainer: Job job_****_****
> committed.


Anyone has experienced this before?
Thanks!

Re: Error when saving a dataframe as ORC file

Posted by Ted Yu <yu...@gmail.com>.

SPARK-8458 is in 1.4.1 release.

You can upgrade to 1.4.1 or, wait for the upcoming 1.5.0 release.

On Sun, Aug 23, 2015 at 2:05 PM, lostrain A <do...@gmail.com>
wrote:

> Hi Zhan,
>   Thanks for the point. Yes I'm using a cluster with spark-1.4.0 and it
> looks like this is most likely the reason. I'll verify this again once the
> we make the upgrade.
>
> Best,
> los
>
> On Sun, Aug 23, 2015 at 1:25 PM, Zhan Zhang <zz...@hortonworks.com>
> wrote:
>
>> If you are using spark-1.4.0, probably it is caused by SPARK-8458
>> <https://issues.apache.org/jira/browse/SPARK-8458>
>>
>> Thanks.
>>
>> Zhan Zhang
>>
>> On Aug 23, 2015, at 12:49 PM, lostrain A <do...@gmail.com>
>> wrote:
>>
>> Ted,
>>   Thanks for the suggestions. Actually I tried both s3n and s3 and the
>> result remains the same.
>>
>>
>> On Sun, Aug 23, 2015 at 12:27 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> In your case, I would specify "fs.s3.awsAccessKeyId" /
>>> "fs.s3.awsSecretAccessKey" since you use s3 protocol.
>>>
>>> On Sun, Aug 23, 2015 at 11:03 AM, lostrain A <
>>> donotlikeworkinghard@gmail.com> wrote:
>>>
>>>> Hi Ted,
>>>>   Thanks for the reply. I tried setting both of the keyid and accesskey
>>>> via
>>>>
>>>> sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "***")
>>>>> sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "**")
>>>>
>>>>
>>>> However, the error still occurs for ORC format.
>>>>
>>>> If I change the format to JSON, although the error does not go, the
>>>> JSON files can be saved successfully.
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Aug 23, 2015 at 5:51 AM, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>>> You may have seen this:
>>>>> http://search-hadoop.com/m/q3RTtdSyM52urAyI
>>>>>
>>>>>
>>>>>
>>>>> On Aug 23, 2015, at 1:01 AM, lostrain A <
>>>>> donotlikeworkinghard@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>   I'm trying to save a simple dataframe to S3 in ORC format. The code
>>>>> is as follows:
>>>>>
>>>>>
>>>>>      val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>>>>>       import sqlContext.implicits._
>>>>>>       val df=sc.parallelize(1 to 1000).toDF()
>>>>>>       df.write.format("orc").save("s3://logs/dummy)
>>>>>
>>>>>
>>>>> I ran the above code in spark-shell and only the _SUCCESS file was
>>>>> saved under the directory.
>>>>> The last part of the spark-shell log said:
>>>>>
>>>>> 15/08/23 07:38:23 task-result-getter-1 INFO TaskSetManager: Finished
>>>>>> task 95.0 in stage 2.0 (TID 295) in 801 ms on ip-*-*-*-*.ec2.internal
>>>>>> (100/100)
>>>>>>
>>>>>
>>>>>
>>>>>> 15/08/23 07:38:23 dag-scheduler-event-loop INFO DAGScheduler:
>>>>>> ResultStage 2 (save at <console>:29) finished in 0.834 s
>>>>>>
>>>>>
>>>>>
>>>>>> 15/08/23 07:38:23 task-result-getter-1 INFO YarnScheduler: Removed
>>>>>> TaskSet 2.0, whose tasks have all completed, from pool
>>>>>>
>>>>>
>>>>>
>>>>>> 15/08/23 07:38:23 main INFO DAGScheduler: Job 2 finished: save at
>>>>>> <console>:29, took 0.895912 s
>>>>>>
>>>>>
>>>>>
>>>>>> 15/08/23 07:38:24 main INFO
>>>>>> LocalDirAllocator$AllocatorPerContext$DirSelector: Returning directory:
>>>>>> /media/ephemeral0/s3/output-
>>>>>>
>>>>>
>>>>>
>>>>>> 15/08/23 07:38:24 main ERROR NativeS3FileSystem: md5Hash for
>>>>>> dummy/_SUCCESS is [-44, 29, -128, -39, -113, 0, -78,
>>>>>>  4, -23, -103, 9, -104, -20, -8, 66, 126]
>>>>>>
>>>>>
>>>>>
>>>>>> 15/08/23 07:38:24 main INFO DefaultWriterContainer: Job job_****_****
>>>>>> committed.
>>>>>
>>>>>
>>>>> Anyone has experienced this before?
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>

Re: Error when saving a dataframe as ORC file

Posted by lostrain A <do...@gmail.com>.

Hi Zhan,
  Thanks for the point. Yes I'm using a cluster with spark-1.4.0 and it
looks like this is most likely the reason. I'll verify this again once the
we make the upgrade.

Best,
los

On Sun, Aug 23, 2015 at 1:25 PM, Zhan Zhang <zz...@hortonworks.com> wrote:

> If you are using spark-1.4.0, probably it is caused by SPARK-8458
> <https://issues.apache.org/jira/browse/SPARK-8458>
>
> Thanks.
>
> Zhan Zhang
>
> On Aug 23, 2015, at 12:49 PM, lostrain A <do...@gmail.com>
> wrote:
>
> Ted,
>   Thanks for the suggestions. Actually I tried both s3n and s3 and the
> result remains the same.
>
>
> On Sun, Aug 23, 2015 at 12:27 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> In your case, I would specify "fs.s3.awsAccessKeyId" /
>> "fs.s3.awsSecretAccessKey" since you use s3 protocol.
>>
>> On Sun, Aug 23, 2015 at 11:03 AM, lostrain A <
>> donotlikeworkinghard@gmail.com> wrote:
>>
>>> Hi Ted,
>>>   Thanks for the reply. I tried setting both of the keyid and accesskey
>>> via
>>>
>>> sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "***")
>>>> sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "**")
>>>
>>>
>>> However, the error still occurs for ORC format.
>>>
>>> If I change the format to JSON, although the error does not go, the JSON
>>> files can be saved successfully.
>>>
>>>
>>>
>>>
>>> On Sun, Aug 23, 2015 at 5:51 AM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> You may have seen this:
>>>> http://search-hadoop.com/m/q3RTtdSyM52urAyI
>>>>
>>>>
>>>>
>>>> On Aug 23, 2015, at 1:01 AM, lostrain A <do...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>>   I'm trying to save a simple dataframe to S3 in ORC format. The code
>>>> is as follows:
>>>>
>>>>
>>>>      val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>>>>       import sqlContext.implicits._
>>>>>       val df=sc.parallelize(1 to 1000).toDF()
>>>>>       df.write.format("orc").save("s3://logs/dummy)
>>>>
>>>>
>>>> I ran the above code in spark-shell and only the _SUCCESS file was
>>>> saved under the directory.
>>>> The last part of the spark-shell log said:
>>>>
>>>> 15/08/23 07:38:23 task-result-getter-1 INFO TaskSetManager: Finished
>>>>> task 95.0 in stage 2.0 (TID 295) in 801 ms on ip-*-*-*-*.ec2.internal
>>>>> (100/100)
>>>>>
>>>>
>>>>
>>>>> 15/08/23 07:38:23 dag-scheduler-event-loop INFO DAGScheduler:
>>>>> ResultStage 2 (save at <console>:29) finished in 0.834 s
>>>>>
>>>>
>>>>
>>>>> 15/08/23 07:38:23 task-result-getter-1 INFO YarnScheduler: Removed
>>>>> TaskSet 2.0, whose tasks have all completed, from pool
>>>>>
>>>>
>>>>
>>>>> 15/08/23 07:38:23 main INFO DAGScheduler: Job 2 finished: save at
>>>>> <console>:29, took 0.895912 s
>>>>>
>>>>
>>>>
>>>>> 15/08/23 07:38:24 main INFO
>>>>> LocalDirAllocator$AllocatorPerContext$DirSelector: Returning directory:
>>>>> /media/ephemeral0/s3/output-
>>>>>
>>>>
>>>>
>>>>> 15/08/23 07:38:24 main ERROR NativeS3FileSystem: md5Hash for
>>>>> dummy/_SUCCESS is [-44, 29, -128, -39, -113, 0, -78,
>>>>>  4, -23, -103, 9, -104, -20, -8, 66, 126]
>>>>>
>>>>
>>>>
>>>>> 15/08/23 07:38:24 main INFO DefaultWriterContainer: Job job_****_****
>>>>> committed.
>>>>
>>>>
>>>> Anyone has experienced this before?
>>>> Thanks!
>>>>
>>>>
>>>>
>>>
>>
>
>

Re: Error when saving a dataframe as ORC file

Posted by Zhan Zhang <zz...@hortonworks.com>.

If you are using spark-1.4.0, probably it is caused by SPARK-8458<https://issues.apache.org/jira/browse/SPARK-8458>

Thanks.

Zhan Zhang

On Aug 23, 2015, at 12:49 PM, lostrain A <do...@gmail.com>> wrote:

Ted,
  Thanks for the suggestions. Actually I tried both s3n and s3 and the result remains the same.


On Sun, Aug 23, 2015 at 12:27 PM, Ted Yu <yu...@gmail.com>> wrote:
In your case, I would specify "fs.s3.awsAccessKeyId" / "fs.s3.awsSecretAccessKey" since you use s3 protocol.

On Sun, Aug 23, 2015 at 11:03 AM, lostrain A <do...@gmail.com>> wrote:
Hi Ted,
  Thanks for the reply. I tried setting both of the keyid and accesskey via

sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "***")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "**")

However, the error still occurs for ORC format.

If I change the format to JSON, although the error does not go, the JSON files can be saved successfully.




On Sun, Aug 23, 2015 at 5:51 AM, Ted Yu <yu...@gmail.com>> wrote:
You may have seen this:
http://search-hadoop.com/m/q3RTtdSyM52urAyI



On Aug 23, 2015, at 1:01 AM, lostrain A <do...@gmail.com>> wrote:

Hi,
  I'm trying to save a simple dataframe to S3 in ORC format. The code is as follows:


     val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
      import sqlContext.implicits._
      val df=sc.parallelize(1 to 1000).toDF()
      df.write.format("orc").save("s3://logs/dummy)

I ran the above code in spark-shell and only the _SUCCESS file was saved under the directory.
The last part of the spark-shell log said:

15/08/23 07:38:23 task-result-getter-1 INFO TaskSetManager: Finished task 95.0 in stage 2.0 (TID 295) in 801 ms on ip-*-*-*-*.ec2.internal (100/100)

15/08/23 07:38:23 dag-scheduler-event-loop INFO DAGScheduler: ResultStage 2 (save at <console>:29) finished in 0.834 s

15/08/23 07:38:23 task-result-getter-1 INFO YarnScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool

15/08/23 07:38:23 main INFO DAGScheduler: Job 2 finished: save at <console>:29, took 0.895912 s

15/08/23 07:38:24 main INFO LocalDirAllocator$AllocatorPerContext$DirSelector: Returning directory: /media/ephemeral0/s3/output-

15/08/23 07:38:24 main ERROR NativeS3FileSystem: md5Hash for dummy/_SUCCESS is [-44, 29, -128, -39, -113, 0, -78,
 4, -23, -103, 9, -104, -20, -8, 66, 126]

15/08/23 07:38:24 main INFO DefaultWriterContainer: Job job_****_**** committed.

Anyone has experienced this before?
Thanks!

Re: Error when saving a dataframe as ORC file

Posted by lostrain A <do...@gmail.com>.

Ted,
  Thanks for the suggestions. Actually I tried both s3n and s3 and the
result remains the same.


On Sun, Aug 23, 2015 at 12:27 PM, Ted Yu <yu...@gmail.com> wrote:

> In your case, I would specify "fs.s3.awsAccessKeyId" /
> "fs.s3.awsSecretAccessKey" since you use s3 protocol.
>
> On Sun, Aug 23, 2015 at 11:03 AM, lostrain A <
> donotlikeworkinghard@gmail.com> wrote:
>
>> Hi Ted,
>>   Thanks for the reply. I tried setting both of the keyid and accesskey
>> via
>>
>> sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "***")
>>> sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "**")
>>
>>
>> However, the error still occurs for ORC format.
>>
>> If I change the format to JSON, although the error does not go, the JSON
>> files can be saved successfully.
>>
>>
>>
>>
>> On Sun, Aug 23, 2015 at 5:51 AM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> You may have seen this:
>>> http://search-hadoop.com/m/q3RTtdSyM52urAyI
>>>
>>>
>>>
>>> On Aug 23, 2015, at 1:01 AM, lostrain A <do...@gmail.com>
>>> wrote:
>>>
>>> Hi,
>>>   I'm trying to save a simple dataframe to S3 in ORC format. The code is
>>> as follows:
>>>
>>>
>>>      val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>>>       import sqlContext.implicits._
>>>>       val df=sc.parallelize(1 to 1000).toDF()
>>>>       df.write.format("orc").save("s3://logs/dummy)
>>>
>>>
>>> I ran the above code in spark-shell and only the _SUCCESS file was saved
>>> under the directory.
>>> The last part of the spark-shell log said:
>>>
>>> 15/08/23 07:38:23 task-result-getter-1 INFO TaskSetManager: Finished
>>>> task 95.0 in stage 2.0 (TID 295) in 801 ms on ip-*-*-*-*.ec2.internal
>>>> (100/100)
>>>>
>>>
>>>
>>>> 15/08/23 07:38:23 dag-scheduler-event-loop INFO DAGScheduler:
>>>> ResultStage 2 (save at <console>:29) finished in 0.834 s
>>>>
>>>
>>>
>>>> 15/08/23 07:38:23 task-result-getter-1 INFO YarnScheduler: Removed
>>>> TaskSet 2.0, whose tasks have all completed, from pool
>>>>
>>>
>>>
>>>> 15/08/23 07:38:23 main INFO DAGScheduler: Job 2 finished: save at
>>>> <console>:29, took 0.895912 s
>>>>
>>>
>>>
>>>> 15/08/23 07:38:24 main INFO
>>>> LocalDirAllocator$AllocatorPerContext$DirSelector: Returning directory:
>>>> /media/ephemeral0/s3/output-
>>>>
>>>
>>>
>>>> 15/08/23 07:38:24 main ERROR NativeS3FileSystem: md5Hash for
>>>> dummy/_SUCCESS is [-44, 29, -128, -39, -113, 0, -78,
>>>>  4, -23, -103, 9, -104, -20, -8, 66, 126]
>>>>
>>>
>>>
>>>> 15/08/23 07:38:24 main INFO DefaultWriterContainer: Job job_****_****
>>>> committed.
>>>
>>>
>>> Anyone has experienced this before?
>>> Thanks!
>>>
>>>
>>>
>>
>

Re: Error when saving a dataframe as ORC file

Posted by Ted Yu <yu...@gmail.com>.

In your case, I would specify "fs.s3.awsAccessKeyId" /
"fs.s3.awsSecretAccessKey" since you use s3 protocol.

On Sun, Aug 23, 2015 at 11:03 AM, lostrain A <donotlikeworkinghard@gmail.com
> wrote:

> Hi Ted,
>   Thanks for the reply. I tried setting both of the keyid and accesskey via
>
> sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "***")
>> sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "**")
>
>
> However, the error still occurs for ORC format.
>
> If I change the format to JSON, although the error does not go, the JSON
> files can be saved successfully.
>
>
>
>
> On Sun, Aug 23, 2015 at 5:51 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> You may have seen this:
>> http://search-hadoop.com/m/q3RTtdSyM52urAyI
>>
>>
>>
>> On Aug 23, 2015, at 1:01 AM, lostrain A <do...@gmail.com>
>> wrote:
>>
>> Hi,
>>   I'm trying to save a simple dataframe to S3 in ORC format. The code is
>> as follows:
>>
>>
>>      val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>>       import sqlContext.implicits._
>>>       val df=sc.parallelize(1 to 1000).toDF()
>>>       df.write.format("orc").save("s3://logs/dummy)
>>
>>
>> I ran the above code in spark-shell and only the _SUCCESS file was saved
>> under the directory.
>> The last part of the spark-shell log said:
>>
>> 15/08/23 07:38:23 task-result-getter-1 INFO TaskSetManager: Finished task
>>> 95.0 in stage 2.0 (TID 295) in 801 ms on ip-*-*-*-*.ec2.internal (100/100)
>>>
>>
>>
>>> 15/08/23 07:38:23 dag-scheduler-event-loop INFO DAGScheduler:
>>> ResultStage 2 (save at <console>:29) finished in 0.834 s
>>>
>>
>>
>>> 15/08/23 07:38:23 task-result-getter-1 INFO YarnScheduler: Removed
>>> TaskSet 2.0, whose tasks have all completed, from pool
>>>
>>
>>
>>> 15/08/23 07:38:23 main INFO DAGScheduler: Job 2 finished: save at
>>> <console>:29, took 0.895912 s
>>>
>>
>>
>>> 15/08/23 07:38:24 main INFO
>>> LocalDirAllocator$AllocatorPerContext$DirSelector: Returning directory:
>>> /media/ephemeral0/s3/output-
>>>
>>
>>
>>> 15/08/23 07:38:24 main ERROR NativeS3FileSystem: md5Hash for
>>> dummy/_SUCCESS is [-44, 29, -128, -39, -113, 0, -78,
>>>  4, -23, -103, 9, -104, -20, -8, 66, 126]
>>>
>>
>>
>>> 15/08/23 07:38:24 main INFO DefaultWriterContainer: Job job_****_****
>>> committed.
>>
>>
>> Anyone has experienced this before?
>> Thanks!
>>
>>
>>
>

Re: Error when saving a dataframe as ORC file

Posted by lostrain A <do...@gmail.com>.

Hi Ted,
  Thanks for the reply. I tried setting both of the keyid and accesskey via

sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "***")
> sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "**")


However, the error still occurs for ORC format.

If I change the format to JSON, although the error does not go, the JSON
files can be saved successfully.




On Sun, Aug 23, 2015 at 5:51 AM, Ted Yu <yu...@gmail.com> wrote:

> You may have seen this:
> http://search-hadoop.com/m/q3RTtdSyM52urAyI
>
>
>
> On Aug 23, 2015, at 1:01 AM, lostrain A <do...@gmail.com>
> wrote:
>
> Hi,
>   I'm trying to save a simple dataframe to S3 in ORC format. The code is
> as follows:
>
>
>      val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>       import sqlContext.implicits._
>>       val df=sc.parallelize(1 to 1000).toDF()
>>       df.write.format("orc").save("s3://logs/dummy)
>
>
> I ran the above code in spark-shell and only the _SUCCESS file was saved
> under the directory.
> The last part of the spark-shell log said:
>
> 15/08/23 07:38:23 task-result-getter-1 INFO TaskSetManager: Finished task
>> 95.0 in stage 2.0 (TID 295) in 801 ms on ip-*-*-*-*.ec2.internal (100/100)
>>
>
>
>> 15/08/23 07:38:23 dag-scheduler-event-loop INFO DAGScheduler: ResultStage
>> 2 (save at <console>:29) finished in 0.834 s
>>
>
>
>> 15/08/23 07:38:23 task-result-getter-1 INFO YarnScheduler: Removed
>> TaskSet 2.0, whose tasks have all completed, from pool
>>
>
>
>> 15/08/23 07:38:23 main INFO DAGScheduler: Job 2 finished: save at
>> <console>:29, took 0.895912 s
>>
>
>
>> 15/08/23 07:38:24 main INFO
>> LocalDirAllocator$AllocatorPerContext$DirSelector: Returning directory:
>> /media/ephemeral0/s3/output-
>>
>
>
>> 15/08/23 07:38:24 main ERROR NativeS3FileSystem: md5Hash for
>> dummy/_SUCCESS is [-44, 29, -128, -39, -113, 0, -78,
>>  4, -23, -103, 9, -104, -20, -8, 66, 126]
>>
>
>
>> 15/08/23 07:38:24 main INFO DefaultWriterContainer: Job job_****_****
>> committed.
>
>
> Anyone has experienced this before?
> Thanks!
>
>
>

Re: Error when saving a dataframe as ORC file

Posted by Ted Yu <yu...@gmail.com>.

You may have seen this:
http://search-hadoop.com/m/q3RTtdSyM52urAyI



> On Aug 23, 2015, at 1:01 AM, lostrain A <do...@gmail.com> wrote:
> 
> Hi,
>   I'm trying to save a simple dataframe to S3 in ORC format. The code is as follows:
> 
> 
>>      val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>       import sqlContext.implicits._
>>       val df=sc.parallelize(1 to 1000).toDF()
>>       df.write.format("orc").save("s3://logs/dummy)
> 
> I ran the above code in spark-shell and only the _SUCCESS file was saved under the directory.
> The last part of the spark-shell log said:
> 
>> 15/08/23 07:38:23 task-result-getter-1 INFO TaskSetManager: Finished task 95.0 in stage 2.0 (TID 295) in 801 ms on ip-*-*-*-*.ec2.internal (100/100)
>  
>> 15/08/23 07:38:23 dag-scheduler-event-loop INFO DAGScheduler: ResultStage 2 (save at <console>:29) finished in 0.834 s
>  
>> 15/08/23 07:38:23 task-result-getter-1 INFO YarnScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool
>  
>> 15/08/23 07:38:23 main INFO DAGScheduler: Job 2 finished: save at <console>:29, took 0.895912 s
>  
>> 15/08/23 07:38:24 main INFO LocalDirAllocator$AllocatorPerContext$DirSelector: Returning directory: /media/ephemeral0/s3/output-
>  
>> 15/08/23 07:38:24 main ERROR NativeS3FileSystem: md5Hash for dummy/_SUCCESS is [-44, 29, -128, -39, -113, 0, -78,
>>  4, -23, -103, 9, -104, -20, -8, 66, 126]
>  
>> 15/08/23 07:38:24 main INFO DefaultWriterContainer: Job job_****_**** committed.
> 
> Anyone has experienced this before?
> Thanks!
>