You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by "Triones,Deng(vip.com)" <tr...@vipshop.com> on 2016/12/20 12:35:50 UTC

答复: How to deal with string column data for spark mlib?

Hi spark dev,

I am using spark 2 to write orc file to hdfs. I have one questions about the savemode.

My use case is this. When I write data into hdfs. If one task failed I hope the file that the task created should be delete and the retry task can write all data, that is to say,
If I have the data 1 to 100 in this task, when the task which write 1 to 100 failed at first, then the task scheduler reschedule the partition task , the data in hdfs should only have the data 1 to 100. Not double 1 and so on.

If so which kind of savemode should I use. I the FileFormatWriter.scala the file name rule contains one UUID,so I am in mistake..

Thanks

Triones

本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人，谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容，或将其用于其他任何目的或向任何人披露。谢谢您的合作！ This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy.