You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Lian Jiang <ji...@gmail.com> on 2018/04/14 00:02:46 UTC

avoid duplicate records when appending new data to a parquet

I have a parquet which has an id field which is the hash of the composite
key fields. Is it possible to maintain the uniqueness of the id field when
appending new data which may duplicate with existing records in the
parquet? Thanks!