You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by andreiL <le...@rogers.com> on 2017/05/09 18:04:42 UTC

Parquet vectorized reader DELTA_BYTE_ARRAY

Hi, I am getting an exception in Spark 2.1 reading parquet files where some
columns are DELTA_BYTE_ARRAY encoded.

java.lang.UnsupportedOperationException: Unsupported encoding:
DELTA_BYTE_ARRAY

Is this exception by design, or am I missing something?

If I turn off the vectorized reader, reading these files works fine.

AndreiL



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Parquet-vectorized-reader-DELTA-BYTE-ARRAY-tp21538.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Parquet vectorized reader DELTA_BYTE_ARRAY

Posted by andreiL <le...@rogers.com>.
I took a closer look and, yes the files were written with Parquet v2.

For some reason Parquet v2 was set as the default, I set it back to Parquet
v1.

Thanks Michael and Ryan for the info.

Andrei.



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Parquet-vectorized-reader-DELTA-BYTE-ARRAY-tp21538p21638.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Parquet vectorized reader DELTA_BYTE_ARRAY

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Michael is right, the delta byte array encoding is a Parquet v2 feature.
Parquet v2 isn't finished yet, though some features are in releases and
those features will be supported in future releases. In other words,
Parquet will maintain backward-compatibility for any released v2 features.

I don't recommend using Parquet v2 yet because Parquet doesn't guarantee
forward-compatibility for those features. For v1, old readers should be
able to read the data written by newer versions, but we won't make that
guarantee for v2 until the spec is considered finished.

rb

On Mon, May 22, 2017 at 10:16 AM, Michael Allman <mi...@videoamp.com>
wrote:

> Hi AndreiL,
>
> Were these files written with the Parquet V2 writer? The Spark 2.1
> vectorized reader does not appear to support that format.
>
> Michael
>
>
> > On May 9, 2017, at 11:04 AM, andreiL <le...@rogers.com> wrote:
> >
> > Hi, I am getting an exception in Spark 2.1 reading parquet files where
> some
> > columns are DELTA_BYTE_ARRAY encoded.
> >
> > java.lang.UnsupportedOperationException: Unsupported encoding:
> > DELTA_BYTE_ARRAY
> >
> > Is this exception by design, or am I missing something?
> >
> > If I turn off the vectorized reader, reading these files works fine.
> >
> > AndreiL
> >
> >
> >
> > --
> > View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Parquet-vectorized-
> reader-DELTA-BYTE-ARRAY-tp21538.html
> > Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Parquet vectorized reader DELTA_BYTE_ARRAY

Posted by Michael Allman <mi...@videoamp.com>.
Hi AndreiL,

Were these files written with the Parquet V2 writer? The Spark 2.1 vectorized reader does not appear to support that format.

Michael


> On May 9, 2017, at 11:04 AM, andreiL <le...@rogers.com> wrote:
> 
> Hi, I am getting an exception in Spark 2.1 reading parquet files where some
> columns are DELTA_BYTE_ARRAY encoded.
> 
> java.lang.UnsupportedOperationException: Unsupported encoding:
> DELTA_BYTE_ARRAY
> 
> Is this exception by design, or am I missing something?
> 
> If I turn off the vectorized reader, reading these files works fine.
> 
> AndreiL
> 
> 
> 
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Parquet-vectorized-reader-DELTA-BYTE-ARRAY-tp21538.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org