You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Bilal (Jira)" <ji...@apache.org> on 2021/11/16 11:07:00 UTC

[jira] [Created] (NIFI-9380) PutParquet - Compression Type: SNAPPY (Not Working)

Bilal created NIFI-9380:
---------------------------

             Summary: PutParquet - Compression Type: SNAPPY (Not Working)
                 Key: NIFI-9380
                 URL: https://issues.apache.org/jira/browse/NIFI-9380
             Project: Apache NiFi
          Issue Type: Bug
          Components: Extensions
    Affects Versions: 1.15.0, 1.14.0
         Environment: CentOS 7.4, RedHat 7.9
            Reporter: Bilal


I have tested different compression types which is a feature of _PutParquet_ and _ConvertAvroToParquet_ Processors on different NiFi versions.

 

Summary information:
 * Compression types (UNCOMPRESSED, GZIP, {*}SNAPPY{*}) of _PutParquet_ Processor works correctly on NiFi 1.12.1 and 1.13.2
 * Compression types (UNCOMPRESSED, GZIP) of _PutParquet_ Processor works correctly on NiFi 1.14.0 and 1.5.0; *SNAPPY* gives an error.

 
 * Compression types (UNCOMPRESSED, GZIP, {*}SNAPPY{*}) of _ConvertAvroToParquet_ Processor works correctly on NiFi 1.12.1, 1.13.2, 1.14.0 and 1.15.0.

_PutParquet_ – Properties:
 * Hadoop Configuration Resources: File locations
 * Kerberos Credentials Service: Keytab service
 * Record Reader: AvroReader Service (Embedded Avro Schema)
 * Overwrite Files: True
 * Compression Type: SNAPPY
 * Other Properties: Default

 

In order to do lean testing, the default configuration was used generally:
 * nifi-env.sh file has the default configuration.
 * bootstrap.conf file has the default configuration.
 * nifi.properties file has the default configuration except security configuration.
 * _PutParquet_ Processor has the default configuration. (But SNAPPY compression is not working)
 * _ConvertAvroToParquet_ Processor has the default configuration. (SNAPPY compression is working correctly)
 * There is no custom processor in our NiFi environment.
 * There is no custom lib location in Nifi properties.

 

Error Log (nifi-app.log):
{noformat}
Error Log (nifi-app.log):
ERROR [Timer-Driven Process Thread-12] o.a.nifi.processors.parquet.PutParquet PutParquet[id=6caab337-68e8-3834-b64a-1d2cbd93aba8] Failed to write due to java.lang.IncompatibleClassChangeError: Class org.xerial.snappy.SnappyNative does not implement the requested interface org.xerial.snappy.SnappyApi: java.lang.IncompatibleClassChangeError: Class org.xerial.snappy.SnappyNative does not implement the requested interface org.xerial.snappy.SnappyApi
java.lang.IncompatibleClassChangeError: Class org.xerial.snappy.SnappyNative does not implement the requested interface org.xerial.snappy.SnappyApi
        at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:380)
        at org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)
        at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
        at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
        at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:167)
        at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:168)
        at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:59)
        at org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:387)
        at org.apache.parquet.column.impl.ColumnWriteStoreBase.flush(ColumnWriteStoreBase.java:186)
        at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:29)
        at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:185)
        at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:124)
        at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:319)
        at org.apache.nifi.parquet.hadoop.AvroParquetHDFSRecordWriter.close(AvroParquetHDFSRecordWriter.java:49)
        at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:534)
        at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:466)
        at org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:326)
        at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2466)
        at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2434)
        at org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:303)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1822)
        at org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:271)
        at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
        at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1202)
        at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)
        at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:103)
        at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)