You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Bryan Bende (Jira)" <ji...@apache.org> on 2021/11/16 15:09:00 UTC

[jira] [Updated] (NIFI-9380) PutParquet - Compression Type: SNAPPY (Not Working)

     [ https://issues.apache.org/jira/browse/NIFI-9380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Bende updated NIFI-9380:
------------------------------
    Status: Patch Available  (was: Open)

> PutParquet - Compression Type: SNAPPY (Not Working)
> ---------------------------------------------------
>
>                 Key: NIFI-9380
>                 URL: https://issues.apache.org/jira/browse/NIFI-9380
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.15.0, 1.14.0
>         Environment: CentOS 7.4, RedHat 7.9
>            Reporter: Bilal
>            Assignee: Bryan Bende
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have tested different compression types which is a feature of _PutParquet_ and _ConvertAvroToParquet_ Processors on different NiFi versions.
>  
> Summary information:
>  * Compression types (UNCOMPRESSED, GZIP, {*}SNAPPY{*}) of _PutParquet_ Processor works correctly on NiFi 1.12.1 and 1.13.2
>  * Compression types (UNCOMPRESSED, GZIP) of _PutParquet_ Processor works correctly on NiFi 1.14.0 and 1.5.0; *SNAPPY* gives an error.
>  
>  * Compression types (UNCOMPRESSED, GZIP, {*}SNAPPY{*}) of _ConvertAvroToParquet_ Processor works correctly on NiFi 1.12.1, 1.13.2, 1.14.0 and 1.15.0.
> _PutParquet_ – Properties:
>  * Hadoop Configuration Resources: File locations
>  * Kerberos Credentials Service: Keytab service
>  * Record Reader: AvroReader Service (Embedded Avro Schema)
>  * Overwrite Files: True
>  * Compression Type: SNAPPY
>  * Other Properties: Default
>  
> In order to do lean testing, the default configuration was used generally:
>  * nifi-env.sh file has the default configuration.
>  * bootstrap.conf file has the default configuration.
>  * nifi.properties file has the default configuration except security configuration.
>  * _PutParquet_ Processor has the default configuration. (But SNAPPY compression is not working)
>  * _ConvertAvroToParquet_ Processor has the default configuration. (SNAPPY compression is working correctly)
>  * There is no custom processor in our NiFi environment.
>  * There is no custom lib location in Nifi properties.
>  
> Error Log (nifi-app.log):
> {noformat}
> Error Log (nifi-app.log):
> ERROR [Timer-Driven Process Thread-12] o.a.nifi.processors.parquet.PutParquet PutParquet[id=6caab337-68e8-3834-b64a-1d2cbd93aba8] Failed to write due to java.lang.IncompatibleClassChangeError: Class org.xerial.snappy.SnappyNative does not implement the requested interface org.xerial.snappy.SnappyApi: java.lang.IncompatibleClassChangeError: Class org.xerial.snappy.SnappyNative does not implement the requested interface org.xerial.snappy.SnappyApi
> java.lang.IncompatibleClassChangeError: Class org.xerial.snappy.SnappyNative does not implement the requested interface org.xerial.snappy.SnappyApi
>         at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:380)
>         at org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)
>         at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
>         at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
>         at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:167)
>         at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:168)
>         at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:59)
>         at org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:387)
>         at org.apache.parquet.column.impl.ColumnWriteStoreBase.flush(ColumnWriteStoreBase.java:186)
>         at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:29)
>         at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:185)
>         at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:124)
>         at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:319)
>         at org.apache.nifi.parquet.hadoop.AvroParquetHDFSRecordWriter.close(AvroParquetHDFSRecordWriter.java:49)
>         at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:534)
>         at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:466)
>         at org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:326)
>         at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2466)
>         at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2434)
>         at org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:303)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1822)
>         at org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:271)
>         at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>         at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1202)
>         at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)
>         at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:103)
>         at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)