You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Dmitry Goldenberg <dg...@hexastax.com> on 2016/03/30 14:03:12 UTC

Having trouble with PutHDFS as Avro, possibly due to codec

Hi,

I've got a dataflow which successfully writes Avro into a directory in
HDFS.  Avro-tools is able to read the Avro files so that seems fine.

Now I'm trying to create a table in Hive using the imported files and my
schema but the CREATE statement's query is just hanging in the Hive query
editor (same via the 'hive' command line).

When compared against another import, I can see that the Avro is different
as far as the compression codec; my NiFi dataflow in ConvertJSONToAvro got
the codec set as 'snappy'.

Here's the create statement I'm trying:

    CREATE EXTERNAL TABLE activations
    STORED AS AVRO
    LOCATION '/demo/xml-etl/activations'
    TBLPROPERTIES ('avro.schema.url'=
    'hdfs:/demo/xml-etl/activations-schema/activations-1.avsc',
'avro.output.codec'='snappy');

I've also tried setting the codec as

    set avro.output.codec=snappy;

(originally I had tried running the CREATE statement with no codec
specification).

Has anyone else encountered this issue?  Is there any way to set a
different codec when converting to Avro in NiFi?

I see that ConvertJSONToAvro simply sets the codec like so:

    writer.setCodec(CodecFactory.snappyCodec());
On the Hive/Hadoop side, I'm running Cloudera 2.6.0-cdh5.4.3.

Thanks,
- Dmitry