You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by "Gershi, Noam " <no...@citi.com> on 2020/04/27 13:26:36 UTC

HCatalogIO - Trying to read table metadata (columns names and indexes)

Hi
Using HCatalogIO as a source - I am trying to read column tables.

Code:

PCollection<HCatRecord> hcatRecords = input
                .apply(HCatalogIO.read()
                        .withConfigProperties(configProperties)
                        .withDatabase("db-name")
                        .withTable("my-table-name"));
...
HCatalogBeamSchema hcatSchema = HCatalogBeamSchema.create(ImmutableMap.of("table", "my-table-name"));
Schema schema = hcatSchema.getTableSchema("db-name", "my-table-name").get();
List<Schema.Field> fields = schema.getFields();


I get:

20/04/27 09:12:16 INFO LineBufferedStream: Caused by: java.lang.UnsupportedOperationException: The type 'decimal(30,16)' of field 'amount' is not supported.
20/04/27 09:12:16 INFO LineBufferedStream:      at org.apache.beam.sdk.io.hcatalog.SchemaUtils.toBeamField(SchemaUtils.java:60)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
20/04/27 09:12:16 INFO LineBufferedStream:      at org.apache.beam.sdk.io.hcatalog.SchemaUtils.toBeamSchema(SchemaUtils.java:53)
20/04/27 09:12:16 INFO LineBufferedStream:      at org.apache.beam.sdk.io.hcatalog.HCatalogBeamSchema.getTableSchema(HCatalogBeamSchema.java:83)

Thanx in advance,
Noam


RE: HCatalogIO - Trying to read table metadata (columns names and indexes)

Posted by "Gershi, Noam " <no...@citi.com>.
Thank you Rahul


From: [gmail.com] rahul patwari <ra...@gmail.com>
Sent: Tuesday, April 28, 2020 4:21 PM
To: user
Subject: Re: HCatalogIO - Trying to read table metadata (columns names and indexes)

Hi Noam,

Currently, Beam doesn't support conversion of HCatRecords to Rows (or) in your case creating Beam Schema from Hive table schema, when the Hive table have parameterized types.

We can use HCatFieldSchema[1] to create the Beam Schema from the Hive table Schema.
I have created a JIRA ticket to track this issue: https://issues.apache.org/jira/browse/BEAM-9840<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_BEAM-2D9840&d=DwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=eCupsSkN3aRgRJ6bFLVuP8YSNUCJ-FVHiW68E9ssX9w&s=UpRPq6_EtLYraBrDMcIghqO8oK1s_hVA83P0RHySIg4&e=>

[1]: https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/schema/HCatFieldSchema.java#L34<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_hive_blob_f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb_hcatalog_core_src_main_java_org_apache_hive_hcatalog_data_schema_HCatFieldSchema.java-23L34&d=DwMFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=eCupsSkN3aRgRJ6bFLVuP8YSNUCJ-FVHiW68E9ssX9w&s=kWQOeV5GJ_ztlIIab74XZdlzlC79Sv1LkwjyfchuvJQ&e=>

PS: I am working on supporting this feature. This feature should be supported in the future releases of Apache Beam.

Regards,
Rahul

On Mon, Apr 27, 2020 at 6:57 PM Gershi, Noam <no...@citi.com>> wrote:
Hi
Using HCatalogIO as a source - I am trying to read column tables.

Code:

PCollection<HCatRecord> hcatRecords = input
                .apply(HCatalogIO.read()
                        .withConfigProperties(configProperties)
                        .withDatabase(“db-name”)
                        .withTable(“my-table-name”));
...
HCatalogBeamSchema hcatSchema = HCatalogBeamSchema.create(ImmutableMap.of("table", "my-table-name"));
Schema schema = hcatSchema.getTableSchema("db-name", "my-table-name”).get();
List<Schema.Field> fields = schema.getFields();


I get:

20/04/27 09:12:16 INFO LineBufferedStream: Caused by: java.lang.UnsupportedOperationException: The type 'decimal(30,16)' of field 'amount' is not supported.
20/04/27 09:12:16 INFO LineBufferedStream:      at org.apache.beam.sdk.io.hcatalog.SchemaUtils.toBeamField(SchemaUtils.java<https://urldefense.proofpoint.com/v2/url?u=http-3A__SchemaUtils.java&d=DwQFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=eCupsSkN3aRgRJ6bFLVuP8YSNUCJ-FVHiW68E9ssX9w&s=76JA525sdVMYK1G-Vha1lBn2XUF2h_R_WwUNc1Q9-qc&e=>:60)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java<https://urldefense.proofpoint.com/v2/url?u=http-3A__ReferencePipeline.java&d=DwQFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=eCupsSkN3aRgRJ6bFLVuP8YSNUCJ-FVHiW68E9ssX9w&s=RcLJiCaEj7fycdwpNdx5BvNkoswqLqrk_dX6D35DIkc&e=>:193)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java<https://urldefense.proofpoint.com/v2/url?u=http-3A__ArrayList.java&d=DwQFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=eCupsSkN3aRgRJ6bFLVuP8YSNUCJ-FVHiW68E9ssX9w&s=1efQPGAljBR3scZEp0Z6TUZV-xdiDlc9OR2SFDjAn0M&e=>:1382)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java<https://urldefense.proofpoint.com/v2/url?u=http-3A__AbstractPipeline.java&d=DwQFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=eCupsSkN3aRgRJ6bFLVuP8YSNUCJ-FVHiW68E9ssX9w&s=5tBDEX1CqoedvIU7uZOuuFCe6l9rL9zpvfLOtQbLJ48&e=>:481)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java<https://urldefense.proofpoint.com/v2/url?u=http-3A__AbstractPipeline.java&d=DwQFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=eCupsSkN3aRgRJ6bFLVuP8YSNUCJ-FVHiW68E9ssX9w&s=5tBDEX1CqoedvIU7uZOuuFCe6l9rL9zpvfLOtQbLJ48&e=>:471)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java<https://urldefense.proofpoint.com/v2/url?u=http-3A__ReduceOps.java&d=DwQFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=eCupsSkN3aRgRJ6bFLVuP8YSNUCJ-FVHiW68E9ssX9w&s=kgGI-HEV-KrhnFePW7QjNxMJRhlUcozHxC0kAk6QPHM&e=>:708)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java<https://urldefense.proofpoint.com/v2/url?u=http-3A__AbstractPipeline.java&d=DwQFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=eCupsSkN3aRgRJ6bFLVuP8YSNUCJ-FVHiW68E9ssX9w&s=5tBDEX1CqoedvIU7uZOuuFCe6l9rL9zpvfLOtQbLJ48&e=>:234)
20/04/27 09:12:16 INFO LineBufferedStream:      at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java<https://urldefense.proofpoint.com/v2/url?u=http-3A__ReferencePipeline.java&d=DwQFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=eCupsSkN3aRgRJ6bFLVuP8YSNUCJ-FVHiW68E9ssX9w&s=RcLJiCaEj7fycdwpNdx5BvNkoswqLqrk_dX6D35DIkc&e=>:499)
20/04/27 09:12:16 INFO LineBufferedStream:      at org.apache.beam.sdk.io.hcatalog.SchemaUtils.toBeamSchema(SchemaUtils.java<https://urldefense.proofpoint.com/v2/url?u=http-3A__SchemaUtils.java&d=DwQFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=eCupsSkN3aRgRJ6bFLVuP8YSNUCJ-FVHiW68E9ssX9w&s=76JA525sdVMYK1G-Vha1lBn2XUF2h_R_WwUNc1Q9-qc&e=>:53)
20/04/27 09:12:16 INFO LineBufferedStream:      at org.apache.beam.sdk.io.hcatalog.HCatalogBeamSchema.getTableSchema(HCatalogBeamSchema.java<https://urldefense.proofpoint.com/v2/url?u=http-3A__HCatalogBeamSchema.java&d=DwQFaQ&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=weSBkYYkcVBTPEmCbP7YQPAjEPSbBXyqPYqK6vLTOAM&m=eCupsSkN3aRgRJ6bFLVuP8YSNUCJ-FVHiW68E9ssX9w&s=Lu2fV9icqfqmkJLl0ZSZG5jJg1jauUMwgVjrIVATJzY&e=>:83)

Thanx in advance,
Noam


Re: HCatalogIO - Trying to read table metadata (columns names and indexes)

Posted by rahul patwari <ra...@gmail.com>.
Hi Noam,

Currently, Beam doesn't support conversion of HCatRecords to Rows (or) in
your case creating Beam Schema from Hive table schema, when the Hive table
have parameterized types.

We can use HCatFieldSchema[1] to create the Beam Schema from the Hive table
Schema.
I have created a JIRA ticket to track this issue:
https://issues.apache.org/jira/browse/BEAM-9840

[1]:
https://github.com/apache/hive/blob/f37c5de6c32b9395d1b34fa3c02ed06d1bfbf6eb/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/schema/HCatFieldSchema.java#L34

PS: I am working on supporting this feature. This feature should be
supported in the future releases of Apache Beam.

Regards,
Rahul

On Mon, Apr 27, 2020 at 6:57 PM Gershi, Noam <no...@citi.com> wrote:

> Hi
>
> Using HCatalogIO as a source - I am trying to read column tables.
>
>
>
> Code:
>
>
>
> PCollection<HCatRecord> hcatRecords = input
>
>                 .apply(HCatalogIO.read()
>
>                         .withConfigProperties(configProperties)
>
>                         .withDatabase(“db-name”)
>
>                         .withTable(“my-table-name”));
>
> ...
>
> HCatalogBeamSchema hcatSchema =
> HCatalogBeamSchema.create(ImmutableMap.of("table", "my-table-name"));
>
> Schema schema = hcatSchema.getTableSchema("db-name",
> "my-table-name”).get();
>
> List<Schema.Field> fields = schema.getFields();
>
>
>
>
>
> I get:
>
>
>
> 20/04/27 09:12:16 INFO LineBufferedStream: Caused by:
> java.lang.UnsupportedOperationException: The type 'decimal(30,16)' of field
> 'amount' is not supported.
>
> 20/04/27 09:12:16 INFO LineBufferedStream:      at
> org.apache.beam.sdk.io.hcatalog.SchemaUtils.toBeamField(SchemaUtils.java:60)
>
> 20/04/27 09:12:16 INFO LineBufferedStream:      at
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>
> 20/04/27 09:12:16 INFO LineBufferedStream:      at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>
> 20/04/27 09:12:16 INFO LineBufferedStream:      at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>
> 20/04/27 09:12:16 INFO LineBufferedStream:      at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>
> 20/04/27 09:12:16 INFO LineBufferedStream:      at
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>
> 20/04/27 09:12:16 INFO LineBufferedStream:      at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>
> 20/04/27 09:12:16 INFO LineBufferedStream:      at
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>
> 20/04/27 09:12:16 INFO LineBufferedStream:      at
> org.apache.beam.sdk.io.hcatalog.SchemaUtils.toBeamSchema(SchemaUtils.java:53)
>
> 20/04/27 09:12:16 INFO LineBufferedStream:      at
> org.apache.beam.sdk.io.hcatalog.HCatalogBeamSchema.getTableSchema(HCatalogBeamSchema.java:83)
>
>
>
> Thanx in advance,
>
> Noam
>
>
>