You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@beam.apache.org by "Alexey Romanenko (JIRA)" <ji...@apache.org> on 2019/03/25 17:49:00 UTC

[jira] [Assigned] (BEAM-6874) HCatalogTableProvider always read all rows

     [ https://issues.apache.org/jira/browse/BEAM-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Romanenko reassigned BEAM-6874:
--------------------------------------

    Assignee: Alexey Romanenko

> HCatalogTableProvider always read all rows
> ------------------------------------------
>
>                 Key: BEAM-6874
>                 URL: https://issues.apache.org/jira/browse/BEAM-6874
>             Project: Beam
>          Issue Type: Bug
>          Components: dsl-sql, io-java-hcatalog
>    Affects Versions: 2.11.0
>            Reporter: Near
>            Assignee: Alexey Romanenko
>            Priority: Major
>         Attachments: limit.png
>
>
> Hi,
> I'm using HCatalogTableProvider while doing SqlTransform.query. The query is something like "select * from `hive`.`table_name` limit 10". Despite of the limit clause, the data source still reads much more rows (the data of Hive table are files on S3), even more than the number of rows in one file (or partition).
>  
> Some more details:
>  # It is running on Flink.
>  # I actually implemented my own HiveTableProvider because HCatalogBeamSchema only supports primitive types. However, the table provider works when I query a small table with ~1k rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)