You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "David Palmer (Jira)" <ji...@apache.org> on 2023/01/30 02:27:00 UTC

[jira] [Created] (HUDI-5648) Deltastreamer Transformer output cannot be used in partitioning

David Palmer created HUDI-5648:
----------------------------------

             Summary: Deltastreamer Transformer output cannot be used in partitioning 
                 Key: HUDI-5648
                 URL: https://issues.apache.org/jira/browse/HUDI-5648
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: David Palmer


When using a Deltastreamer Transformer, the output of the Transformer cannot be used as values for partitioning.

In a test, I used the following configs:

 
{noformat}
hoodie.deltastreamer.transformer.sql=SELECT a.*, from_unixtime(timestamp, 'yyyy') as year, from_unixtime(timestamp, 'MM') as month, from_unixtime(timestamp, 'dd') as day, from_unixtime(timestamp, 'HH') as hour FROM <SRC> a

hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
hoodie.datasource.write.partitionpath.field=year,month,day,hour
 {noformat}
What I expect to happen is that the data files in the output DFS are formatted like this:
{noformat}
/path/to/dfs/table/<year>/<month>/<day>/<hour>/
eg:
s3://test-bucket/table/2023/01/30/15/{noformat}
However instead I get the following structure:
{noformat}
/path/to/dfs/table/__HIVE_DEFAULT_PARTITION__/__HIVE_DEFAULT_PARTITION__/__HIVE_DEFAULT_PARTITION__/__HIVE_DEFAULT_PARTITION__/{noformat}
I would expect the output of Transformers to be available for partitioning just like any other column in the dataset.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)