You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Jay Sen (Jira)" <ji...@apache.org> on 2021/03/12 23:17:00 UTC

[jira] [Updated] (GOBBLIN-1399) provide a way to specify writer path and file name format via config

     [ https://issues.apache.org/jira/browse/GOBBLIN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jay Sen updated GOBBLIN-1399:
-----------------------------
    Summary: provide a way to specify writer path and file name format via config  (was: provide a way to speficy writer path and file name format via config)

> provide a way to specify writer path and file name format via config
> --------------------------------------------------------------------
>
>                 Key: GOBBLIN-1399
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1399
>             Project: Apache Gobblin
>          Issue Type: New Feature
>          Components: gobblin-api
>    Affects Versions: 0.15.0
>            Reporter: Jay Sen
>            Assignee: Hung Tran
>            Priority: Major
>             Fix For: 0.16.0
>
>
> currently gobblin has hard coded specification for writer's path and file name
> primarily it has namespace and tablename and default - 3 way to have writer's path and file name.
> {code:java}
> // code placeholder
> switch (getWriterFilePathType(state)) {
>   case NAMESPACE_TABLE:
>     // writer.file.path.format = <extract.namespace>/<extract.table.name>/
>     return getNamespaceTableWriterFilePath(state);
>   case TABLENAME:
>     // <extract.table.name>
>     return WriterUtils.getTableNameWriterFilePath(state);
>   default:
>     return WriterUtils.getDefaultWriterFilePath(state, numBranches, branchId);
> }
> {code}
>  
>  Filename:
> {code:java}
> namespace.replaceAll("\\.", "/") + "/" + table + "/" + extractId + "_"
>     + (isFull ? "full" : "append");
> {code}
>  
> There is no way user can add any other parameters like version, batchId.
> Also it would be awesome to have any configuration value to be part of the writer path, which can be defined by the format like this
>  
> {code}
> Unable to find source-code formatter for language: java. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yamlextract.type = increments
> writer.file.path.format="<extract.table.name>/<extract.extract.id>/<extract.type>"
> writer.file.name.format="part.<writer_id>_batch_<dataset.batch_id>.<branch_id>.<format_extension>"
> {code}
> Notice the values (like "dataset.batch_id" comes from the runtime config( state.getProp() ), so it allows you to have any kind of flexible path and file name based on your use-case.
> This will be enabled by the feature flag, so existing functionality can remains the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)