You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Marcelo Valle <ma...@ktech.com> on 2019/05/15 16:13:21 UTC

import from 2 DB tables to 1 nested parquet file

Hi,

I have 2 tables on oracle, TABLE_MASTER and TABLE_CHILD. Both have an
independent ID column and TABLE_CHILD has a column MASTER_ID, with an FK to
the master table.

I would like to import these 2 tables incrementally every day to parquet
files on HDFS. However, I wouldn't like to generate 2 different parquet
files for both tables, I would like a single parquet file (meaning - a
single metadata schema) where the child rows would be a list inside master
rows.

Example, if tables are:

MASTER
-------
ID: INT
M_FIELD1: String

CHILD
-------
ID: INT
MASTER_ID: INT
C_FIELD1: INT

I would expect the target parquet file schema to be something like this:

MasterType
-------
ID : INT
M_FIELD1: String
CHILDREN: ListOf<ChildType>

ChildType
-------
ID: INT
C_FIELD1: INT

The most similar thing I found on docs were free form import queries. But
it says nothing about the generated schema in the result file.

Can anyone confirm to me that this is possible and the way to go are free
form queries?

Thanks,
Marcelo.

This email is confidential [and may be protected by legal privilege]. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt.

KTech Services Ltd is registered in England as company number 10704940.

Registered Office: The River Building, 1 Cousin Lane, London EC4R 3TE, United Kingdom