You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Harish Mandala <mv...@gmail.com> on 2013/01/24 00:21:24 UTC

Exporting some fairly complex json to sql server

Hello,

I've been playing with Sqoop, and it seems to fit my use case (to export
some log data from HDFS to Microsoft SQL Server).
A look at the documentation shows that sqoop will export/import between
tables of similar schema. However, my data export is more complicated.
Allow me to describe it. I have JSON strings stored in Hadoop Sequence
files, with each string indexed by timestamp. Each JSON string is similar
to the following:

[
    {
        "Unique_Key": "123",
        "Timestamp": "8948024",
        "Inner_Array": [
            {
                "Name": "XYZ1",
                "Value": "abc1"
            },
            {
                "Name": "XYZ2",
                "Value": "abc2"
            }
        ]
    },
    {
        "Unique_Key": "456",
        "Timestamp": "89489802",
        "Inner_Array": [
            {
                "Name": "JDJ1",
                "Value": "sfj"
            }
        ]
    }
]

Each string represents an array of objects, with the "Unique_Key" and
"Timestamp" of each of these objects corresponding to a row in one SQL
table (Let's call it Table A). Each object has inside it another
"Inner_Array" - each element of this Inner_Array needs to go into another
SQL table (Table B), and will be associated with the previous table using
the Unique_Key as a foreign key.

So, the schema of the two SQL tables will be:

Table A:
Unique_Key (Primary Key) | TimeStamp

Table B:
Unique_Key (Foreign Key) | Name | Value

If I wanted to implement this functionality in Sqoop (placing nested JSON
in multiple tables), it seems I would need to firstly implement a "JSON
parser" in lib and add schema mapping specifications to the configuration.
We would also need to provide an option for parser selection. Is there
anything I am missing? Any comments? Is this functionality already being
implemented by someone?

Thanks for your patient reading,
Harish

RE: Exporting some fairly complex json to sql server

Posted by Venkatesan Ranganathan <n....@live.com>.
Hi Harish
May be you can do it in 2 steps,  an MR job to split your files into 2 separate files and then load them with SQOOP?   I don't think SQOOP today supports it.   You can also load to a one table only at a time. 
Thanks
Venkat

> Date: Wed, 23 Jan 2013 18:21:24 -0500
> Subject: Exporting some fairly complex json to sql server
> From: mvharish14988@gmail.com
> To: dev@sqoop.apache.org
> 
> Hello,
> 
> I've been playing with Sqoop, and it seems to fit my use case (to export
> some log data from HDFS to Microsoft SQL Server).
> A look at the documentation shows that sqoop will export/import between
> tables of similar schema. However, my data export is more complicated.
> Allow me to describe it. I have JSON strings stored in Hadoop Sequence
> files, with each string indexed by timestamp. Each JSON string is similar
> to the following:
> 
> [
>     {
>         "Unique_Key": "123",
>         "Timestamp": "8948024",
>         "Inner_Array": [
>             {
>                 "Name": "XYZ1",
>                 "Value": "abc1"
>             },
>             {
>                 "Name": "XYZ2",
>                 "Value": "abc2"
>             }
>         ]
>     },
>     {
>         "Unique_Key": "456",
>         "Timestamp": "89489802",
>         "Inner_Array": [
>             {
>                 "Name": "JDJ1",
>                 "Value": "sfj"
>             }
>         ]
>     }
> ]
> 
> Each string represents an array of objects, with the "Unique_Key" and
> "Timestamp" of each of these objects corresponding to a row in one SQL
> table (Let's call it Table A). Each object has inside it another
> "Inner_Array" - each element of this Inner_Array needs to go into another
> SQL table (Table B), and will be associated with the previous table using
> the Unique_Key as a foreign key.
> 
> So, the schema of the two SQL tables will be:
> 
> Table A:
> Unique_Key (Primary Key) | TimeStamp
> 
> Table B:
> Unique_Key (Foreign Key) | Name | Value
> 
> If I wanted to implement this functionality in Sqoop (placing nested JSON
> in multiple tables), it seems I would need to firstly implement a "JSON
> parser" in lib and add schema mapping specifications to the configuration.
> We would also need to provide an option for parser selection. Is there
> anything I am missing? Any comments? Is this functionality already being
> implemented by someone?
> 
> Thanks for your patient reading,
> Harish