You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Bhupesh Chawda <bh...@datatorrent.com> on 2015/12/09 14:08:26 UTC

Changes to JdbcOutputOperator

Hi All,

Would it be a good idea to introduce the update functionality to the JDBC
output operator in Apache Apex Malhar library.

The following are possible approaches:

   1. Accept a update query from the user with place holders for values.
   Example: *update tableName set a = ?, b = ? where c = ? and d > ?*. Here
   "?" will be provided by the user as java expressions which will be
   evaluated from the incoming tuple.
   2. Another option is to accept in some configuration format (json / xml)
   the following and formulate the query in the operator. This can become
   arbitrarily complex.
   1. update clause columns
      2. update clause expressions
      3. where clause columns
      4. where clause expressions

I am thinking about going ahead with 1. Please let me know if any other
option is possible and whether such a functionality already exists in some
other class.

Thanks.

-Bhupesh

Re: Changes to JdbcOutputOperator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi All,

In the pull request review, Sandeep pointed out that it would be good to
have an error port which can be used to redirect error tuples which cause
some SQL exception from the database.

Here is the design considered for implementing this:

   - Add an error port to *AbstractTransactionableStoreOutputOperator*
   - In *AbstractJdbcTransactionableOutputOperator*, when executing
   *executeBatch*() for a prepared statement do the following:
      - Catch the *BatchUpdateException*
      - Identify whether the JDBC driver continued executing the batch in
      spite of failures*, *or whether it stopped after the first failure.
      Please see
      https://docs.oracle.com/javase/7/docs/api/java/sql/BatchUpdateException.html
      for more details. In each case, pick out the erroneous records
and redirect
      them to the error port.
      - In case the JDBC driver decided to stop after the first failure,
      update the batch start index to skip the error record and resume
processing
      other records in the batch.

I will add these changes as another commit in the same pull request.

Thanks.

~Bhupesh

On Thu, Mar 17, 2016 at 7:22 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Hi,
>
> I have made the changes as discussed in previous emails.
> Here is the pull request:
> https://github.com/apache/incubator-apex-malhar/pull/215
>
> Please help review the changes.
>
> Thanks
> ~Bhupesh
>
> On Tue, Mar 15, 2016 at 6:57 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
>> Hi,
>>
>> Here are some of the details on what happens when automatic mapping is
>> done from POJO fields to DB columns ( i.e. when FieldInfo is not provided)
>>
>>    - The reference list of columns is *all* the columns in the database
>>    table.
>>    - If the same field is found in POJO as in DB table, then we include
>>    it in the columns to be inserted and create the corresponding getter from
>>    POJO.
>>    - If the same field is not found in the POJO, but the field is a
>>    *nullable* field in the DB, then we drop it from the insert column
>>    list. This means we essentially insert a *null* value for that field.
>>    - If the same field is not found in the POJO and the field is a
>>    *non-nullable* field in the DB, then we throw an exception.
>>
>> ~Bhupesh
>>
>> On Tue, Mar 15, 2016 at 2:57 PM, Bhupesh Chawda <bh...@datatorrent.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> Here is the design I am considering for including the additional feature
>>> as mentioned in the previous email. Please note that this includes the
>>> previous feedback received on the design.
>>>
>>>    1. Rename JdbcPOJOOutputOperator -> *AbstractJdbcPOJOOutputOperator*
>>>    2. Add *JdbcPOJOInsertOutputOperator* extending from
>>>    *AbstractJdbcPOJOOutputOperator*. This will be used for "*insert*"
>>>    output query.
>>>       1. If FieldInfo (Mapping of POJO fields to DB columns) is
>>>       provided by the user - then use only these columns to build the insert SQL
>>>       string. Return this insert query as the update command.
>>>       2. If FieldInfo is not provided by the user - then assume a
>>>       direct mapping between POJO fields and the DB columns. Extract list of
>>>       columns from DB and populate all columns from POJO fields. The POJO fields
>>>       will be assumed to be the same as DB column names.
>>>    3. Add *JdbcPOJONonInsertOutputOperator *extending from
>>>    *AbstractJdbcPOJOOutputOperator*. This will be used for "*update /
>>>    merge / delete*" output queries. [This design stays the same as
>>>    discussed previously]
>>>    1. Accept a parametrized SQL string
>>>       2. Use the FieldInfo provided by the user to populate the
>>>       parameters in the query.
>>>       3. Execute the query.
>>>
>>> Please provide your thoughts on this.
>>>
>>> Thanks.
>>> ~Bhupesh
>>>
>>>
>>> On Mon, Mar 14, 2016 at 6:29 PM, Priyanka Gugale <
>>> priyanka@datatorrent.com> wrote:
>>>
>>>> +1, this is simplifying operator configuration, we should do same for
>>>> other
>>>> DB operators as well.
>>>>
>>>> -Priyanka
>>>>
>>>> On Thu, Mar 10, 2016 at 5:34 PM, Bhupesh Chawda <
>>>> bhupesh@datatorrent.com>
>>>> wrote:
>>>>
>>>> > Hi All,
>>>> >
>>>> > Apart from extending the JDBC Output operator (for update, merge and
>>>> > delete), I am planning the following change to the "*insert*"
>>>> > functionality:
>>>> >
>>>> >
>>>> >    - It was pointed out that - if the incoming POJO has fields which
>>>> are to
>>>> >    be inserted into a database table, the user should not have to
>>>> specify
>>>> > any
>>>> >    kind of mapping between POJO columns and database table column
>>>> names.
>>>> > This
>>>> >    is possible if the POJO field names are same as the db column
>>>> names.
>>>> >
>>>> >
>>>> >    - Here is what I am planning to do to address this:
>>>> >       - Have a flag which indicates "*No mapping required. Assume
>>>> same POJO
>>>> >       field names as the DB column names.*"
>>>> >       - If this flag is true, extract DB column names (in addition to
>>>> the
>>>> >       types, which is done currently) and create getters from POJO
>>>> based
>>>> > on the
>>>> >       DB column types.
>>>> >       - Use these getters to populate the parametrized "*insert*"
>>>> query.
>>>> >
>>>> >
>>>> > Note - Once this is done, the user would not have to specify any field
>>>> > descriptions. Just the DB table name would be sufficient. This would
>>>> only
>>>> > be possible if the DB column names are the same as the field names in
>>>> the
>>>> > POJO.
>>>> > Without this change, "*insert*" will be considered as just another DDL
>>>> > query like "update" / "delete" / "merge". We accept a parametrized
>>>> query
>>>> > and a mapping from the user for POJO field names to DB column names +
>>>> DB
>>>> > column types.
>>>> >
>>>> > Thanks.
>>>> > Bhupesh
>>>> >
>>>> >
>>>> > On Wed, Dec 30, 2015 at 12:56 PM, Bhupesh Chawda <
>>>> bhupesh@datatorrent.com>
>>>> > wrote:
>>>> >
>>>> > > Also, since we are not creating separate operators for insert and
>>>> > > update/merge, it seems we don't need an abstract class. We can
>>>> directly
>>>> > > modify JdbcPOJOOutputOperator.
>>>> > >
>>>> > > -Bhupesh
>>>> > >
>>>> > > On Wed, Dec 30, 2015 at 11:17 AM, Bhupesh Chawda <
>>>> > bhupesh@datatorrent.com>
>>>> > > wrote:
>>>> > >
>>>> > >> Also, a note regarding accepting fieldInfo objects:
>>>> > >>
>>>> > >> FieldInfo has fields for java type of the column, but not the SQL
>>>> data
>>>> > >> type which is needed to set the parameters in the SQL statement.
>>>> This
>>>> > still
>>>> > >> needs to be derived from the database (as in insert).
>>>> > >> In more complex scenarios, as mentioned earlier, this may not
>>>> always be
>>>> > >> possible. In this case we have the following options:
>>>> > >>
>>>> > >>    1. Accept the SQL data types from the user for the parameters
>>>> ("?")
>>>> > >>    in a new data structure.
>>>> > >>    2. Accept the JSON structure as specified earlier.
>>>> > >>    3. Modify FieldInfo to have an additional optional field which
>>>> > >>    indicates the SQL data type of the parameter ("?")
>>>> > >>
>>>> > >> I am planning to go ahead with  option 3.
>>>> > >>
>>>> > >> Comments?
>>>> > >>
>>>> > >> -Bhupesh
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >> On Mon, Dec 28, 2015 at 11:26 PM, Chandni Singh <
>>>> > chandni@datatorrent.com>
>>>> > >> wrote:
>>>> > >>
>>>> > >>> Yeah that sounds good.
>>>> > >>>
>>>> > >>> Chandni
>>>> > >>>
>>>> > >>> On Mon, Dec 28, 2015 at 2:46 AM, Bhupesh Chawda <
>>>> > bhupesh@datatorrent.com
>>>> > >>> >
>>>> > >>> wrote:
>>>> > >>>
>>>> > >>> > Thanks Chandni for the comments.
>>>> > >>> >
>>>> > >>> > Additionally I think, we should do away with constructing the
>>>> SQL
>>>> > >>> query in
>>>> > >>> > the operator. This is because it is only possible in case of
>>>> simple
>>>> > >>> insert
>>>> > >>> > statements. In case a where clause is needed in the insert
>>>> statement,
>>>> > >>> we
>>>> > >>> > cannot construct the SQL easily. If we are accepting a
>>>> parametrized
>>>> > >>> query
>>>> > >>> > from the user for update/merge, why not do the same for insert
>>>> > >>> statements
>>>> > >>> > as well? Then the hierarchy would look like:
>>>> > >>> > - AbstractJdbcPOJOOutputOperator (As suggested)
>>>> > >>> > -- JdbcPOJOOutputOperator (Takes care of all statements insert,
>>>> > update,
>>>> > >>> > merge, delete)
>>>> > >>> >
>>>> > >>> > We don't need a separate operator for inserts and update/merge.
>>>> > >>> >
>>>> > >>> > Thanks.
>>>> > >>> > -Bhupesh
>>>> > >>> >
>>>> > >>> >
>>>> > >>> > On Mon, Dec 28, 2015 at 3:32 PM, Chandni Singh <
>>>> > >>> chandni@datatorrent.com>
>>>> > >>> > wrote:
>>>> > >>> >
>>>> > >>> > >   1. FieldInfo, is ultimately a custom object and any user
>>>> who uses
>>>> > >>> this
>>>> > >>> > >    operator has to construct an object, populate it and then
>>>> use
>>>> > it.
>>>> > >>> We
>>>> > >>> > are
>>>> > >>> > >    trying to avoid using any custom object and allow any user
>>>> to
>>>> > use
>>>> > >>> the
>>>> > >>> > >    operator without writing any extra code; just
>>>> configuration.
>>>> > >>> > > FieldInfo is a way to provide configuration in a UI friendly
>>>> way.
>>>> > >>> > Providing
>>>> > >>> > > configuration as JSON is not UI friendly.
>>>> > >>> > >
>>>> > >>> > >    2. In case of update / merge, we need what SQL data types
>>>> that
>>>> > the
>>>> > >>> > >    expression provided by the user would evaluate to. In case
>>>> of
>>>> > >>> insert
>>>> > >>> > we
>>>> > >>> > > can
>>>> > >>> > >    go and fetch the data types of the columns directly from
>>>> DB.
>>>> > >>> However,
>>>> > >>> > > the
>>>> > >>> > >    same is not possible for custom expressions; "avg(salary)"
>>>> for
>>>> > >>> > instance.
>>>> > >>> > > Ok so here is where you can make a change.
>>>> > >>> > > - JDBCPojoOutput can be renamed to
>>>> AbstractJDBCPojoOutpuOperator.
>>>> > >>> > > - Abstraction is to fetch the type of column.
>>>> > >>> > > - Add a concrete JDBCPojoInsertOutput  that derives the types
>>>> of
>>>> > >>> columns
>>>> > >>> > > directly from DB. Please note that FieldInfo can also provide
>>>> type
>>>> > of
>>>> > >>> > > derived column.
>>>> > >>> > >
>>>> > >>> > >
>>>> > >>> > > You mentioned "We are trying to avoid using any custom object
>>>> and
>>>> > >>> allow
>>>> > >>> > any
>>>> > >>> > > user to use the
>>>> > >>> > >    operator without writing any extra code".
>>>> > >>> > > This I think is specific to your use case. You can create an
>>>> > >>> extension of
>>>> > >>> > > the above which takes JSON blob and creates FieldInfos from
>>>> it.
>>>> > >>> > >
>>>> > >>> > > Chandni
>>>> > >>> > >
>>>> > >>> > >
>>>> > >>> > >
>>>> > >>> > >
>>>> > >>> > >
>>>> > >>> > > On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <
>>>> > >>> bhupesh@datatorrent.com
>>>> > >>> > >
>>>> > >>> > > wrote:
>>>> > >>> > >
>>>> > >>> > > > Hi Chandni,
>>>> > >>> > > >
>>>> > >>> > > > Following are the issues:
>>>> > >>> > > >
>>>> > >>> > > >    1. FieldInfo, is ultimately a custom object and any user
>>>> who
>>>> > >>> uses
>>>> > >>> > this
>>>> > >>> > > >    operator has to construct an object, populate it and
>>>> then use
>>>> > >>> it. We
>>>> > >>> > > are
>>>> > >>> > > >    trying to avoid using any custom object and allow any
>>>> user to
>>>> > >>> use
>>>> > >>> > the
>>>> > >>> > > >    operator without writing any extra code; just
>>>> configuration.
>>>> > >>> > > >    2. In case of update / merge, we need what SQL data
>>>> types that
>>>> > >>> the
>>>> > >>> > > >    expression provided by the user would evaluate to. In
>>>> case of
>>>> > >>> insert
>>>> > >>> > > we
>>>> > >>> > > > can
>>>> > >>> > > >    go and fetch the data types of the columns directly from
>>>> DB.
>>>> > >>> > However,
>>>> > >>> > > > the
>>>> > >>> > > >    same is not possible for custom expressions;
>>>> "avg(salary)" for
>>>> > >>> > > instance.
>>>> > >>> > > >
>>>> > >>> > > > Thanks.
>>>> > >>> > > >
>>>> > >>> > > > -Bhupesh
>>>> > >>> > > >
>>>> > >>> > > >
>>>> > >>> > > > On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <
>>>> > >>> > chandni@datatorrent.com>
>>>> > >>> > > > wrote:
>>>> > >>> > > >
>>>> > >>> > > > > Hi Bhupesh,
>>>> > >>> > > > >
>>>> > >>> > > > > JDBCPojoOutputOperator was written for a demo and
>>>> therefore it
>>>> > >>> was
>>>> > >>> > > marked
>>>> > >>> > > > > Evolving which is why I had mentioned that you should
>>>> feel free
>>>> > >>> to
>>>> > >>> > > modify
>>>> > >>> > > > > it.
>>>> > >>> > > > >
>>>> > >>> > > > > I think an insert query can be as complex as any other
>>>> query.
>>>> > It
>>>> > >>> uses
>>>> > >>> > > > > FieldInfo because in the app builder it is easy for the
>>>> user to
>>>> > >>> > provide
>>>> > >>> > > > > that instead of JSON String.
>>>> > >>> > > > >
>>>> > >>> > > > > Can you please provide specifics about what it is that
>>>> you find
>>>> > >>> > > difficult
>>>> > >>> > > > > to change/implement for providing update/merge/delete
>>>> support
>>>> > in
>>>> > >>> > > > > JDBCPojoOutputOperator?
>>>> > >>> > > > >
>>>> > >>> > > > > Chandni
>>>> > >>> > > > >
>>>> > >>> > > > > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <
>>>> > >>> > > bhupesh@datatorrent.com
>>>> > >>> > > > >
>>>> > >>> > > > > wrote:
>>>> > >>> > > > >
>>>> > >>> > > > > > Hi All,
>>>> > >>> > > > > >
>>>> > >>> > > > > > It has been pointed out that adding another class for
>>>> > handling
>>>> > >>> > > update /
>>>> > >>> > > > > > merge queries would not be a good option.
>>>> > >>> > > > > >
>>>> > >>> > > > > > Here are the current implementation details:
>>>> > >>> > > > > >
>>>> > >>> > > > > >    - We have an existing class: JdbcPOJOOutputOperator
>>>> which
>>>> > >>> > accepts
>>>> > >>> > > a
>>>> > >>> > > > > list
>>>> > >>> > > > > >    of FieldInfo objects. Each element of this list
>>>> indicates
>>>> > >>> the
>>>> > >>> > > > details
>>>> > >>> > > > > >    (column name, pojo field expression and datatype) of
>>>> > fields
>>>> > >>> that
>>>> > >>> > > > need
>>>> > >>> > > > > > to be
>>>> > >>> > > > > >    inserted. Using this, the operator formulates the
>>>> insert
>>>> > >>> query
>>>> > >>> > in
>>>> > >>> > > > the
>>>> > >>> > > > > > setup
>>>> > >>> > > > > >    method and identifies the sql datatypes of these
>>>> columns
>>>> > >>> from
>>>> > >>> > the
>>>> > >>> > > > > > database
>>>> > >>> > > > > >    using the table name.
>>>> > >>> > > > > >
>>>> > >>> > > > > >
>>>> > >>> > > > > >    - Now, coming to the update / merge feature, it is
>>>> > >>> difficult to
>>>> > >>> > > > > >    formulate the update / merge query in the operator
>>>> logic
>>>> > >>> due to
>>>> > >>> > > the
>>>> > >>> > > > > > complex
>>>> > >>> > > > > >    structure of these statements. For this reason, we
>>>> plan to
>>>> > >>> take
>>>> > >>> > a
>>>> > >>> > > > > >    parametrized SQL query from the user. This may look
>>>> like:
>>>> > >>> > *"update
>>>> > >>> > > > > table
>>>> > >>> > > > > >    set x = ?, y = ? where z + w > ? and a == 1;"*. Such
>>>> > >>> statements
>>>> > >>> > > can
>>>> > >>> > > > be
>>>> > >>> > > > > >    accepted from the user in addition to a json string
>>>> which
>>>> > >>> > > indicates
>>>> > >>> > > > > the
>>>> > >>> > > > > >    details for the parameters: *column name, the pojo
>>>> > >>> expression
>>>> > >>> > and
>>>> > >>> > > > the
>>>> > >>> > > > > >    sql data type* of the expression. Note that this
>>>> > >>> information is
>>>> > >>> > > > > similar
>>>> > >>> > > > > >    to the FieldInfo object, but is a string which can be
>>>> > >>> configured
>>>> > >>> > > > > easily
>>>> > >>> > > > > > by
>>>> > >>> > > > > >    the user. Also note that the data type which is
>>>> accepted
>>>> > is
>>>> > >>> the
>>>> > >>> > > SQL
>>>> > >>> > > > > data
>>>> > >>> > > > > >    type. Using this info we can populate the
>>>> parametrized
>>>> > >>> query and
>>>> > >>> > > run
>>>> > >>> > > > > it
>>>> > >>> > > > > >    based on the incoming POJO.
>>>> > >>> > > > > >
>>>> > >>> > > > > > The second approach is able to handle all kinds of
>>>> queries
>>>> > >>> (insert
>>>> > >>> > /
>>>> > >>> > > > > update
>>>> > >>> > > > > > / merge / delete). However, since we already have the
>>>> > >>> > > > > > JdbcPOJOOutputOperator, we would like to merge the new
>>>> > >>> > functionality
>>>> > >>> > > > into
>>>> > >>> > > > > > the same class.
>>>> > >>> > > > > >
>>>> > >>> > > > > > Here we have the following options:
>>>> > >>> > > > > >
>>>> > >>> > > > > >    1. Change the existing class
>>>> (JdbcPOJOOutputOperator) to
>>>> > the
>>>> > >>> > > second
>>>> > >>> > > > > >    approach which is more generic and also handles
>>>> inserts.
>>>> > >>> > > > > >    2. Add the update/ merge functionality to the
>>>> existing
>>>> > class
>>>> > >>> > > without
>>>> > >>> > > > > >    changing the existing functionality. This will have
>>>> two
>>>> > >>> > different
>>>> > >>> > > > ways
>>>> > >>> > > > > > that
>>>> > >>> > > > > >    insert queries may be handled in the operator.
>>>> > >>> > > > > >    3. Add another class which extends from
>>>> > >>> JdbcPOJOOutputOperator
>>>> > >>> > and
>>>> > >>> > > > > have
>>>> > >>> > > > > >    the update/merge functionality there. (This is not
>>>> > >>> recommended.)
>>>> > >>> > > > > >    4. Any other approach.
>>>> > >>> > > > > >
>>>> > >>> > > > > > Please suggest.
>>>> > >>> > > > > >
>>>> > >>> > > > > > Thanks.
>>>> > >>> > > > > >
>>>> > >>> > > > > > -Bhupesh.
>>>> > >>> > > > > >
>>>> > >>> > > > > >
>>>> > >>> > > > > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
>>>> > >>> > > > chandni@datatorrent.com
>>>> > >>> > > > > >
>>>> > >>> > > > > > wrote:
>>>> > >>> > > > > >
>>>> > >>> > > > > > > No I don't think we are restricting Malhar to just
>>>> abstract
>>>> > >>> > > classes.
>>>> > >>> > > > > > > Whenever they are couple of use cases that we see
>>>> quite
>>>> > >>> often, we
>>>> > >>> > > add
>>>> > >>> > > > > > > concrete implementations.
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > For eg. FileLineInputOperator which is a concrete
>>>> > >>> implementation
>>>> > >>> > of
>>>> > >>> > > > > > > AbstractFileInputOperator. FSSliceReader is an
>>>> example as
>>>> > >>> well.
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > In AbstractJdbcOutputOperator case there hasn't been
>>>> such
>>>> > >>> common
>>>> > >>> > > > > > > insert/update query.
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > Also if you look at the example I provided, it is very
>>>> > >>> simple to
>>>> > >>> > > > > provide
>>>> > >>> > > > > > a
>>>> > >>> > > > > > > concrete implementation.
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > If you would like to change JdbcPOJOOutputOperator to
>>>> work
>>>> > >>> for
>>>> > >>> > > > > > > "UPDATE/MERGE" then please go ahead.
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > Chandni
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
>>>> > >>> > > > > bhupesh@datatorrent.com
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > wrote:
>>>> > >>> > > > > > >
>>>> > >>> > > > > > > > I see. So, just to understand more, do we plan to
>>>> keep
>>>> > >>> Malhar
>>>> > >>> > > > > > restricted
>>>> > >>> > > > > > > to
>>>> > >>> > > > > > > > the base functionality (as in abstract classes)?
>>>> And put
>>>> > >>> the
>>>> > >>> > > > > > > configuration
>>>> > >>> > > > > > > > aspect / concrete implementations in apps that use
>>>> these
>>>> > >>> > > operators?
>>>> > >>> > > > > > > >
>>>> > >>> > > > > > > > Thanks.
>>>> > >>> > > > > > > > Bhupesh
>>>> > >>> > > > > > > >
>>>> > >>> > > > > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
>>>> > >>> > > > > > chandni@datatorrent.com>
>>>> > >>> > > > > > > > wrote:
>>>> > >>> > > > > > > >
>>>> > >>> > > > > > > > > Hi,
>>>> > >>> > > > > > > > >
>>>> > >>> > > > > > > > > Here is an example of doing Upsert with JDBC:
>>>> > >>> > > > > > > > >
>>>> > >>> > > > > > > > >
>>>> > >>> > > > > > > >
>>>> > >>> > > > > > >
>>>> > >>> > > > > >
>>>> > >>> > > > >
>>>> > >>> > > >
>>>> > >>> > >
>>>> > >>> >
>>>> > >>>
>>>> >
>>>> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
>>>> > >>> > > > > > > > >
>>>> > >>> > > > > > > > > Thanks,
>>>> > >>> > > > > > > > > Chandni
>>>> > >>> > > > > > > > >
>>>> > >>> > > > > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
>>>> > >>> > > > > > > chandni@datatorrent.com
>>>> > >>> > > > > > > > >
>>>> > >>> > > > > > > > > wrote:
>>>> > >>> > > > > > > > >
>>>> > >>> > > > > > > > > > The operators are under Malhar/lib/db/jdbc.
>>>> > >>> > > > > > > > > >
>>>> > >>> > > > > > > > > > Here is one of them:
>>>> > >>> > > > > > > > > >
>>>> > >>> > > > > > > > > >
>>>> > >>> > > > > > > > >
>>>> > >>> > > > > > > >
>>>> > >>> > > > > > >
>>>> > >>> > > > > >
>>>> > >>> > > > >
>>>> > >>> > > >
>>>> > >>> > >
>>>> > >>> >
>>>> > >>>
>>>> >
>>>> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
>>>> > >>> > > > > > > > > >
>>>> > >>> > > > > > > > > > They work with any kind PreparedStatement -
>>>> insert or
>>>> > >>> > update
>>>> > >>> > > > > > > > > >
>>>> > >>> > > > > > > > > > Chandni
>>>> > >>> > > > > > > > > >
>>>> > >>> > > > > > > > > >
>>>> > >>> > > > > > > > > >
>>>> > >>> > > > > > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh
>>>> Chawda <
>>>> > >>> > > > > > > > > bhupesh@datatorrent.com>
>>>> > >>> > > > > > > > > > wrote:
>>>> > >>> > > > > > > > > >
>>>> > >>> > > > > > > > > >> Hi Chandni,
>>>> > >>> > > > > > > > > >>
>>>> > >>> > > > > > > > > >> I don't see an update query being handled in
>>>> the
>>>> > >>> operator.
>>>> > >>> > > > Could
>>>> > >>> > > > > > you
>>>> > >>> > > > > > > > > >> please
>>>> > >>> > > > > > > > > >> point me to the appropriate class?
>>>> > >>> > > > > > > > > >> Or did you mean that handling a update query is
>>>> > just a
>>>> > >>> > > matter
>>>> > >>> > > > of
>>>> > >>> > > > > > > > > extending
>>>> > >>> > > > > > > > > >> the class and providing a concrete
>>>> implementation?
>>>> > >>> > > > > > > > > >>
>>>> > >>> > > > > > > > > >> Thanks.
>>>> > >>> > > > > > > > > >> -Bhupesh
>>>> > >>> > > > > > > > > >>
>>>> > >>> > > > > > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni
>>>> Singh <
>>>> > >>> > > > > > > > > chandni@datatorrent.com>
>>>> > >>> > > > > > > > > >> wrote:
>>>> > >>> > > > > > > > > >>
>>>> > >>> > > > > > > > > >> > Hi Bhupesh,
>>>> > >>> > > > > > > > > >> >
>>>> > >>> > > > > > > > > >> > The current abstract JDBC Output Operators
>>>> in
>>>> > >>> library
>>>> > >>> > are
>>>> > >>> > > > > > generic
>>>> > >>> > > > > > > > and
>>>> > >>> > > > > > > > > >> have
>>>> > >>> > > > > > > > > >> > already been used in multiple POCs and
>>>> > >>> applications. In
>>>> > >>> > > fact
>>>> > >>> > > > > > this
>>>> > >>> > > > > > > > > >> operator
>>>> > >>> > > > > > > > > >> > has matured through customer use cases. It
>>>> is not
>>>> > >>> just
>>>> > >>> > an
>>>> > >>> > > > > insert
>>>> > >>> > > > > > > > > >> operator.
>>>> > >>> > > > > > > > > >> > We have used it to perform update and
>>>> inserts.
>>>> > >>> > > > > > > > > >> >
>>>> > >>> > > > > > > > > >> > That said, I don't think it is a good idea to
>>>> > >>> introduce
>>>> > >>> > > > input
>>>> > >>> > > > > > > format
>>>> > >>> > > > > > > > > in
>>>> > >>> > > > > > > > > >> >  these abstract implementations. It is
>>>> written to
>>>> > >>> handle
>>>> > >>> > > any
>>>> > >>> > > > > > type
>>>> > >>> > > > > > > of
>>>> > >>> > > > > > > > > >> query,
>>>> > >>> > > > > > > > > >> > be it a procedure call (that was an actual
>>>> > customer
>>>> > >>> use
>>>> > >>> > > > case).
>>>> > >>> > > > > > > > > >> >
>>>> > >>> > > > > > > > > >> > Chandni
>>>> > >>> > > > > > > > > >> >
>>>> > >>> > > > > > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh
>>>> Chawda <
>>>> > >>> > > > > > > > > >> bhupesh@datatorrent.com>
>>>> > >>> > > > > > > > > >> > wrote:
>>>> > >>> > > > > > > > > >> >
>>>> > >>> > > > > > > > > >> > > Hi All,
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > > We are planning to proceed with the
>>>> following
>>>> > >>> approach
>>>> > >>> > > for
>>>> > >>> > > > > > JDBC
>>>> > >>> > > > > > > > > >> *update*
>>>> > >>> > > > > > > > > >> > > operator:
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > >    - *Update Query Configuration*
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > >    - Example Update Query: *update
>>>> tableName set
>>>> > >>> a =
>>>> > >>> > ?**
>>>> > >>> > > > > > where b
>>>> > >>> > > > > > > > = ?
>>>> > >>> > > > > > > > > >> and
>>>> > >>> > > > > > > > > >> > c
>>>> > >>> > > > > > > > > >> > >    > ?;*
>>>> > >>> > > > > > > > > >> > >    - Example JSON input array for parameter
>>>> > >>> > > > instantiations:
>>>> > >>> > > > > > > *[{a,
>>>> > >>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression,
>>>> > >>> VARCHAR}, {c,
>>>> > >>> > > > > > > expression,
>>>> > >>> > > > > > > > > >> > DATE}]*
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > > We are also planning to change the JDBC
>>>> Output
>>>> > >>> > Operator
>>>> > >>> > > in
>>>> > >>> > > > > > > Malhar
>>>> > >>> > > > > > > > > >> Library
>>>> > >>> > > > > > > > > >> > > which currently does just insert. We plan
>>>> to
>>>> > make
>>>> > >>> the
>>>> > >>> > > > input
>>>> > >>> > > > > > > format
>>>> > >>> > > > > > > > > >> > > consistent for both insert and update and
>>>> hence
>>>> > >>> the
>>>> > >>> > > change
>>>> > >>> > > > > to
>>>> > >>> > > > > > > the
>>>> > >>> > > > > > > > > >> current
>>>> > >>> > > > > > > > > >> > > way of configuration using JSON. Following
>>>> would
>>>> > >>> be
>>>> > >>> > the
>>>> > >>> > > > > config
>>>> > >>> > > > > > > for
>>>> > >>> > > > > > > > > >> > inserts:
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > >    - *Insert Query Configuration*
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > >    - Example Insert Query: *insert into
>>>> > tableName
>>>> > >>> > values
>>>> > >>> > > > (?,
>>>> > >>> > > > > > ?,
>>>> > >>> > > > > > > > .. ,
>>>> > >>> > > > > > > > > >> ?);*
>>>> > >>> > > > > > > > > >> > >    - Example JSON input array for parameter
>>>> > >>> > > > instantiations:
>>>> > >>> > > > > > > *[{a,
>>>> > >>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression,
>>>> > >>> VARCHAR}, ..
>>>> > >>> > ,
>>>> > >>> > > > {c,
>>>> > >>> > > > > > > > > >> expression,
>>>> > >>> > > > > > > > > >> > > DATE}]*
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > > Please let us know your thoughts.
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > > Thanks.
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > > -Bhupesh
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh
>>>> Chawda <
>>>> > >>> > > > > > > > > >> bhupesh@datatorrent.com>
>>>> > >>> > > > > > > > > >> > > wrote:
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> > > > Hi All,
>>>> > >>> > > > > > > > > >> > > >
>>>> > >>> > > > > > > > > >> > > > Would it be a good idea to introduce the
>>>> > update
>>>> > >>> > > > > > functionality
>>>> > >>> > > > > > > to
>>>> > >>> > > > > > > > > the
>>>> > >>> > > > > > > > > >> > JDBC
>>>> > >>> > > > > > > > > >> > > > output operator in Apache Apex Malhar
>>>> library.
>>>> > >>> > > > > > > > > >> > > >
>>>> > >>> > > > > > > > > >> > > > The following are possible approaches:
>>>> > >>> > > > > > > > > >> > > >
>>>> > >>> > > > > > > > > >> > > >    1. Accept a update query from the
>>>> user with
>>>> > >>> place
>>>> > >>> > > > > holders
>>>> > >>> > > > > > > for
>>>> > >>> > > > > > > > > >> > values.
>>>> > >>> > > > > > > > > >> > > >    Example: *update tableName set a = ?,
>>>> b = ?
>>>> > >>> > where c
>>>> > >>> > > > = ?
>>>> > >>> > > > > > and
>>>> > >>> > > > > > > > d >
>>>> > >>> > > > > > > > > >> ?*.
>>>> > >>> > > > > > > > > >> > > >    Here "?" will be provided by the user
>>>> as
>>>> > java
>>>> > >>> > > > > expressions
>>>> > >>> > > > > > > > which
>>>> > >>> > > > > > > > > >> will
>>>> > >>> > > > > > > > > >> > > be
>>>> > >>> > > > > > > > > >> > > >    evaluated from the incoming tuple.
>>>> > >>> > > > > > > > > >> > > >    2. Another option is to accept in some
>>>> > >>> > > configuration
>>>> > >>> > > > > > format
>>>> > >>> > > > > > > > > >> (json /
>>>> > >>> > > > > > > > > >> > > >    xml) the following and formulate the
>>>> query
>>>> > >>> in the
>>>> > >>> > > > > > operator.
>>>> > >>> > > > > > > > > This
>>>> > >>> > > > > > > > > >> can
>>>> > >>> > > > > > > > > >> > > become
>>>> > >>> > > > > > > > > >> > > >    arbitrarily complex.
>>>> > >>> > > > > > > > > >> > > >    1. update clause columns
>>>> > >>> > > > > > > > > >> > > >       2. update clause expressions
>>>> > >>> > > > > > > > > >> > > >       3. where clause columns
>>>> > >>> > > > > > > > > >> > > >       4. where clause expressions
>>>> > >>> > > > > > > > > >> > > >
>>>> > >>> > > > > > > > > >> > > > I am thinking about going ahead with 1.
>>>> Please
>>>> > >>> let
>>>> > >>> > me
>>>> > >>> > > > know
>>>> > >>> > > > > > if
>>>> > >>> > > > > > > > any
>>>> > >>> > > > > > > > > >> other
>>>> > >>> > > > > > > > > >> > > > option is possible and whether such a
>>>> > >>> functionality
>>>> > >>> > > > > already
>>>> > >>> > > > > > > > exists
>>>> > >>> > > > > > > > > >> in
>>>> > >>> > > > > > > > > >> > > some
>>>> > >>> > > > > > > > > >> > > > other class.
>>>> > >>> > > > > > > > > >> > > >
>>>> > >>> > > > > > > > > >> > > > Thanks.
>>>> > >>> > > > > > > > > >> > > >
>>>> > >>> > > > > > > > > >> > > > -Bhupesh
>>>> > >>> > > > > > > > > >> > > >
>>>> > >>> > > > > > > > > >> > >
>>>> > >>> > > > > > > > > >> >
>>>> > >>> > > > > > > > > >>
>>>> > >>> > > > > > > > > >
>>>> > >>> > > > > > > > > >
>>>> > >>> > > > > > > > >
>>>> > >>> > > > > > > >
>>>> > >>> > > > > > >
>>>> > >>> > > > > >
>>>> > >>> > > > >
>>>> > >>> > > >
>>>> > >>> > >
>>>> > >>> >
>>>> > >>>
>>>> > >>
>>>> > >>
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Changes to JdbcOutputOperator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi,

I have made the changes as discussed in previous emails.
Here is the pull request:
https://github.com/apache/incubator-apex-malhar/pull/215

Please help review the changes.

Thanks
~Bhupesh

On Tue, Mar 15, 2016 at 6:57 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Hi,
>
> Here are some of the details on what happens when automatic mapping is
> done from POJO fields to DB columns ( i.e. when FieldInfo is not provided)
>
>    - The reference list of columns is *all* the columns in the database
>    table.
>    - If the same field is found in POJO as in DB table, then we include
>    it in the columns to be inserted and create the corresponding getter from
>    POJO.
>    - If the same field is not found in the POJO, but the field is a
>    *nullable* field in the DB, then we drop it from the insert column
>    list. This means we essentially insert a *null* value for that field.
>    - If the same field is not found in the POJO and the field is a
>    *non-nullable* field in the DB, then we throw an exception.
>
> ~Bhupesh
>
> On Tue, Mar 15, 2016 at 2:57 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
>> Hi All,
>>
>> Here is the design I am considering for including the additional feature
>> as mentioned in the previous email. Please note that this includes the
>> previous feedback received on the design.
>>
>>    1. Rename JdbcPOJOOutputOperator -> *AbstractJdbcPOJOOutputOperator*
>>    2. Add *JdbcPOJOInsertOutputOperator* extending from
>>    *AbstractJdbcPOJOOutputOperator*. This will be used for "*insert*"
>>    output query.
>>       1. If FieldInfo (Mapping of POJO fields to DB columns) is provided
>>       by the user - then use only these columns to build the insert SQL string.
>>       Return this insert query as the update command.
>>       2. If FieldInfo is not provided by the user - then assume a direct
>>       mapping between POJO fields and the DB columns. Extract list of columns
>>       from DB and populate all columns from POJO fields. The POJO fields will be
>>       assumed to be the same as DB column names.
>>    3. Add *JdbcPOJONonInsertOutputOperator *extending from
>>    *AbstractJdbcPOJOOutputOperator*. This will be used for "*update /
>>    merge / delete*" output queries. [This design stays the same as
>>    discussed previously]
>>    1. Accept a parametrized SQL string
>>       2. Use the FieldInfo provided by the user to populate the
>>       parameters in the query.
>>       3. Execute the query.
>>
>> Please provide your thoughts on this.
>>
>> Thanks.
>> ~Bhupesh
>>
>>
>> On Mon, Mar 14, 2016 at 6:29 PM, Priyanka Gugale <
>> priyanka@datatorrent.com> wrote:
>>
>>> +1, this is simplifying operator configuration, we should do same for
>>> other
>>> DB operators as well.
>>>
>>> -Priyanka
>>>
>>> On Thu, Mar 10, 2016 at 5:34 PM, Bhupesh Chawda <bhupesh@datatorrent.com
>>> >
>>> wrote:
>>>
>>> > Hi All,
>>> >
>>> > Apart from extending the JDBC Output operator (for update, merge and
>>> > delete), I am planning the following change to the "*insert*"
>>> > functionality:
>>> >
>>> >
>>> >    - It was pointed out that - if the incoming POJO has fields which
>>> are to
>>> >    be inserted into a database table, the user should not have to
>>> specify
>>> > any
>>> >    kind of mapping between POJO columns and database table column
>>> names.
>>> > This
>>> >    is possible if the POJO field names are same as the db column names.
>>> >
>>> >
>>> >    - Here is what I am planning to do to address this:
>>> >       - Have a flag which indicates "*No mapping required. Assume same
>>> POJO
>>> >       field names as the DB column names.*"
>>> >       - If this flag is true, extract DB column names (in addition to
>>> the
>>> >       types, which is done currently) and create getters from POJO
>>> based
>>> > on the
>>> >       DB column types.
>>> >       - Use these getters to populate the parametrized "*insert*"
>>> query.
>>> >
>>> >
>>> > Note - Once this is done, the user would not have to specify any field
>>> > descriptions. Just the DB table name would be sufficient. This would
>>> only
>>> > be possible if the DB column names are the same as the field names in
>>> the
>>> > POJO.
>>> > Without this change, "*insert*" will be considered as just another DDL
>>> > query like "update" / "delete" / "merge". We accept a parametrized
>>> query
>>> > and a mapping from the user for POJO field names to DB column names +
>>> DB
>>> > column types.
>>> >
>>> > Thanks.
>>> > Bhupesh
>>> >
>>> >
>>> > On Wed, Dec 30, 2015 at 12:56 PM, Bhupesh Chawda <
>>> bhupesh@datatorrent.com>
>>> > wrote:
>>> >
>>> > > Also, since we are not creating separate operators for insert and
>>> > > update/merge, it seems we don't need an abstract class. We can
>>> directly
>>> > > modify JdbcPOJOOutputOperator.
>>> > >
>>> > > -Bhupesh
>>> > >
>>> > > On Wed, Dec 30, 2015 at 11:17 AM, Bhupesh Chawda <
>>> > bhupesh@datatorrent.com>
>>> > > wrote:
>>> > >
>>> > >> Also, a note regarding accepting fieldInfo objects:
>>> > >>
>>> > >> FieldInfo has fields for java type of the column, but not the SQL
>>> data
>>> > >> type which is needed to set the parameters in the SQL statement.
>>> This
>>> > still
>>> > >> needs to be derived from the database (as in insert).
>>> > >> In more complex scenarios, as mentioned earlier, this may not
>>> always be
>>> > >> possible. In this case we have the following options:
>>> > >>
>>> > >>    1. Accept the SQL data types from the user for the parameters
>>> ("?")
>>> > >>    in a new data structure.
>>> > >>    2. Accept the JSON structure as specified earlier.
>>> > >>    3. Modify FieldInfo to have an additional optional field which
>>> > >>    indicates the SQL data type of the parameter ("?")
>>> > >>
>>> > >> I am planning to go ahead with  option 3.
>>> > >>
>>> > >> Comments?
>>> > >>
>>> > >> -Bhupesh
>>> > >>
>>> > >>
>>> > >>
>>> > >> On Mon, Dec 28, 2015 at 11:26 PM, Chandni Singh <
>>> > chandni@datatorrent.com>
>>> > >> wrote:
>>> > >>
>>> > >>> Yeah that sounds good.
>>> > >>>
>>> > >>> Chandni
>>> > >>>
>>> > >>> On Mon, Dec 28, 2015 at 2:46 AM, Bhupesh Chawda <
>>> > bhupesh@datatorrent.com
>>> > >>> >
>>> > >>> wrote:
>>> > >>>
>>> > >>> > Thanks Chandni for the comments.
>>> > >>> >
>>> > >>> > Additionally I think, we should do away with constructing the SQL
>>> > >>> query in
>>> > >>> > the operator. This is because it is only possible in case of
>>> simple
>>> > >>> insert
>>> > >>> > statements. In case a where clause is needed in the insert
>>> statement,
>>> > >>> we
>>> > >>> > cannot construct the SQL easily. If we are accepting a
>>> parametrized
>>> > >>> query
>>> > >>> > from the user for update/merge, why not do the same for insert
>>> > >>> statements
>>> > >>> > as well? Then the hierarchy would look like:
>>> > >>> > - AbstractJdbcPOJOOutputOperator (As suggested)
>>> > >>> > -- JdbcPOJOOutputOperator (Takes care of all statements insert,
>>> > update,
>>> > >>> > merge, delete)
>>> > >>> >
>>> > >>> > We don't need a separate operator for inserts and update/merge.
>>> > >>> >
>>> > >>> > Thanks.
>>> > >>> > -Bhupesh
>>> > >>> >
>>> > >>> >
>>> > >>> > On Mon, Dec 28, 2015 at 3:32 PM, Chandni Singh <
>>> > >>> chandni@datatorrent.com>
>>> > >>> > wrote:
>>> > >>> >
>>> > >>> > >   1. FieldInfo, is ultimately a custom object and any user who
>>> uses
>>> > >>> this
>>> > >>> > >    operator has to construct an object, populate it and then
>>> use
>>> > it.
>>> > >>> We
>>> > >>> > are
>>> > >>> > >    trying to avoid using any custom object and allow any user
>>> to
>>> > use
>>> > >>> the
>>> > >>> > >    operator without writing any extra code; just configuration.
>>> > >>> > > FieldInfo is a way to provide configuration in a UI friendly
>>> way.
>>> > >>> > Providing
>>> > >>> > > configuration as JSON is not UI friendly.
>>> > >>> > >
>>> > >>> > >    2. In case of update / merge, we need what SQL data types
>>> that
>>> > the
>>> > >>> > >    expression provided by the user would evaluate to. In case
>>> of
>>> > >>> insert
>>> > >>> > we
>>> > >>> > > can
>>> > >>> > >    go and fetch the data types of the columns directly from DB.
>>> > >>> However,
>>> > >>> > > the
>>> > >>> > >    same is not possible for custom expressions; "avg(salary)"
>>> for
>>> > >>> > instance.
>>> > >>> > > Ok so here is where you can make a change.
>>> > >>> > > - JDBCPojoOutput can be renamed to
>>> AbstractJDBCPojoOutpuOperator.
>>> > >>> > > - Abstraction is to fetch the type of column.
>>> > >>> > > - Add a concrete JDBCPojoInsertOutput  that derives the types
>>> of
>>> > >>> columns
>>> > >>> > > directly from DB. Please note that FieldInfo can also provide
>>> type
>>> > of
>>> > >>> > > derived column.
>>> > >>> > >
>>> > >>> > >
>>> > >>> > > You mentioned "We are trying to avoid using any custom object
>>> and
>>> > >>> allow
>>> > >>> > any
>>> > >>> > > user to use the
>>> > >>> > >    operator without writing any extra code".
>>> > >>> > > This I think is specific to your use case. You can create an
>>> > >>> extension of
>>> > >>> > > the above which takes JSON blob and creates FieldInfos from it.
>>> > >>> > >
>>> > >>> > > Chandni
>>> > >>> > >
>>> > >>> > >
>>> > >>> > >
>>> > >>> > >
>>> > >>> > >
>>> > >>> > > On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <
>>> > >>> bhupesh@datatorrent.com
>>> > >>> > >
>>> > >>> > > wrote:
>>> > >>> > >
>>> > >>> > > > Hi Chandni,
>>> > >>> > > >
>>> > >>> > > > Following are the issues:
>>> > >>> > > >
>>> > >>> > > >    1. FieldInfo, is ultimately a custom object and any user
>>> who
>>> > >>> uses
>>> > >>> > this
>>> > >>> > > >    operator has to construct an object, populate it and then
>>> use
>>> > >>> it. We
>>> > >>> > > are
>>> > >>> > > >    trying to avoid using any custom object and allow any
>>> user to
>>> > >>> use
>>> > >>> > the
>>> > >>> > > >    operator without writing any extra code; just
>>> configuration.
>>> > >>> > > >    2. In case of update / merge, we need what SQL data types
>>> that
>>> > >>> the
>>> > >>> > > >    expression provided by the user would evaluate to. In
>>> case of
>>> > >>> insert
>>> > >>> > > we
>>> > >>> > > > can
>>> > >>> > > >    go and fetch the data types of the columns directly from
>>> DB.
>>> > >>> > However,
>>> > >>> > > > the
>>> > >>> > > >    same is not possible for custom expressions;
>>> "avg(salary)" for
>>> > >>> > > instance.
>>> > >>> > > >
>>> > >>> > > > Thanks.
>>> > >>> > > >
>>> > >>> > > > -Bhupesh
>>> > >>> > > >
>>> > >>> > > >
>>> > >>> > > > On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <
>>> > >>> > chandni@datatorrent.com>
>>> > >>> > > > wrote:
>>> > >>> > > >
>>> > >>> > > > > Hi Bhupesh,
>>> > >>> > > > >
>>> > >>> > > > > JDBCPojoOutputOperator was written for a demo and
>>> therefore it
>>> > >>> was
>>> > >>> > > marked
>>> > >>> > > > > Evolving which is why I had mentioned that you should feel
>>> free
>>> > >>> to
>>> > >>> > > modify
>>> > >>> > > > > it.
>>> > >>> > > > >
>>> > >>> > > > > I think an insert query can be as complex as any other
>>> query.
>>> > It
>>> > >>> uses
>>> > >>> > > > > FieldInfo because in the app builder it is easy for the
>>> user to
>>> > >>> > provide
>>> > >>> > > > > that instead of JSON String.
>>> > >>> > > > >
>>> > >>> > > > > Can you please provide specifics about what it is that you
>>> find
>>> > >>> > > difficult
>>> > >>> > > > > to change/implement for providing update/merge/delete
>>> support
>>> > in
>>> > >>> > > > > JDBCPojoOutputOperator?
>>> > >>> > > > >
>>> > >>> > > > > Chandni
>>> > >>> > > > >
>>> > >>> > > > > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <
>>> > >>> > > bhupesh@datatorrent.com
>>> > >>> > > > >
>>> > >>> > > > > wrote:
>>> > >>> > > > >
>>> > >>> > > > > > Hi All,
>>> > >>> > > > > >
>>> > >>> > > > > > It has been pointed out that adding another class for
>>> > handling
>>> > >>> > > update /
>>> > >>> > > > > > merge queries would not be a good option.
>>> > >>> > > > > >
>>> > >>> > > > > > Here are the current implementation details:
>>> > >>> > > > > >
>>> > >>> > > > > >    - We have an existing class: JdbcPOJOOutputOperator
>>> which
>>> > >>> > accepts
>>> > >>> > > a
>>> > >>> > > > > list
>>> > >>> > > > > >    of FieldInfo objects. Each element of this list
>>> indicates
>>> > >>> the
>>> > >>> > > > details
>>> > >>> > > > > >    (column name, pojo field expression and datatype) of
>>> > fields
>>> > >>> that
>>> > >>> > > > need
>>> > >>> > > > > > to be
>>> > >>> > > > > >    inserted. Using this, the operator formulates the
>>> insert
>>> > >>> query
>>> > >>> > in
>>> > >>> > > > the
>>> > >>> > > > > > setup
>>> > >>> > > > > >    method and identifies the sql datatypes of these
>>> columns
>>> > >>> from
>>> > >>> > the
>>> > >>> > > > > > database
>>> > >>> > > > > >    using the table name.
>>> > >>> > > > > >
>>> > >>> > > > > >
>>> > >>> > > > > >    - Now, coming to the update / merge feature, it is
>>> > >>> difficult to
>>> > >>> > > > > >    formulate the update / merge query in the operator
>>> logic
>>> > >>> due to
>>> > >>> > > the
>>> > >>> > > > > > complex
>>> > >>> > > > > >    structure of these statements. For this reason, we
>>> plan to
>>> > >>> take
>>> > >>> > a
>>> > >>> > > > > >    parametrized SQL query from the user. This may look
>>> like:
>>> > >>> > *"update
>>> > >>> > > > > table
>>> > >>> > > > > >    set x = ?, y = ? where z + w > ? and a == 1;"*. Such
>>> > >>> statements
>>> > >>> > > can
>>> > >>> > > > be
>>> > >>> > > > > >    accepted from the user in addition to a json string
>>> which
>>> > >>> > > indicates
>>> > >>> > > > > the
>>> > >>> > > > > >    details for the parameters: *column name, the pojo
>>> > >>> expression
>>> > >>> > and
>>> > >>> > > > the
>>> > >>> > > > > >    sql data type* of the expression. Note that this
>>> > >>> information is
>>> > >>> > > > > similar
>>> > >>> > > > > >    to the FieldInfo object, but is a string which can be
>>> > >>> configured
>>> > >>> > > > > easily
>>> > >>> > > > > > by
>>> > >>> > > > > >    the user. Also note that the data type which is
>>> accepted
>>> > is
>>> > >>> the
>>> > >>> > > SQL
>>> > >>> > > > > data
>>> > >>> > > > > >    type. Using this info we can populate the parametrized
>>> > >>> query and
>>> > >>> > > run
>>> > >>> > > > > it
>>> > >>> > > > > >    based on the incoming POJO.
>>> > >>> > > > > >
>>> > >>> > > > > > The second approach is able to handle all kinds of
>>> queries
>>> > >>> (insert
>>> > >>> > /
>>> > >>> > > > > update
>>> > >>> > > > > > / merge / delete). However, since we already have the
>>> > >>> > > > > > JdbcPOJOOutputOperator, we would like to merge the new
>>> > >>> > functionality
>>> > >>> > > > into
>>> > >>> > > > > > the same class.
>>> > >>> > > > > >
>>> > >>> > > > > > Here we have the following options:
>>> > >>> > > > > >
>>> > >>> > > > > >    1. Change the existing class (JdbcPOJOOutputOperator)
>>> to
>>> > the
>>> > >>> > > second
>>> > >>> > > > > >    approach which is more generic and also handles
>>> inserts.
>>> > >>> > > > > >    2. Add the update/ merge functionality to the existing
>>> > class
>>> > >>> > > without
>>> > >>> > > > > >    changing the existing functionality. This will have
>>> two
>>> > >>> > different
>>> > >>> > > > ways
>>> > >>> > > > > > that
>>> > >>> > > > > >    insert queries may be handled in the operator.
>>> > >>> > > > > >    3. Add another class which extends from
>>> > >>> JdbcPOJOOutputOperator
>>> > >>> > and
>>> > >>> > > > > have
>>> > >>> > > > > >    the update/merge functionality there. (This is not
>>> > >>> recommended.)
>>> > >>> > > > > >    4. Any other approach.
>>> > >>> > > > > >
>>> > >>> > > > > > Please suggest.
>>> > >>> > > > > >
>>> > >>> > > > > > Thanks.
>>> > >>> > > > > >
>>> > >>> > > > > > -Bhupesh.
>>> > >>> > > > > >
>>> > >>> > > > > >
>>> > >>> > > > > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
>>> > >>> > > > chandni@datatorrent.com
>>> > >>> > > > > >
>>> > >>> > > > > > wrote:
>>> > >>> > > > > >
>>> > >>> > > > > > > No I don't think we are restricting Malhar to just
>>> abstract
>>> > >>> > > classes.
>>> > >>> > > > > > > Whenever they are couple of use cases that we see quite
>>> > >>> often, we
>>> > >>> > > add
>>> > >>> > > > > > > concrete implementations.
>>> > >>> > > > > > >
>>> > >>> > > > > > > For eg. FileLineInputOperator which is a concrete
>>> > >>> implementation
>>> > >>> > of
>>> > >>> > > > > > > AbstractFileInputOperator. FSSliceReader is an example
>>> as
>>> > >>> well.
>>> > >>> > > > > > >
>>> > >>> > > > > > > In AbstractJdbcOutputOperator case there hasn't been
>>> such
>>> > >>> common
>>> > >>> > > > > > > insert/update query.
>>> > >>> > > > > > >
>>> > >>> > > > > > > Also if you look at the example I provided, it is very
>>> > >>> simple to
>>> > >>> > > > > provide
>>> > >>> > > > > > a
>>> > >>> > > > > > > concrete implementation.
>>> > >>> > > > > > >
>>> > >>> > > > > > > If you would like to change JdbcPOJOOutputOperator to
>>> work
>>> > >>> for
>>> > >>> > > > > > > "UPDATE/MERGE" then please go ahead.
>>> > >>> > > > > > >
>>> > >>> > > > > > > Chandni
>>> > >>> > > > > > >
>>> > >>> > > > > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
>>> > >>> > > > > bhupesh@datatorrent.com
>>> > >>> > > > > > >
>>> > >>> > > > > > > wrote:
>>> > >>> > > > > > >
>>> > >>> > > > > > > > I see. So, just to understand more, do we plan to
>>> keep
>>> > >>> Malhar
>>> > >>> > > > > > restricted
>>> > >>> > > > > > > to
>>> > >>> > > > > > > > the base functionality (as in abstract classes)? And
>>> put
>>> > >>> the
>>> > >>> > > > > > > configuration
>>> > >>> > > > > > > > aspect / concrete implementations in apps that use
>>> these
>>> > >>> > > operators?
>>> > >>> > > > > > > >
>>> > >>> > > > > > > > Thanks.
>>> > >>> > > > > > > > Bhupesh
>>> > >>> > > > > > > >
>>> > >>> > > > > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
>>> > >>> > > > > > chandni@datatorrent.com>
>>> > >>> > > > > > > > wrote:
>>> > >>> > > > > > > >
>>> > >>> > > > > > > > > Hi,
>>> > >>> > > > > > > > >
>>> > >>> > > > > > > > > Here is an example of doing Upsert with JDBC:
>>> > >>> > > > > > > > >
>>> > >>> > > > > > > > >
>>> > >>> > > > > > > >
>>> > >>> > > > > > >
>>> > >>> > > > > >
>>> > >>> > > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> >
>>> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
>>> > >>> > > > > > > > >
>>> > >>> > > > > > > > > Thanks,
>>> > >>> > > > > > > > > Chandni
>>> > >>> > > > > > > > >
>>> > >>> > > > > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
>>> > >>> > > > > > > chandni@datatorrent.com
>>> > >>> > > > > > > > >
>>> > >>> > > > > > > > > wrote:
>>> > >>> > > > > > > > >
>>> > >>> > > > > > > > > > The operators are under Malhar/lib/db/jdbc.
>>> > >>> > > > > > > > > >
>>> > >>> > > > > > > > > > Here is one of them:
>>> > >>> > > > > > > > > >
>>> > >>> > > > > > > > > >
>>> > >>> > > > > > > > >
>>> > >>> > > > > > > >
>>> > >>> > > > > > >
>>> > >>> > > > > >
>>> > >>> > > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> >
>>> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
>>> > >>> > > > > > > > > >
>>> > >>> > > > > > > > > > They work with any kind PreparedStatement -
>>> insert or
>>> > >>> > update
>>> > >>> > > > > > > > > >
>>> > >>> > > > > > > > > > Chandni
>>> > >>> > > > > > > > > >
>>> > >>> > > > > > > > > >
>>> > >>> > > > > > > > > >
>>> > >>> > > > > > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda
>>> <
>>> > >>> > > > > > > > > bhupesh@datatorrent.com>
>>> > >>> > > > > > > > > > wrote:
>>> > >>> > > > > > > > > >
>>> > >>> > > > > > > > > >> Hi Chandni,
>>> > >>> > > > > > > > > >>
>>> > >>> > > > > > > > > >> I don't see an update query being handled in the
>>> > >>> operator.
>>> > >>> > > > Could
>>> > >>> > > > > > you
>>> > >>> > > > > > > > > >> please
>>> > >>> > > > > > > > > >> point me to the appropriate class?
>>> > >>> > > > > > > > > >> Or did you mean that handling a update query is
>>> > just a
>>> > >>> > > matter
>>> > >>> > > > of
>>> > >>> > > > > > > > > extending
>>> > >>> > > > > > > > > >> the class and providing a concrete
>>> implementation?
>>> > >>> > > > > > > > > >>
>>> > >>> > > > > > > > > >> Thanks.
>>> > >>> > > > > > > > > >> -Bhupesh
>>> > >>> > > > > > > > > >>
>>> > >>> > > > > > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh
>>> <
>>> > >>> > > > > > > > > chandni@datatorrent.com>
>>> > >>> > > > > > > > > >> wrote:
>>> > >>> > > > > > > > > >>
>>> > >>> > > > > > > > > >> > Hi Bhupesh,
>>> > >>> > > > > > > > > >> >
>>> > >>> > > > > > > > > >> > The current abstract JDBC Output Operators  in
>>> > >>> library
>>> > >>> > are
>>> > >>> > > > > > generic
>>> > >>> > > > > > > > and
>>> > >>> > > > > > > > > >> have
>>> > >>> > > > > > > > > >> > already been used in multiple POCs and
>>> > >>> applications. In
>>> > >>> > > fact
>>> > >>> > > > > > this
>>> > >>> > > > > > > > > >> operator
>>> > >>> > > > > > > > > >> > has matured through customer use cases. It is
>>> not
>>> > >>> just
>>> > >>> > an
>>> > >>> > > > > insert
>>> > >>> > > > > > > > > >> operator.
>>> > >>> > > > > > > > > >> > We have used it to perform update and inserts.
>>> > >>> > > > > > > > > >> >
>>> > >>> > > > > > > > > >> > That said, I don't think it is a good idea to
>>> > >>> introduce
>>> > >>> > > > input
>>> > >>> > > > > > > format
>>> > >>> > > > > > > > > in
>>> > >>> > > > > > > > > >> >  these abstract implementations. It is
>>> written to
>>> > >>> handle
>>> > >>> > > any
>>> > >>> > > > > > type
>>> > >>> > > > > > > of
>>> > >>> > > > > > > > > >> query,
>>> > >>> > > > > > > > > >> > be it a procedure call (that was an actual
>>> > customer
>>> > >>> use
>>> > >>> > > > case).
>>> > >>> > > > > > > > > >> >
>>> > >>> > > > > > > > > >> > Chandni
>>> > >>> > > > > > > > > >> >
>>> > >>> > > > > > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh
>>> Chawda <
>>> > >>> > > > > > > > > >> bhupesh@datatorrent.com>
>>> > >>> > > > > > > > > >> > wrote:
>>> > >>> > > > > > > > > >> >
>>> > >>> > > > > > > > > >> > > Hi All,
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > > We are planning to proceed with the
>>> following
>>> > >>> approach
>>> > >>> > > for
>>> > >>> > > > > > JDBC
>>> > >>> > > > > > > > > >> *update*
>>> > >>> > > > > > > > > >> > > operator:
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > >    - *Update Query Configuration*
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > >    - Example Update Query: *update
>>> tableName set
>>> > >>> a =
>>> > >>> > ?**
>>> > >>> > > > > > where b
>>> > >>> > > > > > > > = ?
>>> > >>> > > > > > > > > >> and
>>> > >>> > > > > > > > > >> > c
>>> > >>> > > > > > > > > >> > >    > ?;*
>>> > >>> > > > > > > > > >> > >    - Example JSON input array for parameter
>>> > >>> > > > instantiations:
>>> > >>> > > > > > > *[{a,
>>> > >>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression,
>>> > >>> VARCHAR}, {c,
>>> > >>> > > > > > > expression,
>>> > >>> > > > > > > > > >> > DATE}]*
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > > We are also planning to change the JDBC
>>> Output
>>> > >>> > Operator
>>> > >>> > > in
>>> > >>> > > > > > > Malhar
>>> > >>> > > > > > > > > >> Library
>>> > >>> > > > > > > > > >> > > which currently does just insert. We plan to
>>> > make
>>> > >>> the
>>> > >>> > > > input
>>> > >>> > > > > > > format
>>> > >>> > > > > > > > > >> > > consistent for both insert and update and
>>> hence
>>> > >>> the
>>> > >>> > > change
>>> > >>> > > > > to
>>> > >>> > > > > > > the
>>> > >>> > > > > > > > > >> current
>>> > >>> > > > > > > > > >> > > way of configuration using JSON. Following
>>> would
>>> > >>> be
>>> > >>> > the
>>> > >>> > > > > config
>>> > >>> > > > > > > for
>>> > >>> > > > > > > > > >> > inserts:
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > >    - *Insert Query Configuration*
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > >    - Example Insert Query: *insert into
>>> > tableName
>>> > >>> > values
>>> > >>> > > > (?,
>>> > >>> > > > > > ?,
>>> > >>> > > > > > > > .. ,
>>> > >>> > > > > > > > > >> ?);*
>>> > >>> > > > > > > > > >> > >    - Example JSON input array for parameter
>>> > >>> > > > instantiations:
>>> > >>> > > > > > > *[{a,
>>> > >>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression,
>>> > >>> VARCHAR}, ..
>>> > >>> > ,
>>> > >>> > > > {c,
>>> > >>> > > > > > > > > >> expression,
>>> > >>> > > > > > > > > >> > > DATE}]*
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > > Please let us know your thoughts.
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > > Thanks.
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > > -Bhupesh
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh
>>> Chawda <
>>> > >>> > > > > > > > > >> bhupesh@datatorrent.com>
>>> > >>> > > > > > > > > >> > > wrote:
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> > > > Hi All,
>>> > >>> > > > > > > > > >> > > >
>>> > >>> > > > > > > > > >> > > > Would it be a good idea to introduce the
>>> > update
>>> > >>> > > > > > functionality
>>> > >>> > > > > > > to
>>> > >>> > > > > > > > > the
>>> > >>> > > > > > > > > >> > JDBC
>>> > >>> > > > > > > > > >> > > > output operator in Apache Apex Malhar
>>> library.
>>> > >>> > > > > > > > > >> > > >
>>> > >>> > > > > > > > > >> > > > The following are possible approaches:
>>> > >>> > > > > > > > > >> > > >
>>> > >>> > > > > > > > > >> > > >    1. Accept a update query from the user
>>> with
>>> > >>> place
>>> > >>> > > > > holders
>>> > >>> > > > > > > for
>>> > >>> > > > > > > > > >> > values.
>>> > >>> > > > > > > > > >> > > >    Example: *update tableName set a = ?,
>>> b = ?
>>> > >>> > where c
>>> > >>> > > > = ?
>>> > >>> > > > > > and
>>> > >>> > > > > > > > d >
>>> > >>> > > > > > > > > >> ?*.
>>> > >>> > > > > > > > > >> > > >    Here "?" will be provided by the user
>>> as
>>> > java
>>> > >>> > > > > expressions
>>> > >>> > > > > > > > which
>>> > >>> > > > > > > > > >> will
>>> > >>> > > > > > > > > >> > > be
>>> > >>> > > > > > > > > >> > > >    evaluated from the incoming tuple.
>>> > >>> > > > > > > > > >> > > >    2. Another option is to accept in some
>>> > >>> > > configuration
>>> > >>> > > > > > format
>>> > >>> > > > > > > > > >> (json /
>>> > >>> > > > > > > > > >> > > >    xml) the following and formulate the
>>> query
>>> > >>> in the
>>> > >>> > > > > > operator.
>>> > >>> > > > > > > > > This
>>> > >>> > > > > > > > > >> can
>>> > >>> > > > > > > > > >> > > become
>>> > >>> > > > > > > > > >> > > >    arbitrarily complex.
>>> > >>> > > > > > > > > >> > > >    1. update clause columns
>>> > >>> > > > > > > > > >> > > >       2. update clause expressions
>>> > >>> > > > > > > > > >> > > >       3. where clause columns
>>> > >>> > > > > > > > > >> > > >       4. where clause expressions
>>> > >>> > > > > > > > > >> > > >
>>> > >>> > > > > > > > > >> > > > I am thinking about going ahead with 1.
>>> Please
>>> > >>> let
>>> > >>> > me
>>> > >>> > > > know
>>> > >>> > > > > > if
>>> > >>> > > > > > > > any
>>> > >>> > > > > > > > > >> other
>>> > >>> > > > > > > > > >> > > > option is possible and whether such a
>>> > >>> functionality
>>> > >>> > > > > already
>>> > >>> > > > > > > > exists
>>> > >>> > > > > > > > > >> in
>>> > >>> > > > > > > > > >> > > some
>>> > >>> > > > > > > > > >> > > > other class.
>>> > >>> > > > > > > > > >> > > >
>>> > >>> > > > > > > > > >> > > > Thanks.
>>> > >>> > > > > > > > > >> > > >
>>> > >>> > > > > > > > > >> > > > -Bhupesh
>>> > >>> > > > > > > > > >> > > >
>>> > >>> > > > > > > > > >> > >
>>> > >>> > > > > > > > > >> >
>>> > >>> > > > > > > > > >>
>>> > >>> > > > > > > > > >
>>> > >>> > > > > > > > > >
>>> > >>> > > > > > > > >
>>> > >>> > > > > > > >
>>> > >>> > > > > > >
>>> > >>> > > > > >
>>> > >>> > > > >
>>> > >>> > > >
>>> > >>> > >
>>> > >>> >
>>> > >>>
>>> > >>
>>> > >>
>>> > >
>>> >
>>>
>>
>>
>

Re: Changes to JdbcOutputOperator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi,

Here are some of the details on what happens when automatic mapping is done
from POJO fields to DB columns ( i.e. when FieldInfo is not provided)

   - The reference list of columns is *all* the columns in the database
   table.
   - If the same field is found in POJO as in DB table, then we include it
   in the columns to be inserted and create the corresponding getter from POJO.
   - If the same field is not found in the POJO, but the field is a
   *nullable* field in the DB, then we drop it from the insert column list.
   This means we essentially insert a *null* value for that field.
   - If the same field is not found in the POJO and the field is a
   *non-nullable* field in the DB, then we throw an exception.

~Bhupesh

On Tue, Mar 15, 2016 at 2:57 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Hi All,
>
> Here is the design I am considering for including the additional feature
> as mentioned in the previous email. Please note that this includes the
> previous feedback received on the design.
>
>    1. Rename JdbcPOJOOutputOperator -> *AbstractJdbcPOJOOutputOperator*
>    2. Add *JdbcPOJOInsertOutputOperator* extending from
>    *AbstractJdbcPOJOOutputOperator*. This will be used for "*insert*"
>    output query.
>       1. If FieldInfo (Mapping of POJO fields to DB columns) is provided
>       by the user - then use only these columns to build the insert SQL string.
>       Return this insert query as the update command.
>       2. If FieldInfo is not provided by the user - then assume a direct
>       mapping between POJO fields and the DB columns. Extract list of columns
>       from DB and populate all columns from POJO fields. The POJO fields will be
>       assumed to be the same as DB column names.
>    3. Add *JdbcPOJONonInsertOutputOperator *extending from
>    *AbstractJdbcPOJOOutputOperator*. This will be used for "*update /
>    merge / delete*" output queries. [This design stays the same as
>    discussed previously]
>    1. Accept a parametrized SQL string
>       2. Use the FieldInfo provided by the user to populate the
>       parameters in the query.
>       3. Execute the query.
>
> Please provide your thoughts on this.
>
> Thanks.
> ~Bhupesh
>
>
> On Mon, Mar 14, 2016 at 6:29 PM, Priyanka Gugale <priyanka@datatorrent.com
> > wrote:
>
>> +1, this is simplifying operator configuration, we should do same for
>> other
>> DB operators as well.
>>
>> -Priyanka
>>
>> On Thu, Mar 10, 2016 at 5:34 PM, Bhupesh Chawda <bh...@datatorrent.com>
>> wrote:
>>
>> > Hi All,
>> >
>> > Apart from extending the JDBC Output operator (for update, merge and
>> > delete), I am planning the following change to the "*insert*"
>> > functionality:
>> >
>> >
>> >    - It was pointed out that - if the incoming POJO has fields which
>> are to
>> >    be inserted into a database table, the user should not have to
>> specify
>> > any
>> >    kind of mapping between POJO columns and database table column names.
>> > This
>> >    is possible if the POJO field names are same as the db column names.
>> >
>> >
>> >    - Here is what I am planning to do to address this:
>> >       - Have a flag which indicates "*No mapping required. Assume same
>> POJO
>> >       field names as the DB column names.*"
>> >       - If this flag is true, extract DB column names (in addition to
>> the
>> >       types, which is done currently) and create getters from POJO based
>> > on the
>> >       DB column types.
>> >       - Use these getters to populate the parametrized "*insert*" query.
>> >
>> >
>> > Note - Once this is done, the user would not have to specify any field
>> > descriptions. Just the DB table name would be sufficient. This would
>> only
>> > be possible if the DB column names are the same as the field names in
>> the
>> > POJO.
>> > Without this change, "*insert*" will be considered as just another DDL
>> > query like "update" / "delete" / "merge". We accept a parametrized query
>> > and a mapping from the user for POJO field names to DB column names + DB
>> > column types.
>> >
>> > Thanks.
>> > Bhupesh
>> >
>> >
>> > On Wed, Dec 30, 2015 at 12:56 PM, Bhupesh Chawda <
>> bhupesh@datatorrent.com>
>> > wrote:
>> >
>> > > Also, since we are not creating separate operators for insert and
>> > > update/merge, it seems we don't need an abstract class. We can
>> directly
>> > > modify JdbcPOJOOutputOperator.
>> > >
>> > > -Bhupesh
>> > >
>> > > On Wed, Dec 30, 2015 at 11:17 AM, Bhupesh Chawda <
>> > bhupesh@datatorrent.com>
>> > > wrote:
>> > >
>> > >> Also, a note regarding accepting fieldInfo objects:
>> > >>
>> > >> FieldInfo has fields for java type of the column, but not the SQL
>> data
>> > >> type which is needed to set the parameters in the SQL statement. This
>> > still
>> > >> needs to be derived from the database (as in insert).
>> > >> In more complex scenarios, as mentioned earlier, this may not always
>> be
>> > >> possible. In this case we have the following options:
>> > >>
>> > >>    1. Accept the SQL data types from the user for the parameters
>> ("?")
>> > >>    in a new data structure.
>> > >>    2. Accept the JSON structure as specified earlier.
>> > >>    3. Modify FieldInfo to have an additional optional field which
>> > >>    indicates the SQL data type of the parameter ("?")
>> > >>
>> > >> I am planning to go ahead with  option 3.
>> > >>
>> > >> Comments?
>> > >>
>> > >> -Bhupesh
>> > >>
>> > >>
>> > >>
>> > >> On Mon, Dec 28, 2015 at 11:26 PM, Chandni Singh <
>> > chandni@datatorrent.com>
>> > >> wrote:
>> > >>
>> > >>> Yeah that sounds good.
>> > >>>
>> > >>> Chandni
>> > >>>
>> > >>> On Mon, Dec 28, 2015 at 2:46 AM, Bhupesh Chawda <
>> > bhupesh@datatorrent.com
>> > >>> >
>> > >>> wrote:
>> > >>>
>> > >>> > Thanks Chandni for the comments.
>> > >>> >
>> > >>> > Additionally I think, we should do away with constructing the SQL
>> > >>> query in
>> > >>> > the operator. This is because it is only possible in case of
>> simple
>> > >>> insert
>> > >>> > statements. In case a where clause is needed in the insert
>> statement,
>> > >>> we
>> > >>> > cannot construct the SQL easily. If we are accepting a
>> parametrized
>> > >>> query
>> > >>> > from the user for update/merge, why not do the same for insert
>> > >>> statements
>> > >>> > as well? Then the hierarchy would look like:
>> > >>> > - AbstractJdbcPOJOOutputOperator (As suggested)
>> > >>> > -- JdbcPOJOOutputOperator (Takes care of all statements insert,
>> > update,
>> > >>> > merge, delete)
>> > >>> >
>> > >>> > We don't need a separate operator for inserts and update/merge.
>> > >>> >
>> > >>> > Thanks.
>> > >>> > -Bhupesh
>> > >>> >
>> > >>> >
>> > >>> > On Mon, Dec 28, 2015 at 3:32 PM, Chandni Singh <
>> > >>> chandni@datatorrent.com>
>> > >>> > wrote:
>> > >>> >
>> > >>> > >   1. FieldInfo, is ultimately a custom object and any user who
>> uses
>> > >>> this
>> > >>> > >    operator has to construct an object, populate it and then use
>> > it.
>> > >>> We
>> > >>> > are
>> > >>> > >    trying to avoid using any custom object and allow any user to
>> > use
>> > >>> the
>> > >>> > >    operator without writing any extra code; just configuration.
>> > >>> > > FieldInfo is a way to provide configuration in a UI friendly
>> way.
>> > >>> > Providing
>> > >>> > > configuration as JSON is not UI friendly.
>> > >>> > >
>> > >>> > >    2. In case of update / merge, we need what SQL data types
>> that
>> > the
>> > >>> > >    expression provided by the user would evaluate to. In case of
>> > >>> insert
>> > >>> > we
>> > >>> > > can
>> > >>> > >    go and fetch the data types of the columns directly from DB.
>> > >>> However,
>> > >>> > > the
>> > >>> > >    same is not possible for custom expressions; "avg(salary)"
>> for
>> > >>> > instance.
>> > >>> > > Ok so here is where you can make a change.
>> > >>> > > - JDBCPojoOutput can be renamed to
>> AbstractJDBCPojoOutpuOperator.
>> > >>> > > - Abstraction is to fetch the type of column.
>> > >>> > > - Add a concrete JDBCPojoInsertOutput  that derives the types of
>> > >>> columns
>> > >>> > > directly from DB. Please note that FieldInfo can also provide
>> type
>> > of
>> > >>> > > derived column.
>> > >>> > >
>> > >>> > >
>> > >>> > > You mentioned "We are trying to avoid using any custom object
>> and
>> > >>> allow
>> > >>> > any
>> > >>> > > user to use the
>> > >>> > >    operator without writing any extra code".
>> > >>> > > This I think is specific to your use case. You can create an
>> > >>> extension of
>> > >>> > > the above which takes JSON blob and creates FieldInfos from it.
>> > >>> > >
>> > >>> > > Chandni
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > >
>> > >>> > > On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <
>> > >>> bhupesh@datatorrent.com
>> > >>> > >
>> > >>> > > wrote:
>> > >>> > >
>> > >>> > > > Hi Chandni,
>> > >>> > > >
>> > >>> > > > Following are the issues:
>> > >>> > > >
>> > >>> > > >    1. FieldInfo, is ultimately a custom object and any user
>> who
>> > >>> uses
>> > >>> > this
>> > >>> > > >    operator has to construct an object, populate it and then
>> use
>> > >>> it. We
>> > >>> > > are
>> > >>> > > >    trying to avoid using any custom object and allow any user
>> to
>> > >>> use
>> > >>> > the
>> > >>> > > >    operator without writing any extra code; just
>> configuration.
>> > >>> > > >    2. In case of update / merge, we need what SQL data types
>> that
>> > >>> the
>> > >>> > > >    expression provided by the user would evaluate to. In case
>> of
>> > >>> insert
>> > >>> > > we
>> > >>> > > > can
>> > >>> > > >    go and fetch the data types of the columns directly from
>> DB.
>> > >>> > However,
>> > >>> > > > the
>> > >>> > > >    same is not possible for custom expressions; "avg(salary)"
>> for
>> > >>> > > instance.
>> > >>> > > >
>> > >>> > > > Thanks.
>> > >>> > > >
>> > >>> > > > -Bhupesh
>> > >>> > > >
>> > >>> > > >
>> > >>> > > > On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <
>> > >>> > chandni@datatorrent.com>
>> > >>> > > > wrote:
>> > >>> > > >
>> > >>> > > > > Hi Bhupesh,
>> > >>> > > > >
>> > >>> > > > > JDBCPojoOutputOperator was written for a demo and therefore
>> it
>> > >>> was
>> > >>> > > marked
>> > >>> > > > > Evolving which is why I had mentioned that you should feel
>> free
>> > >>> to
>> > >>> > > modify
>> > >>> > > > > it.
>> > >>> > > > >
>> > >>> > > > > I think an insert query can be as complex as any other
>> query.
>> > It
>> > >>> uses
>> > >>> > > > > FieldInfo because in the app builder it is easy for the
>> user to
>> > >>> > provide
>> > >>> > > > > that instead of JSON String.
>> > >>> > > > >
>> > >>> > > > > Can you please provide specifics about what it is that you
>> find
>> > >>> > > difficult
>> > >>> > > > > to change/implement for providing update/merge/delete
>> support
>> > in
>> > >>> > > > > JDBCPojoOutputOperator?
>> > >>> > > > >
>> > >>> > > > > Chandni
>> > >>> > > > >
>> > >>> > > > > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <
>> > >>> > > bhupesh@datatorrent.com
>> > >>> > > > >
>> > >>> > > > > wrote:
>> > >>> > > > >
>> > >>> > > > > > Hi All,
>> > >>> > > > > >
>> > >>> > > > > > It has been pointed out that adding another class for
>> > handling
>> > >>> > > update /
>> > >>> > > > > > merge queries would not be a good option.
>> > >>> > > > > >
>> > >>> > > > > > Here are the current implementation details:
>> > >>> > > > > >
>> > >>> > > > > >    - We have an existing class: JdbcPOJOOutputOperator
>> which
>> > >>> > accepts
>> > >>> > > a
>> > >>> > > > > list
>> > >>> > > > > >    of FieldInfo objects. Each element of this list
>> indicates
>> > >>> the
>> > >>> > > > details
>> > >>> > > > > >    (column name, pojo field expression and datatype) of
>> > fields
>> > >>> that
>> > >>> > > > need
>> > >>> > > > > > to be
>> > >>> > > > > >    inserted. Using this, the operator formulates the
>> insert
>> > >>> query
>> > >>> > in
>> > >>> > > > the
>> > >>> > > > > > setup
>> > >>> > > > > >    method and identifies the sql datatypes of these
>> columns
>> > >>> from
>> > >>> > the
>> > >>> > > > > > database
>> > >>> > > > > >    using the table name.
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > >    - Now, coming to the update / merge feature, it is
>> > >>> difficult to
>> > >>> > > > > >    formulate the update / merge query in the operator
>> logic
>> > >>> due to
>> > >>> > > the
>> > >>> > > > > > complex
>> > >>> > > > > >    structure of these statements. For this reason, we
>> plan to
>> > >>> take
>> > >>> > a
>> > >>> > > > > >    parametrized SQL query from the user. This may look
>> like:
>> > >>> > *"update
>> > >>> > > > > table
>> > >>> > > > > >    set x = ?, y = ? where z + w > ? and a == 1;"*. Such
>> > >>> statements
>> > >>> > > can
>> > >>> > > > be
>> > >>> > > > > >    accepted from the user in addition to a json string
>> which
>> > >>> > > indicates
>> > >>> > > > > the
>> > >>> > > > > >    details for the parameters: *column name, the pojo
>> > >>> expression
>> > >>> > and
>> > >>> > > > the
>> > >>> > > > > >    sql data type* of the expression. Note that this
>> > >>> information is
>> > >>> > > > > similar
>> > >>> > > > > >    to the FieldInfo object, but is a string which can be
>> > >>> configured
>> > >>> > > > > easily
>> > >>> > > > > > by
>> > >>> > > > > >    the user. Also note that the data type which is
>> accepted
>> > is
>> > >>> the
>> > >>> > > SQL
>> > >>> > > > > data
>> > >>> > > > > >    type. Using this info we can populate the parametrized
>> > >>> query and
>> > >>> > > run
>> > >>> > > > > it
>> > >>> > > > > >    based on the incoming POJO.
>> > >>> > > > > >
>> > >>> > > > > > The second approach is able to handle all kinds of queries
>> > >>> (insert
>> > >>> > /
>> > >>> > > > > update
>> > >>> > > > > > / merge / delete). However, since we already have the
>> > >>> > > > > > JdbcPOJOOutputOperator, we would like to merge the new
>> > >>> > functionality
>> > >>> > > > into
>> > >>> > > > > > the same class.
>> > >>> > > > > >
>> > >>> > > > > > Here we have the following options:
>> > >>> > > > > >
>> > >>> > > > > >    1. Change the existing class (JdbcPOJOOutputOperator)
>> to
>> > the
>> > >>> > > second
>> > >>> > > > > >    approach which is more generic and also handles
>> inserts.
>> > >>> > > > > >    2. Add the update/ merge functionality to the existing
>> > class
>> > >>> > > without
>> > >>> > > > > >    changing the existing functionality. This will have two
>> > >>> > different
>> > >>> > > > ways
>> > >>> > > > > > that
>> > >>> > > > > >    insert queries may be handled in the operator.
>> > >>> > > > > >    3. Add another class which extends from
>> > >>> JdbcPOJOOutputOperator
>> > >>> > and
>> > >>> > > > > have
>> > >>> > > > > >    the update/merge functionality there. (This is not
>> > >>> recommended.)
>> > >>> > > > > >    4. Any other approach.
>> > >>> > > > > >
>> > >>> > > > > > Please suggest.
>> > >>> > > > > >
>> > >>> > > > > > Thanks.
>> > >>> > > > > >
>> > >>> > > > > > -Bhupesh.
>> > >>> > > > > >
>> > >>> > > > > >
>> > >>> > > > > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
>> > >>> > > > chandni@datatorrent.com
>> > >>> > > > > >
>> > >>> > > > > > wrote:
>> > >>> > > > > >
>> > >>> > > > > > > No I don't think we are restricting Malhar to just
>> abstract
>> > >>> > > classes.
>> > >>> > > > > > > Whenever they are couple of use cases that we see quite
>> > >>> often, we
>> > >>> > > add
>> > >>> > > > > > > concrete implementations.
>> > >>> > > > > > >
>> > >>> > > > > > > For eg. FileLineInputOperator which is a concrete
>> > >>> implementation
>> > >>> > of
>> > >>> > > > > > > AbstractFileInputOperator. FSSliceReader is an example
>> as
>> > >>> well.
>> > >>> > > > > > >
>> > >>> > > > > > > In AbstractJdbcOutputOperator case there hasn't been
>> such
>> > >>> common
>> > >>> > > > > > > insert/update query.
>> > >>> > > > > > >
>> > >>> > > > > > > Also if you look at the example I provided, it is very
>> > >>> simple to
>> > >>> > > > > provide
>> > >>> > > > > > a
>> > >>> > > > > > > concrete implementation.
>> > >>> > > > > > >
>> > >>> > > > > > > If you would like to change JdbcPOJOOutputOperator to
>> work
>> > >>> for
>> > >>> > > > > > > "UPDATE/MERGE" then please go ahead.
>> > >>> > > > > > >
>> > >>> > > > > > > Chandni
>> > >>> > > > > > >
>> > >>> > > > > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
>> > >>> > > > > bhupesh@datatorrent.com
>> > >>> > > > > > >
>> > >>> > > > > > > wrote:
>> > >>> > > > > > >
>> > >>> > > > > > > > I see. So, just to understand more, do we plan to keep
>> > >>> Malhar
>> > >>> > > > > > restricted
>> > >>> > > > > > > to
>> > >>> > > > > > > > the base functionality (as in abstract classes)? And
>> put
>> > >>> the
>> > >>> > > > > > > configuration
>> > >>> > > > > > > > aspect / concrete implementations in apps that use
>> these
>> > >>> > > operators?
>> > >>> > > > > > > >
>> > >>> > > > > > > > Thanks.
>> > >>> > > > > > > > Bhupesh
>> > >>> > > > > > > >
>> > >>> > > > > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
>> > >>> > > > > > chandni@datatorrent.com>
>> > >>> > > > > > > > wrote:
>> > >>> > > > > > > >
>> > >>> > > > > > > > > Hi,
>> > >>> > > > > > > > >
>> > >>> > > > > > > > > Here is an example of doing Upsert with JDBC:
>> > >>> > > > > > > > >
>> > >>> > > > > > > > >
>> > >>> > > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
>> > >>> > > > > > > > >
>> > >>> > > > > > > > > Thanks,
>> > >>> > > > > > > > > Chandni
>> > >>> > > > > > > > >
>> > >>> > > > > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
>> > >>> > > > > > > chandni@datatorrent.com
>> > >>> > > > > > > > >
>> > >>> > > > > > > > > wrote:
>> > >>> > > > > > > > >
>> > >>> > > > > > > > > > The operators are under Malhar/lib/db/jdbc.
>> > >>> > > > > > > > > >
>> > >>> > > > > > > > > > Here is one of them:
>> > >>> > > > > > > > > >
>> > >>> > > > > > > > > >
>> > >>> > > > > > > > >
>> > >>> > > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> >
>> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
>> > >>> > > > > > > > > >
>> > >>> > > > > > > > > > They work with any kind PreparedStatement -
>> insert or
>> > >>> > update
>> > >>> > > > > > > > > >
>> > >>> > > > > > > > > > Chandni
>> > >>> > > > > > > > > >
>> > >>> > > > > > > > > >
>> > >>> > > > > > > > > >
>> > >>> > > > > > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
>> > >>> > > > > > > > > bhupesh@datatorrent.com>
>> > >>> > > > > > > > > > wrote:
>> > >>> > > > > > > > > >
>> > >>> > > > > > > > > >> Hi Chandni,
>> > >>> > > > > > > > > >>
>> > >>> > > > > > > > > >> I don't see an update query being handled in the
>> > >>> operator.
>> > >>> > > > Could
>> > >>> > > > > > you
>> > >>> > > > > > > > > >> please
>> > >>> > > > > > > > > >> point me to the appropriate class?
>> > >>> > > > > > > > > >> Or did you mean that handling a update query is
>> > just a
>> > >>> > > matter
>> > >>> > > > of
>> > >>> > > > > > > > > extending
>> > >>> > > > > > > > > >> the class and providing a concrete
>> implementation?
>> > >>> > > > > > > > > >>
>> > >>> > > > > > > > > >> Thanks.
>> > >>> > > > > > > > > >> -Bhupesh
>> > >>> > > > > > > > > >>
>> > >>> > > > > > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
>> > >>> > > > > > > > > chandni@datatorrent.com>
>> > >>> > > > > > > > > >> wrote:
>> > >>> > > > > > > > > >>
>> > >>> > > > > > > > > >> > Hi Bhupesh,
>> > >>> > > > > > > > > >> >
>> > >>> > > > > > > > > >> > The current abstract JDBC Output Operators  in
>> > >>> library
>> > >>> > are
>> > >>> > > > > > generic
>> > >>> > > > > > > > and
>> > >>> > > > > > > > > >> have
>> > >>> > > > > > > > > >> > already been used in multiple POCs and
>> > >>> applications. In
>> > >>> > > fact
>> > >>> > > > > > this
>> > >>> > > > > > > > > >> operator
>> > >>> > > > > > > > > >> > has matured through customer use cases. It is
>> not
>> > >>> just
>> > >>> > an
>> > >>> > > > > insert
>> > >>> > > > > > > > > >> operator.
>> > >>> > > > > > > > > >> > We have used it to perform update and inserts.
>> > >>> > > > > > > > > >> >
>> > >>> > > > > > > > > >> > That said, I don't think it is a good idea to
>> > >>> introduce
>> > >>> > > > input
>> > >>> > > > > > > format
>> > >>> > > > > > > > > in
>> > >>> > > > > > > > > >> >  these abstract implementations. It is written
>> to
>> > >>> handle
>> > >>> > > any
>> > >>> > > > > > type
>> > >>> > > > > > > of
>> > >>> > > > > > > > > >> query,
>> > >>> > > > > > > > > >> > be it a procedure call (that was an actual
>> > customer
>> > >>> use
>> > >>> > > > case).
>> > >>> > > > > > > > > >> >
>> > >>> > > > > > > > > >> > Chandni
>> > >>> > > > > > > > > >> >
>> > >>> > > > > > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh
>> Chawda <
>> > >>> > > > > > > > > >> bhupesh@datatorrent.com>
>> > >>> > > > > > > > > >> > wrote:
>> > >>> > > > > > > > > >> >
>> > >>> > > > > > > > > >> > > Hi All,
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > > We are planning to proceed with the following
>> > >>> approach
>> > >>> > > for
>> > >>> > > > > > JDBC
>> > >>> > > > > > > > > >> *update*
>> > >>> > > > > > > > > >> > > operator:
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > >    - *Update Query Configuration*
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > >    - Example Update Query: *update tableName
>> set
>> > >>> a =
>> > >>> > ?**
>> > >>> > > > > > where b
>> > >>> > > > > > > > = ?
>> > >>> > > > > > > > > >> and
>> > >>> > > > > > > > > >> > c
>> > >>> > > > > > > > > >> > >    > ?;*
>> > >>> > > > > > > > > >> > >    - Example JSON input array for parameter
>> > >>> > > > instantiations:
>> > >>> > > > > > > *[{a,
>> > >>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression,
>> > >>> VARCHAR}, {c,
>> > >>> > > > > > > expression,
>> > >>> > > > > > > > > >> > DATE}]*
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > > We are also planning to change the JDBC
>> Output
>> > >>> > Operator
>> > >>> > > in
>> > >>> > > > > > > Malhar
>> > >>> > > > > > > > > >> Library
>> > >>> > > > > > > > > >> > > which currently does just insert. We plan to
>> > make
>> > >>> the
>> > >>> > > > input
>> > >>> > > > > > > format
>> > >>> > > > > > > > > >> > > consistent for both insert and update and
>> hence
>> > >>> the
>> > >>> > > change
>> > >>> > > > > to
>> > >>> > > > > > > the
>> > >>> > > > > > > > > >> current
>> > >>> > > > > > > > > >> > > way of configuration using JSON. Following
>> would
>> > >>> be
>> > >>> > the
>> > >>> > > > > config
>> > >>> > > > > > > for
>> > >>> > > > > > > > > >> > inserts:
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > >    - *Insert Query Configuration*
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > >    - Example Insert Query: *insert into
>> > tableName
>> > >>> > values
>> > >>> > > > (?,
>> > >>> > > > > > ?,
>> > >>> > > > > > > > .. ,
>> > >>> > > > > > > > > >> ?);*
>> > >>> > > > > > > > > >> > >    - Example JSON input array for parameter
>> > >>> > > > instantiations:
>> > >>> > > > > > > *[{a,
>> > >>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression,
>> > >>> VARCHAR}, ..
>> > >>> > ,
>> > >>> > > > {c,
>> > >>> > > > > > > > > >> expression,
>> > >>> > > > > > > > > >> > > DATE}]*
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > > Please let us know your thoughts.
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > > Thanks.
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > > -Bhupesh
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh
>> Chawda <
>> > >>> > > > > > > > > >> bhupesh@datatorrent.com>
>> > >>> > > > > > > > > >> > > wrote:
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> > > > Hi All,
>> > >>> > > > > > > > > >> > > >
>> > >>> > > > > > > > > >> > > > Would it be a good idea to introduce the
>> > update
>> > >>> > > > > > functionality
>> > >>> > > > > > > to
>> > >>> > > > > > > > > the
>> > >>> > > > > > > > > >> > JDBC
>> > >>> > > > > > > > > >> > > > output operator in Apache Apex Malhar
>> library.
>> > >>> > > > > > > > > >> > > >
>> > >>> > > > > > > > > >> > > > The following are possible approaches:
>> > >>> > > > > > > > > >> > > >
>> > >>> > > > > > > > > >> > > >    1. Accept a update query from the user
>> with
>> > >>> place
>> > >>> > > > > holders
>> > >>> > > > > > > for
>> > >>> > > > > > > > > >> > values.
>> > >>> > > > > > > > > >> > > >    Example: *update tableName set a = ?, b
>> = ?
>> > >>> > where c
>> > >>> > > > = ?
>> > >>> > > > > > and
>> > >>> > > > > > > > d >
>> > >>> > > > > > > > > >> ?*.
>> > >>> > > > > > > > > >> > > >    Here "?" will be provided by the user as
>> > java
>> > >>> > > > > expressions
>> > >>> > > > > > > > which
>> > >>> > > > > > > > > >> will
>> > >>> > > > > > > > > >> > > be
>> > >>> > > > > > > > > >> > > >    evaluated from the incoming tuple.
>> > >>> > > > > > > > > >> > > >    2. Another option is to accept in some
>> > >>> > > configuration
>> > >>> > > > > > format
>> > >>> > > > > > > > > >> (json /
>> > >>> > > > > > > > > >> > > >    xml) the following and formulate the
>> query
>> > >>> in the
>> > >>> > > > > > operator.
>> > >>> > > > > > > > > This
>> > >>> > > > > > > > > >> can
>> > >>> > > > > > > > > >> > > become
>> > >>> > > > > > > > > >> > > >    arbitrarily complex.
>> > >>> > > > > > > > > >> > > >    1. update clause columns
>> > >>> > > > > > > > > >> > > >       2. update clause expressions
>> > >>> > > > > > > > > >> > > >       3. where clause columns
>> > >>> > > > > > > > > >> > > >       4. where clause expressions
>> > >>> > > > > > > > > >> > > >
>> > >>> > > > > > > > > >> > > > I am thinking about going ahead with 1.
>> Please
>> > >>> let
>> > >>> > me
>> > >>> > > > know
>> > >>> > > > > > if
>> > >>> > > > > > > > any
>> > >>> > > > > > > > > >> other
>> > >>> > > > > > > > > >> > > > option is possible and whether such a
>> > >>> functionality
>> > >>> > > > > already
>> > >>> > > > > > > > exists
>> > >>> > > > > > > > > >> in
>> > >>> > > > > > > > > >> > > some
>> > >>> > > > > > > > > >> > > > other class.
>> > >>> > > > > > > > > >> > > >
>> > >>> > > > > > > > > >> > > > Thanks.
>> > >>> > > > > > > > > >> > > >
>> > >>> > > > > > > > > >> > > > -Bhupesh
>> > >>> > > > > > > > > >> > > >
>> > >>> > > > > > > > > >> > >
>> > >>> > > > > > > > > >> >
>> > >>> > > > > > > > > >>
>> > >>> > > > > > > > > >
>> > >>> > > > > > > > > >
>> > >>> > > > > > > > >
>> > >>> > > > > > > >
>> > >>> > > > > > >
>> > >>> > > > > >
>> > >>> > > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>
>

Re: Changes to JdbcOutputOperator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi All,

Here is the design I am considering for including the additional feature as
mentioned in the previous email. Please note that this includes the
previous feedback received on the design.

   1. Rename JdbcPOJOOutputOperator -> *AbstractJdbcPOJOOutputOperator*
   2. Add *JdbcPOJOInsertOutputOperator* extending from
   *AbstractJdbcPOJOOutputOperator*. This will be used for "*insert*"
   output query.
      1. If FieldInfo (Mapping of POJO fields to DB columns) is provided by
      the user - then use only these columns to build the insert SQL string.
      Return this insert query as the update command.
      2. If FieldInfo is not provided by the user - then assume a direct
      mapping between POJO fields and the DB columns. Extract list of columns
      from DB and populate all columns from POJO fields. The POJO
fields will be
      assumed to be the same as DB column names.
   3. Add *JdbcPOJONonInsertOutputOperator *extending from
   *AbstractJdbcPOJOOutputOperator*. This will be used for "*update / merge
   / delete*" output queries. [This design stays the same as discussed
   previously]
   1. Accept a parametrized SQL string
      2. Use the FieldInfo provided by the user to populate the parameters
      in the query.
      3. Execute the query.

Please provide your thoughts on this.

Thanks.
~Bhupesh


On Mon, Mar 14, 2016 at 6:29 PM, Priyanka Gugale <pr...@datatorrent.com>
wrote:

> +1, this is simplifying operator configuration, we should do same for other
> DB operators as well.
>
> -Priyanka
>
> On Thu, Mar 10, 2016 at 5:34 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Hi All,
> >
> > Apart from extending the JDBC Output operator (for update, merge and
> > delete), I am planning the following change to the "*insert*"
> > functionality:
> >
> >
> >    - It was pointed out that - if the incoming POJO has fields which are
> to
> >    be inserted into a database table, the user should not have to specify
> > any
> >    kind of mapping between POJO columns and database table column names.
> > This
> >    is possible if the POJO field names are same as the db column names.
> >
> >
> >    - Here is what I am planning to do to address this:
> >       - Have a flag which indicates "*No mapping required. Assume same
> POJO
> >       field names as the DB column names.*"
> >       - If this flag is true, extract DB column names (in addition to the
> >       types, which is done currently) and create getters from POJO based
> > on the
> >       DB column types.
> >       - Use these getters to populate the parametrized "*insert*" query.
> >
> >
> > Note - Once this is done, the user would not have to specify any field
> > descriptions. Just the DB table name would be sufficient. This would only
> > be possible if the DB column names are the same as the field names in the
> > POJO.
> > Without this change, "*insert*" will be considered as just another DDL
> > query like "update" / "delete" / "merge". We accept a parametrized query
> > and a mapping from the user for POJO field names to DB column names + DB
> > column types.
> >
> > Thanks.
> > Bhupesh
> >
> >
> > On Wed, Dec 30, 2015 at 12:56 PM, Bhupesh Chawda <
> bhupesh@datatorrent.com>
> > wrote:
> >
> > > Also, since we are not creating separate operators for insert and
> > > update/merge, it seems we don't need an abstract class. We can directly
> > > modify JdbcPOJOOutputOperator.
> > >
> > > -Bhupesh
> > >
> > > On Wed, Dec 30, 2015 at 11:17 AM, Bhupesh Chawda <
> > bhupesh@datatorrent.com>
> > > wrote:
> > >
> > >> Also, a note regarding accepting fieldInfo objects:
> > >>
> > >> FieldInfo has fields for java type of the column, but not the SQL data
> > >> type which is needed to set the parameters in the SQL statement. This
> > still
> > >> needs to be derived from the database (as in insert).
> > >> In more complex scenarios, as mentioned earlier, this may not always
> be
> > >> possible. In this case we have the following options:
> > >>
> > >>    1. Accept the SQL data types from the user for the parameters ("?")
> > >>    in a new data structure.
> > >>    2. Accept the JSON structure as specified earlier.
> > >>    3. Modify FieldInfo to have an additional optional field which
> > >>    indicates the SQL data type of the parameter ("?")
> > >>
> > >> I am planning to go ahead with  option 3.
> > >>
> > >> Comments?
> > >>
> > >> -Bhupesh
> > >>
> > >>
> > >>
> > >> On Mon, Dec 28, 2015 at 11:26 PM, Chandni Singh <
> > chandni@datatorrent.com>
> > >> wrote:
> > >>
> > >>> Yeah that sounds good.
> > >>>
> > >>> Chandni
> > >>>
> > >>> On Mon, Dec 28, 2015 at 2:46 AM, Bhupesh Chawda <
> > bhupesh@datatorrent.com
> > >>> >
> > >>> wrote:
> > >>>
> > >>> > Thanks Chandni for the comments.
> > >>> >
> > >>> > Additionally I think, we should do away with constructing the SQL
> > >>> query in
> > >>> > the operator. This is because it is only possible in case of simple
> > >>> insert
> > >>> > statements. In case a where clause is needed in the insert
> statement,
> > >>> we
> > >>> > cannot construct the SQL easily. If we are accepting a parametrized
> > >>> query
> > >>> > from the user for update/merge, why not do the same for insert
> > >>> statements
> > >>> > as well? Then the hierarchy would look like:
> > >>> > - AbstractJdbcPOJOOutputOperator (As suggested)
> > >>> > -- JdbcPOJOOutputOperator (Takes care of all statements insert,
> > update,
> > >>> > merge, delete)
> > >>> >
> > >>> > We don't need a separate operator for inserts and update/merge.
> > >>> >
> > >>> > Thanks.
> > >>> > -Bhupesh
> > >>> >
> > >>> >
> > >>> > On Mon, Dec 28, 2015 at 3:32 PM, Chandni Singh <
> > >>> chandni@datatorrent.com>
> > >>> > wrote:
> > >>> >
> > >>> > >   1. FieldInfo, is ultimately a custom object and any user who
> uses
> > >>> this
> > >>> > >    operator has to construct an object, populate it and then use
> > it.
> > >>> We
> > >>> > are
> > >>> > >    trying to avoid using any custom object and allow any user to
> > use
> > >>> the
> > >>> > >    operator without writing any extra code; just configuration.
> > >>> > > FieldInfo is a way to provide configuration in a UI friendly way.
> > >>> > Providing
> > >>> > > configuration as JSON is not UI friendly.
> > >>> > >
> > >>> > >    2. In case of update / merge, we need what SQL data types that
> > the
> > >>> > >    expression provided by the user would evaluate to. In case of
> > >>> insert
> > >>> > we
> > >>> > > can
> > >>> > >    go and fetch the data types of the columns directly from DB.
> > >>> However,
> > >>> > > the
> > >>> > >    same is not possible for custom expressions; "avg(salary)" for
> > >>> > instance.
> > >>> > > Ok so here is where you can make a change.
> > >>> > > - JDBCPojoOutput can be renamed to AbstractJDBCPojoOutpuOperator.
> > >>> > > - Abstraction is to fetch the type of column.
> > >>> > > - Add a concrete JDBCPojoInsertOutput  that derives the types of
> > >>> columns
> > >>> > > directly from DB. Please note that FieldInfo can also provide
> type
> > of
> > >>> > > derived column.
> > >>> > >
> > >>> > >
> > >>> > > You mentioned "We are trying to avoid using any custom object and
> > >>> allow
> > >>> > any
> > >>> > > user to use the
> > >>> > >    operator without writing any extra code".
> > >>> > > This I think is specific to your use case. You can create an
> > >>> extension of
> > >>> > > the above which takes JSON blob and creates FieldInfos from it.
> > >>> > >
> > >>> > > Chandni
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <
> > >>> bhupesh@datatorrent.com
> > >>> > >
> > >>> > > wrote:
> > >>> > >
> > >>> > > > Hi Chandni,
> > >>> > > >
> > >>> > > > Following are the issues:
> > >>> > > >
> > >>> > > >    1. FieldInfo, is ultimately a custom object and any user who
> > >>> uses
> > >>> > this
> > >>> > > >    operator has to construct an object, populate it and then
> use
> > >>> it. We
> > >>> > > are
> > >>> > > >    trying to avoid using any custom object and allow any user
> to
> > >>> use
> > >>> > the
> > >>> > > >    operator without writing any extra code; just configuration.
> > >>> > > >    2. In case of update / merge, we need what SQL data types
> that
> > >>> the
> > >>> > > >    expression provided by the user would evaluate to. In case
> of
> > >>> insert
> > >>> > > we
> > >>> > > > can
> > >>> > > >    go and fetch the data types of the columns directly from DB.
> > >>> > However,
> > >>> > > > the
> > >>> > > >    same is not possible for custom expressions; "avg(salary)"
> for
> > >>> > > instance.
> > >>> > > >
> > >>> > > > Thanks.
> > >>> > > >
> > >>> > > > -Bhupesh
> > >>> > > >
> > >>> > > >
> > >>> > > > On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <
> > >>> > chandni@datatorrent.com>
> > >>> > > > wrote:
> > >>> > > >
> > >>> > > > > Hi Bhupesh,
> > >>> > > > >
> > >>> > > > > JDBCPojoOutputOperator was written for a demo and therefore
> it
> > >>> was
> > >>> > > marked
> > >>> > > > > Evolving which is why I had mentioned that you should feel
> free
> > >>> to
> > >>> > > modify
> > >>> > > > > it.
> > >>> > > > >
> > >>> > > > > I think an insert query can be as complex as any other query.
> > It
> > >>> uses
> > >>> > > > > FieldInfo because in the app builder it is easy for the user
> to
> > >>> > provide
> > >>> > > > > that instead of JSON String.
> > >>> > > > >
> > >>> > > > > Can you please provide specifics about what it is that you
> find
> > >>> > > difficult
> > >>> > > > > to change/implement for providing update/merge/delete support
> > in
> > >>> > > > > JDBCPojoOutputOperator?
> > >>> > > > >
> > >>> > > > > Chandni
> > >>> > > > >
> > >>> > > > > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <
> > >>> > > bhupesh@datatorrent.com
> > >>> > > > >
> > >>> > > > > wrote:
> > >>> > > > >
> > >>> > > > > > Hi All,
> > >>> > > > > >
> > >>> > > > > > It has been pointed out that adding another class for
> > handling
> > >>> > > update /
> > >>> > > > > > merge queries would not be a good option.
> > >>> > > > > >
> > >>> > > > > > Here are the current implementation details:
> > >>> > > > > >
> > >>> > > > > >    - We have an existing class: JdbcPOJOOutputOperator
> which
> > >>> > accepts
> > >>> > > a
> > >>> > > > > list
> > >>> > > > > >    of FieldInfo objects. Each element of this list
> indicates
> > >>> the
> > >>> > > > details
> > >>> > > > > >    (column name, pojo field expression and datatype) of
> > fields
> > >>> that
> > >>> > > > need
> > >>> > > > > > to be
> > >>> > > > > >    inserted. Using this, the operator formulates the insert
> > >>> query
> > >>> > in
> > >>> > > > the
> > >>> > > > > > setup
> > >>> > > > > >    method and identifies the sql datatypes of these columns
> > >>> from
> > >>> > the
> > >>> > > > > > database
> > >>> > > > > >    using the table name.
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > >    - Now, coming to the update / merge feature, it is
> > >>> difficult to
> > >>> > > > > >    formulate the update / merge query in the operator logic
> > >>> due to
> > >>> > > the
> > >>> > > > > > complex
> > >>> > > > > >    structure of these statements. For this reason, we plan
> to
> > >>> take
> > >>> > a
> > >>> > > > > >    parametrized SQL query from the user. This may look
> like:
> > >>> > *"update
> > >>> > > > > table
> > >>> > > > > >    set x = ?, y = ? where z + w > ? and a == 1;"*. Such
> > >>> statements
> > >>> > > can
> > >>> > > > be
> > >>> > > > > >    accepted from the user in addition to a json string
> which
> > >>> > > indicates
> > >>> > > > > the
> > >>> > > > > >    details for the parameters: *column name, the pojo
> > >>> expression
> > >>> > and
> > >>> > > > the
> > >>> > > > > >    sql data type* of the expression. Note that this
> > >>> information is
> > >>> > > > > similar
> > >>> > > > > >    to the FieldInfo object, but is a string which can be
> > >>> configured
> > >>> > > > > easily
> > >>> > > > > > by
> > >>> > > > > >    the user. Also note that the data type which is accepted
> > is
> > >>> the
> > >>> > > SQL
> > >>> > > > > data
> > >>> > > > > >    type. Using this info we can populate the parametrized
> > >>> query and
> > >>> > > run
> > >>> > > > > it
> > >>> > > > > >    based on the incoming POJO.
> > >>> > > > > >
> > >>> > > > > > The second approach is able to handle all kinds of queries
> > >>> (insert
> > >>> > /
> > >>> > > > > update
> > >>> > > > > > / merge / delete). However, since we already have the
> > >>> > > > > > JdbcPOJOOutputOperator, we would like to merge the new
> > >>> > functionality
> > >>> > > > into
> > >>> > > > > > the same class.
> > >>> > > > > >
> > >>> > > > > > Here we have the following options:
> > >>> > > > > >
> > >>> > > > > >    1. Change the existing class (JdbcPOJOOutputOperator) to
> > the
> > >>> > > second
> > >>> > > > > >    approach which is more generic and also handles inserts.
> > >>> > > > > >    2. Add the update/ merge functionality to the existing
> > class
> > >>> > > without
> > >>> > > > > >    changing the existing functionality. This will have two
> > >>> > different
> > >>> > > > ways
> > >>> > > > > > that
> > >>> > > > > >    insert queries may be handled in the operator.
> > >>> > > > > >    3. Add another class which extends from
> > >>> JdbcPOJOOutputOperator
> > >>> > and
> > >>> > > > > have
> > >>> > > > > >    the update/merge functionality there. (This is not
> > >>> recommended.)
> > >>> > > > > >    4. Any other approach.
> > >>> > > > > >
> > >>> > > > > > Please suggest.
> > >>> > > > > >
> > >>> > > > > > Thanks.
> > >>> > > > > >
> > >>> > > > > > -Bhupesh.
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
> > >>> > > > chandni@datatorrent.com
> > >>> > > > > >
> > >>> > > > > > wrote:
> > >>> > > > > >
> > >>> > > > > > > No I don't think we are restricting Malhar to just
> abstract
> > >>> > > classes.
> > >>> > > > > > > Whenever they are couple of use cases that we see quite
> > >>> often, we
> > >>> > > add
> > >>> > > > > > > concrete implementations.
> > >>> > > > > > >
> > >>> > > > > > > For eg. FileLineInputOperator which is a concrete
> > >>> implementation
> > >>> > of
> > >>> > > > > > > AbstractFileInputOperator. FSSliceReader is an example as
> > >>> well.
> > >>> > > > > > >
> > >>> > > > > > > In AbstractJdbcOutputOperator case there hasn't been such
> > >>> common
> > >>> > > > > > > insert/update query.
> > >>> > > > > > >
> > >>> > > > > > > Also if you look at the example I provided, it is very
> > >>> simple to
> > >>> > > > > provide
> > >>> > > > > > a
> > >>> > > > > > > concrete implementation.
> > >>> > > > > > >
> > >>> > > > > > > If you would like to change JdbcPOJOOutputOperator to
> work
> > >>> for
> > >>> > > > > > > "UPDATE/MERGE" then please go ahead.
> > >>> > > > > > >
> > >>> > > > > > > Chandni
> > >>> > > > > > >
> > >>> > > > > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
> > >>> > > > > bhupesh@datatorrent.com
> > >>> > > > > > >
> > >>> > > > > > > wrote:
> > >>> > > > > > >
> > >>> > > > > > > > I see. So, just to understand more, do we plan to keep
> > >>> Malhar
> > >>> > > > > > restricted
> > >>> > > > > > > to
> > >>> > > > > > > > the base functionality (as in abstract classes)? And
> put
> > >>> the
> > >>> > > > > > > configuration
> > >>> > > > > > > > aspect / concrete implementations in apps that use
> these
> > >>> > > operators?
> > >>> > > > > > > >
> > >>> > > > > > > > Thanks.
> > >>> > > > > > > > Bhupesh
> > >>> > > > > > > >
> > >>> > > > > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
> > >>> > > > > > chandni@datatorrent.com>
> > >>> > > > > > > > wrote:
> > >>> > > > > > > >
> > >>> > > > > > > > > Hi,
> > >>> > > > > > > > >
> > >>> > > > > > > > > Here is an example of doing Upsert with JDBC:
> > >>> > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
> > >>> > > > > > > > >
> > >>> > > > > > > > > Thanks,
> > >>> > > > > > > > > Chandni
> > >>> > > > > > > > >
> > >>> > > > > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
> > >>> > > > > > > chandni@datatorrent.com
> > >>> > > > > > > > >
> > >>> > > > > > > > > wrote:
> > >>> > > > > > > > >
> > >>> > > > > > > > > > The operators are under Malhar/lib/db/jdbc.
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > Here is one of them:
> > >>> > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > They work with any kind PreparedStatement - insert
> or
> > >>> > update
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > Chandni
> > >>> > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> > >>> > > > > > > > > bhupesh@datatorrent.com>
> > >>> > > > > > > > > > wrote:
> > >>> > > > > > > > > >
> > >>> > > > > > > > > >> Hi Chandni,
> > >>> > > > > > > > > >>
> > >>> > > > > > > > > >> I don't see an update query being handled in the
> > >>> operator.
> > >>> > > > Could
> > >>> > > > > > you
> > >>> > > > > > > > > >> please
> > >>> > > > > > > > > >> point me to the appropriate class?
> > >>> > > > > > > > > >> Or did you mean that handling a update query is
> > just a
> > >>> > > matter
> > >>> > > > of
> > >>> > > > > > > > > extending
> > >>> > > > > > > > > >> the class and providing a concrete implementation?
> > >>> > > > > > > > > >>
> > >>> > > > > > > > > >> Thanks.
> > >>> > > > > > > > > >> -Bhupesh
> > >>> > > > > > > > > >>
> > >>> > > > > > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
> > >>> > > > > > > > > chandni@datatorrent.com>
> > >>> > > > > > > > > >> wrote:
> > >>> > > > > > > > > >>
> > >>> > > > > > > > > >> > Hi Bhupesh,
> > >>> > > > > > > > > >> >
> > >>> > > > > > > > > >> > The current abstract JDBC Output Operators  in
> > >>> library
> > >>> > are
> > >>> > > > > > generic
> > >>> > > > > > > > and
> > >>> > > > > > > > > >> have
> > >>> > > > > > > > > >> > already been used in multiple POCs and
> > >>> applications. In
> > >>> > > fact
> > >>> > > > > > this
> > >>> > > > > > > > > >> operator
> > >>> > > > > > > > > >> > has matured through customer use cases. It is
> not
> > >>> just
> > >>> > an
> > >>> > > > > insert
> > >>> > > > > > > > > >> operator.
> > >>> > > > > > > > > >> > We have used it to perform update and inserts.
> > >>> > > > > > > > > >> >
> > >>> > > > > > > > > >> > That said, I don't think it is a good idea to
> > >>> introduce
> > >>> > > > input
> > >>> > > > > > > format
> > >>> > > > > > > > > in
> > >>> > > > > > > > > >> >  these abstract implementations. It is written
> to
> > >>> handle
> > >>> > > any
> > >>> > > > > > type
> > >>> > > > > > > of
> > >>> > > > > > > > > >> query,
> > >>> > > > > > > > > >> > be it a procedure call (that was an actual
> > customer
> > >>> use
> > >>> > > > case).
> > >>> > > > > > > > > >> >
> > >>> > > > > > > > > >> > Chandni
> > >>> > > > > > > > > >> >
> > >>> > > > > > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda
> <
> > >>> > > > > > > > > >> bhupesh@datatorrent.com>
> > >>> > > > > > > > > >> > wrote:
> > >>> > > > > > > > > >> >
> > >>> > > > > > > > > >> > > Hi All,
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > > We are planning to proceed with the following
> > >>> approach
> > >>> > > for
> > >>> > > > > > JDBC
> > >>> > > > > > > > > >> *update*
> > >>> > > > > > > > > >> > > operator:
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > >    - *Update Query Configuration*
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > >    - Example Update Query: *update tableName
> set
> > >>> a =
> > >>> > ?**
> > >>> > > > > > where b
> > >>> > > > > > > > = ?
> > >>> > > > > > > > > >> and
> > >>> > > > > > > > > >> > c
> > >>> > > > > > > > > >> > >    > ?;*
> > >>> > > > > > > > > >> > >    - Example JSON input array for parameter
> > >>> > > > instantiations:
> > >>> > > > > > > *[{a,
> > >>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression,
> > >>> VARCHAR}, {c,
> > >>> > > > > > > expression,
> > >>> > > > > > > > > >> > DATE}]*
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > > We are also planning to change the JDBC Output
> > >>> > Operator
> > >>> > > in
> > >>> > > > > > > Malhar
> > >>> > > > > > > > > >> Library
> > >>> > > > > > > > > >> > > which currently does just insert. We plan to
> > make
> > >>> the
> > >>> > > > input
> > >>> > > > > > > format
> > >>> > > > > > > > > >> > > consistent for both insert and update and
> hence
> > >>> the
> > >>> > > change
> > >>> > > > > to
> > >>> > > > > > > the
> > >>> > > > > > > > > >> current
> > >>> > > > > > > > > >> > > way of configuration using JSON. Following
> would
> > >>> be
> > >>> > the
> > >>> > > > > config
> > >>> > > > > > > for
> > >>> > > > > > > > > >> > inserts:
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > >    - *Insert Query Configuration*
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > >    - Example Insert Query: *insert into
> > tableName
> > >>> > values
> > >>> > > > (?,
> > >>> > > > > > ?,
> > >>> > > > > > > > .. ,
> > >>> > > > > > > > > >> ?);*
> > >>> > > > > > > > > >> > >    - Example JSON input array for parameter
> > >>> > > > instantiations:
> > >>> > > > > > > *[{a,
> > >>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression,
> > >>> VARCHAR}, ..
> > >>> > ,
> > >>> > > > {c,
> > >>> > > > > > > > > >> expression,
> > >>> > > > > > > > > >> > > DATE}]*
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > > Please let us know your thoughts.
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > > Thanks.
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > > -Bhupesh
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh
> Chawda <
> > >>> > > > > > > > > >> bhupesh@datatorrent.com>
> > >>> > > > > > > > > >> > > wrote:
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> > > > Hi All,
> > >>> > > > > > > > > >> > > >
> > >>> > > > > > > > > >> > > > Would it be a good idea to introduce the
> > update
> > >>> > > > > > functionality
> > >>> > > > > > > to
> > >>> > > > > > > > > the
> > >>> > > > > > > > > >> > JDBC
> > >>> > > > > > > > > >> > > > output operator in Apache Apex Malhar
> library.
> > >>> > > > > > > > > >> > > >
> > >>> > > > > > > > > >> > > > The following are possible approaches:
> > >>> > > > > > > > > >> > > >
> > >>> > > > > > > > > >> > > >    1. Accept a update query from the user
> with
> > >>> place
> > >>> > > > > holders
> > >>> > > > > > > for
> > >>> > > > > > > > > >> > values.
> > >>> > > > > > > > > >> > > >    Example: *update tableName set a = ?, b
> = ?
> > >>> > where c
> > >>> > > > = ?
> > >>> > > > > > and
> > >>> > > > > > > > d >
> > >>> > > > > > > > > >> ?*.
> > >>> > > > > > > > > >> > > >    Here "?" will be provided by the user as
> > java
> > >>> > > > > expressions
> > >>> > > > > > > > which
> > >>> > > > > > > > > >> will
> > >>> > > > > > > > > >> > > be
> > >>> > > > > > > > > >> > > >    evaluated from the incoming tuple.
> > >>> > > > > > > > > >> > > >    2. Another option is to accept in some
> > >>> > > configuration
> > >>> > > > > > format
> > >>> > > > > > > > > >> (json /
> > >>> > > > > > > > > >> > > >    xml) the following and formulate the
> query
> > >>> in the
> > >>> > > > > > operator.
> > >>> > > > > > > > > This
> > >>> > > > > > > > > >> can
> > >>> > > > > > > > > >> > > become
> > >>> > > > > > > > > >> > > >    arbitrarily complex.
> > >>> > > > > > > > > >> > > >    1. update clause columns
> > >>> > > > > > > > > >> > > >       2. update clause expressions
> > >>> > > > > > > > > >> > > >       3. where clause columns
> > >>> > > > > > > > > >> > > >       4. where clause expressions
> > >>> > > > > > > > > >> > > >
> > >>> > > > > > > > > >> > > > I am thinking about going ahead with 1.
> Please
> > >>> let
> > >>> > me
> > >>> > > > know
> > >>> > > > > > if
> > >>> > > > > > > > any
> > >>> > > > > > > > > >> other
> > >>> > > > > > > > > >> > > > option is possible and whether such a
> > >>> functionality
> > >>> > > > > already
> > >>> > > > > > > > exists
> > >>> > > > > > > > > >> in
> > >>> > > > > > > > > >> > > some
> > >>> > > > > > > > > >> > > > other class.
> > >>> > > > > > > > > >> > > >
> > >>> > > > > > > > > >> > > > Thanks.
> > >>> > > > > > > > > >> > > >
> > >>> > > > > > > > > >> > > > -Bhupesh
> > >>> > > > > > > > > >> > > >
> > >>> > > > > > > > > >> > >
> > >>> > > > > > > > > >> >
> > >>> > > > > > > > > >>
> > >>> > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: Changes to JdbcOutputOperator

Posted by Priyanka Gugale <pr...@datatorrent.com>.
+1, this is simplifying operator configuration, we should do same for other
DB operators as well.

-Priyanka

On Thu, Mar 10, 2016 at 5:34 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Hi All,
>
> Apart from extending the JDBC Output operator (for update, merge and
> delete), I am planning the following change to the "*insert*"
> functionality:
>
>
>    - It was pointed out that - if the incoming POJO has fields which are to
>    be inserted into a database table, the user should not have to specify
> any
>    kind of mapping between POJO columns and database table column names.
> This
>    is possible if the POJO field names are same as the db column names.
>
>
>    - Here is what I am planning to do to address this:
>       - Have a flag which indicates "*No mapping required. Assume same POJO
>       field names as the DB column names.*"
>       - If this flag is true, extract DB column names (in addition to the
>       types, which is done currently) and create getters from POJO based
> on the
>       DB column types.
>       - Use these getters to populate the parametrized "*insert*" query.
>
>
> Note - Once this is done, the user would not have to specify any field
> descriptions. Just the DB table name would be sufficient. This would only
> be possible if the DB column names are the same as the field names in the
> POJO.
> Without this change, "*insert*" will be considered as just another DDL
> query like "update" / "delete" / "merge". We accept a parametrized query
> and a mapping from the user for POJO field names to DB column names + DB
> column types.
>
> Thanks.
> Bhupesh
>
>
> On Wed, Dec 30, 2015 at 12:56 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Also, since we are not creating separate operators for insert and
> > update/merge, it seems we don't need an abstract class. We can directly
> > modify JdbcPOJOOutputOperator.
> >
> > -Bhupesh
> >
> > On Wed, Dec 30, 2015 at 11:17 AM, Bhupesh Chawda <
> bhupesh@datatorrent.com>
> > wrote:
> >
> >> Also, a note regarding accepting fieldInfo objects:
> >>
> >> FieldInfo has fields for java type of the column, but not the SQL data
> >> type which is needed to set the parameters in the SQL statement. This
> still
> >> needs to be derived from the database (as in insert).
> >> In more complex scenarios, as mentioned earlier, this may not always be
> >> possible. In this case we have the following options:
> >>
> >>    1. Accept the SQL data types from the user for the parameters ("?")
> >>    in a new data structure.
> >>    2. Accept the JSON structure as specified earlier.
> >>    3. Modify FieldInfo to have an additional optional field which
> >>    indicates the SQL data type of the parameter ("?")
> >>
> >> I am planning to go ahead with  option 3.
> >>
> >> Comments?
> >>
> >> -Bhupesh
> >>
> >>
> >>
> >> On Mon, Dec 28, 2015 at 11:26 PM, Chandni Singh <
> chandni@datatorrent.com>
> >> wrote:
> >>
> >>> Yeah that sounds good.
> >>>
> >>> Chandni
> >>>
> >>> On Mon, Dec 28, 2015 at 2:46 AM, Bhupesh Chawda <
> bhupesh@datatorrent.com
> >>> >
> >>> wrote:
> >>>
> >>> > Thanks Chandni for the comments.
> >>> >
> >>> > Additionally I think, we should do away with constructing the SQL
> >>> query in
> >>> > the operator. This is because it is only possible in case of simple
> >>> insert
> >>> > statements. In case a where clause is needed in the insert statement,
> >>> we
> >>> > cannot construct the SQL easily. If we are accepting a parametrized
> >>> query
> >>> > from the user for update/merge, why not do the same for insert
> >>> statements
> >>> > as well? Then the hierarchy would look like:
> >>> > - AbstractJdbcPOJOOutputOperator (As suggested)
> >>> > -- JdbcPOJOOutputOperator (Takes care of all statements insert,
> update,
> >>> > merge, delete)
> >>> >
> >>> > We don't need a separate operator for inserts and update/merge.
> >>> >
> >>> > Thanks.
> >>> > -Bhupesh
> >>> >
> >>> >
> >>> > On Mon, Dec 28, 2015 at 3:32 PM, Chandni Singh <
> >>> chandni@datatorrent.com>
> >>> > wrote:
> >>> >
> >>> > >   1. FieldInfo, is ultimately a custom object and any user who uses
> >>> this
> >>> > >    operator has to construct an object, populate it and then use
> it.
> >>> We
> >>> > are
> >>> > >    trying to avoid using any custom object and allow any user to
> use
> >>> the
> >>> > >    operator without writing any extra code; just configuration.
> >>> > > FieldInfo is a way to provide configuration in a UI friendly way.
> >>> > Providing
> >>> > > configuration as JSON is not UI friendly.
> >>> > >
> >>> > >    2. In case of update / merge, we need what SQL data types that
> the
> >>> > >    expression provided by the user would evaluate to. In case of
> >>> insert
> >>> > we
> >>> > > can
> >>> > >    go and fetch the data types of the columns directly from DB.
> >>> However,
> >>> > > the
> >>> > >    same is not possible for custom expressions; "avg(salary)" for
> >>> > instance.
> >>> > > Ok so here is where you can make a change.
> >>> > > - JDBCPojoOutput can be renamed to AbstractJDBCPojoOutpuOperator.
> >>> > > - Abstraction is to fetch the type of column.
> >>> > > - Add a concrete JDBCPojoInsertOutput  that derives the types of
> >>> columns
> >>> > > directly from DB. Please note that FieldInfo can also provide type
> of
> >>> > > derived column.
> >>> > >
> >>> > >
> >>> > > You mentioned "We are trying to avoid using any custom object and
> >>> allow
> >>> > any
> >>> > > user to use the
> >>> > >    operator without writing any extra code".
> >>> > > This I think is specific to your use case. You can create an
> >>> extension of
> >>> > > the above which takes JSON blob and creates FieldInfos from it.
> >>> > >
> >>> > > Chandni
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > > On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <
> >>> bhupesh@datatorrent.com
> >>> > >
> >>> > > wrote:
> >>> > >
> >>> > > > Hi Chandni,
> >>> > > >
> >>> > > > Following are the issues:
> >>> > > >
> >>> > > >    1. FieldInfo, is ultimately a custom object and any user who
> >>> uses
> >>> > this
> >>> > > >    operator has to construct an object, populate it and then use
> >>> it. We
> >>> > > are
> >>> > > >    trying to avoid using any custom object and allow any user to
> >>> use
> >>> > the
> >>> > > >    operator without writing any extra code; just configuration.
> >>> > > >    2. In case of update / merge, we need what SQL data types that
> >>> the
> >>> > > >    expression provided by the user would evaluate to. In case of
> >>> insert
> >>> > > we
> >>> > > > can
> >>> > > >    go and fetch the data types of the columns directly from DB.
> >>> > However,
> >>> > > > the
> >>> > > >    same is not possible for custom expressions; "avg(salary)" for
> >>> > > instance.
> >>> > > >
> >>> > > > Thanks.
> >>> > > >
> >>> > > > -Bhupesh
> >>> > > >
> >>> > > >
> >>> > > > On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <
> >>> > chandni@datatorrent.com>
> >>> > > > wrote:
> >>> > > >
> >>> > > > > Hi Bhupesh,
> >>> > > > >
> >>> > > > > JDBCPojoOutputOperator was written for a demo and therefore it
> >>> was
> >>> > > marked
> >>> > > > > Evolving which is why I had mentioned that you should feel free
> >>> to
> >>> > > modify
> >>> > > > > it.
> >>> > > > >
> >>> > > > > I think an insert query can be as complex as any other query.
> It
> >>> uses
> >>> > > > > FieldInfo because in the app builder it is easy for the user to
> >>> > provide
> >>> > > > > that instead of JSON String.
> >>> > > > >
> >>> > > > > Can you please provide specifics about what it is that you find
> >>> > > difficult
> >>> > > > > to change/implement for providing update/merge/delete support
> in
> >>> > > > > JDBCPojoOutputOperator?
> >>> > > > >
> >>> > > > > Chandni
> >>> > > > >
> >>> > > > > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <
> >>> > > bhupesh@datatorrent.com
> >>> > > > >
> >>> > > > > wrote:
> >>> > > > >
> >>> > > > > > Hi All,
> >>> > > > > >
> >>> > > > > > It has been pointed out that adding another class for
> handling
> >>> > > update /
> >>> > > > > > merge queries would not be a good option.
> >>> > > > > >
> >>> > > > > > Here are the current implementation details:
> >>> > > > > >
> >>> > > > > >    - We have an existing class: JdbcPOJOOutputOperator which
> >>> > accepts
> >>> > > a
> >>> > > > > list
> >>> > > > > >    of FieldInfo objects. Each element of this list indicates
> >>> the
> >>> > > > details
> >>> > > > > >    (column name, pojo field expression and datatype) of
> fields
> >>> that
> >>> > > > need
> >>> > > > > > to be
> >>> > > > > >    inserted. Using this, the operator formulates the insert
> >>> query
> >>> > in
> >>> > > > the
> >>> > > > > > setup
> >>> > > > > >    method and identifies the sql datatypes of these columns
> >>> from
> >>> > the
> >>> > > > > > database
> >>> > > > > >    using the table name.
> >>> > > > > >
> >>> > > > > >
> >>> > > > > >    - Now, coming to the update / merge feature, it is
> >>> difficult to
> >>> > > > > >    formulate the update / merge query in the operator logic
> >>> due to
> >>> > > the
> >>> > > > > > complex
> >>> > > > > >    structure of these statements. For this reason, we plan to
> >>> take
> >>> > a
> >>> > > > > >    parametrized SQL query from the user. This may look like:
> >>> > *"update
> >>> > > > > table
> >>> > > > > >    set x = ?, y = ? where z + w > ? and a == 1;"*. Such
> >>> statements
> >>> > > can
> >>> > > > be
> >>> > > > > >    accepted from the user in addition to a json string which
> >>> > > indicates
> >>> > > > > the
> >>> > > > > >    details for the parameters: *column name, the pojo
> >>> expression
> >>> > and
> >>> > > > the
> >>> > > > > >    sql data type* of the expression. Note that this
> >>> information is
> >>> > > > > similar
> >>> > > > > >    to the FieldInfo object, but is a string which can be
> >>> configured
> >>> > > > > easily
> >>> > > > > > by
> >>> > > > > >    the user. Also note that the data type which is accepted
> is
> >>> the
> >>> > > SQL
> >>> > > > > data
> >>> > > > > >    type. Using this info we can populate the parametrized
> >>> query and
> >>> > > run
> >>> > > > > it
> >>> > > > > >    based on the incoming POJO.
> >>> > > > > >
> >>> > > > > > The second approach is able to handle all kinds of queries
> >>> (insert
> >>> > /
> >>> > > > > update
> >>> > > > > > / merge / delete). However, since we already have the
> >>> > > > > > JdbcPOJOOutputOperator, we would like to merge the new
> >>> > functionality
> >>> > > > into
> >>> > > > > > the same class.
> >>> > > > > >
> >>> > > > > > Here we have the following options:
> >>> > > > > >
> >>> > > > > >    1. Change the existing class (JdbcPOJOOutputOperator) to
> the
> >>> > > second
> >>> > > > > >    approach which is more generic and also handles inserts.
> >>> > > > > >    2. Add the update/ merge functionality to the existing
> class
> >>> > > without
> >>> > > > > >    changing the existing functionality. This will have two
> >>> > different
> >>> > > > ways
> >>> > > > > > that
> >>> > > > > >    insert queries may be handled in the operator.
> >>> > > > > >    3. Add another class which extends from
> >>> JdbcPOJOOutputOperator
> >>> > and
> >>> > > > > have
> >>> > > > > >    the update/merge functionality there. (This is not
> >>> recommended.)
> >>> > > > > >    4. Any other approach.
> >>> > > > > >
> >>> > > > > > Please suggest.
> >>> > > > > >
> >>> > > > > > Thanks.
> >>> > > > > >
> >>> > > > > > -Bhupesh.
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
> >>> > > > chandni@datatorrent.com
> >>> > > > > >
> >>> > > > > > wrote:
> >>> > > > > >
> >>> > > > > > > No I don't think we are restricting Malhar to just abstract
> >>> > > classes.
> >>> > > > > > > Whenever they are couple of use cases that we see quite
> >>> often, we
> >>> > > add
> >>> > > > > > > concrete implementations.
> >>> > > > > > >
> >>> > > > > > > For eg. FileLineInputOperator which is a concrete
> >>> implementation
> >>> > of
> >>> > > > > > > AbstractFileInputOperator. FSSliceReader is an example as
> >>> well.
> >>> > > > > > >
> >>> > > > > > > In AbstractJdbcOutputOperator case there hasn't been such
> >>> common
> >>> > > > > > > insert/update query.
> >>> > > > > > >
> >>> > > > > > > Also if you look at the example I provided, it is very
> >>> simple to
> >>> > > > > provide
> >>> > > > > > a
> >>> > > > > > > concrete implementation.
> >>> > > > > > >
> >>> > > > > > > If you would like to change JdbcPOJOOutputOperator to work
> >>> for
> >>> > > > > > > "UPDATE/MERGE" then please go ahead.
> >>> > > > > > >
> >>> > > > > > > Chandni
> >>> > > > > > >
> >>> > > > > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
> >>> > > > > bhupesh@datatorrent.com
> >>> > > > > > >
> >>> > > > > > > wrote:
> >>> > > > > > >
> >>> > > > > > > > I see. So, just to understand more, do we plan to keep
> >>> Malhar
> >>> > > > > > restricted
> >>> > > > > > > to
> >>> > > > > > > > the base functionality (as in abstract classes)? And put
> >>> the
> >>> > > > > > > configuration
> >>> > > > > > > > aspect / concrete implementations in apps that use these
> >>> > > operators?
> >>> > > > > > > >
> >>> > > > > > > > Thanks.
> >>> > > > > > > > Bhupesh
> >>> > > > > > > >
> >>> > > > > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
> >>> > > > > > chandni@datatorrent.com>
> >>> > > > > > > > wrote:
> >>> > > > > > > >
> >>> > > > > > > > > Hi,
> >>> > > > > > > > >
> >>> > > > > > > > > Here is an example of doing Upsert with JDBC:
> >>> > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
> >>> > > > > > > > >
> >>> > > > > > > > > Thanks,
> >>> > > > > > > > > Chandni
> >>> > > > > > > > >
> >>> > > > > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
> >>> > > > > > > chandni@datatorrent.com
> >>> > > > > > > > >
> >>> > > > > > > > > wrote:
> >>> > > > > > > > >
> >>> > > > > > > > > > The operators are under Malhar/lib/db/jdbc.
> >>> > > > > > > > > >
> >>> > > > > > > > > > Here is one of them:
> >>> > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> >>> > > > > > > > > >
> >>> > > > > > > > > > They work with any kind PreparedStatement - insert or
> >>> > update
> >>> > > > > > > > > >
> >>> > > > > > > > > > Chandni
> >>> > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> >>> > > > > > > > > bhupesh@datatorrent.com>
> >>> > > > > > > > > > wrote:
> >>> > > > > > > > > >
> >>> > > > > > > > > >> Hi Chandni,
> >>> > > > > > > > > >>
> >>> > > > > > > > > >> I don't see an update query being handled in the
> >>> operator.
> >>> > > > Could
> >>> > > > > > you
> >>> > > > > > > > > >> please
> >>> > > > > > > > > >> point me to the appropriate class?
> >>> > > > > > > > > >> Or did you mean that handling a update query is
> just a
> >>> > > matter
> >>> > > > of
> >>> > > > > > > > > extending
> >>> > > > > > > > > >> the class and providing a concrete implementation?
> >>> > > > > > > > > >>
> >>> > > > > > > > > >> Thanks.
> >>> > > > > > > > > >> -Bhupesh
> >>> > > > > > > > > >>
> >>> > > > > > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
> >>> > > > > > > > > chandni@datatorrent.com>
> >>> > > > > > > > > >> wrote:
> >>> > > > > > > > > >>
> >>> > > > > > > > > >> > Hi Bhupesh,
> >>> > > > > > > > > >> >
> >>> > > > > > > > > >> > The current abstract JDBC Output Operators  in
> >>> library
> >>> > are
> >>> > > > > > generic
> >>> > > > > > > > and
> >>> > > > > > > > > >> have
> >>> > > > > > > > > >> > already been used in multiple POCs and
> >>> applications. In
> >>> > > fact
> >>> > > > > > this
> >>> > > > > > > > > >> operator
> >>> > > > > > > > > >> > has matured through customer use cases. It is not
> >>> just
> >>> > an
> >>> > > > > insert
> >>> > > > > > > > > >> operator.
> >>> > > > > > > > > >> > We have used it to perform update and inserts.
> >>> > > > > > > > > >> >
> >>> > > > > > > > > >> > That said, I don't think it is a good idea to
> >>> introduce
> >>> > > > input
> >>> > > > > > > format
> >>> > > > > > > > > in
> >>> > > > > > > > > >> >  these abstract implementations. It is written to
> >>> handle
> >>> > > any
> >>> > > > > > type
> >>> > > > > > > of
> >>> > > > > > > > > >> query,
> >>> > > > > > > > > >> > be it a procedure call (that was an actual
> customer
> >>> use
> >>> > > > case).
> >>> > > > > > > > > >> >
> >>> > > > > > > > > >> > Chandni
> >>> > > > > > > > > >> >
> >>> > > > > > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
> >>> > > > > > > > > >> bhupesh@datatorrent.com>
> >>> > > > > > > > > >> > wrote:
> >>> > > > > > > > > >> >
> >>> > > > > > > > > >> > > Hi All,
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > > We are planning to proceed with the following
> >>> approach
> >>> > > for
> >>> > > > > > JDBC
> >>> > > > > > > > > >> *update*
> >>> > > > > > > > > >> > > operator:
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > >    - *Update Query Configuration*
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > >    - Example Update Query: *update tableName set
> >>> a =
> >>> > ?**
> >>> > > > > > where b
> >>> > > > > > > > = ?
> >>> > > > > > > > > >> and
> >>> > > > > > > > > >> > c
> >>> > > > > > > > > >> > >    > ?;*
> >>> > > > > > > > > >> > >    - Example JSON input array for parameter
> >>> > > > instantiations:
> >>> > > > > > > *[{a,
> >>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression,
> >>> VARCHAR}, {c,
> >>> > > > > > > expression,
> >>> > > > > > > > > >> > DATE}]*
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > > We are also planning to change the JDBC Output
> >>> > Operator
> >>> > > in
> >>> > > > > > > Malhar
> >>> > > > > > > > > >> Library
> >>> > > > > > > > > >> > > which currently does just insert. We plan to
> make
> >>> the
> >>> > > > input
> >>> > > > > > > format
> >>> > > > > > > > > >> > > consistent for both insert and update and hence
> >>> the
> >>> > > change
> >>> > > > > to
> >>> > > > > > > the
> >>> > > > > > > > > >> current
> >>> > > > > > > > > >> > > way of configuration using JSON. Following would
> >>> be
> >>> > the
> >>> > > > > config
> >>> > > > > > > for
> >>> > > > > > > > > >> > inserts:
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > >    - *Insert Query Configuration*
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > >    - Example Insert Query: *insert into
> tableName
> >>> > values
> >>> > > > (?,
> >>> > > > > > ?,
> >>> > > > > > > > .. ,
> >>> > > > > > > > > >> ?);*
> >>> > > > > > > > > >> > >    - Example JSON input array for parameter
> >>> > > > instantiations:
> >>> > > > > > > *[{a,
> >>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression,
> >>> VARCHAR}, ..
> >>> > ,
> >>> > > > {c,
> >>> > > > > > > > > >> expression,
> >>> > > > > > > > > >> > > DATE}]*
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > > Please let us know your thoughts.
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > > Thanks.
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > > -Bhupesh
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
> >>> > > > > > > > > >> bhupesh@datatorrent.com>
> >>> > > > > > > > > >> > > wrote:
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> > > > Hi All,
> >>> > > > > > > > > >> > > >
> >>> > > > > > > > > >> > > > Would it be a good idea to introduce the
> update
> >>> > > > > > functionality
> >>> > > > > > > to
> >>> > > > > > > > > the
> >>> > > > > > > > > >> > JDBC
> >>> > > > > > > > > >> > > > output operator in Apache Apex Malhar library.
> >>> > > > > > > > > >> > > >
> >>> > > > > > > > > >> > > > The following are possible approaches:
> >>> > > > > > > > > >> > > >
> >>> > > > > > > > > >> > > >    1. Accept a update query from the user with
> >>> place
> >>> > > > > holders
> >>> > > > > > > for
> >>> > > > > > > > > >> > values.
> >>> > > > > > > > > >> > > >    Example: *update tableName set a = ?, b = ?
> >>> > where c
> >>> > > > = ?
> >>> > > > > > and
> >>> > > > > > > > d >
> >>> > > > > > > > > >> ?*.
> >>> > > > > > > > > >> > > >    Here "?" will be provided by the user as
> java
> >>> > > > > expressions
> >>> > > > > > > > which
> >>> > > > > > > > > >> will
> >>> > > > > > > > > >> > > be
> >>> > > > > > > > > >> > > >    evaluated from the incoming tuple.
> >>> > > > > > > > > >> > > >    2. Another option is to accept in some
> >>> > > configuration
> >>> > > > > > format
> >>> > > > > > > > > >> (json /
> >>> > > > > > > > > >> > > >    xml) the following and formulate the query
> >>> in the
> >>> > > > > > operator.
> >>> > > > > > > > > This
> >>> > > > > > > > > >> can
> >>> > > > > > > > > >> > > become
> >>> > > > > > > > > >> > > >    arbitrarily complex.
> >>> > > > > > > > > >> > > >    1. update clause columns
> >>> > > > > > > > > >> > > >       2. update clause expressions
> >>> > > > > > > > > >> > > >       3. where clause columns
> >>> > > > > > > > > >> > > >       4. where clause expressions
> >>> > > > > > > > > >> > > >
> >>> > > > > > > > > >> > > > I am thinking about going ahead with 1. Please
> >>> let
> >>> > me
> >>> > > > know
> >>> > > > > > if
> >>> > > > > > > > any
> >>> > > > > > > > > >> other
> >>> > > > > > > > > >> > > > option is possible and whether such a
> >>> functionality
> >>> > > > > already
> >>> > > > > > > > exists
> >>> > > > > > > > > >> in
> >>> > > > > > > > > >> > > some
> >>> > > > > > > > > >> > > > other class.
> >>> > > > > > > > > >> > > >
> >>> > > > > > > > > >> > > > Thanks.
> >>> > > > > > > > > >> > > >
> >>> > > > > > > > > >> > > > -Bhupesh
> >>> > > > > > > > > >> > > >
> >>> > > > > > > > > >> > >
> >>> > > > > > > > > >> >
> >>> > > > > > > > > >>
> >>> > > > > > > > > >
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Re: Changes to JdbcOutputOperator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi All,

Apart from extending the JDBC Output operator (for update, merge and
delete), I am planning the following change to the "*insert*" functionality:


   - It was pointed out that - if the incoming POJO has fields which are to
   be inserted into a database table, the user should not have to specify any
   kind of mapping between POJO columns and database table column names. This
   is possible if the POJO field names are same as the db column names.


   - Here is what I am planning to do to address this:
      - Have a flag which indicates "*No mapping required. Assume same POJO
      field names as the DB column names.*"
      - If this flag is true, extract DB column names (in addition to the
      types, which is done currently) and create getters from POJO based on the
      DB column types.
      - Use these getters to populate the parametrized "*insert*" query.


Note - Once this is done, the user would not have to specify any field
descriptions. Just the DB table name would be sufficient. This would only
be possible if the DB column names are the same as the field names in the
POJO.
Without this change, "*insert*" will be considered as just another DDL
query like "update" / "delete" / "merge". We accept a parametrized query
and a mapping from the user for POJO field names to DB column names + DB
column types.

Thanks.
Bhupesh


On Wed, Dec 30, 2015 at 12:56 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Also, since we are not creating separate operators for insert and
> update/merge, it seems we don't need an abstract class. We can directly
> modify JdbcPOJOOutputOperator.
>
> -Bhupesh
>
> On Wed, Dec 30, 2015 at 11:17 AM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
>> Also, a note regarding accepting fieldInfo objects:
>>
>> FieldInfo has fields for java type of the column, but not the SQL data
>> type which is needed to set the parameters in the SQL statement. This still
>> needs to be derived from the database (as in insert).
>> In more complex scenarios, as mentioned earlier, this may not always be
>> possible. In this case we have the following options:
>>
>>    1. Accept the SQL data types from the user for the parameters ("?")
>>    in a new data structure.
>>    2. Accept the JSON structure as specified earlier.
>>    3. Modify FieldInfo to have an additional optional field which
>>    indicates the SQL data type of the parameter ("?")
>>
>> I am planning to go ahead with  option 3.
>>
>> Comments?
>>
>> -Bhupesh
>>
>>
>>
>> On Mon, Dec 28, 2015 at 11:26 PM, Chandni Singh <ch...@datatorrent.com>
>> wrote:
>>
>>> Yeah that sounds good.
>>>
>>> Chandni
>>>
>>> On Mon, Dec 28, 2015 at 2:46 AM, Bhupesh Chawda <bhupesh@datatorrent.com
>>> >
>>> wrote:
>>>
>>> > Thanks Chandni for the comments.
>>> >
>>> > Additionally I think, we should do away with constructing the SQL
>>> query in
>>> > the operator. This is because it is only possible in case of simple
>>> insert
>>> > statements. In case a where clause is needed in the insert statement,
>>> we
>>> > cannot construct the SQL easily. If we are accepting a parametrized
>>> query
>>> > from the user for update/merge, why not do the same for insert
>>> statements
>>> > as well? Then the hierarchy would look like:
>>> > - AbstractJdbcPOJOOutputOperator (As suggested)
>>> > -- JdbcPOJOOutputOperator (Takes care of all statements insert, update,
>>> > merge, delete)
>>> >
>>> > We don't need a separate operator for inserts and update/merge.
>>> >
>>> > Thanks.
>>> > -Bhupesh
>>> >
>>> >
>>> > On Mon, Dec 28, 2015 at 3:32 PM, Chandni Singh <
>>> chandni@datatorrent.com>
>>> > wrote:
>>> >
>>> > >   1. FieldInfo, is ultimately a custom object and any user who uses
>>> this
>>> > >    operator has to construct an object, populate it and then use it.
>>> We
>>> > are
>>> > >    trying to avoid using any custom object and allow any user to use
>>> the
>>> > >    operator without writing any extra code; just configuration.
>>> > > FieldInfo is a way to provide configuration in a UI friendly way.
>>> > Providing
>>> > > configuration as JSON is not UI friendly.
>>> > >
>>> > >    2. In case of update / merge, we need what SQL data types that the
>>> > >    expression provided by the user would evaluate to. In case of
>>> insert
>>> > we
>>> > > can
>>> > >    go and fetch the data types of the columns directly from DB.
>>> However,
>>> > > the
>>> > >    same is not possible for custom expressions; "avg(salary)" for
>>> > instance.
>>> > > Ok so here is where you can make a change.
>>> > > - JDBCPojoOutput can be renamed to AbstractJDBCPojoOutpuOperator.
>>> > > - Abstraction is to fetch the type of column.
>>> > > - Add a concrete JDBCPojoInsertOutput  that derives the types of
>>> columns
>>> > > directly from DB. Please note that FieldInfo can also provide type of
>>> > > derived column.
>>> > >
>>> > >
>>> > > You mentioned "We are trying to avoid using any custom object and
>>> allow
>>> > any
>>> > > user to use the
>>> > >    operator without writing any extra code".
>>> > > This I think is specific to your use case. You can create an
>>> extension of
>>> > > the above which takes JSON blob and creates FieldInfos from it.
>>> > >
>>> > > Chandni
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <
>>> bhupesh@datatorrent.com
>>> > >
>>> > > wrote:
>>> > >
>>> > > > Hi Chandni,
>>> > > >
>>> > > > Following are the issues:
>>> > > >
>>> > > >    1. FieldInfo, is ultimately a custom object and any user who
>>> uses
>>> > this
>>> > > >    operator has to construct an object, populate it and then use
>>> it. We
>>> > > are
>>> > > >    trying to avoid using any custom object and allow any user to
>>> use
>>> > the
>>> > > >    operator without writing any extra code; just configuration.
>>> > > >    2. In case of update / merge, we need what SQL data types that
>>> the
>>> > > >    expression provided by the user would evaluate to. In case of
>>> insert
>>> > > we
>>> > > > can
>>> > > >    go and fetch the data types of the columns directly from DB.
>>> > However,
>>> > > > the
>>> > > >    same is not possible for custom expressions; "avg(salary)" for
>>> > > instance.
>>> > > >
>>> > > > Thanks.
>>> > > >
>>> > > > -Bhupesh
>>> > > >
>>> > > >
>>> > > > On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <
>>> > chandni@datatorrent.com>
>>> > > > wrote:
>>> > > >
>>> > > > > Hi Bhupesh,
>>> > > > >
>>> > > > > JDBCPojoOutputOperator was written for a demo and therefore it
>>> was
>>> > > marked
>>> > > > > Evolving which is why I had mentioned that you should feel free
>>> to
>>> > > modify
>>> > > > > it.
>>> > > > >
>>> > > > > I think an insert query can be as complex as any other query. It
>>> uses
>>> > > > > FieldInfo because in the app builder it is easy for the user to
>>> > provide
>>> > > > > that instead of JSON String.
>>> > > > >
>>> > > > > Can you please provide specifics about what it is that you find
>>> > > difficult
>>> > > > > to change/implement for providing update/merge/delete support in
>>> > > > > JDBCPojoOutputOperator?
>>> > > > >
>>> > > > > Chandni
>>> > > > >
>>> > > > > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <
>>> > > bhupesh@datatorrent.com
>>> > > > >
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Hi All,
>>> > > > > >
>>> > > > > > It has been pointed out that adding another class for handling
>>> > > update /
>>> > > > > > merge queries would not be a good option.
>>> > > > > >
>>> > > > > > Here are the current implementation details:
>>> > > > > >
>>> > > > > >    - We have an existing class: JdbcPOJOOutputOperator which
>>> > accepts
>>> > > a
>>> > > > > list
>>> > > > > >    of FieldInfo objects. Each element of this list indicates
>>> the
>>> > > > details
>>> > > > > >    (column name, pojo field expression and datatype) of fields
>>> that
>>> > > > need
>>> > > > > > to be
>>> > > > > >    inserted. Using this, the operator formulates the insert
>>> query
>>> > in
>>> > > > the
>>> > > > > > setup
>>> > > > > >    method and identifies the sql datatypes of these columns
>>> from
>>> > the
>>> > > > > > database
>>> > > > > >    using the table name.
>>> > > > > >
>>> > > > > >
>>> > > > > >    - Now, coming to the update / merge feature, it is
>>> difficult to
>>> > > > > >    formulate the update / merge query in the operator logic
>>> due to
>>> > > the
>>> > > > > > complex
>>> > > > > >    structure of these statements. For this reason, we plan to
>>> take
>>> > a
>>> > > > > >    parametrized SQL query from the user. This may look like:
>>> > *"update
>>> > > > > table
>>> > > > > >    set x = ?, y = ? where z + w > ? and a == 1;"*. Such
>>> statements
>>> > > can
>>> > > > be
>>> > > > > >    accepted from the user in addition to a json string which
>>> > > indicates
>>> > > > > the
>>> > > > > >    details for the parameters: *column name, the pojo
>>> expression
>>> > and
>>> > > > the
>>> > > > > >    sql data type* of the expression. Note that this
>>> information is
>>> > > > > similar
>>> > > > > >    to the FieldInfo object, but is a string which can be
>>> configured
>>> > > > > easily
>>> > > > > > by
>>> > > > > >    the user. Also note that the data type which is accepted is
>>> the
>>> > > SQL
>>> > > > > data
>>> > > > > >    type. Using this info we can populate the parametrized
>>> query and
>>> > > run
>>> > > > > it
>>> > > > > >    based on the incoming POJO.
>>> > > > > >
>>> > > > > > The second approach is able to handle all kinds of queries
>>> (insert
>>> > /
>>> > > > > update
>>> > > > > > / merge / delete). However, since we already have the
>>> > > > > > JdbcPOJOOutputOperator, we would like to merge the new
>>> > functionality
>>> > > > into
>>> > > > > > the same class.
>>> > > > > >
>>> > > > > > Here we have the following options:
>>> > > > > >
>>> > > > > >    1. Change the existing class (JdbcPOJOOutputOperator) to the
>>> > > second
>>> > > > > >    approach which is more generic and also handles inserts.
>>> > > > > >    2. Add the update/ merge functionality to the existing class
>>> > > without
>>> > > > > >    changing the existing functionality. This will have two
>>> > different
>>> > > > ways
>>> > > > > > that
>>> > > > > >    insert queries may be handled in the operator.
>>> > > > > >    3. Add another class which extends from
>>> JdbcPOJOOutputOperator
>>> > and
>>> > > > > have
>>> > > > > >    the update/merge functionality there. (This is not
>>> recommended.)
>>> > > > > >    4. Any other approach.
>>> > > > > >
>>> > > > > > Please suggest.
>>> > > > > >
>>> > > > > > Thanks.
>>> > > > > >
>>> > > > > > -Bhupesh.
>>> > > > > >
>>> > > > > >
>>> > > > > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
>>> > > > chandni@datatorrent.com
>>> > > > > >
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > > > No I don't think we are restricting Malhar to just abstract
>>> > > classes.
>>> > > > > > > Whenever they are couple of use cases that we see quite
>>> often, we
>>> > > add
>>> > > > > > > concrete implementations.
>>> > > > > > >
>>> > > > > > > For eg. FileLineInputOperator which is a concrete
>>> implementation
>>> > of
>>> > > > > > > AbstractFileInputOperator. FSSliceReader is an example as
>>> well.
>>> > > > > > >
>>> > > > > > > In AbstractJdbcOutputOperator case there hasn't been such
>>> common
>>> > > > > > > insert/update query.
>>> > > > > > >
>>> > > > > > > Also if you look at the example I provided, it is very
>>> simple to
>>> > > > > provide
>>> > > > > > a
>>> > > > > > > concrete implementation.
>>> > > > > > >
>>> > > > > > > If you would like to change JdbcPOJOOutputOperator to work
>>> for
>>> > > > > > > "UPDATE/MERGE" then please go ahead.
>>> > > > > > >
>>> > > > > > > Chandni
>>> > > > > > >
>>> > > > > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
>>> > > > > bhupesh@datatorrent.com
>>> > > > > > >
>>> > > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > I see. So, just to understand more, do we plan to keep
>>> Malhar
>>> > > > > > restricted
>>> > > > > > > to
>>> > > > > > > > the base functionality (as in abstract classes)? And put
>>> the
>>> > > > > > > configuration
>>> > > > > > > > aspect / concrete implementations in apps that use these
>>> > > operators?
>>> > > > > > > >
>>> > > > > > > > Thanks.
>>> > > > > > > > Bhupesh
>>> > > > > > > >
>>> > > > > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
>>> > > > > > chandni@datatorrent.com>
>>> > > > > > > > wrote:
>>> > > > > > > >
>>> > > > > > > > > Hi,
>>> > > > > > > > >
>>> > > > > > > > > Here is an example of doing Upsert with JDBC:
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
>>> > > > > > > > >
>>> > > > > > > > > Thanks,
>>> > > > > > > > > Chandni
>>> > > > > > > > >
>>> > > > > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
>>> > > > > > > chandni@datatorrent.com
>>> > > > > > > > >
>>> > > > > > > > > wrote:
>>> > > > > > > > >
>>> > > > > > > > > > The operators are under Malhar/lib/db/jdbc.
>>> > > > > > > > > >
>>> > > > > > > > > > Here is one of them:
>>> > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
>>> > > > > > > > > >
>>> > > > > > > > > > They work with any kind PreparedStatement - insert or
>>> > update
>>> > > > > > > > > >
>>> > > > > > > > > > Chandni
>>> > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
>>> > > > > > > > > bhupesh@datatorrent.com>
>>> > > > > > > > > > wrote:
>>> > > > > > > > > >
>>> > > > > > > > > >> Hi Chandni,
>>> > > > > > > > > >>
>>> > > > > > > > > >> I don't see an update query being handled in the
>>> operator.
>>> > > > Could
>>> > > > > > you
>>> > > > > > > > > >> please
>>> > > > > > > > > >> point me to the appropriate class?
>>> > > > > > > > > >> Or did you mean that handling a update query is just a
>>> > > matter
>>> > > > of
>>> > > > > > > > > extending
>>> > > > > > > > > >> the class and providing a concrete implementation?
>>> > > > > > > > > >>
>>> > > > > > > > > >> Thanks.
>>> > > > > > > > > >> -Bhupesh
>>> > > > > > > > > >>
>>> > > > > > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
>>> > > > > > > > > chandni@datatorrent.com>
>>> > > > > > > > > >> wrote:
>>> > > > > > > > > >>
>>> > > > > > > > > >> > Hi Bhupesh,
>>> > > > > > > > > >> >
>>> > > > > > > > > >> > The current abstract JDBC Output Operators  in
>>> library
>>> > are
>>> > > > > > generic
>>> > > > > > > > and
>>> > > > > > > > > >> have
>>> > > > > > > > > >> > already been used in multiple POCs and
>>> applications. In
>>> > > fact
>>> > > > > > this
>>> > > > > > > > > >> operator
>>> > > > > > > > > >> > has matured through customer use cases. It is not
>>> just
>>> > an
>>> > > > > insert
>>> > > > > > > > > >> operator.
>>> > > > > > > > > >> > We have used it to perform update and inserts.
>>> > > > > > > > > >> >
>>> > > > > > > > > >> > That said, I don't think it is a good idea to
>>> introduce
>>> > > > input
>>> > > > > > > format
>>> > > > > > > > > in
>>> > > > > > > > > >> >  these abstract implementations. It is written to
>>> handle
>>> > > any
>>> > > > > > type
>>> > > > > > > of
>>> > > > > > > > > >> query,
>>> > > > > > > > > >> > be it a procedure call (that was an actual customer
>>> use
>>> > > > case).
>>> > > > > > > > > >> >
>>> > > > > > > > > >> > Chandni
>>> > > > > > > > > >> >
>>> > > > > > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
>>> > > > > > > > > >> bhupesh@datatorrent.com>
>>> > > > > > > > > >> > wrote:
>>> > > > > > > > > >> >
>>> > > > > > > > > >> > > Hi All,
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > We are planning to proceed with the following
>>> approach
>>> > > for
>>> > > > > > JDBC
>>> > > > > > > > > >> *update*
>>> > > > > > > > > >> > > operator:
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >    - *Update Query Configuration*
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >    - Example Update Query: *update tableName set
>>> a =
>>> > ?**
>>> > > > > > where b
>>> > > > > > > > = ?
>>> > > > > > > > > >> and
>>> > > > > > > > > >> > c
>>> > > > > > > > > >> > >    > ?;*
>>> > > > > > > > > >> > >    - Example JSON input array for parameter
>>> > > > instantiations:
>>> > > > > > > *[{a,
>>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression,
>>> VARCHAR}, {c,
>>> > > > > > > expression,
>>> > > > > > > > > >> > DATE}]*
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > We are also planning to change the JDBC Output
>>> > Operator
>>> > > in
>>> > > > > > > Malhar
>>> > > > > > > > > >> Library
>>> > > > > > > > > >> > > which currently does just insert. We plan to make
>>> the
>>> > > > input
>>> > > > > > > format
>>> > > > > > > > > >> > > consistent for both insert and update and hence
>>> the
>>> > > change
>>> > > > > to
>>> > > > > > > the
>>> > > > > > > > > >> current
>>> > > > > > > > > >> > > way of configuration using JSON. Following would
>>> be
>>> > the
>>> > > > > config
>>> > > > > > > for
>>> > > > > > > > > >> > inserts:
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >    - *Insert Query Configuration*
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >    - Example Insert Query: *insert into tableName
>>> > values
>>> > > > (?,
>>> > > > > > ?,
>>> > > > > > > > .. ,
>>> > > > > > > > > >> ?);*
>>> > > > > > > > > >> > >    - Example JSON input array for parameter
>>> > > > instantiations:
>>> > > > > > > *[{a,
>>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression,
>>> VARCHAR}, ..
>>> > ,
>>> > > > {c,
>>> > > > > > > > > >> expression,
>>> > > > > > > > > >> > > DATE}]*
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > Please let us know your thoughts.
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > Thanks.
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > -Bhupesh
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
>>> > > > > > > > > >> bhupesh@datatorrent.com>
>>> > > > > > > > > >> > > wrote:
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> > > > Hi All,
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > > > Would it be a good idea to introduce the update
>>> > > > > > functionality
>>> > > > > > > to
>>> > > > > > > > > the
>>> > > > > > > > > >> > JDBC
>>> > > > > > > > > >> > > > output operator in Apache Apex Malhar library.
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > > > The following are possible approaches:
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > > >    1. Accept a update query from the user with
>>> place
>>> > > > > holders
>>> > > > > > > for
>>> > > > > > > > > >> > values.
>>> > > > > > > > > >> > > >    Example: *update tableName set a = ?, b = ?
>>> > where c
>>> > > > = ?
>>> > > > > > and
>>> > > > > > > > d >
>>> > > > > > > > > >> ?*.
>>> > > > > > > > > >> > > >    Here "?" will be provided by the user as java
>>> > > > > expressions
>>> > > > > > > > which
>>> > > > > > > > > >> will
>>> > > > > > > > > >> > > be
>>> > > > > > > > > >> > > >    evaluated from the incoming tuple.
>>> > > > > > > > > >> > > >    2. Another option is to accept in some
>>> > > configuration
>>> > > > > > format
>>> > > > > > > > > >> (json /
>>> > > > > > > > > >> > > >    xml) the following and formulate the query
>>> in the
>>> > > > > > operator.
>>> > > > > > > > > This
>>> > > > > > > > > >> can
>>> > > > > > > > > >> > > become
>>> > > > > > > > > >> > > >    arbitrarily complex.
>>> > > > > > > > > >> > > >    1. update clause columns
>>> > > > > > > > > >> > > >       2. update clause expressions
>>> > > > > > > > > >> > > >       3. where clause columns
>>> > > > > > > > > >> > > >       4. where clause expressions
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > > > I am thinking about going ahead with 1. Please
>>> let
>>> > me
>>> > > > know
>>> > > > > > if
>>> > > > > > > > any
>>> > > > > > > > > >> other
>>> > > > > > > > > >> > > > option is possible and whether such a
>>> functionality
>>> > > > > already
>>> > > > > > > > exists
>>> > > > > > > > > >> in
>>> > > > > > > > > >> > > some
>>> > > > > > > > > >> > > > other class.
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > > > Thanks.
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > > > -Bhupesh
>>> > > > > > > > > >> > > >
>>> > > > > > > > > >> > >
>>> > > > > > > > > >> >
>>> > > > > > > > > >>
>>> > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Changes to JdbcOutputOperator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Also, since we are not creating separate operators for insert and
update/merge, it seems we don't need an abstract class. We can directly
modify JdbcPOJOOutputOperator.

-Bhupesh

On Wed, Dec 30, 2015 at 11:17 AM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Also, a note regarding accepting fieldInfo objects:
>
> FieldInfo has fields for java type of the column, but not the SQL data
> type which is needed to set the parameters in the SQL statement. This still
> needs to be derived from the database (as in insert).
> In more complex scenarios, as mentioned earlier, this may not always be
> possible. In this case we have the following options:
>
>    1. Accept the SQL data types from the user for the parameters ("?") in
>    a new data structure.
>    2. Accept the JSON structure as specified earlier.
>    3. Modify FieldInfo to have an additional optional field which
>    indicates the SQL data type of the parameter ("?")
>
> I am planning to go ahead with  option 3.
>
> Comments?
>
> -Bhupesh
>
>
>
> On Mon, Dec 28, 2015 at 11:26 PM, Chandni Singh <ch...@datatorrent.com>
> wrote:
>
>> Yeah that sounds good.
>>
>> Chandni
>>
>> On Mon, Dec 28, 2015 at 2:46 AM, Bhupesh Chawda <bh...@datatorrent.com>
>> wrote:
>>
>> > Thanks Chandni for the comments.
>> >
>> > Additionally I think, we should do away with constructing the SQL query
>> in
>> > the operator. This is because it is only possible in case of simple
>> insert
>> > statements. In case a where clause is needed in the insert statement, we
>> > cannot construct the SQL easily. If we are accepting a parametrized
>> query
>> > from the user for update/merge, why not do the same for insert
>> statements
>> > as well? Then the hierarchy would look like:
>> > - AbstractJdbcPOJOOutputOperator (As suggested)
>> > -- JdbcPOJOOutputOperator (Takes care of all statements insert, update,
>> > merge, delete)
>> >
>> > We don't need a separate operator for inserts and update/merge.
>> >
>> > Thanks.
>> > -Bhupesh
>> >
>> >
>> > On Mon, Dec 28, 2015 at 3:32 PM, Chandni Singh <chandni@datatorrent.com
>> >
>> > wrote:
>> >
>> > >   1. FieldInfo, is ultimately a custom object and any user who uses
>> this
>> > >    operator has to construct an object, populate it and then use it.
>> We
>> > are
>> > >    trying to avoid using any custom object and allow any user to use
>> the
>> > >    operator without writing any extra code; just configuration.
>> > > FieldInfo is a way to provide configuration in a UI friendly way.
>> > Providing
>> > > configuration as JSON is not UI friendly.
>> > >
>> > >    2. In case of update / merge, we need what SQL data types that the
>> > >    expression provided by the user would evaluate to. In case of
>> insert
>> > we
>> > > can
>> > >    go and fetch the data types of the columns directly from DB.
>> However,
>> > > the
>> > >    same is not possible for custom expressions; "avg(salary)" for
>> > instance.
>> > > Ok so here is where you can make a change.
>> > > - JDBCPojoOutput can be renamed to AbstractJDBCPojoOutpuOperator.
>> > > - Abstraction is to fetch the type of column.
>> > > - Add a concrete JDBCPojoInsertOutput  that derives the types of
>> columns
>> > > directly from DB. Please note that FieldInfo can also provide type of
>> > > derived column.
>> > >
>> > >
>> > > You mentioned "We are trying to avoid using any custom object and
>> allow
>> > any
>> > > user to use the
>> > >    operator without writing any extra code".
>> > > This I think is specific to your use case. You can create an
>> extension of
>> > > the above which takes JSON blob and creates FieldInfos from it.
>> > >
>> > > Chandni
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <
>> bhupesh@datatorrent.com
>> > >
>> > > wrote:
>> > >
>> > > > Hi Chandni,
>> > > >
>> > > > Following are the issues:
>> > > >
>> > > >    1. FieldInfo, is ultimately a custom object and any user who uses
>> > this
>> > > >    operator has to construct an object, populate it and then use
>> it. We
>> > > are
>> > > >    trying to avoid using any custom object and allow any user to use
>> > the
>> > > >    operator without writing any extra code; just configuration.
>> > > >    2. In case of update / merge, we need what SQL data types that
>> the
>> > > >    expression provided by the user would evaluate to. In case of
>> insert
>> > > we
>> > > > can
>> > > >    go and fetch the data types of the columns directly from DB.
>> > However,
>> > > > the
>> > > >    same is not possible for custom expressions; "avg(salary)" for
>> > > instance.
>> > > >
>> > > > Thanks.
>> > > >
>> > > > -Bhupesh
>> > > >
>> > > >
>> > > > On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <
>> > chandni@datatorrent.com>
>> > > > wrote:
>> > > >
>> > > > > Hi Bhupesh,
>> > > > >
>> > > > > JDBCPojoOutputOperator was written for a demo and therefore it was
>> > > marked
>> > > > > Evolving which is why I had mentioned that you should feel free to
>> > > modify
>> > > > > it.
>> > > > >
>> > > > > I think an insert query can be as complex as any other query. It
>> uses
>> > > > > FieldInfo because in the app builder it is easy for the user to
>> > provide
>> > > > > that instead of JSON String.
>> > > > >
>> > > > > Can you please provide specifics about what it is that you find
>> > > difficult
>> > > > > to change/implement for providing update/merge/delete support in
>> > > > > JDBCPojoOutputOperator?
>> > > > >
>> > > > > Chandni
>> > > > >
>> > > > > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <
>> > > bhupesh@datatorrent.com
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Hi All,
>> > > > > >
>> > > > > > It has been pointed out that adding another class for handling
>> > > update /
>> > > > > > merge queries would not be a good option.
>> > > > > >
>> > > > > > Here are the current implementation details:
>> > > > > >
>> > > > > >    - We have an existing class: JdbcPOJOOutputOperator which
>> > accepts
>> > > a
>> > > > > list
>> > > > > >    of FieldInfo objects. Each element of this list indicates the
>> > > > details
>> > > > > >    (column name, pojo field expression and datatype) of fields
>> that
>> > > > need
>> > > > > > to be
>> > > > > >    inserted. Using this, the operator formulates the insert
>> query
>> > in
>> > > > the
>> > > > > > setup
>> > > > > >    method and identifies the sql datatypes of these columns from
>> > the
>> > > > > > database
>> > > > > >    using the table name.
>> > > > > >
>> > > > > >
>> > > > > >    - Now, coming to the update / merge feature, it is difficult
>> to
>> > > > > >    formulate the update / merge query in the operator logic due
>> to
>> > > the
>> > > > > > complex
>> > > > > >    structure of these statements. For this reason, we plan to
>> take
>> > a
>> > > > > >    parametrized SQL query from the user. This may look like:
>> > *"update
>> > > > > table
>> > > > > >    set x = ?, y = ? where z + w > ? and a == 1;"*. Such
>> statements
>> > > can
>> > > > be
>> > > > > >    accepted from the user in addition to a json string which
>> > > indicates
>> > > > > the
>> > > > > >    details for the parameters: *column name, the pojo expression
>> > and
>> > > > the
>> > > > > >    sql data type* of the expression. Note that this information
>> is
>> > > > > similar
>> > > > > >    to the FieldInfo object, but is a string which can be
>> configured
>> > > > > easily
>> > > > > > by
>> > > > > >    the user. Also note that the data type which is accepted is
>> the
>> > > SQL
>> > > > > data
>> > > > > >    type. Using this info we can populate the parametrized query
>> and
>> > > run
>> > > > > it
>> > > > > >    based on the incoming POJO.
>> > > > > >
>> > > > > > The second approach is able to handle all kinds of queries
>> (insert
>> > /
>> > > > > update
>> > > > > > / merge / delete). However, since we already have the
>> > > > > > JdbcPOJOOutputOperator, we would like to merge the new
>> > functionality
>> > > > into
>> > > > > > the same class.
>> > > > > >
>> > > > > > Here we have the following options:
>> > > > > >
>> > > > > >    1. Change the existing class (JdbcPOJOOutputOperator) to the
>> > > second
>> > > > > >    approach which is more generic and also handles inserts.
>> > > > > >    2. Add the update/ merge functionality to the existing class
>> > > without
>> > > > > >    changing the existing functionality. This will have two
>> > different
>> > > > ways
>> > > > > > that
>> > > > > >    insert queries may be handled in the operator.
>> > > > > >    3. Add another class which extends from
>> JdbcPOJOOutputOperator
>> > and
>> > > > > have
>> > > > > >    the update/merge functionality there. (This is not
>> recommended.)
>> > > > > >    4. Any other approach.
>> > > > > >
>> > > > > > Please suggest.
>> > > > > >
>> > > > > > Thanks.
>> > > > > >
>> > > > > > -Bhupesh.
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
>> > > > chandni@datatorrent.com
>> > > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > No I don't think we are restricting Malhar to just abstract
>> > > classes.
>> > > > > > > Whenever they are couple of use cases that we see quite
>> often, we
>> > > add
>> > > > > > > concrete implementations.
>> > > > > > >
>> > > > > > > For eg. FileLineInputOperator which is a concrete
>> implementation
>> > of
>> > > > > > > AbstractFileInputOperator. FSSliceReader is an example as
>> well.
>> > > > > > >
>> > > > > > > In AbstractJdbcOutputOperator case there hasn't been such
>> common
>> > > > > > > insert/update query.
>> > > > > > >
>> > > > > > > Also if you look at the example I provided, it is very simple
>> to
>> > > > > provide
>> > > > > > a
>> > > > > > > concrete implementation.
>> > > > > > >
>> > > > > > > If you would like to change JdbcPOJOOutputOperator to work for
>> > > > > > > "UPDATE/MERGE" then please go ahead.
>> > > > > > >
>> > > > > > > Chandni
>> > > > > > >
>> > > > > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
>> > > > > bhupesh@datatorrent.com
>> > > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > I see. So, just to understand more, do we plan to keep
>> Malhar
>> > > > > > restricted
>> > > > > > > to
>> > > > > > > > the base functionality (as in abstract classes)? And put the
>> > > > > > > configuration
>> > > > > > > > aspect / concrete implementations in apps that use these
>> > > operators?
>> > > > > > > >
>> > > > > > > > Thanks.
>> > > > > > > > Bhupesh
>> > > > > > > >
>> > > > > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
>> > > > > > chandni@datatorrent.com>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi,
>> > > > > > > > >
>> > > > > > > > > Here is an example of doing Upsert with JDBC:
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > > Chandni
>> > > > > > > > >
>> > > > > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
>> > > > > > > chandni@datatorrent.com
>> > > > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > The operators are under Malhar/lib/db/jdbc.
>> > > > > > > > > >
>> > > > > > > > > > Here is one of them:
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
>> > > > > > > > > >
>> > > > > > > > > > They work with any kind PreparedStatement - insert or
>> > update
>> > > > > > > > > >
>> > > > > > > > > > Chandni
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
>> > > > > > > > > bhupesh@datatorrent.com>
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > >> Hi Chandni,
>> > > > > > > > > >>
>> > > > > > > > > >> I don't see an update query being handled in the
>> operator.
>> > > > Could
>> > > > > > you
>> > > > > > > > > >> please
>> > > > > > > > > >> point me to the appropriate class?
>> > > > > > > > > >> Or did you mean that handling a update query is just a
>> > > matter
>> > > > of
>> > > > > > > > > extending
>> > > > > > > > > >> the class and providing a concrete implementation?
>> > > > > > > > > >>
>> > > > > > > > > >> Thanks.
>> > > > > > > > > >> -Bhupesh
>> > > > > > > > > >>
>> > > > > > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
>> > > > > > > > > chandni@datatorrent.com>
>> > > > > > > > > >> wrote:
>> > > > > > > > > >>
>> > > > > > > > > >> > Hi Bhupesh,
>> > > > > > > > > >> >
>> > > > > > > > > >> > The current abstract JDBC Output Operators  in
>> library
>> > are
>> > > > > > generic
>> > > > > > > > and
>> > > > > > > > > >> have
>> > > > > > > > > >> > already been used in multiple POCs and applications.
>> In
>> > > fact
>> > > > > > this
>> > > > > > > > > >> operator
>> > > > > > > > > >> > has matured through customer use cases. It is not
>> just
>> > an
>> > > > > insert
>> > > > > > > > > >> operator.
>> > > > > > > > > >> > We have used it to perform update and inserts.
>> > > > > > > > > >> >
>> > > > > > > > > >> > That said, I don't think it is a good idea to
>> introduce
>> > > > input
>> > > > > > > format
>> > > > > > > > > in
>> > > > > > > > > >> >  these abstract implementations. It is written to
>> handle
>> > > any
>> > > > > > type
>> > > > > > > of
>> > > > > > > > > >> query,
>> > > > > > > > > >> > be it a procedure call (that was an actual customer
>> use
>> > > > case).
>> > > > > > > > > >> >
>> > > > > > > > > >> > Chandni
>> > > > > > > > > >> >
>> > > > > > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
>> > > > > > > > > >> bhupesh@datatorrent.com>
>> > > > > > > > > >> > wrote:
>> > > > > > > > > >> >
>> > > > > > > > > >> > > Hi All,
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > We are planning to proceed with the following
>> approach
>> > > for
>> > > > > > JDBC
>> > > > > > > > > >> *update*
>> > > > > > > > > >> > > operator:
>> > > > > > > > > >> > >
>> > > > > > > > > >> > >    - *Update Query Configuration*
>> > > > > > > > > >> > >
>> > > > > > > > > >> > >
>> > > > > > > > > >> > >    - Example Update Query: *update tableName set a
>> =
>> > ?**
>> > > > > > where b
>> > > > > > > > = ?
>> > > > > > > > > >> and
>> > > > > > > > > >> > c
>> > > > > > > > > >> > >    > ?;*
>> > > > > > > > > >> > >    - Example JSON input array for parameter
>> > > > instantiations:
>> > > > > > > *[{a,
>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR},
>> {c,
>> > > > > > > expression,
>> > > > > > > > > >> > DATE}]*
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > We are also planning to change the JDBC Output
>> > Operator
>> > > in
>> > > > > > > Malhar
>> > > > > > > > > >> Library
>> > > > > > > > > >> > > which currently does just insert. We plan to make
>> the
>> > > > input
>> > > > > > > format
>> > > > > > > > > >> > > consistent for both insert and update and hence the
>> > > change
>> > > > > to
>> > > > > > > the
>> > > > > > > > > >> current
>> > > > > > > > > >> > > way of configuration using JSON. Following would be
>> > the
>> > > > > config
>> > > > > > > for
>> > > > > > > > > >> > inserts:
>> > > > > > > > > >> > >
>> > > > > > > > > >> > >    - *Insert Query Configuration*
>> > > > > > > > > >> > >
>> > > > > > > > > >> > >
>> > > > > > > > > >> > >    - Example Insert Query: *insert into tableName
>> > values
>> > > > (?,
>> > > > > > ?,
>> > > > > > > > .. ,
>> > > > > > > > > >> ?);*
>> > > > > > > > > >> > >    - Example JSON input array for parameter
>> > > > instantiations:
>> > > > > > > *[{a,
>> > > > > > > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR},
>> ..
>> > ,
>> > > > {c,
>> > > > > > > > > >> expression,
>> > > > > > > > > >> > > DATE}]*
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > Please let us know your thoughts.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > Thanks.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > -Bhupesh
>> > > > > > > > > >> > >
>> > > > > > > > > >> > >
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
>> > > > > > > > > >> bhupesh@datatorrent.com>
>> > > > > > > > > >> > > wrote:
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > > Hi All,
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > Would it be a good idea to introduce the update
>> > > > > > functionality
>> > > > > > > to
>> > > > > > > > > the
>> > > > > > > > > >> > JDBC
>> > > > > > > > > >> > > > output operator in Apache Apex Malhar library.
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > The following are possible approaches:
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > >    1. Accept a update query from the user with
>> place
>> > > > > holders
>> > > > > > > for
>> > > > > > > > > >> > values.
>> > > > > > > > > >> > > >    Example: *update tableName set a = ?, b = ?
>> > where c
>> > > > = ?
>> > > > > > and
>> > > > > > > > d >
>> > > > > > > > > >> ?*.
>> > > > > > > > > >> > > >    Here "?" will be provided by the user as java
>> > > > > expressions
>> > > > > > > > which
>> > > > > > > > > >> will
>> > > > > > > > > >> > > be
>> > > > > > > > > >> > > >    evaluated from the incoming tuple.
>> > > > > > > > > >> > > >    2. Another option is to accept in some
>> > > configuration
>> > > > > > format
>> > > > > > > > > >> (json /
>> > > > > > > > > >> > > >    xml) the following and formulate the query in
>> the
>> > > > > > operator.
>> > > > > > > > > This
>> > > > > > > > > >> can
>> > > > > > > > > >> > > become
>> > > > > > > > > >> > > >    arbitrarily complex.
>> > > > > > > > > >> > > >    1. update clause columns
>> > > > > > > > > >> > > >       2. update clause expressions
>> > > > > > > > > >> > > >       3. where clause columns
>> > > > > > > > > >> > > >       4. where clause expressions
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > I am thinking about going ahead with 1. Please
>> let
>> > me
>> > > > know
>> > > > > > if
>> > > > > > > > any
>> > > > > > > > > >> other
>> > > > > > > > > >> > > > option is possible and whether such a
>> functionality
>> > > > > already
>> > > > > > > > exists
>> > > > > > > > > >> in
>> > > > > > > > > >> > > some
>> > > > > > > > > >> > > > other class.
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > Thanks.
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > > > -Bhupesh
>> > > > > > > > > >> > > >
>> > > > > > > > > >> > >
>> > > > > > > > > >> >
>> > > > > > > > > >>
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Changes to JdbcOutputOperator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Also, a note regarding accepting fieldInfo objects:

FieldInfo has fields for java type of the column, but not the SQL data type
which is needed to set the parameters in the SQL statement. This still
needs to be derived from the database (as in insert).
In more complex scenarios, as mentioned earlier, this may not always be
possible. In this case we have the following options:

   1. Accept the SQL data types from the user for the parameters ("?") in a
   new data structure.
   2. Accept the JSON structure as specified earlier.
   3. Modify FieldInfo to have an additional optional field which indicates
   the SQL data type of the parameter ("?")

I am planning to go ahead with  option 3.

Comments?

-Bhupesh



On Mon, Dec 28, 2015 at 11:26 PM, Chandni Singh <ch...@datatorrent.com>
wrote:

> Yeah that sounds good.
>
> Chandni
>
> On Mon, Dec 28, 2015 at 2:46 AM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Thanks Chandni for the comments.
> >
> > Additionally I think, we should do away with constructing the SQL query
> in
> > the operator. This is because it is only possible in case of simple
> insert
> > statements. In case a where clause is needed in the insert statement, we
> > cannot construct the SQL easily. If we are accepting a parametrized query
> > from the user for update/merge, why not do the same for insert statements
> > as well? Then the hierarchy would look like:
> > - AbstractJdbcPOJOOutputOperator (As suggested)
> > -- JdbcPOJOOutputOperator (Takes care of all statements insert, update,
> > merge, delete)
> >
> > We don't need a separate operator for inserts and update/merge.
> >
> > Thanks.
> > -Bhupesh
> >
> >
> > On Mon, Dec 28, 2015 at 3:32 PM, Chandni Singh <ch...@datatorrent.com>
> > wrote:
> >
> > >   1. FieldInfo, is ultimately a custom object and any user who uses
> this
> > >    operator has to construct an object, populate it and then use it. We
> > are
> > >    trying to avoid using any custom object and allow any user to use
> the
> > >    operator without writing any extra code; just configuration.
> > > FieldInfo is a way to provide configuration in a UI friendly way.
> > Providing
> > > configuration as JSON is not UI friendly.
> > >
> > >    2. In case of update / merge, we need what SQL data types that the
> > >    expression provided by the user would evaluate to. In case of insert
> > we
> > > can
> > >    go and fetch the data types of the columns directly from DB.
> However,
> > > the
> > >    same is not possible for custom expressions; "avg(salary)" for
> > instance.
> > > Ok so here is where you can make a change.
> > > - JDBCPojoOutput can be renamed to AbstractJDBCPojoOutpuOperator.
> > > - Abstraction is to fetch the type of column.
> > > - Add a concrete JDBCPojoInsertOutput  that derives the types of
> columns
> > > directly from DB. Please note that FieldInfo can also provide type of
> > > derived column.
> > >
> > >
> > > You mentioned "We are trying to avoid using any custom object and allow
> > any
> > > user to use the
> > >    operator without writing any extra code".
> > > This I think is specific to your use case. You can create an extension
> of
> > > the above which takes JSON blob and creates FieldInfos from it.
> > >
> > > Chandni
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <
> bhupesh@datatorrent.com
> > >
> > > wrote:
> > >
> > > > Hi Chandni,
> > > >
> > > > Following are the issues:
> > > >
> > > >    1. FieldInfo, is ultimately a custom object and any user who uses
> > this
> > > >    operator has to construct an object, populate it and then use it.
> We
> > > are
> > > >    trying to avoid using any custom object and allow any user to use
> > the
> > > >    operator without writing any extra code; just configuration.
> > > >    2. In case of update / merge, we need what SQL data types that the
> > > >    expression provided by the user would evaluate to. In case of
> insert
> > > we
> > > > can
> > > >    go and fetch the data types of the columns directly from DB.
> > However,
> > > > the
> > > >    same is not possible for custom expressions; "avg(salary)" for
> > > instance.
> > > >
> > > > Thanks.
> > > >
> > > > -Bhupesh
> > > >
> > > >
> > > > On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <
> > chandni@datatorrent.com>
> > > > wrote:
> > > >
> > > > > Hi Bhupesh,
> > > > >
> > > > > JDBCPojoOutputOperator was written for a demo and therefore it was
> > > marked
> > > > > Evolving which is why I had mentioned that you should feel free to
> > > modify
> > > > > it.
> > > > >
> > > > > I think an insert query can be as complex as any other query. It
> uses
> > > > > FieldInfo because in the app builder it is easy for the user to
> > provide
> > > > > that instead of JSON String.
> > > > >
> > > > > Can you please provide specifics about what it is that you find
> > > difficult
> > > > > to change/implement for providing update/merge/delete support in
> > > > > JDBCPojoOutputOperator?
> > > > >
> > > > > Chandni
> > > > >
> > > > > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <
> > > bhupesh@datatorrent.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > It has been pointed out that adding another class for handling
> > > update /
> > > > > > merge queries would not be a good option.
> > > > > >
> > > > > > Here are the current implementation details:
> > > > > >
> > > > > >    - We have an existing class: JdbcPOJOOutputOperator which
> > accepts
> > > a
> > > > > list
> > > > > >    of FieldInfo objects. Each element of this list indicates the
> > > > details
> > > > > >    (column name, pojo field expression and datatype) of fields
> that
> > > > need
> > > > > > to be
> > > > > >    inserted. Using this, the operator formulates the insert query
> > in
> > > > the
> > > > > > setup
> > > > > >    method and identifies the sql datatypes of these columns from
> > the
> > > > > > database
> > > > > >    using the table name.
> > > > > >
> > > > > >
> > > > > >    - Now, coming to the update / merge feature, it is difficult
> to
> > > > > >    formulate the update / merge query in the operator logic due
> to
> > > the
> > > > > > complex
> > > > > >    structure of these statements. For this reason, we plan to
> take
> > a
> > > > > >    parametrized SQL query from the user. This may look like:
> > *"update
> > > > > table
> > > > > >    set x = ?, y = ? where z + w > ? and a == 1;"*. Such
> statements
> > > can
> > > > be
> > > > > >    accepted from the user in addition to a json string which
> > > indicates
> > > > > the
> > > > > >    details for the parameters: *column name, the pojo expression
> > and
> > > > the
> > > > > >    sql data type* of the expression. Note that this information
> is
> > > > > similar
> > > > > >    to the FieldInfo object, but is a string which can be
> configured
> > > > > easily
> > > > > > by
> > > > > >    the user. Also note that the data type which is accepted is
> the
> > > SQL
> > > > > data
> > > > > >    type. Using this info we can populate the parametrized query
> and
> > > run
> > > > > it
> > > > > >    based on the incoming POJO.
> > > > > >
> > > > > > The second approach is able to handle all kinds of queries
> (insert
> > /
> > > > > update
> > > > > > / merge / delete). However, since we already have the
> > > > > > JdbcPOJOOutputOperator, we would like to merge the new
> > functionality
> > > > into
> > > > > > the same class.
> > > > > >
> > > > > > Here we have the following options:
> > > > > >
> > > > > >    1. Change the existing class (JdbcPOJOOutputOperator) to the
> > > second
> > > > > >    approach which is more generic and also handles inserts.
> > > > > >    2. Add the update/ merge functionality to the existing class
> > > without
> > > > > >    changing the existing functionality. This will have two
> > different
> > > > ways
> > > > > > that
> > > > > >    insert queries may be handled in the operator.
> > > > > >    3. Add another class which extends from JdbcPOJOOutputOperator
> > and
> > > > > have
> > > > > >    the update/merge functionality there. (This is not
> recommended.)
> > > > > >    4. Any other approach.
> > > > > >
> > > > > > Please suggest.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > -Bhupesh.
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
> > > > chandni@datatorrent.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > No I don't think we are restricting Malhar to just abstract
> > > classes.
> > > > > > > Whenever they are couple of use cases that we see quite often,
> we
> > > add
> > > > > > > concrete implementations.
> > > > > > >
> > > > > > > For eg. FileLineInputOperator which is a concrete
> implementation
> > of
> > > > > > > AbstractFileInputOperator. FSSliceReader is an example as well.
> > > > > > >
> > > > > > > In AbstractJdbcOutputOperator case there hasn't been such
> common
> > > > > > > insert/update query.
> > > > > > >
> > > > > > > Also if you look at the example I provided, it is very simple
> to
> > > > > provide
> > > > > > a
> > > > > > > concrete implementation.
> > > > > > >
> > > > > > > If you would like to change JdbcPOJOOutputOperator to work for
> > > > > > > "UPDATE/MERGE" then please go ahead.
> > > > > > >
> > > > > > > Chandni
> > > > > > >
> > > > > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
> > > > > bhupesh@datatorrent.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I see. So, just to understand more, do we plan to keep Malhar
> > > > > > restricted
> > > > > > > to
> > > > > > > > the base functionality (as in abstract classes)? And put the
> > > > > > > configuration
> > > > > > > > aspect / concrete implementations in apps that use these
> > > operators?
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > > Bhupesh
> > > > > > > >
> > > > > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
> > > > > > chandni@datatorrent.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Here is an example of doing Upsert with JDBC:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Chandni
> > > > > > > > >
> > > > > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
> > > > > > > chandni@datatorrent.com
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > The operators are under Malhar/lib/db/jdbc.
> > > > > > > > > >
> > > > > > > > > > Here is one of them:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> > > > > > > > > >
> > > > > > > > > > They work with any kind PreparedStatement - insert or
> > update
> > > > > > > > > >
> > > > > > > > > > Chandni
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> > > > > > > > > bhupesh@datatorrent.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Hi Chandni,
> > > > > > > > > >>
> > > > > > > > > >> I don't see an update query being handled in the
> operator.
> > > > Could
> > > > > > you
> > > > > > > > > >> please
> > > > > > > > > >> point me to the appropriate class?
> > > > > > > > > >> Or did you mean that handling a update query is just a
> > > matter
> > > > of
> > > > > > > > > extending
> > > > > > > > > >> the class and providing a concrete implementation?
> > > > > > > > > >>
> > > > > > > > > >> Thanks.
> > > > > > > > > >> -Bhupesh
> > > > > > > > > >>
> > > > > > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
> > > > > > > > > chandni@datatorrent.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hi Bhupesh,
> > > > > > > > > >> >
> > > > > > > > > >> > The current abstract JDBC Output Operators  in library
> > are
> > > > > > generic
> > > > > > > > and
> > > > > > > > > >> have
> > > > > > > > > >> > already been used in multiple POCs and applications.
> In
> > > fact
> > > > > > this
> > > > > > > > > >> operator
> > > > > > > > > >> > has matured through customer use cases. It is not just
> > an
> > > > > insert
> > > > > > > > > >> operator.
> > > > > > > > > >> > We have used it to perform update and inserts.
> > > > > > > > > >> >
> > > > > > > > > >> > That said, I don't think it is a good idea to
> introduce
> > > > input
> > > > > > > format
> > > > > > > > > in
> > > > > > > > > >> >  these abstract implementations. It is written to
> handle
> > > any
> > > > > > type
> > > > > > > of
> > > > > > > > > >> query,
> > > > > > > > > >> > be it a procedure call (that was an actual customer
> use
> > > > case).
> > > > > > > > > >> >
> > > > > > > > > >> > Chandni
> > > > > > > > > >> >
> > > > > > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
> > > > > > > > > >> bhupesh@datatorrent.com>
> > > > > > > > > >> > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > Hi All,
> > > > > > > > > >> > >
> > > > > > > > > >> > > We are planning to proceed with the following
> approach
> > > for
> > > > > > JDBC
> > > > > > > > > >> *update*
> > > > > > > > > >> > > operator:
> > > > > > > > > >> > >
> > > > > > > > > >> > >    - *Update Query Configuration*
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > >    - Example Update Query: *update tableName set a =
> > ?**
> > > > > > where b
> > > > > > > > = ?
> > > > > > > > > >> and
> > > > > > > > > >> > c
> > > > > > > > > >> > >    > ?;*
> > > > > > > > > >> > >    - Example JSON input array for parameter
> > > > instantiations:
> > > > > > > *[{a,
> > > > > > > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR},
> {c,
> > > > > > > expression,
> > > > > > > > > >> > DATE}]*
> > > > > > > > > >> > >
> > > > > > > > > >> > > We are also planning to change the JDBC Output
> > Operator
> > > in
> > > > > > > Malhar
> > > > > > > > > >> Library
> > > > > > > > > >> > > which currently does just insert. We plan to make
> the
> > > > input
> > > > > > > format
> > > > > > > > > >> > > consistent for both insert and update and hence the
> > > change
> > > > > to
> > > > > > > the
> > > > > > > > > >> current
> > > > > > > > > >> > > way of configuration using JSON. Following would be
> > the
> > > > > config
> > > > > > > for
> > > > > > > > > >> > inserts:
> > > > > > > > > >> > >
> > > > > > > > > >> > >    - *Insert Query Configuration*
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > >    - Example Insert Query: *insert into tableName
> > values
> > > > (?,
> > > > > > ?,
> > > > > > > > .. ,
> > > > > > > > > >> ?);*
> > > > > > > > > >> > >    - Example JSON input array for parameter
> > > > instantiations:
> > > > > > > *[{a,
> > > > > > > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR},
> ..
> > ,
> > > > {c,
> > > > > > > > > >> expression,
> > > > > > > > > >> > > DATE}]*
> > > > > > > > > >> > >
> > > > > > > > > >> > > Please let us know your thoughts.
> > > > > > > > > >> > >
> > > > > > > > > >> > > Thanks.
> > > > > > > > > >> > >
> > > > > > > > > >> > > -Bhupesh
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
> > > > > > > > > >> bhupesh@datatorrent.com>
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > > Hi All,
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Would it be a good idea to introduce the update
> > > > > > functionality
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > >> > JDBC
> > > > > > > > > >> > > > output operator in Apache Apex Malhar library.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > The following are possible approaches:
> > > > > > > > > >> > > >
> > > > > > > > > >> > > >    1. Accept a update query from the user with
> place
> > > > > holders
> > > > > > > for
> > > > > > > > > >> > values.
> > > > > > > > > >> > > >    Example: *update tableName set a = ?, b = ?
> > where c
> > > > = ?
> > > > > > and
> > > > > > > > d >
> > > > > > > > > >> ?*.
> > > > > > > > > >> > > >    Here "?" will be provided by the user as java
> > > > > expressions
> > > > > > > > which
> > > > > > > > > >> will
> > > > > > > > > >> > > be
> > > > > > > > > >> > > >    evaluated from the incoming tuple.
> > > > > > > > > >> > > >    2. Another option is to accept in some
> > > configuration
> > > > > > format
> > > > > > > > > >> (json /
> > > > > > > > > >> > > >    xml) the following and formulate the query in
> the
> > > > > > operator.
> > > > > > > > > This
> > > > > > > > > >> can
> > > > > > > > > >> > > become
> > > > > > > > > >> > > >    arbitrarily complex.
> > > > > > > > > >> > > >    1. update clause columns
> > > > > > > > > >> > > >       2. update clause expressions
> > > > > > > > > >> > > >       3. where clause columns
> > > > > > > > > >> > > >       4. where clause expressions
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > I am thinking about going ahead with 1. Please let
> > me
> > > > know
> > > > > > if
> > > > > > > > any
> > > > > > > > > >> other
> > > > > > > > > >> > > > option is possible and whether such a
> functionality
> > > > > already
> > > > > > > > exists
> > > > > > > > > >> in
> > > > > > > > > >> > > some
> > > > > > > > > >> > > > other class.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Thanks.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > -Bhupesh
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Changes to JdbcOutputOperator

Posted by Chandni Singh <ch...@datatorrent.com>.
Yeah that sounds good.

Chandni

On Mon, Dec 28, 2015 at 2:46 AM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Thanks Chandni for the comments.
>
> Additionally I think, we should do away with constructing the SQL query in
> the operator. This is because it is only possible in case of simple insert
> statements. In case a where clause is needed in the insert statement, we
> cannot construct the SQL easily. If we are accepting a parametrized query
> from the user for update/merge, why not do the same for insert statements
> as well? Then the hierarchy would look like:
> - AbstractJdbcPOJOOutputOperator (As suggested)
> -- JdbcPOJOOutputOperator (Takes care of all statements insert, update,
> merge, delete)
>
> We don't need a separate operator for inserts and update/merge.
>
> Thanks.
> -Bhupesh
>
>
> On Mon, Dec 28, 2015 at 3:32 PM, Chandni Singh <ch...@datatorrent.com>
> wrote:
>
> >   1. FieldInfo, is ultimately a custom object and any user who uses this
> >    operator has to construct an object, populate it and then use it. We
> are
> >    trying to avoid using any custom object and allow any user to use the
> >    operator without writing any extra code; just configuration.
> > FieldInfo is a way to provide configuration in a UI friendly way.
> Providing
> > configuration as JSON is not UI friendly.
> >
> >    2. In case of update / merge, we need what SQL data types that the
> >    expression provided by the user would evaluate to. In case of insert
> we
> > can
> >    go and fetch the data types of the columns directly from DB. However,
> > the
> >    same is not possible for custom expressions; "avg(salary)" for
> instance.
> > Ok so here is where you can make a change.
> > - JDBCPojoOutput can be renamed to AbstractJDBCPojoOutpuOperator.
> > - Abstraction is to fetch the type of column.
> > - Add a concrete JDBCPojoInsertOutput  that derives the types of columns
> > directly from DB. Please note that FieldInfo can also provide type of
> > derived column.
> >
> >
> > You mentioned "We are trying to avoid using any custom object and allow
> any
> > user to use the
> >    operator without writing any extra code".
> > This I think is specific to your use case. You can create an extension of
> > the above which takes JSON blob and creates FieldInfos from it.
> >
> > Chandni
> >
> >
> >
> >
> >
> > On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <bhupesh@datatorrent.com
> >
> > wrote:
> >
> > > Hi Chandni,
> > >
> > > Following are the issues:
> > >
> > >    1. FieldInfo, is ultimately a custom object and any user who uses
> this
> > >    operator has to construct an object, populate it and then use it. We
> > are
> > >    trying to avoid using any custom object and allow any user to use
> the
> > >    operator without writing any extra code; just configuration.
> > >    2. In case of update / merge, we need what SQL data types that the
> > >    expression provided by the user would evaluate to. In case of insert
> > we
> > > can
> > >    go and fetch the data types of the columns directly from DB.
> However,
> > > the
> > >    same is not possible for custom expressions; "avg(salary)" for
> > instance.
> > >
> > > Thanks.
> > >
> > > -Bhupesh
> > >
> > >
> > > On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <
> chandni@datatorrent.com>
> > > wrote:
> > >
> > > > Hi Bhupesh,
> > > >
> > > > JDBCPojoOutputOperator was written for a demo and therefore it was
> > marked
> > > > Evolving which is why I had mentioned that you should feel free to
> > modify
> > > > it.
> > > >
> > > > I think an insert query can be as complex as any other query. It uses
> > > > FieldInfo because in the app builder it is easy for the user to
> provide
> > > > that instead of JSON String.
> > > >
> > > > Can you please provide specifics about what it is that you find
> > difficult
> > > > to change/implement for providing update/merge/delete support in
> > > > JDBCPojoOutputOperator?
> > > >
> > > > Chandni
> > > >
> > > > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <
> > bhupesh@datatorrent.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > It has been pointed out that adding another class for handling
> > update /
> > > > > merge queries would not be a good option.
> > > > >
> > > > > Here are the current implementation details:
> > > > >
> > > > >    - We have an existing class: JdbcPOJOOutputOperator which
> accepts
> > a
> > > > list
> > > > >    of FieldInfo objects. Each element of this list indicates the
> > > details
> > > > >    (column name, pojo field expression and datatype) of fields that
> > > need
> > > > > to be
> > > > >    inserted. Using this, the operator formulates the insert query
> in
> > > the
> > > > > setup
> > > > >    method and identifies the sql datatypes of these columns from
> the
> > > > > database
> > > > >    using the table name.
> > > > >
> > > > >
> > > > >    - Now, coming to the update / merge feature, it is difficult to
> > > > >    formulate the update / merge query in the operator logic due to
> > the
> > > > > complex
> > > > >    structure of these statements. For this reason, we plan to take
> a
> > > > >    parametrized SQL query from the user. This may look like:
> *"update
> > > > table
> > > > >    set x = ?, y = ? where z + w > ? and a == 1;"*. Such statements
> > can
> > > be
> > > > >    accepted from the user in addition to a json string which
> > indicates
> > > > the
> > > > >    details for the parameters: *column name, the pojo expression
> and
> > > the
> > > > >    sql data type* of the expression. Note that this information is
> > > > similar
> > > > >    to the FieldInfo object, but is a string which can be configured
> > > > easily
> > > > > by
> > > > >    the user. Also note that the data type which is accepted is the
> > SQL
> > > > data
> > > > >    type. Using this info we can populate the parametrized query and
> > run
> > > > it
> > > > >    based on the incoming POJO.
> > > > >
> > > > > The second approach is able to handle all kinds of queries (insert
> /
> > > > update
> > > > > / merge / delete). However, since we already have the
> > > > > JdbcPOJOOutputOperator, we would like to merge the new
> functionality
> > > into
> > > > > the same class.
> > > > >
> > > > > Here we have the following options:
> > > > >
> > > > >    1. Change the existing class (JdbcPOJOOutputOperator) to the
> > second
> > > > >    approach which is more generic and also handles inserts.
> > > > >    2. Add the update/ merge functionality to the existing class
> > without
> > > > >    changing the existing functionality. This will have two
> different
> > > ways
> > > > > that
> > > > >    insert queries may be handled in the operator.
> > > > >    3. Add another class which extends from JdbcPOJOOutputOperator
> and
> > > > have
> > > > >    the update/merge functionality there. (This is not recommended.)
> > > > >    4. Any other approach.
> > > > >
> > > > > Please suggest.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > -Bhupesh.
> > > > >
> > > > >
> > > > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
> > > chandni@datatorrent.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > No I don't think we are restricting Malhar to just abstract
> > classes.
> > > > > > Whenever they are couple of use cases that we see quite often, we
> > add
> > > > > > concrete implementations.
> > > > > >
> > > > > > For eg. FileLineInputOperator which is a concrete implementation
> of
> > > > > > AbstractFileInputOperator. FSSliceReader is an example as well.
> > > > > >
> > > > > > In AbstractJdbcOutputOperator case there hasn't been such common
> > > > > > insert/update query.
> > > > > >
> > > > > > Also if you look at the example I provided, it is very simple to
> > > > provide
> > > > > a
> > > > > > concrete implementation.
> > > > > >
> > > > > > If you would like to change JdbcPOJOOutputOperator to work for
> > > > > > "UPDATE/MERGE" then please go ahead.
> > > > > >
> > > > > > Chandni
> > > > > >
> > > > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
> > > > bhupesh@datatorrent.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I see. So, just to understand more, do we plan to keep Malhar
> > > > > restricted
> > > > > > to
> > > > > > > the base functionality (as in abstract classes)? And put the
> > > > > > configuration
> > > > > > > aspect / concrete implementations in apps that use these
> > operators?
> > > > > > >
> > > > > > > Thanks.
> > > > > > > Bhupesh
> > > > > > >
> > > > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
> > > > > chandni@datatorrent.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Here is an example of doing Upsert with JDBC:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Chandni
> > > > > > > >
> > > > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
> > > > > > chandni@datatorrent.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > The operators are under Malhar/lib/db/jdbc.
> > > > > > > > >
> > > > > > > > > Here is one of them:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> > > > > > > > >
> > > > > > > > > They work with any kind PreparedStatement - insert or
> update
> > > > > > > > >
> > > > > > > > > Chandni
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> > > > > > > > bhupesh@datatorrent.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hi Chandni,
> > > > > > > > >>
> > > > > > > > >> I don't see an update query being handled in the operator.
> > > Could
> > > > > you
> > > > > > > > >> please
> > > > > > > > >> point me to the appropriate class?
> > > > > > > > >> Or did you mean that handling a update query is just a
> > matter
> > > of
> > > > > > > > extending
> > > > > > > > >> the class and providing a concrete implementation?
> > > > > > > > >>
> > > > > > > > >> Thanks.
> > > > > > > > >> -Bhupesh
> > > > > > > > >>
> > > > > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
> > > > > > > > chandni@datatorrent.com>
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >> > Hi Bhupesh,
> > > > > > > > >> >
> > > > > > > > >> > The current abstract JDBC Output Operators  in library
> are
> > > > > generic
> > > > > > > and
> > > > > > > > >> have
> > > > > > > > >> > already been used in multiple POCs and applications. In
> > fact
> > > > > this
> > > > > > > > >> operator
> > > > > > > > >> > has matured through customer use cases. It is not just
> an
> > > > insert
> > > > > > > > >> operator.
> > > > > > > > >> > We have used it to perform update and inserts.
> > > > > > > > >> >
> > > > > > > > >> > That said, I don't think it is a good idea to introduce
> > > input
> > > > > > format
> > > > > > > > in
> > > > > > > > >> >  these abstract implementations. It is written to handle
> > any
> > > > > type
> > > > > > of
> > > > > > > > >> query,
> > > > > > > > >> > be it a procedure call (that was an actual customer use
> > > case).
> > > > > > > > >> >
> > > > > > > > >> > Chandni
> > > > > > > > >> >
> > > > > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
> > > > > > > > >> bhupesh@datatorrent.com>
> > > > > > > > >> > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Hi All,
> > > > > > > > >> > >
> > > > > > > > >> > > We are planning to proceed with the following approach
> > for
> > > > > JDBC
> > > > > > > > >> *update*
> > > > > > > > >> > > operator:
> > > > > > > > >> > >
> > > > > > > > >> > >    - *Update Query Configuration*
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > >    - Example Update Query: *update tableName set a =
> ?**
> > > > > where b
> > > > > > > = ?
> > > > > > > > >> and
> > > > > > > > >> > c
> > > > > > > > >> > >    > ?;*
> > > > > > > > >> > >    - Example JSON input array for parameter
> > > instantiations:
> > > > > > *[{a,
> > > > > > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, {c,
> > > > > > expression,
> > > > > > > > >> > DATE}]*
> > > > > > > > >> > >
> > > > > > > > >> > > We are also planning to change the JDBC Output
> Operator
> > in
> > > > > > Malhar
> > > > > > > > >> Library
> > > > > > > > >> > > which currently does just insert. We plan to make the
> > > input
> > > > > > format
> > > > > > > > >> > > consistent for both insert and update and hence the
> > change
> > > > to
> > > > > > the
> > > > > > > > >> current
> > > > > > > > >> > > way of configuration using JSON. Following would be
> the
> > > > config
> > > > > > for
> > > > > > > > >> > inserts:
> > > > > > > > >> > >
> > > > > > > > >> > >    - *Insert Query Configuration*
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > >    - Example Insert Query: *insert into tableName
> values
> > > (?,
> > > > > ?,
> > > > > > > .. ,
> > > > > > > > >> ?);*
> > > > > > > > >> > >    - Example JSON input array for parameter
> > > instantiations:
> > > > > > *[{a,
> > > > > > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, ..
> ,
> > > {c,
> > > > > > > > >> expression,
> > > > > > > > >> > > DATE}]*
> > > > > > > > >> > >
> > > > > > > > >> > > Please let us know your thoughts.
> > > > > > > > >> > >
> > > > > > > > >> > > Thanks.
> > > > > > > > >> > >
> > > > > > > > >> > > -Bhupesh
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
> > > > > > > > >> bhupesh@datatorrent.com>
> > > > > > > > >> > > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > > Hi All,
> > > > > > > > >> > > >
> > > > > > > > >> > > > Would it be a good idea to introduce the update
> > > > > functionality
> > > > > > to
> > > > > > > > the
> > > > > > > > >> > JDBC
> > > > > > > > >> > > > output operator in Apache Apex Malhar library.
> > > > > > > > >> > > >
> > > > > > > > >> > > > The following are possible approaches:
> > > > > > > > >> > > >
> > > > > > > > >> > > >    1. Accept a update query from the user with place
> > > > holders
> > > > > > for
> > > > > > > > >> > values.
> > > > > > > > >> > > >    Example: *update tableName set a = ?, b = ?
> where c
> > > = ?
> > > > > and
> > > > > > > d >
> > > > > > > > >> ?*.
> > > > > > > > >> > > >    Here "?" will be provided by the user as java
> > > > expressions
> > > > > > > which
> > > > > > > > >> will
> > > > > > > > >> > > be
> > > > > > > > >> > > >    evaluated from the incoming tuple.
> > > > > > > > >> > > >    2. Another option is to accept in some
> > configuration
> > > > > format
> > > > > > > > >> (json /
> > > > > > > > >> > > >    xml) the following and formulate the query in the
> > > > > operator.
> > > > > > > > This
> > > > > > > > >> can
> > > > > > > > >> > > become
> > > > > > > > >> > > >    arbitrarily complex.
> > > > > > > > >> > > >    1. update clause columns
> > > > > > > > >> > > >       2. update clause expressions
> > > > > > > > >> > > >       3. where clause columns
> > > > > > > > >> > > >       4. where clause expressions
> > > > > > > > >> > > >
> > > > > > > > >> > > > I am thinking about going ahead with 1. Please let
> me
> > > know
> > > > > if
> > > > > > > any
> > > > > > > > >> other
> > > > > > > > >> > > > option is possible and whether such a functionality
> > > > already
> > > > > > > exists
> > > > > > > > >> in
> > > > > > > > >> > > some
> > > > > > > > >> > > > other class.
> > > > > > > > >> > > >
> > > > > > > > >> > > > Thanks.
> > > > > > > > >> > > >
> > > > > > > > >> > > > -Bhupesh
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Changes to JdbcOutputOperator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Thanks Chandni for the comments.

Additionally I think, we should do away with constructing the SQL query in
the operator. This is because it is only possible in case of simple insert
statements. In case a where clause is needed in the insert statement, we
cannot construct the SQL easily. If we are accepting a parametrized query
from the user for update/merge, why not do the same for insert statements
as well? Then the hierarchy would look like:
- AbstractJdbcPOJOOutputOperator (As suggested)
-- JdbcPOJOOutputOperator (Takes care of all statements insert, update,
merge, delete)

We don't need a separate operator for inserts and update/merge.

Thanks.
-Bhupesh


On Mon, Dec 28, 2015 at 3:32 PM, Chandni Singh <ch...@datatorrent.com>
wrote:

>   1. FieldInfo, is ultimately a custom object and any user who uses this
>    operator has to construct an object, populate it and then use it. We are
>    trying to avoid using any custom object and allow any user to use the
>    operator without writing any extra code; just configuration.
> FieldInfo is a way to provide configuration in a UI friendly way. Providing
> configuration as JSON is not UI friendly.
>
>    2. In case of update / merge, we need what SQL data types that the
>    expression provided by the user would evaluate to. In case of insert we
> can
>    go and fetch the data types of the columns directly from DB. However,
> the
>    same is not possible for custom expressions; "avg(salary)" for instance.
> Ok so here is where you can make a change.
> - JDBCPojoOutput can be renamed to AbstractJDBCPojoOutpuOperator.
> - Abstraction is to fetch the type of column.
> - Add a concrete JDBCPojoInsertOutput  that derives the types of columns
> directly from DB. Please note that FieldInfo can also provide type of
> derived column.
>
>
> You mentioned "We are trying to avoid using any custom object and allow any
> user to use the
>    operator without writing any extra code".
> This I think is specific to your use case. You can create an extension of
> the above which takes JSON blob and creates FieldInfos from it.
>
> Chandni
>
>
>
>
>
> On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Hi Chandni,
> >
> > Following are the issues:
> >
> >    1. FieldInfo, is ultimately a custom object and any user who uses this
> >    operator has to construct an object, populate it and then use it. We
> are
> >    trying to avoid using any custom object and allow any user to use the
> >    operator without writing any extra code; just configuration.
> >    2. In case of update / merge, we need what SQL data types that the
> >    expression provided by the user would evaluate to. In case of insert
> we
> > can
> >    go and fetch the data types of the columns directly from DB. However,
> > the
> >    same is not possible for custom expressions; "avg(salary)" for
> instance.
> >
> > Thanks.
> >
> > -Bhupesh
> >
> >
> > On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <ch...@datatorrent.com>
> > wrote:
> >
> > > Hi Bhupesh,
> > >
> > > JDBCPojoOutputOperator was written for a demo and therefore it was
> marked
> > > Evolving which is why I had mentioned that you should feel free to
> modify
> > > it.
> > >
> > > I think an insert query can be as complex as any other query. It uses
> > > FieldInfo because in the app builder it is easy for the user to provide
> > > that instead of JSON String.
> > >
> > > Can you please provide specifics about what it is that you find
> difficult
> > > to change/implement for providing update/merge/delete support in
> > > JDBCPojoOutputOperator?
> > >
> > > Chandni
> > >
> > > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <
> bhupesh@datatorrent.com
> > >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > It has been pointed out that adding another class for handling
> update /
> > > > merge queries would not be a good option.
> > > >
> > > > Here are the current implementation details:
> > > >
> > > >    - We have an existing class: JdbcPOJOOutputOperator which accepts
> a
> > > list
> > > >    of FieldInfo objects. Each element of this list indicates the
> > details
> > > >    (column name, pojo field expression and datatype) of fields that
> > need
> > > > to be
> > > >    inserted. Using this, the operator formulates the insert query in
> > the
> > > > setup
> > > >    method and identifies the sql datatypes of these columns from the
> > > > database
> > > >    using the table name.
> > > >
> > > >
> > > >    - Now, coming to the update / merge feature, it is difficult to
> > > >    formulate the update / merge query in the operator logic due to
> the
> > > > complex
> > > >    structure of these statements. For this reason, we plan to take a
> > > >    parametrized SQL query from the user. This may look like: *"update
> > > table
> > > >    set x = ?, y = ? where z + w > ? and a == 1;"*. Such statements
> can
> > be
> > > >    accepted from the user in addition to a json string which
> indicates
> > > the
> > > >    details for the parameters: *column name, the pojo expression and
> > the
> > > >    sql data type* of the expression. Note that this information is
> > > similar
> > > >    to the FieldInfo object, but is a string which can be configured
> > > easily
> > > > by
> > > >    the user. Also note that the data type which is accepted is the
> SQL
> > > data
> > > >    type. Using this info we can populate the parametrized query and
> run
> > > it
> > > >    based on the incoming POJO.
> > > >
> > > > The second approach is able to handle all kinds of queries (insert /
> > > update
> > > > / merge / delete). However, since we already have the
> > > > JdbcPOJOOutputOperator, we would like to merge the new functionality
> > into
> > > > the same class.
> > > >
> > > > Here we have the following options:
> > > >
> > > >    1. Change the existing class (JdbcPOJOOutputOperator) to the
> second
> > > >    approach which is more generic and also handles inserts.
> > > >    2. Add the update/ merge functionality to the existing class
> without
> > > >    changing the existing functionality. This will have two different
> > ways
> > > > that
> > > >    insert queries may be handled in the operator.
> > > >    3. Add another class which extends from JdbcPOJOOutputOperator and
> > > have
> > > >    the update/merge functionality there. (This is not recommended.)
> > > >    4. Any other approach.
> > > >
> > > > Please suggest.
> > > >
> > > > Thanks.
> > > >
> > > > -Bhupesh.
> > > >
> > > >
> > > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
> > chandni@datatorrent.com
> > > >
> > > > wrote:
> > > >
> > > > > No I don't think we are restricting Malhar to just abstract
> classes.
> > > > > Whenever they are couple of use cases that we see quite often, we
> add
> > > > > concrete implementations.
> > > > >
> > > > > For eg. FileLineInputOperator which is a concrete implementation of
> > > > > AbstractFileInputOperator. FSSliceReader is an example as well.
> > > > >
> > > > > In AbstractJdbcOutputOperator case there hasn't been such common
> > > > > insert/update query.
> > > > >
> > > > > Also if you look at the example I provided, it is very simple to
> > > provide
> > > > a
> > > > > concrete implementation.
> > > > >
> > > > > If you would like to change JdbcPOJOOutputOperator to work for
> > > > > "UPDATE/MERGE" then please go ahead.
> > > > >
> > > > > Chandni
> > > > >
> > > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
> > > bhupesh@datatorrent.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > I see. So, just to understand more, do we plan to keep Malhar
> > > > restricted
> > > > > to
> > > > > > the base functionality (as in abstract classes)? And put the
> > > > > configuration
> > > > > > aspect / concrete implementations in apps that use these
> operators?
> > > > > >
> > > > > > Thanks.
> > > > > > Bhupesh
> > > > > >
> > > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
> > > > chandni@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Here is an example of doing Upsert with JDBC:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Chandni
> > > > > > >
> > > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
> > > > > chandni@datatorrent.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > The operators are under Malhar/lib/db/jdbc.
> > > > > > > >
> > > > > > > > Here is one of them:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> > > > > > > >
> > > > > > > > They work with any kind PreparedStatement - insert or update
> > > > > > > >
> > > > > > > > Chandni
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> > > > > > > bhupesh@datatorrent.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi Chandni,
> > > > > > > >>
> > > > > > > >> I don't see an update query being handled in the operator.
> > Could
> > > > you
> > > > > > > >> please
> > > > > > > >> point me to the appropriate class?
> > > > > > > >> Or did you mean that handling a update query is just a
> matter
> > of
> > > > > > > extending
> > > > > > > >> the class and providing a concrete implementation?
> > > > > > > >>
> > > > > > > >> Thanks.
> > > > > > > >> -Bhupesh
> > > > > > > >>
> > > > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
> > > > > > > chandni@datatorrent.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Hi Bhupesh,
> > > > > > > >> >
> > > > > > > >> > The current abstract JDBC Output Operators  in library are
> > > > generic
> > > > > > and
> > > > > > > >> have
> > > > > > > >> > already been used in multiple POCs and applications. In
> fact
> > > > this
> > > > > > > >> operator
> > > > > > > >> > has matured through customer use cases. It is not just an
> > > insert
> > > > > > > >> operator.
> > > > > > > >> > We have used it to perform update and inserts.
> > > > > > > >> >
> > > > > > > >> > That said, I don't think it is a good idea to introduce
> > input
> > > > > format
> > > > > > > in
> > > > > > > >> >  these abstract implementations. It is written to handle
> any
> > > > type
> > > > > of
> > > > > > > >> query,
> > > > > > > >> > be it a procedure call (that was an actual customer use
> > case).
> > > > > > > >> >
> > > > > > > >> > Chandni
> > > > > > > >> >
> > > > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
> > > > > > > >> bhupesh@datatorrent.com>
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > Hi All,
> > > > > > > >> > >
> > > > > > > >> > > We are planning to proceed with the following approach
> for
> > > > JDBC
> > > > > > > >> *update*
> > > > > > > >> > > operator:
> > > > > > > >> > >
> > > > > > > >> > >    - *Update Query Configuration*
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > >    - Example Update Query: *update tableName set a = ?**
> > > > where b
> > > > > > = ?
> > > > > > > >> and
> > > > > > > >> > c
> > > > > > > >> > >    > ?;*
> > > > > > > >> > >    - Example JSON input array for parameter
> > instantiations:
> > > > > *[{a,
> > > > > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, {c,
> > > > > expression,
> > > > > > > >> > DATE}]*
> > > > > > > >> > >
> > > > > > > >> > > We are also planning to change the JDBC Output Operator
> in
> > > > > Malhar
> > > > > > > >> Library
> > > > > > > >> > > which currently does just insert. We plan to make the
> > input
> > > > > format
> > > > > > > >> > > consistent for both insert and update and hence the
> change
> > > to
> > > > > the
> > > > > > > >> current
> > > > > > > >> > > way of configuration using JSON. Following would be the
> > > config
> > > > > for
> > > > > > > >> > inserts:
> > > > > > > >> > >
> > > > > > > >> > >    - *Insert Query Configuration*
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > >    - Example Insert Query: *insert into tableName values
> > (?,
> > > > ?,
> > > > > > .. ,
> > > > > > > >> ?);*
> > > > > > > >> > >    - Example JSON input array for parameter
> > instantiations:
> > > > > *[{a,
> > > > > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, .. ,
> > {c,
> > > > > > > >> expression,
> > > > > > > >> > > DATE}]*
> > > > > > > >> > >
> > > > > > > >> > > Please let us know your thoughts.
> > > > > > > >> > >
> > > > > > > >> > > Thanks.
> > > > > > > >> > >
> > > > > > > >> > > -Bhupesh
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
> > > > > > > >> bhupesh@datatorrent.com>
> > > > > > > >> > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Hi All,
> > > > > > > >> > > >
> > > > > > > >> > > > Would it be a good idea to introduce the update
> > > > functionality
> > > > > to
> > > > > > > the
> > > > > > > >> > JDBC
> > > > > > > >> > > > output operator in Apache Apex Malhar library.
> > > > > > > >> > > >
> > > > > > > >> > > > The following are possible approaches:
> > > > > > > >> > > >
> > > > > > > >> > > >    1. Accept a update query from the user with place
> > > holders
> > > > > for
> > > > > > > >> > values.
> > > > > > > >> > > >    Example: *update tableName set a = ?, b = ? where c
> > = ?
> > > > and
> > > > > > d >
> > > > > > > >> ?*.
> > > > > > > >> > > >    Here "?" will be provided by the user as java
> > > expressions
> > > > > > which
> > > > > > > >> will
> > > > > > > >> > > be
> > > > > > > >> > > >    evaluated from the incoming tuple.
> > > > > > > >> > > >    2. Another option is to accept in some
> configuration
> > > > format
> > > > > > > >> (json /
> > > > > > > >> > > >    xml) the following and formulate the query in the
> > > > operator.
> > > > > > > This
> > > > > > > >> can
> > > > > > > >> > > become
> > > > > > > >> > > >    arbitrarily complex.
> > > > > > > >> > > >    1. update clause columns
> > > > > > > >> > > >       2. update clause expressions
> > > > > > > >> > > >       3. where clause columns
> > > > > > > >> > > >       4. where clause expressions
> > > > > > > >> > > >
> > > > > > > >> > > > I am thinking about going ahead with 1. Please let me
> > know
> > > > if
> > > > > > any
> > > > > > > >> other
> > > > > > > >> > > > option is possible and whether such a functionality
> > > already
> > > > > > exists
> > > > > > > >> in
> > > > > > > >> > > some
> > > > > > > >> > > > other class.
> > > > > > > >> > > >
> > > > > > > >> > > > Thanks.
> > > > > > > >> > > >
> > > > > > > >> > > > -Bhupesh
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Changes to JdbcOutputOperator

Posted by Chandni Singh <ch...@datatorrent.com>.
  1. FieldInfo, is ultimately a custom object and any user who uses this
   operator has to construct an object, populate it and then use it. We are
   trying to avoid using any custom object and allow any user to use the
   operator without writing any extra code; just configuration.
FieldInfo is a way to provide configuration in a UI friendly way. Providing
configuration as JSON is not UI friendly.

   2. In case of update / merge, we need what SQL data types that the
   expression provided by the user would evaluate to. In case of insert we
can
   go and fetch the data types of the columns directly from DB. However, the
   same is not possible for custom expressions; "avg(salary)" for instance.
Ok so here is where you can make a change.
- JDBCPojoOutput can be renamed to AbstractJDBCPojoOutpuOperator.
- Abstraction is to fetch the type of column.
- Add a concrete JDBCPojoInsertOutput  that derives the types of columns
directly from DB. Please note that FieldInfo can also provide type of
derived column.


You mentioned "We are trying to avoid using any custom object and allow any
user to use the
   operator without writing any extra code".
This I think is specific to your use case. You can create an extension of
the above which takes JSON blob and creates FieldInfos from it.

Chandni





On Mon, Dec 28, 2015 at 1:48 AM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Hi Chandni,
>
> Following are the issues:
>
>    1. FieldInfo, is ultimately a custom object and any user who uses this
>    operator has to construct an object, populate it and then use it. We are
>    trying to avoid using any custom object and allow any user to use the
>    operator without writing any extra code; just configuration.
>    2. In case of update / merge, we need what SQL data types that the
>    expression provided by the user would evaluate to. In case of insert we
> can
>    go and fetch the data types of the columns directly from DB. However,
> the
>    same is not possible for custom expressions; "avg(salary)" for instance.
>
> Thanks.
>
> -Bhupesh
>
>
> On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <ch...@datatorrent.com>
> wrote:
>
> > Hi Bhupesh,
> >
> > JDBCPojoOutputOperator was written for a demo and therefore it was marked
> > Evolving which is why I had mentioned that you should feel free to modify
> > it.
> >
> > I think an insert query can be as complex as any other query. It uses
> > FieldInfo because in the app builder it is easy for the user to provide
> > that instead of JSON String.
> >
> > Can you please provide specifics about what it is that you find difficult
> > to change/implement for providing update/merge/delete support in
> > JDBCPojoOutputOperator?
> >
> > Chandni
> >
> > On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <bhupesh@datatorrent.com
> >
> > wrote:
> >
> > > Hi All,
> > >
> > > It has been pointed out that adding another class for handling update /
> > > merge queries would not be a good option.
> > >
> > > Here are the current implementation details:
> > >
> > >    - We have an existing class: JdbcPOJOOutputOperator which accepts a
> > list
> > >    of FieldInfo objects. Each element of this list indicates the
> details
> > >    (column name, pojo field expression and datatype) of fields that
> need
> > > to be
> > >    inserted. Using this, the operator formulates the insert query in
> the
> > > setup
> > >    method and identifies the sql datatypes of these columns from the
> > > database
> > >    using the table name.
> > >
> > >
> > >    - Now, coming to the update / merge feature, it is difficult to
> > >    formulate the update / merge query in the operator logic due to the
> > > complex
> > >    structure of these statements. For this reason, we plan to take a
> > >    parametrized SQL query from the user. This may look like: *"update
> > table
> > >    set x = ?, y = ? where z + w > ? and a == 1;"*. Such statements can
> be
> > >    accepted from the user in addition to a json string which indicates
> > the
> > >    details for the parameters: *column name, the pojo expression and
> the
> > >    sql data type* of the expression. Note that this information is
> > similar
> > >    to the FieldInfo object, but is a string which can be configured
> > easily
> > > by
> > >    the user. Also note that the data type which is accepted is the SQL
> > data
> > >    type. Using this info we can populate the parametrized query and run
> > it
> > >    based on the incoming POJO.
> > >
> > > The second approach is able to handle all kinds of queries (insert /
> > update
> > > / merge / delete). However, since we already have the
> > > JdbcPOJOOutputOperator, we would like to merge the new functionality
> into
> > > the same class.
> > >
> > > Here we have the following options:
> > >
> > >    1. Change the existing class (JdbcPOJOOutputOperator) to the second
> > >    approach which is more generic and also handles inserts.
> > >    2. Add the update/ merge functionality to the existing class without
> > >    changing the existing functionality. This will have two different
> ways
> > > that
> > >    insert queries may be handled in the operator.
> > >    3. Add another class which extends from JdbcPOJOOutputOperator and
> > have
> > >    the update/merge functionality there. (This is not recommended.)
> > >    4. Any other approach.
> > >
> > > Please suggest.
> > >
> > > Thanks.
> > >
> > > -Bhupesh.
> > >
> > >
> > > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <
> chandni@datatorrent.com
> > >
> > > wrote:
> > >
> > > > No I don't think we are restricting Malhar to just abstract classes.
> > > > Whenever they are couple of use cases that we see quite often, we add
> > > > concrete implementations.
> > > >
> > > > For eg. FileLineInputOperator which is a concrete implementation of
> > > > AbstractFileInputOperator. FSSliceReader is an example as well.
> > > >
> > > > In AbstractJdbcOutputOperator case there hasn't been such common
> > > > insert/update query.
> > > >
> > > > Also if you look at the example I provided, it is very simple to
> > provide
> > > a
> > > > concrete implementation.
> > > >
> > > > If you would like to change JdbcPOJOOutputOperator to work for
> > > > "UPDATE/MERGE" then please go ahead.
> > > >
> > > > Chandni
> > > >
> > > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
> > bhupesh@datatorrent.com
> > > >
> > > > wrote:
> > > >
> > > > > I see. So, just to understand more, do we plan to keep Malhar
> > > restricted
> > > > to
> > > > > the base functionality (as in abstract classes)? And put the
> > > > configuration
> > > > > aspect / concrete implementations in apps that use these operators?
> > > > >
> > > > > Thanks.
> > > > > Bhupesh
> > > > >
> > > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
> > > chandni@datatorrent.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Here is an example of doing Upsert with JDBC:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
> > > > > >
> > > > > > Thanks,
> > > > > > Chandni
> > > > > >
> > > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
> > > > chandni@datatorrent.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > The operators are under Malhar/lib/db/jdbc.
> > > > > > >
> > > > > > > Here is one of them:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> > > > > > >
> > > > > > > They work with any kind PreparedStatement - insert or update
> > > > > > >
> > > > > > > Chandni
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> > > > > > bhupesh@datatorrent.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Chandni,
> > > > > > >>
> > > > > > >> I don't see an update query being handled in the operator.
> Could
> > > you
> > > > > > >> please
> > > > > > >> point me to the appropriate class?
> > > > > > >> Or did you mean that handling a update query is just a matter
> of
> > > > > > extending
> > > > > > >> the class and providing a concrete implementation?
> > > > > > >>
> > > > > > >> Thanks.
> > > > > > >> -Bhupesh
> > > > > > >>
> > > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
> > > > > > chandni@datatorrent.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hi Bhupesh,
> > > > > > >> >
> > > > > > >> > The current abstract JDBC Output Operators  in library are
> > > generic
> > > > > and
> > > > > > >> have
> > > > > > >> > already been used in multiple POCs and applications. In fact
> > > this
> > > > > > >> operator
> > > > > > >> > has matured through customer use cases. It is not just an
> > insert
> > > > > > >> operator.
> > > > > > >> > We have used it to perform update and inserts.
> > > > > > >> >
> > > > > > >> > That said, I don't think it is a good idea to introduce
> input
> > > > format
> > > > > > in
> > > > > > >> >  these abstract implementations. It is written to handle any
> > > type
> > > > of
> > > > > > >> query,
> > > > > > >> > be it a procedure call (that was an actual customer use
> case).
> > > > > > >> >
> > > > > > >> > Chandni
> > > > > > >> >
> > > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
> > > > > > >> bhupesh@datatorrent.com>
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> > > Hi All,
> > > > > > >> > >
> > > > > > >> > > We are planning to proceed with the following approach for
> > > JDBC
> > > > > > >> *update*
> > > > > > >> > > operator:
> > > > > > >> > >
> > > > > > >> > >    - *Update Query Configuration*
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > >    - Example Update Query: *update tableName set a = ?**
> > > where b
> > > > > = ?
> > > > > > >> and
> > > > > > >> > c
> > > > > > >> > >    > ?;*
> > > > > > >> > >    - Example JSON input array for parameter
> instantiations:
> > > > *[{a,
> > > > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, {c,
> > > > expression,
> > > > > > >> > DATE}]*
> > > > > > >> > >
> > > > > > >> > > We are also planning to change the JDBC Output Operator in
> > > > Malhar
> > > > > > >> Library
> > > > > > >> > > which currently does just insert. We plan to make the
> input
> > > > format
> > > > > > >> > > consistent for both insert and update and hence the change
> > to
> > > > the
> > > > > > >> current
> > > > > > >> > > way of configuration using JSON. Following would be the
> > config
> > > > for
> > > > > > >> > inserts:
> > > > > > >> > >
> > > > > > >> > >    - *Insert Query Configuration*
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > >    - Example Insert Query: *insert into tableName values
> (?,
> > > ?,
> > > > > .. ,
> > > > > > >> ?);*
> > > > > > >> > >    - Example JSON input array for parameter
> instantiations:
> > > > *[{a,
> > > > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, .. ,
> {c,
> > > > > > >> expression,
> > > > > > >> > > DATE}]*
> > > > > > >> > >
> > > > > > >> > > Please let us know your thoughts.
> > > > > > >> > >
> > > > > > >> > > Thanks.
> > > > > > >> > >
> > > > > > >> > > -Bhupesh
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
> > > > > > >> bhupesh@datatorrent.com>
> > > > > > >> > > wrote:
> > > > > > >> > >
> > > > > > >> > > > Hi All,
> > > > > > >> > > >
> > > > > > >> > > > Would it be a good idea to introduce the update
> > > functionality
> > > > to
> > > > > > the
> > > > > > >> > JDBC
> > > > > > >> > > > output operator in Apache Apex Malhar library.
> > > > > > >> > > >
> > > > > > >> > > > The following are possible approaches:
> > > > > > >> > > >
> > > > > > >> > > >    1. Accept a update query from the user with place
> > holders
> > > > for
> > > > > > >> > values.
> > > > > > >> > > >    Example: *update tableName set a = ?, b = ? where c
> = ?
> > > and
> > > > > d >
> > > > > > >> ?*.
> > > > > > >> > > >    Here "?" will be provided by the user as java
> > expressions
> > > > > which
> > > > > > >> will
> > > > > > >> > > be
> > > > > > >> > > >    evaluated from the incoming tuple.
> > > > > > >> > > >    2. Another option is to accept in some configuration
> > > format
> > > > > > >> (json /
> > > > > > >> > > >    xml) the following and formulate the query in the
> > > operator.
> > > > > > This
> > > > > > >> can
> > > > > > >> > > become
> > > > > > >> > > >    arbitrarily complex.
> > > > > > >> > > >    1. update clause columns
> > > > > > >> > > >       2. update clause expressions
> > > > > > >> > > >       3. where clause columns
> > > > > > >> > > >       4. where clause expressions
> > > > > > >> > > >
> > > > > > >> > > > I am thinking about going ahead with 1. Please let me
> know
> > > if
> > > > > any
> > > > > > >> other
> > > > > > >> > > > option is possible and whether such a functionality
> > already
> > > > > exists
> > > > > > >> in
> > > > > > >> > > some
> > > > > > >> > > > other class.
> > > > > > >> > > >
> > > > > > >> > > > Thanks.
> > > > > > >> > > >
> > > > > > >> > > > -Bhupesh
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Changes to JdbcOutputOperator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi Chandni,

Following are the issues:

   1. FieldInfo, is ultimately a custom object and any user who uses this
   operator has to construct an object, populate it and then use it. We are
   trying to avoid using any custom object and allow any user to use the
   operator without writing any extra code; just configuration.
   2. In case of update / merge, we need what SQL data types that the
   expression provided by the user would evaluate to. In case of insert we can
   go and fetch the data types of the columns directly from DB. However, the
   same is not possible for custom expressions; "avg(salary)" for instance.

Thanks.

-Bhupesh


On Mon, Dec 28, 2015 at 2:57 PM, Chandni Singh <ch...@datatorrent.com>
wrote:

> Hi Bhupesh,
>
> JDBCPojoOutputOperator was written for a demo and therefore it was marked
> Evolving which is why I had mentioned that you should feel free to modify
> it.
>
> I think an insert query can be as complex as any other query. It uses
> FieldInfo because in the app builder it is easy for the user to provide
> that instead of JSON String.
>
> Can you please provide specifics about what it is that you find difficult
> to change/implement for providing update/merge/delete support in
> JDBCPojoOutputOperator?
>
> Chandni
>
> On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Hi All,
> >
> > It has been pointed out that adding another class for handling update /
> > merge queries would not be a good option.
> >
> > Here are the current implementation details:
> >
> >    - We have an existing class: JdbcPOJOOutputOperator which accepts a
> list
> >    of FieldInfo objects. Each element of this list indicates the details
> >    (column name, pojo field expression and datatype) of fields that need
> > to be
> >    inserted. Using this, the operator formulates the insert query in the
> > setup
> >    method and identifies the sql datatypes of these columns from the
> > database
> >    using the table name.
> >
> >
> >    - Now, coming to the update / merge feature, it is difficult to
> >    formulate the update / merge query in the operator logic due to the
> > complex
> >    structure of these statements. For this reason, we plan to take a
> >    parametrized SQL query from the user. This may look like: *"update
> table
> >    set x = ?, y = ? where z + w > ? and a == 1;"*. Such statements can be
> >    accepted from the user in addition to a json string which indicates
> the
> >    details for the parameters: *column name, the pojo expression and the
> >    sql data type* of the expression. Note that this information is
> similar
> >    to the FieldInfo object, but is a string which can be configured
> easily
> > by
> >    the user. Also note that the data type which is accepted is the SQL
> data
> >    type. Using this info we can populate the parametrized query and run
> it
> >    based on the incoming POJO.
> >
> > The second approach is able to handle all kinds of queries (insert /
> update
> > / merge / delete). However, since we already have the
> > JdbcPOJOOutputOperator, we would like to merge the new functionality into
> > the same class.
> >
> > Here we have the following options:
> >
> >    1. Change the existing class (JdbcPOJOOutputOperator) to the second
> >    approach which is more generic and also handles inserts.
> >    2. Add the update/ merge functionality to the existing class without
> >    changing the existing functionality. This will have two different ways
> > that
> >    insert queries may be handled in the operator.
> >    3. Add another class which extends from JdbcPOJOOutputOperator and
> have
> >    the update/merge functionality there. (This is not recommended.)
> >    4. Any other approach.
> >
> > Please suggest.
> >
> > Thanks.
> >
> > -Bhupesh.
> >
> >
> > On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <chandni@datatorrent.com
> >
> > wrote:
> >
> > > No I don't think we are restricting Malhar to just abstract classes.
> > > Whenever they are couple of use cases that we see quite often, we add
> > > concrete implementations.
> > >
> > > For eg. FileLineInputOperator which is a concrete implementation of
> > > AbstractFileInputOperator. FSSliceReader is an example as well.
> > >
> > > In AbstractJdbcOutputOperator case there hasn't been such common
> > > insert/update query.
> > >
> > > Also if you look at the example I provided, it is very simple to
> provide
> > a
> > > concrete implementation.
> > >
> > > If you would like to change JdbcPOJOOutputOperator to work for
> > > "UPDATE/MERGE" then please go ahead.
> > >
> > > Chandni
> > >
> > > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <
> bhupesh@datatorrent.com
> > >
> > > wrote:
> > >
> > > > I see. So, just to understand more, do we plan to keep Malhar
> > restricted
> > > to
> > > > the base functionality (as in abstract classes)? And put the
> > > configuration
> > > > aspect / concrete implementations in apps that use these operators?
> > > >
> > > > Thanks.
> > > > Bhupesh
> > > >
> > > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
> > chandni@datatorrent.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Here is an example of doing Upsert with JDBC:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
> > > > >
> > > > > Thanks,
> > > > > Chandni
> > > > >
> > > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
> > > chandni@datatorrent.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > The operators are under Malhar/lib/db/jdbc.
> > > > > >
> > > > > > Here is one of them:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> > > > > >
> > > > > > They work with any kind PreparedStatement - insert or update
> > > > > >
> > > > > > Chandni
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> > > > > bhupesh@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Chandni,
> > > > > >>
> > > > > >> I don't see an update query being handled in the operator. Could
> > you
> > > > > >> please
> > > > > >> point me to the appropriate class?
> > > > > >> Or did you mean that handling a update query is just a matter of
> > > > > extending
> > > > > >> the class and providing a concrete implementation?
> > > > > >>
> > > > > >> Thanks.
> > > > > >> -Bhupesh
> > > > > >>
> > > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
> > > > > chandni@datatorrent.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi Bhupesh,
> > > > > >> >
> > > > > >> > The current abstract JDBC Output Operators  in library are
> > generic
> > > > and
> > > > > >> have
> > > > > >> > already been used in multiple POCs and applications. In fact
> > this
> > > > > >> operator
> > > > > >> > has matured through customer use cases. It is not just an
> insert
> > > > > >> operator.
> > > > > >> > We have used it to perform update and inserts.
> > > > > >> >
> > > > > >> > That said, I don't think it is a good idea to introduce input
> > > format
> > > > > in
> > > > > >> >  these abstract implementations. It is written to handle any
> > type
> > > of
> > > > > >> query,
> > > > > >> > be it a procedure call (that was an actual customer use case).
> > > > > >> >
> > > > > >> > Chandni
> > > > > >> >
> > > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
> > > > > >> bhupesh@datatorrent.com>
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > Hi All,
> > > > > >> > >
> > > > > >> > > We are planning to proceed with the following approach for
> > JDBC
> > > > > >> *update*
> > > > > >> > > operator:
> > > > > >> > >
> > > > > >> > >    - *Update Query Configuration*
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >    - Example Update Query: *update tableName set a = ?**
> > where b
> > > > = ?
> > > > > >> and
> > > > > >> > c
> > > > > >> > >    > ?;*
> > > > > >> > >    - Example JSON input array for parameter instantiations:
> > > *[{a,
> > > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, {c,
> > > expression,
> > > > > >> > DATE}]*
> > > > > >> > >
> > > > > >> > > We are also planning to change the JDBC Output Operator in
> > > Malhar
> > > > > >> Library
> > > > > >> > > which currently does just insert. We plan to make the input
> > > format
> > > > > >> > > consistent for both insert and update and hence the change
> to
> > > the
> > > > > >> current
> > > > > >> > > way of configuration using JSON. Following would be the
> config
> > > for
> > > > > >> > inserts:
> > > > > >> > >
> > > > > >> > >    - *Insert Query Configuration*
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >    - Example Insert Query: *insert into tableName values (?,
> > ?,
> > > > .. ,
> > > > > >> ?);*
> > > > > >> > >    - Example JSON input array for parameter instantiations:
> > > *[{a,
> > > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, .. , {c,
> > > > > >> expression,
> > > > > >> > > DATE}]*
> > > > > >> > >
> > > > > >> > > Please let us know your thoughts.
> > > > > >> > >
> > > > > >> > > Thanks.
> > > > > >> > >
> > > > > >> > > -Bhupesh
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
> > > > > >> bhupesh@datatorrent.com>
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > Hi All,
> > > > > >> > > >
> > > > > >> > > > Would it be a good idea to introduce the update
> > functionality
> > > to
> > > > > the
> > > > > >> > JDBC
> > > > > >> > > > output operator in Apache Apex Malhar library.
> > > > > >> > > >
> > > > > >> > > > The following are possible approaches:
> > > > > >> > > >
> > > > > >> > > >    1. Accept a update query from the user with place
> holders
> > > for
> > > > > >> > values.
> > > > > >> > > >    Example: *update tableName set a = ?, b = ? where c = ?
> > and
> > > > d >
> > > > > >> ?*.
> > > > > >> > > >    Here "?" will be provided by the user as java
> expressions
> > > > which
> > > > > >> will
> > > > > >> > > be
> > > > > >> > > >    evaluated from the incoming tuple.
> > > > > >> > > >    2. Another option is to accept in some configuration
> > format
> > > > > >> (json /
> > > > > >> > > >    xml) the following and formulate the query in the
> > operator.
> > > > > This
> > > > > >> can
> > > > > >> > > become
> > > > > >> > > >    arbitrarily complex.
> > > > > >> > > >    1. update clause columns
> > > > > >> > > >       2. update clause expressions
> > > > > >> > > >       3. where clause columns
> > > > > >> > > >       4. where clause expressions
> > > > > >> > > >
> > > > > >> > > > I am thinking about going ahead with 1. Please let me know
> > if
> > > > any
> > > > > >> other
> > > > > >> > > > option is possible and whether such a functionality
> already
> > > > exists
> > > > > >> in
> > > > > >> > > some
> > > > > >> > > > other class.
> > > > > >> > > >
> > > > > >> > > > Thanks.
> > > > > >> > > >
> > > > > >> > > > -Bhupesh
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Changes to JdbcOutputOperator

Posted by Chandni Singh <ch...@datatorrent.com>.
Hi Bhupesh,

JDBCPojoOutputOperator was written for a demo and therefore it was marked
Evolving which is why I had mentioned that you should feel free to modify
it.

I think an insert query can be as complex as any other query. It uses
FieldInfo because in the app builder it is easy for the user to provide
that instead of JSON String.

Can you please provide specifics about what it is that you find difficult
to change/implement for providing update/merge/delete support in
JDBCPojoOutputOperator?

Chandni

On Mon, Dec 28, 2015 at 1:09 AM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Hi All,
>
> It has been pointed out that adding another class for handling update /
> merge queries would not be a good option.
>
> Here are the current implementation details:
>
>    - We have an existing class: JdbcPOJOOutputOperator which accepts a list
>    of FieldInfo objects. Each element of this list indicates the details
>    (column name, pojo field expression and datatype) of fields that need
> to be
>    inserted. Using this, the operator formulates the insert query in the
> setup
>    method and identifies the sql datatypes of these columns from the
> database
>    using the table name.
>
>
>    - Now, coming to the update / merge feature, it is difficult to
>    formulate the update / merge query in the operator logic due to the
> complex
>    structure of these statements. For this reason, we plan to take a
>    parametrized SQL query from the user. This may look like: *"update table
>    set x = ?, y = ? where z + w > ? and a == 1;"*. Such statements can be
>    accepted from the user in addition to a json string which indicates the
>    details for the parameters: *column name, the pojo expression and the
>    sql data type* of the expression. Note that this information is similar
>    to the FieldInfo object, but is a string which can be configured easily
> by
>    the user. Also note that the data type which is accepted is the SQL data
>    type. Using this info we can populate the parametrized query and run it
>    based on the incoming POJO.
>
> The second approach is able to handle all kinds of queries (insert / update
> / merge / delete). However, since we already have the
> JdbcPOJOOutputOperator, we would like to merge the new functionality into
> the same class.
>
> Here we have the following options:
>
>    1. Change the existing class (JdbcPOJOOutputOperator) to the second
>    approach which is more generic and also handles inserts.
>    2. Add the update/ merge functionality to the existing class without
>    changing the existing functionality. This will have two different ways
> that
>    insert queries may be handled in the operator.
>    3. Add another class which extends from JdbcPOJOOutputOperator and have
>    the update/merge functionality there. (This is not recommended.)
>    4. Any other approach.
>
> Please suggest.
>
> Thanks.
>
> -Bhupesh.
>
>
> On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <ch...@datatorrent.com>
> wrote:
>
> > No I don't think we are restricting Malhar to just abstract classes.
> > Whenever they are couple of use cases that we see quite often, we add
> > concrete implementations.
> >
> > For eg. FileLineInputOperator which is a concrete implementation of
> > AbstractFileInputOperator. FSSliceReader is an example as well.
> >
> > In AbstractJdbcOutputOperator case there hasn't been such common
> > insert/update query.
> >
> > Also if you look at the example I provided, it is very simple to provide
> a
> > concrete implementation.
> >
> > If you would like to change JdbcPOJOOutputOperator to work for
> > "UPDATE/MERGE" then please go ahead.
> >
> > Chandni
> >
> > On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <bhupesh@datatorrent.com
> >
> > wrote:
> >
> > > I see. So, just to understand more, do we plan to keep Malhar
> restricted
> > to
> > > the base functionality (as in abstract classes)? And put the
> > configuration
> > > aspect / concrete implementations in apps that use these operators?
> > >
> > > Thanks.
> > > Bhupesh
> > >
> > > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <
> chandni@datatorrent.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Here is an example of doing Upsert with JDBC:
> > > >
> > > >
> > >
> >
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
> > > >
> > > > Thanks,
> > > > Chandni
> > > >
> > > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
> > chandni@datatorrent.com
> > > >
> > > > wrote:
> > > >
> > > > > The operators are under Malhar/lib/db/jdbc.
> > > > >
> > > > > Here is one of them:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> > > > >
> > > > > They work with any kind PreparedStatement - insert or update
> > > > >
> > > > > Chandni
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> > > > bhupesh@datatorrent.com>
> > > > > wrote:
> > > > >
> > > > >> Hi Chandni,
> > > > >>
> > > > >> I don't see an update query being handled in the operator. Could
> you
> > > > >> please
> > > > >> point me to the appropriate class?
> > > > >> Or did you mean that handling a update query is just a matter of
> > > > extending
> > > > >> the class and providing a concrete implementation?
> > > > >>
> > > > >> Thanks.
> > > > >> -Bhupesh
> > > > >>
> > > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
> > > > chandni@datatorrent.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Bhupesh,
> > > > >> >
> > > > >> > The current abstract JDBC Output Operators  in library are
> generic
> > > and
> > > > >> have
> > > > >> > already been used in multiple POCs and applications. In fact
> this
> > > > >> operator
> > > > >> > has matured through customer use cases. It is not just an insert
> > > > >> operator.
> > > > >> > We have used it to perform update and inserts.
> > > > >> >
> > > > >> > That said, I don't think it is a good idea to introduce input
> > format
> > > > in
> > > > >> >  these abstract implementations. It is written to handle any
> type
> > of
> > > > >> query,
> > > > >> > be it a procedure call (that was an actual customer use case).
> > > > >> >
> > > > >> > Chandni
> > > > >> >
> > > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
> > > > >> bhupesh@datatorrent.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hi All,
> > > > >> > >
> > > > >> > > We are planning to proceed with the following approach for
> JDBC
> > > > >> *update*
> > > > >> > > operator:
> > > > >> > >
> > > > >> > >    - *Update Query Configuration*
> > > > >> > >
> > > > >> > >
> > > > >> > >    - Example Update Query: *update tableName set a = ?**
> where b
> > > = ?
> > > > >> and
> > > > >> > c
> > > > >> > >    > ?;*
> > > > >> > >    - Example JSON input array for parameter instantiations:
> > *[{a,
> > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, {c,
> > expression,
> > > > >> > DATE}]*
> > > > >> > >
> > > > >> > > We are also planning to change the JDBC Output Operator in
> > Malhar
> > > > >> Library
> > > > >> > > which currently does just insert. We plan to make the input
> > format
> > > > >> > > consistent for both insert and update and hence the change to
> > the
> > > > >> current
> > > > >> > > way of configuration using JSON. Following would be the config
> > for
> > > > >> > inserts:
> > > > >> > >
> > > > >> > >    - *Insert Query Configuration*
> > > > >> > >
> > > > >> > >
> > > > >> > >    - Example Insert Query: *insert into tableName values (?,
> ?,
> > > .. ,
> > > > >> ?);*
> > > > >> > >    - Example JSON input array for parameter instantiations:
> > *[{a,
> > > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, .. , {c,
> > > > >> expression,
> > > > >> > > DATE}]*
> > > > >> > >
> > > > >> > > Please let us know your thoughts.
> > > > >> > >
> > > > >> > > Thanks.
> > > > >> > >
> > > > >> > > -Bhupesh
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
> > > > >> bhupesh@datatorrent.com>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > Hi All,
> > > > >> > > >
> > > > >> > > > Would it be a good idea to introduce the update
> functionality
> > to
> > > > the
> > > > >> > JDBC
> > > > >> > > > output operator in Apache Apex Malhar library.
> > > > >> > > >
> > > > >> > > > The following are possible approaches:
> > > > >> > > >
> > > > >> > > >    1. Accept a update query from the user with place holders
> > for
> > > > >> > values.
> > > > >> > > >    Example: *update tableName set a = ?, b = ? where c = ?
> and
> > > d >
> > > > >> ?*.
> > > > >> > > >    Here "?" will be provided by the user as java expressions
> > > which
> > > > >> will
> > > > >> > > be
> > > > >> > > >    evaluated from the incoming tuple.
> > > > >> > > >    2. Another option is to accept in some configuration
> format
> > > > >> (json /
> > > > >> > > >    xml) the following and formulate the query in the
> operator.
> > > > This
> > > > >> can
> > > > >> > > become
> > > > >> > > >    arbitrarily complex.
> > > > >> > > >    1. update clause columns
> > > > >> > > >       2. update clause expressions
> > > > >> > > >       3. where clause columns
> > > > >> > > >       4. where clause expressions
> > > > >> > > >
> > > > >> > > > I am thinking about going ahead with 1. Please let me know
> if
> > > any
> > > > >> other
> > > > >> > > > option is possible and whether such a functionality already
> > > exists
> > > > >> in
> > > > >> > > some
> > > > >> > > > other class.
> > > > >> > > >
> > > > >> > > > Thanks.
> > > > >> > > >
> > > > >> > > > -Bhupesh
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Changes to JdbcOutputOperator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi All,

It has been pointed out that adding another class for handling update /
merge queries would not be a good option.

Here are the current implementation details:

   - We have an existing class: JdbcPOJOOutputOperator which accepts a list
   of FieldInfo objects. Each element of this list indicates the details
   (column name, pojo field expression and datatype) of fields that need to be
   inserted. Using this, the operator formulates the insert query in the setup
   method and identifies the sql datatypes of these columns from the database
   using the table name.


   - Now, coming to the update / merge feature, it is difficult to
   formulate the update / merge query in the operator logic due to the complex
   structure of these statements. For this reason, we plan to take a
   parametrized SQL query from the user. This may look like: *"update table
   set x = ?, y = ? where z + w > ? and a == 1;"*. Such statements can be
   accepted from the user in addition to a json string which indicates the
   details for the parameters: *column name, the pojo expression and the
   sql data type* of the expression. Note that this information is similar
   to the FieldInfo object, but is a string which can be configured easily by
   the user. Also note that the data type which is accepted is the SQL data
   type. Using this info we can populate the parametrized query and run it
   based on the incoming POJO.

The second approach is able to handle all kinds of queries (insert / update
/ merge / delete). However, since we already have the
JdbcPOJOOutputOperator, we would like to merge the new functionality into
the same class.

Here we have the following options:

   1. Change the existing class (JdbcPOJOOutputOperator) to the second
   approach which is more generic and also handles inserts.
   2. Add the update/ merge functionality to the existing class without
   changing the existing functionality. This will have two different ways that
   insert queries may be handled in the operator.
   3. Add another class which extends from JdbcPOJOOutputOperator and have
   the update/merge functionality there. (This is not recommended.)
   4. Any other approach.

Please suggest.

Thanks.

-Bhupesh.


On Mon, Dec 14, 2015 at 11:04 AM, Chandni Singh <ch...@datatorrent.com>
wrote:

> No I don't think we are restricting Malhar to just abstract classes.
> Whenever they are couple of use cases that we see quite often, we add
> concrete implementations.
>
> For eg. FileLineInputOperator which is a concrete implementation of
> AbstractFileInputOperator. FSSliceReader is an example as well.
>
> In AbstractJdbcOutputOperator case there hasn't been such common
> insert/update query.
>
> Also if you look at the example I provided, it is very simple to provide a
> concrete implementation.
>
> If you would like to change JdbcPOJOOutputOperator to work for
> "UPDATE/MERGE" then please go ahead.
>
> Chandni
>
> On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > I see. So, just to understand more, do we plan to keep Malhar restricted
> to
> > the base functionality (as in abstract classes)? And put the
> configuration
> > aspect / concrete implementations in apps that use these operators?
> >
> > Thanks.
> > Bhupesh
> >
> > On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <ch...@datatorrent.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Here is an example of doing Upsert with JDBC:
> > >
> > >
> >
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
> > >
> > > Thanks,
> > > Chandni
> > >
> > > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <
> chandni@datatorrent.com
> > >
> > > wrote:
> > >
> > > > The operators are under Malhar/lib/db/jdbc.
> > > >
> > > > Here is one of them:
> > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> > > >
> > > > They work with any kind PreparedStatement - insert or update
> > > >
> > > > Chandni
> > > >
> > > >
> > > >
> > > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> > > bhupesh@datatorrent.com>
> > > > wrote:
> > > >
> > > >> Hi Chandni,
> > > >>
> > > >> I don't see an update query being handled in the operator. Could you
> > > >> please
> > > >> point me to the appropriate class?
> > > >> Or did you mean that handling a update query is just a matter of
> > > extending
> > > >> the class and providing a concrete implementation?
> > > >>
> > > >> Thanks.
> > > >> -Bhupesh
> > > >>
> > > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
> > > chandni@datatorrent.com>
> > > >> wrote:
> > > >>
> > > >> > Hi Bhupesh,
> > > >> >
> > > >> > The current abstract JDBC Output Operators  in library are generic
> > and
> > > >> have
> > > >> > already been used in multiple POCs and applications. In fact this
> > > >> operator
> > > >> > has matured through customer use cases. It is not just an insert
> > > >> operator.
> > > >> > We have used it to perform update and inserts.
> > > >> >
> > > >> > That said, I don't think it is a good idea to introduce input
> format
> > > in
> > > >> >  these abstract implementations. It is written to handle any type
> of
> > > >> query,
> > > >> > be it a procedure call (that was an actual customer use case).
> > > >> >
> > > >> > Chandni
> > > >> >
> > > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
> > > >> bhupesh@datatorrent.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi All,
> > > >> > >
> > > >> > > We are planning to proceed with the following approach for JDBC
> > > >> *update*
> > > >> > > operator:
> > > >> > >
> > > >> > >    - *Update Query Configuration*
> > > >> > >
> > > >> > >
> > > >> > >    - Example Update Query: *update tableName set a = ?** where b
> > = ?
> > > >> and
> > > >> > c
> > > >> > >    > ?;*
> > > >> > >    - Example JSON input array for parameter instantiations:
> *[{a,
> > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, {c,
> expression,
> > > >> > DATE}]*
> > > >> > >
> > > >> > > We are also planning to change the JDBC Output Operator in
> Malhar
> > > >> Library
> > > >> > > which currently does just insert. We plan to make the input
> format
> > > >> > > consistent for both insert and update and hence the change to
> the
> > > >> current
> > > >> > > way of configuration using JSON. Following would be the config
> for
> > > >> > inserts:
> > > >> > >
> > > >> > >    - *Insert Query Configuration*
> > > >> > >
> > > >> > >
> > > >> > >    - Example Insert Query: *insert into tableName values (?, ?,
> > .. ,
> > > >> ?);*
> > > >> > >    - Example JSON input array for parameter instantiations:
> *[{a,
> > > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, .. , {c,
> > > >> expression,
> > > >> > > DATE}]*
> > > >> > >
> > > >> > > Please let us know your thoughts.
> > > >> > >
> > > >> > > Thanks.
> > > >> > >
> > > >> > > -Bhupesh
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
> > > >> bhupesh@datatorrent.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi All,
> > > >> > > >
> > > >> > > > Would it be a good idea to introduce the update functionality
> to
> > > the
> > > >> > JDBC
> > > >> > > > output operator in Apache Apex Malhar library.
> > > >> > > >
> > > >> > > > The following are possible approaches:
> > > >> > > >
> > > >> > > >    1. Accept a update query from the user with place holders
> for
> > > >> > values.
> > > >> > > >    Example: *update tableName set a = ?, b = ? where c = ? and
> > d >
> > > >> ?*.
> > > >> > > >    Here "?" will be provided by the user as java expressions
> > which
> > > >> will
> > > >> > > be
> > > >> > > >    evaluated from the incoming tuple.
> > > >> > > >    2. Another option is to accept in some configuration format
> > > >> (json /
> > > >> > > >    xml) the following and formulate the query in the operator.
> > > This
> > > >> can
> > > >> > > become
> > > >> > > >    arbitrarily complex.
> > > >> > > >    1. update clause columns
> > > >> > > >       2. update clause expressions
> > > >> > > >       3. where clause columns
> > > >> > > >       4. where clause expressions
> > > >> > > >
> > > >> > > > I am thinking about going ahead with 1. Please let me know if
> > any
> > > >> other
> > > >> > > > option is possible and whether such a functionality already
> > exists
> > > >> in
> > > >> > > some
> > > >> > > > other class.
> > > >> > > >
> > > >> > > > Thanks.
> > > >> > > >
> > > >> > > > -Bhupesh
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Changes to JdbcOutputOperator

Posted by Chandni Singh <ch...@datatorrent.com>.
No I don't think we are restricting Malhar to just abstract classes.
Whenever they are couple of use cases that we see quite often, we add
concrete implementations.

For eg. FileLineInputOperator which is a concrete implementation of
AbstractFileInputOperator. FSSliceReader is an example as well.

In AbstractJdbcOutputOperator case there hasn't been such common
insert/update query.

Also if you look at the example I provided, it is very simple to provide a
concrete implementation.

If you would like to change JdbcPOJOOutputOperator to work for
"UPDATE/MERGE" then please go ahead.

Chandni

On Sun, Dec 13, 2015 at 8:47 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> I see. So, just to understand more, do we plan to keep Malhar restricted to
> the base functionality (as in abstract classes)? And put the configuration
> aspect / concrete implementations in apps that use these operators?
>
> Thanks.
> Bhupesh
>
> On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <ch...@datatorrent.com>
> wrote:
>
> > Hi,
> >
> > Here is an example of doing Upsert with JDBC:
> >
> >
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
> >
> > Thanks,
> > Chandni
> >
> > On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <chandni@datatorrent.com
> >
> > wrote:
> >
> > > The operators are under Malhar/lib/db/jdbc.
> > >
> > > Here is one of them:
> > >
> > >
> >
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> > >
> > > They work with any kind PreparedStatement - insert or update
> > >
> > > Chandni
> > >
> > >
> > >
> > > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> > bhupesh@datatorrent.com>
> > > wrote:
> > >
> > >> Hi Chandni,
> > >>
> > >> I don't see an update query being handled in the operator. Could you
> > >> please
> > >> point me to the appropriate class?
> > >> Or did you mean that handling a update query is just a matter of
> > extending
> > >> the class and providing a concrete implementation?
> > >>
> > >> Thanks.
> > >> -Bhupesh
> > >>
> > >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
> > chandni@datatorrent.com>
> > >> wrote:
> > >>
> > >> > Hi Bhupesh,
> > >> >
> > >> > The current abstract JDBC Output Operators  in library are generic
> and
> > >> have
> > >> > already been used in multiple POCs and applications. In fact this
> > >> operator
> > >> > has matured through customer use cases. It is not just an insert
> > >> operator.
> > >> > We have used it to perform update and inserts.
> > >> >
> > >> > That said, I don't think it is a good idea to introduce input format
> > in
> > >> >  these abstract implementations. It is written to handle any type of
> > >> query,
> > >> > be it a procedure call (that was an actual customer use case).
> > >> >
> > >> > Chandni
> > >> >
> > >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
> > >> bhupesh@datatorrent.com>
> > >> > wrote:
> > >> >
> > >> > > Hi All,
> > >> > >
> > >> > > We are planning to proceed with the following approach for JDBC
> > >> *update*
> > >> > > operator:
> > >> > >
> > >> > >    - *Update Query Configuration*
> > >> > >
> > >> > >
> > >> > >    - Example Update Query: *update tableName set a = ?** where b
> = ?
> > >> and
> > >> > c
> > >> > >    > ?;*
> > >> > >    - Example JSON input array for parameter instantiations: *[{a,
> > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, {c, expression,
> > >> > DATE}]*
> > >> > >
> > >> > > We are also planning to change the JDBC Output Operator in Malhar
> > >> Library
> > >> > > which currently does just insert. We plan to make the input format
> > >> > > consistent for both insert and update and hence the change to the
> > >> current
> > >> > > way of configuration using JSON. Following would be the config for
> > >> > inserts:
> > >> > >
> > >> > >    - *Insert Query Configuration*
> > >> > >
> > >> > >
> > >> > >    - Example Insert Query: *insert into tableName values (?, ?,
> .. ,
> > >> ?);*
> > >> > >    - Example JSON input array for parameter instantiations: *[{a,
> > >> > >    expression, INTEGER}, {b, expression, VARCHAR}, .. , {c,
> > >> expression,
> > >> > > DATE}]*
> > >> > >
> > >> > > Please let us know your thoughts.
> > >> > >
> > >> > > Thanks.
> > >> > >
> > >> > > -Bhupesh
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
> > >> bhupesh@datatorrent.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Hi All,
> > >> > > >
> > >> > > > Would it be a good idea to introduce the update functionality to
> > the
> > >> > JDBC
> > >> > > > output operator in Apache Apex Malhar library.
> > >> > > >
> > >> > > > The following are possible approaches:
> > >> > > >
> > >> > > >    1. Accept a update query from the user with place holders for
> > >> > values.
> > >> > > >    Example: *update tableName set a = ?, b = ? where c = ? and
> d >
> > >> ?*.
> > >> > > >    Here "?" will be provided by the user as java expressions
> which
> > >> will
> > >> > > be
> > >> > > >    evaluated from the incoming tuple.
> > >> > > >    2. Another option is to accept in some configuration format
> > >> (json /
> > >> > > >    xml) the following and formulate the query in the operator.
> > This
> > >> can
> > >> > > become
> > >> > > >    arbitrarily complex.
> > >> > > >    1. update clause columns
> > >> > > >       2. update clause expressions
> > >> > > >       3. where clause columns
> > >> > > >       4. where clause expressions
> > >> > > >
> > >> > > > I am thinking about going ahead with 1. Please let me know if
> any
> > >> other
> > >> > > > option is possible and whether such a functionality already
> exists
> > >> in
> > >> > > some
> > >> > > > other class.
> > >> > > >
> > >> > > > Thanks.
> > >> > > >
> > >> > > > -Bhupesh
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Changes to JdbcOutputOperator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
I see. So, just to understand more, do we plan to keep Malhar restricted to
the base functionality (as in abstract classes)? And put the configuration
aspect / concrete implementations in apps that use these operators?

Thanks.
Bhupesh

On Sat, Dec 12, 2015 at 5:43 AM, Chandni Singh <ch...@datatorrent.com>
wrote:

> Hi,
>
> Here is an example of doing Upsert with JDBC:
>
> https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java
>
> Thanks,
> Chandni
>
> On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <ch...@datatorrent.com>
> wrote:
>
> > The operators are under Malhar/lib/db/jdbc.
> >
> > Here is one of them:
> >
> >
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
> >
> > They work with any kind PreparedStatement - insert or update
> >
> > Chandni
> >
> >
> >
> > On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <
> bhupesh@datatorrent.com>
> > wrote:
> >
> >> Hi Chandni,
> >>
> >> I don't see an update query being handled in the operator. Could you
> >> please
> >> point me to the appropriate class?
> >> Or did you mean that handling a update query is just a matter of
> extending
> >> the class and providing a concrete implementation?
> >>
> >> Thanks.
> >> -Bhupesh
> >>
> >> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <
> chandni@datatorrent.com>
> >> wrote:
> >>
> >> > Hi Bhupesh,
> >> >
> >> > The current abstract JDBC Output Operators  in library are generic and
> >> have
> >> > already been used in multiple POCs and applications. In fact this
> >> operator
> >> > has matured through customer use cases. It is not just an insert
> >> operator.
> >> > We have used it to perform update and inserts.
> >> >
> >> > That said, I don't think it is a good idea to introduce input format
> in
> >> >  these abstract implementations. It is written to handle any type of
> >> query,
> >> > be it a procedure call (that was an actual customer use case).
> >> >
> >> > Chandni
> >> >
> >> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
> >> bhupesh@datatorrent.com>
> >> > wrote:
> >> >
> >> > > Hi All,
> >> > >
> >> > > We are planning to proceed with the following approach for JDBC
> >> *update*
> >> > > operator:
> >> > >
> >> > >    - *Update Query Configuration*
> >> > >
> >> > >
> >> > >    - Example Update Query: *update tableName set a = ?** where b = ?
> >> and
> >> > c
> >> > >    > ?;*
> >> > >    - Example JSON input array for parameter instantiations: *[{a,
> >> > >    expression, INTEGER}, {b, expression, VARCHAR}, {c, expression,
> >> > DATE}]*
> >> > >
> >> > > We are also planning to change the JDBC Output Operator in Malhar
> >> Library
> >> > > which currently does just insert. We plan to make the input format
> >> > > consistent for both insert and update and hence the change to the
> >> current
> >> > > way of configuration using JSON. Following would be the config for
> >> > inserts:
> >> > >
> >> > >    - *Insert Query Configuration*
> >> > >
> >> > >
> >> > >    - Example Insert Query: *insert into tableName values (?, ?, .. ,
> >> ?);*
> >> > >    - Example JSON input array for parameter instantiations: *[{a,
> >> > >    expression, INTEGER}, {b, expression, VARCHAR}, .. , {c,
> >> expression,
> >> > > DATE}]*
> >> > >
> >> > > Please let us know your thoughts.
> >> > >
> >> > > Thanks.
> >> > >
> >> > > -Bhupesh
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
> >> bhupesh@datatorrent.com>
> >> > > wrote:
> >> > >
> >> > > > Hi All,
> >> > > >
> >> > > > Would it be a good idea to introduce the update functionality to
> the
> >> > JDBC
> >> > > > output operator in Apache Apex Malhar library.
> >> > > >
> >> > > > The following are possible approaches:
> >> > > >
> >> > > >    1. Accept a update query from the user with place holders for
> >> > values.
> >> > > >    Example: *update tableName set a = ?, b = ? where c = ? and d >
> >> ?*.
> >> > > >    Here "?" will be provided by the user as java expressions which
> >> will
> >> > > be
> >> > > >    evaluated from the incoming tuple.
> >> > > >    2. Another option is to accept in some configuration format
> >> (json /
> >> > > >    xml) the following and formulate the query in the operator.
> This
> >> can
> >> > > become
> >> > > >    arbitrarily complex.
> >> > > >    1. update clause columns
> >> > > >       2. update clause expressions
> >> > > >       3. where clause columns
> >> > > >       4. where clause expressions
> >> > > >
> >> > > > I am thinking about going ahead with 1. Please let me know if any
> >> other
> >> > > > option is possible and whether such a functionality already exists
> >> in
> >> > > some
> >> > > > other class.
> >> > > >
> >> > > > Thanks.
> >> > > >
> >> > > > -Bhupesh
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Changes to JdbcOutputOperator

Posted by Chandni Singh <ch...@datatorrent.com>.
Hi,

Here is an example of doing Upsert with JDBC:
https://github.com/chandnisingh/Malhar/blob/examples/apps/jdbc/src/main/java/com/datatorrent/jdbc/JdbcWriter.java

Thanks,
Chandni

On Fri, Dec 11, 2015 at 11:19 AM, Chandni Singh <ch...@datatorrent.com>
wrote:

> The operators are under Malhar/lib/db/jdbc.
>
> Here is one of them:
>
> https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java
>
> They work with any kind PreparedStatement - insert or update
>
> Chandni
>
>
>
> On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
>> Hi Chandni,
>>
>> I don't see an update query being handled in the operator. Could you
>> please
>> point me to the appropriate class?
>> Or did you mean that handling a update query is just a matter of extending
>> the class and providing a concrete implementation?
>>
>> Thanks.
>> -Bhupesh
>>
>> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <ch...@datatorrent.com>
>> wrote:
>>
>> > Hi Bhupesh,
>> >
>> > The current abstract JDBC Output Operators  in library are generic and
>> have
>> > already been used in multiple POCs and applications. In fact this
>> operator
>> > has matured through customer use cases. It is not just an insert
>> operator.
>> > We have used it to perform update and inserts.
>> >
>> > That said, I don't think it is a good idea to introduce input format in
>> >  these abstract implementations. It is written to handle any type of
>> query,
>> > be it a procedure call (that was an actual customer use case).
>> >
>> > Chandni
>> >
>> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <
>> bhupesh@datatorrent.com>
>> > wrote:
>> >
>> > > Hi All,
>> > >
>> > > We are planning to proceed with the following approach for JDBC
>> *update*
>> > > operator:
>> > >
>> > >    - *Update Query Configuration*
>> > >
>> > >
>> > >    - Example Update Query: *update tableName set a = ?** where b = ?
>> and
>> > c
>> > >    > ?;*
>> > >    - Example JSON input array for parameter instantiations: *[{a,
>> > >    expression, INTEGER}, {b, expression, VARCHAR}, {c, expression,
>> > DATE}]*
>> > >
>> > > We are also planning to change the JDBC Output Operator in Malhar
>> Library
>> > > which currently does just insert. We plan to make the input format
>> > > consistent for both insert and update and hence the change to the
>> current
>> > > way of configuration using JSON. Following would be the config for
>> > inserts:
>> > >
>> > >    - *Insert Query Configuration*
>> > >
>> > >
>> > >    - Example Insert Query: *insert into tableName values (?, ?, .. ,
>> ?);*
>> > >    - Example JSON input array for parameter instantiations: *[{a,
>> > >    expression, INTEGER}, {b, expression, VARCHAR}, .. , {c,
>> expression,
>> > > DATE}]*
>> > >
>> > > Please let us know your thoughts.
>> > >
>> > > Thanks.
>> > >
>> > > -Bhupesh
>> > >
>> > >
>> > >
>> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
>> bhupesh@datatorrent.com>
>> > > wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > Would it be a good idea to introduce the update functionality to the
>> > JDBC
>> > > > output operator in Apache Apex Malhar library.
>> > > >
>> > > > The following are possible approaches:
>> > > >
>> > > >    1. Accept a update query from the user with place holders for
>> > values.
>> > > >    Example: *update tableName set a = ?, b = ? where c = ? and d >
>> ?*.
>> > > >    Here "?" will be provided by the user as java expressions which
>> will
>> > > be
>> > > >    evaluated from the incoming tuple.
>> > > >    2. Another option is to accept in some configuration format
>> (json /
>> > > >    xml) the following and formulate the query in the operator. This
>> can
>> > > become
>> > > >    arbitrarily complex.
>> > > >    1. update clause columns
>> > > >       2. update clause expressions
>> > > >       3. where clause columns
>> > > >       4. where clause expressions
>> > > >
>> > > > I am thinking about going ahead with 1. Please let me know if any
>> other
>> > > > option is possible and whether such a functionality already exists
>> in
>> > > some
>> > > > other class.
>> > > >
>> > > > Thanks.
>> > > >
>> > > > -Bhupesh
>> > > >
>> > >
>> >
>>
>
>

Re: Changes to JdbcOutputOperator

Posted by Chandni Singh <ch...@datatorrent.com>.
The operators are under Malhar/lib/db/jdbc.

Here is one of them:
https://github.com/apache/incubator-apex-malhar/blob/devel-3/library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcTransactionableOutputOperator.java

They work with any kind PreparedStatement - insert or update

Chandni



On Fri, Dec 11, 2015 at 10:59 AM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Hi Chandni,
>
> I don't see an update query being handled in the operator. Could you please
> point me to the appropriate class?
> Or did you mean that handling a update query is just a matter of extending
> the class and providing a concrete implementation?
>
> Thanks.
> -Bhupesh
>
> On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <ch...@datatorrent.com>
> wrote:
>
> > Hi Bhupesh,
> >
> > The current abstract JDBC Output Operators  in library are generic and
> have
> > already been used in multiple POCs and applications. In fact this
> operator
> > has matured through customer use cases. It is not just an insert
> operator.
> > We have used it to perform update and inserts.
> >
> > That said, I don't think it is a good idea to introduce input format in
> >  these abstract implementations. It is written to handle any type of
> query,
> > be it a procedure call (that was an actual customer use case).
> >
> > Chandni
> >
> > On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <bhupesh@datatorrent.com
> >
> > wrote:
> >
> > > Hi All,
> > >
> > > We are planning to proceed with the following approach for JDBC
> *update*
> > > operator:
> > >
> > >    - *Update Query Configuration*
> > >
> > >
> > >    - Example Update Query: *update tableName set a = ?** where b = ?
> and
> > c
> > >    > ?;*
> > >    - Example JSON input array for parameter instantiations: *[{a,
> > >    expression, INTEGER}, {b, expression, VARCHAR}, {c, expression,
> > DATE}]*
> > >
> > > We are also planning to change the JDBC Output Operator in Malhar
> Library
> > > which currently does just insert. We plan to make the input format
> > > consistent for both insert and update and hence the change to the
> current
> > > way of configuration using JSON. Following would be the config for
> > inserts:
> > >
> > >    - *Insert Query Configuration*
> > >
> > >
> > >    - Example Insert Query: *insert into tableName values (?, ?, .. ,
> ?);*
> > >    - Example JSON input array for parameter instantiations: *[{a,
> > >    expression, INTEGER}, {b, expression, VARCHAR}, .. , {c, expression,
> > > DATE}]*
> > >
> > > Please let us know your thoughts.
> > >
> > > Thanks.
> > >
> > > -Bhupesh
> > >
> > >
> > >
> > > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <
> bhupesh@datatorrent.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > Would it be a good idea to introduce the update functionality to the
> > JDBC
> > > > output operator in Apache Apex Malhar library.
> > > >
> > > > The following are possible approaches:
> > > >
> > > >    1. Accept a update query from the user with place holders for
> > values.
> > > >    Example: *update tableName set a = ?, b = ? where c = ? and d >
> ?*.
> > > >    Here "?" will be provided by the user as java expressions which
> will
> > > be
> > > >    evaluated from the incoming tuple.
> > > >    2. Another option is to accept in some configuration format (json
> /
> > > >    xml) the following and formulate the query in the operator. This
> can
> > > become
> > > >    arbitrarily complex.
> > > >    1. update clause columns
> > > >       2. update clause expressions
> > > >       3. where clause columns
> > > >       4. where clause expressions
> > > >
> > > > I am thinking about going ahead with 1. Please let me know if any
> other
> > > > option is possible and whether such a functionality already exists in
> > > some
> > > > other class.
> > > >
> > > > Thanks.
> > > >
> > > > -Bhupesh
> > > >
> > >
> >
>

Re: Changes to JdbcOutputOperator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi Chandni,

I don't see an update query being handled in the operator. Could you please
point me to the appropriate class?
Or did you mean that handling a update query is just a matter of extending
the class and providing a concrete implementation?

Thanks.
-Bhupesh

On Fri, Dec 11, 2015 at 10:42 PM, Chandni Singh <ch...@datatorrent.com>
wrote:

> Hi Bhupesh,
>
> The current abstract JDBC Output Operators  in library are generic and have
> already been used in multiple POCs and applications. In fact this operator
> has matured through customer use cases. It is not just an insert operator.
> We have used it to perform update and inserts.
>
> That said, I don't think it is a good idea to introduce input format in
>  these abstract implementations. It is written to handle any type of query,
> be it a procedure call (that was an actual customer use case).
>
> Chandni
>
> On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Hi All,
> >
> > We are planning to proceed with the following approach for JDBC *update*
> > operator:
> >
> >    - *Update Query Configuration*
> >
> >
> >    - Example Update Query: *update tableName set a = ?** where b = ? and
> c
> >    > ?;*
> >    - Example JSON input array for parameter instantiations: *[{a,
> >    expression, INTEGER}, {b, expression, VARCHAR}, {c, expression,
> DATE}]*
> >
> > We are also planning to change the JDBC Output Operator in Malhar Library
> > which currently does just insert. We plan to make the input format
> > consistent for both insert and update and hence the change to the current
> > way of configuration using JSON. Following would be the config for
> inserts:
> >
> >    - *Insert Query Configuration*
> >
> >
> >    - Example Insert Query: *insert into tableName values (?, ?, .. , ?);*
> >    - Example JSON input array for parameter instantiations: *[{a,
> >    expression, INTEGER}, {b, expression, VARCHAR}, .. , {c, expression,
> > DATE}]*
> >
> > Please let us know your thoughts.
> >
> > Thanks.
> >
> > -Bhupesh
> >
> >
> >
> > On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <bh...@datatorrent.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > Would it be a good idea to introduce the update functionality to the
> JDBC
> > > output operator in Apache Apex Malhar library.
> > >
> > > The following are possible approaches:
> > >
> > >    1. Accept a update query from the user with place holders for
> values.
> > >    Example: *update tableName set a = ?, b = ? where c = ? and d > ?*.
> > >    Here "?" will be provided by the user as java expressions which will
> > be
> > >    evaluated from the incoming tuple.
> > >    2. Another option is to accept in some configuration format (json /
> > >    xml) the following and formulate the query in the operator. This can
> > become
> > >    arbitrarily complex.
> > >    1. update clause columns
> > >       2. update clause expressions
> > >       3. where clause columns
> > >       4. where clause expressions
> > >
> > > I am thinking about going ahead with 1. Please let me know if any other
> > > option is possible and whether such a functionality already exists in
> > some
> > > other class.
> > >
> > > Thanks.
> > >
> > > -Bhupesh
> > >
> >
>

Re: Changes to JdbcOutputOperator

Posted by Chandni Singh <ch...@datatorrent.com>.
Hi Bhupesh,

The current abstract JDBC Output Operators  in library are generic and have
already been used in multiple POCs and applications. In fact this operator
has matured through customer use cases. It is not just an insert operator.
We have used it to perform update and inserts.

That said, I don't think it is a good idea to introduce input format in
 these abstract implementations. It is written to handle any type of query,
be it a procedure call (that was an actual customer use case).

Chandni

On Fri, Dec 11, 2015 at 2:50 AM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Hi All,
>
> We are planning to proceed with the following approach for JDBC *update*
> operator:
>
>    - *Update Query Configuration*
>
>
>    - Example Update Query: *update tableName set a = ?** where b = ? and c
>    > ?;*
>    - Example JSON input array for parameter instantiations: *[{a,
>    expression, INTEGER}, {b, expression, VARCHAR}, {c, expression, DATE}]*
>
> We are also planning to change the JDBC Output Operator in Malhar Library
> which currently does just insert. We plan to make the input format
> consistent for both insert and update and hence the change to the current
> way of configuration using JSON. Following would be the config for inserts:
>
>    - *Insert Query Configuration*
>
>
>    - Example Insert Query: *insert into tableName values (?, ?, .. , ?);*
>    - Example JSON input array for parameter instantiations: *[{a,
>    expression, INTEGER}, {b, expression, VARCHAR}, .. , {c, expression,
> DATE}]*
>
> Please let us know your thoughts.
>
> Thanks.
>
> -Bhupesh
>
>
>
> On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <bh...@datatorrent.com>
> wrote:
>
> > Hi All,
> >
> > Would it be a good idea to introduce the update functionality to the JDBC
> > output operator in Apache Apex Malhar library.
> >
> > The following are possible approaches:
> >
> >    1. Accept a update query from the user with place holders for values.
> >    Example: *update tableName set a = ?, b = ? where c = ? and d > ?*.
> >    Here "?" will be provided by the user as java expressions which will
> be
> >    evaluated from the incoming tuple.
> >    2. Another option is to accept in some configuration format (json /
> >    xml) the following and formulate the query in the operator. This can
> become
> >    arbitrarily complex.
> >    1. update clause columns
> >       2. update clause expressions
> >       3. where clause columns
> >       4. where clause expressions
> >
> > I am thinking about going ahead with 1. Please let me know if any other
> > option is possible and whether such a functionality already exists in
> some
> > other class.
> >
> > Thanks.
> >
> > -Bhupesh
> >
>

Re: Changes to JdbcOutputOperator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Hi All,

We are planning to proceed with the following approach for JDBC *update*
operator:

   - *Update Query Configuration*


   - Example Update Query: *update tableName set a = ?** where b = ? and c
   > ?;*
   - Example JSON input array for parameter instantiations: *[{a,
   expression, INTEGER}, {b, expression, VARCHAR}, {c, expression, DATE}]*

We are also planning to change the JDBC Output Operator in Malhar Library
which currently does just insert. We plan to make the input format
consistent for both insert and update and hence the change to the current
way of configuration using JSON. Following would be the config for inserts:

   - *Insert Query Configuration*


   - Example Insert Query: *insert into tableName values (?, ?, .. , ?);*
   - Example JSON input array for parameter instantiations: *[{a,
   expression, INTEGER}, {b, expression, VARCHAR}, .. , {c, expression, DATE}]*

Please let us know your thoughts.

Thanks.

-Bhupesh



On Wed, Dec 9, 2015 at 6:38 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Hi All,
>
> Would it be a good idea to introduce the update functionality to the JDBC
> output operator in Apache Apex Malhar library.
>
> The following are possible approaches:
>
>    1. Accept a update query from the user with place holders for values.
>    Example: *update tableName set a = ?, b = ? where c = ? and d > ?*.
>    Here "?" will be provided by the user as java expressions which will be
>    evaluated from the incoming tuple.
>    2. Another option is to accept in some configuration format (json /
>    xml) the following and formulate the query in the operator. This can become
>    arbitrarily complex.
>    1. update clause columns
>       2. update clause expressions
>       3. where clause columns
>       4. where clause expressions
>
> I am thinking about going ahead with 1. Please let me know if any other
> option is possible and whether such a functionality already exists in some
> other class.
>
> Thanks.
>
> -Bhupesh
>