You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Jilani Shaik <ji...@gmail.com> on 2017/02/06 04:41:25 UTC

Re: sqoop hbase incremental import - Sqoop 1.4.6

Hi Liz,

lets say we inserted data in a table with initial import, that looks like
this in hbase shell

 1                                     column=pay:amount,
timestamp=1485129654025, value=4.99
 1                                     column=pay:customer_id,
timestamp=1485129654025, value=1
 1                                     column=pay:last_update,
timestamp=1485129654025, value=2017-01-23 05:29:09.0
 1                                     column=pay:payment_date,
timestamp=1485129654025, value=2005-05-25 11:30:37.0
 1                                     column=pay:rental_id,
timestamp=1485129654025, value=573
 1                                     column=pay:staff_id,
timestamp=1485129654025, value=1
 10                                    column=pay:amount,
timestamp=1485129504390, value=5.99
 10                                    column=pay:customer_id,
timestamp=1485129504390, value=1
 10                                    column=pay:last_update,
timestamp=1485129504390, value=2006-02-15 22:12:30.0
 10                                    column=pay:payment_date,
timestamp=1485129504390, value=2005-07-08 03:17:05.0
 10                                    column=pay:rental_id,
timestamp=1485129504390, value=4526
 10                                    column=pay:staff_id,
timestamp=1485129504390, value=2


now assume that in source rental_id becomes NULL for rowkey "1", and then
we are doing incremental import into HBase. With current import the final
HBase data after incremental import will look like this.

 1                                     column=pay:amount,
timestamp=1485129654025, value=4.99
 1                                     column=pay:customer_id,
timestamp=1485129654025, value=1
 1                                     column=pay:last_update,
timestamp=1485129654025, value=2017-02-05 05:29:09.0
 1                                     column=pay:payment_date,
timestamp=1485129654025, value=2005-05-25 11:30:37.0
 1                                     column=pay:rental_id,
timestamp=1485129654025, value=573
 1                                     column=pay:staff_id,
timestamp=1485129654025, value=1
 10                                    column=pay:amount,
timestamp=1485129504390, value=5.99
 10                                    column=pay:customer_id,
timestamp=1485129504390, value=1
 10                                    column=pay:last_update,
timestamp=1485129504390, value=2017-02-05 05:12:30.0
 10                                    column=pay:payment_date,
timestamp=1485129504390, value=2005-07-08 03:17:05.0
 10                                    column=pay:rental_id,
timestamp=1485129504390, value=126
 10                                    column=pay:staff_id,
timestamp=1485129504390, value=2



As source column "rental_id" becomes NULL for rowkey "1", the final HBase
should not have the "rental_id" for this rowkey "1". I am expecting below
data for these rowkeys.


 1                                     column=pay:amount,
timestamp=1485129654025, value=4.99
 1                                     column=pay:customer_id,
timestamp=1485129654025, value=1
 1                                     column=pay:last_update,
timestamp=1485129654025, value=2017-02-05 05:29:09.0
 1                                     column=pay:payment_date,
timestamp=1485129654025, value=2005-05-25 11:30:37.0
 1                                     column=pay:staff_id,
timestamp=1485129654025, value=1
 10                                    column=pay:amount,
timestamp=1485129504390, value=5.99
 10                                    column=pay:customer_id,
timestamp=1485129504390, value=1
 10                                    column=pay:last_update,
timestamp=1485129504390, value=2017-02-05 05:12:30.0
 10                                    column=pay:payment_date,
timestamp=1485129504390, value=2005-07-08 03:17:05.0
 10                                    column=pay:rental_id,
timestamp=1485129504390, value=126
 10                                    column=pay:staff_id,
timestamp=1485129504390, value=2


Please let me know if anything required further.


Thanks,
Jilani

On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
liz.szilagyi@cloudera.com> wrote:

> Hi Jilani,
> I'm not sure I completely understand what you are trying to do. Could you
> give us some examples with e.g. 4 columns and 2 rows of example data
> showing the changes that happen compared to the changes you'd like to see?
> Thanks,
> Liz
>
> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <ji...@gmail.com>
> wrote:
>
> >
> > Please help in resolving the issue, I am going through source code some
> > how the required nature is missing, But not sure is it for some reason we
> > avoided this nature.
> >
> > Provide me some suggestions how to go with this scenario.
> >
> > Thanks,
> > Jilani
> >
> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <ji...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> We have a scenario where we are importing data into HBase with sqoop
> >> incremental import.
> >>
> >> Lets say we imported a table and later source table got updated for some
> >> columns as null values for some rows. Then while doing incremental
> import
> >> as per HBase these columns should not be there in HBase table. But right
> >> now these columns will be as it is available with previous values.
> >>
> >> Is there any fix to overcome this issue?
> >>
> >>
> >> Thanks,
> >> Jilani
> >>
> >
> >
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Hi Bogi,

I have updated the patch and submitted review request for the same with
changes as a patch at both the places.

Changes applied on trunk version.

Thanks,
Jilani

On Fri, Mar 10, 2017 at 2:10 PM, Boglarka Egyed <bo...@cloudera.com> wrote:

> Hi Jilani,
>
> Information of a JIRA creation/modification are sent to dev@ so you don't
> need to share the JIRA here again. Also, if you have set "Sqoop" at the
> ReviewBoard as a group to review people in that group will receive a
> notification about it.
>
> I can see that you have created a JIRA
> <https://issues.apache.org/jira/browse/SQOOP-3149> but I don't see the
> patch file attached nor a link to the review board. Could you please add
> them as I have described it in one of my previous email?
>
> After you have attached the patch file and filed a review request (linked
> to the JIRA so that people can find your change to review) the community
> will review it which will take a while.
>
> Thanks,
> Bogi
>
> On Fri, Mar 10, 2017 at 7:08 PM, Jilani Shaik <ji...@gmail.com>
> wrote:
>
>> Hi Bogi,
>>
>> Do I need to share the JIRA and review ticket etc here again
>> or
>> Once I create a JIRA and review ticket and submit the detail, workflow
>> will follows from there like validating, comment/suggestion etc.
>>
>> Thanks,
>> Jilani
>>
>> On Fri, Mar 10, 2017 at 7:57 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>>> Yes you are correct, I am running from eclipse. Will run from command
>>> line.
>>>
>>> Sent from my iPhone
>>>
>>> On Mar 10, 2017, at 6:48 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>>>
>>> Hi Jilani,
>>>
>>> Please try to execute "ant compile" and then "ant test" from command
>>> line, it will run unit tests. If I understood you well, you tried run tests
>>> from Eclipse which won't work.
>>>
>>> Thanks,
>>> Bogi
>>>
>>> On Fri, Mar 10, 2017 at 6:10 AM, Jilani Shaik <ji...@gmail.com>
>>> wrote:
>>>
>>>> Hi Bogi,
>>>>
>>>> Thanks for the providing direction.
>>>>
>>>> As you suggested I explored further and resolved the issue and able to
>>>> test
>>>> the fix on trunk based code changes in my hadoop cluster.
>>>>
>>>> Root cause for my issue:
>>>> 1.4.6 code base using the same avro version which is there in my hadoop
>>>> cluster so there is no issue for that jar component, whereas trunk code
>>>> base using the avro-1.8.1 jar files, which is not available in my hadoop
>>>> cluster.
>>>>
>>>> Can you suggest how to do unit test etc for this component.
>>>>
>>>> I tried with "test" target, I am getting all as failed as below.
>>>>
>>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.415
>>>> sec
>>>>     [junit] Running com.cloudera.sqoop.TestDirectImport
>>>>     [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time
>>>> elapsed:
>>>> 13.705 sec
>>>>     [junit] Test com.cloudera.sqoop.TestDirectImport FAILED
>>>>     [junit] Running com.cloudera.sqoop.TestExport
>>>>     [junit] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time
>>>> elapsed: 22.564 sec
>>>>     [junit] Test com.cloudera.sqoop.TestExport FAILED
>>>>     [junit] Running com.cloudera.sqoop.TestExportUpdate
>>>>
>>>> Do I need to do any changes? I am running from eclipse with "test"
>>>> target.
>>>>
>>>> Thanks,
>>>> Jilani
>>>>
>>>>
>>>> On Thu, Mar 9, 2017 at 9:42 PM, Jilani Shaik <ji...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Bogi,
>>>> >
>>>> > - Prepared jar using trunk with "jar-all" target
>>>> >
>>>> > - Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/
>>>> >
>>>> > - Moved out existing jar to some other location
>>>> >
>>>> > - then execute the below command to do import
>>>> > sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
>>>> > --username test --password test123$ --table payment -m 2 --hbase-table
>>>> > /database/demoapp/hbase/payment --column-family pay --hbase-row-key
>>>> > payment_id --incremental lastmodified --merge-key payment_id
>>>> --check-column
>>>> > last_update --last-value '2017-01-08 08:02:05.0'
>>>> >
>>>> >
>>>> > The same steps I followed for both the jar from trunk code vs 1.4.6
>>>> branch
>>>> > code.
>>>> >
>>>> > Where are you suggesting the multiple avro jars, is it at the time of
>>>> jar
>>>> > preparation or running the command using the jar.
>>>> >
>>>> >
>>>> > Thanks,
>>>> > Jilani
>>>> >
>>>> > On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com>
>>>> wrote:
>>>> >
>>>> >> Hi Jilani,
>>>> >>
>>>> >> I suspect that you have an old version of Avro or even multiple Avro
>>>> >> versions on your classpath and thus Sqoop uses an older one.
>>>> >>
>>>> >> Could you please provide a list of the exact commands you have
>>>> performed
>>>> >> so that I can reproduce the issue?
>>>> >>
>>>> >> Thanks,
>>>> >> Bogi
>>>> >>
>>>> >> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com>
>>>> >> wrote:
>>>> >>
>>>> >>> Can some one provide me the pointers what am I missing with trunk vs
>>>> >>> 1.4.6
>>>> >>> builds, which is giving some error as mentioned in below mail chain.
>>>> >>>
>>>> >>> I did followed the same ant target to prepare jar for both
>>>> branches, but
>>>> >>> even though 1.4.6 jar is different to 1.4.7 which is created form
>>>> trunk.
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Jilani
>>>> >>>
>>>> >>>
>>>> >>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
>>>> >>> wrote:
>>>> >>>
>>>> >>> > Hi Bogi,
>>>> >>> >
>>>> >>> > I am getting below error, when I have prepared jar from trunk and
>>>> try
>>>> >>> to
>>>> >>> > do sqoop import with mysql database table and got below exception,
>>>> >>> where as
>>>> >>> > similar changes are working with branch 1.4.6.
>>>> >>> >
>>>> >>> >
>>>> >>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
>>>> >>> 1.4.7-SNAPSHOT
>>>> >>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>>>> >>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password
>>>> on the
>>>> >>> > command-line is insecure. Consider using -P instead.
>>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>>> >>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data
>>>> Connector for
>>>> >>> > Oracle and Hadoop can be called by Sqoop!
>>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>>> >>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
>>>> >>> scheme:
>>>> >>> > jdbc:mysql:
>>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>> >>> > org/apache/avro/LogicalType
>>>> >>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
>>>> >>> > DefaultManagerFactory.java:67)
>>>> >>> >         at org.apache.sqoop.ConnFactory.g
>>>> >>> etManager(ConnFactory.java:184)
>>>> >>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
>>>> >>> > BaseSqoopTool.java:270)
>>>> >>> >         at org.apache.sqoop.tool.ImportTo
>>>> ol.init(ImportTool.java:97)
>>>> >>> >         at org.apache.sqoop.tool.ImportTo
>>>> ol.run(ImportTool.java:617)
>>>> >>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>>>> >>> >         at org.apache.hadoop.util.ToolRun
>>>> ner.run(ToolRunner.java:70)
>>>> >>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>>>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>>>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>>>> >>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
>>>> >>> > Caused by: java.lang.ClassNotFoundException:
>>>> >>> org.apache.avro.LogicalType
>>>> >>> >         at java.net.URLClassLoader.findCl
>>>> ass(URLClassLoader.java:381)
>>>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>> >>> >         at sun.misc.Launcher$AppClassLoad
>>>> >>> er.loadClass(Launcher.java:331)
>>>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>> >>> >         ... 11 more
>>>> >>> >
>>>> >>> > Please let me know what is missing and how to resolve this
>>>> exception,
>>>> >>> Let
>>>> >>> > me know if you need further details.
>>>> >>> >
>>>> >>> > Thanks,
>>>> >>> > Jilani
>>>> >>> >
>>>> >>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bogi@cloudera.com
>>>> >
>>>> >>> wrote:
>>>> >>> >
>>>> >>> >> Hi Jilani,
>>>> >>> >>
>>>> >>> >> This is an example: SQOOP-3053
>>>> >>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the
>>>> review
>>>> >>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
>>>> >>> changes on
>>>> >>> >> trunk as it will be used to cut the future release so your patch
>>>> >>> >> definitely
>>>> >>> >> needs to be be able to apply on it.
>>>> >>> >>
>>>> >>> >> Thanks,
>>>> >>> >> Bogi
>>>> >>> >>
>>>> >>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <
>>>> jilani2423@gmail.com>
>>>> >>> >> wrote:
>>>> >>> >>
>>>> >>> >> > Hi Bogi,
>>>> >>> >> >
>>>> >>> >> > Can you provide me sample Jira tickets and Review requests
>>>> similar
>>>> >>> to
>>>> >>> >> > this, to proceed further.
>>>> >>> >> >
>>>> >>> >> > I applied the code changes from sqoop git from this branch
>>>> >>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will
>>>> take
>>>> >>> the
>>>> >>> >> code
>>>> >>> >> > from there and apply the changes before submit review for
>>>> request.
>>>> >>> >> >
>>>> >>> >> > Thanks,
>>>> >>> >> > Jilani
>>>> >>> >> >
>>>> >>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <
>>>> bogi@cloudera.com>
>>>> >>> >> wrote:
>>>> >>> >> >
>>>> >>> >> >> Hi Jilani,
>>>> >>> >> >>
>>>> >>> >> >> To get your change committed please do the following:
>>>> >>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
>>>> >>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project
>>>> Sqoop
>>>> >>> >> >> * Create a review request at Apache's review board
>>>> >>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link
>>>> it to
>>>> >>> the
>>>> >>> >> JIRA
>>>> >>> >> >>
>>>> >>> >> >> ticket
>>>> >>> >> >>
>>>> >>> >> >> Please consider the guidelines below:
>>>> >>> >> >>
>>>> >>> >> >> Review board
>>>> >>> >> >> * Summary: generate your summary using the issue's jira key +
>>>> jira
>>>> >>> >> title
>>>> >>> >> >> * Groups: add the relevant group so everyone on the project
>>>> will
>>>> >>> know
>>>> >>> >> >> about
>>>> >>> >> >> your patch (Sqoop)
>>>> >>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to
>>>> the
>>>> >>> jira
>>>> >>> >> side
>>>> >>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for
>>>> Sqoop2
>>>> >>> >> >> * And as soon as the patch gets committed, it's very useful
>>>> for the
>>>> >>> >> >> community if you close the review and mark it as "Submitted"
>>>> at the
>>>> >>> >> Review
>>>> >>> >> >> board. The button to do this is top right at your own tickets,
>>>> >>> right
>>>> >>> >> next
>>>> >>> >> >> to  the Download Diff button.
>>>> >>> >> >>
>>>> >>> >> >> Jira
>>>> >>> >> >> * Link: please add the link of the review as an external/web
>>>> link
>>>> >>> so
>>>> >>> >> it's
>>>> >>> >> >> easy to navigate to the reviews side
>>>> >>> >> >> * Status: mark it as "patch available"
>>>> >>> >> >>
>>>> >>> >> >> Sqoop community will receive emails about your new ticket and
>>>> >>> review
>>>> >>> >> >> request and will review your change.
>>>> >>> >> >>
>>>> >>> >> >> Thanks,
>>>> >>> >> >> Bogi
>>>> >>> >> >>
>>>> >>> >> >>
>>>> >>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <
>>>> >>> jilani2423@gmail.com>
>>>> >>> >> >> wrote:
>>>> >>> >> >>
>>>> >>> >> >> > Do we have any update?
>>>> >>> >> >> >
>>>> >>> >> >> > I did checkout of the 1.4.6 code and done code changes to
>>>> achieve
>>>> >>> >> this
>>>> >>> >> >> and
>>>> >>> >> >> > tested in cluster and it is working as expected. Is there a
>>>> way
>>>> >>> I can
>>>> >>> >> >> > contribute this as a patch and then the committers can
>>>> validate
>>>> >>> >> further
>>>> >>> >> >> and
>>>> >>> >> >> > suggest if any changes required to move further. Please
>>>> suggest
>>>> >>> the
>>>> >>> >> >> > approach.
>>>> >>> >> >> >
>>>> >>> >> >> > Thanks,
>>>> >>> >> >> > Jilani
>>>> >>> >> >> >
>>>> >>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
>>>> >>> jilani2423@gmail.com>
>>>> >>> >> >> > wrote:
>>>> >>> >> >> >
>>>> >>> >> >> > > Hi Liz,
>>>> >>> >> >> > >
>>>> >>> >> >> > > lets say we inserted data in a table with initial import,
>>>> that
>>>> >>> >> looks
>>>> >>> >> >> like
>>>> >>> >> >> > > this in hbase shell
>>>> >>> >> >> > >
>>>> >>> >> >> > >  1                                     column=pay:amount,
>>>> >>> >> >> > > timestamp=1485129654025, value=4.99
>>>> >>> >> >> > >  1
>>>>  column=pay:customer_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=1
>>>> >>> >> >> > >  1
>>>>  column=pay:last_update,
>>>> >>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>>>> >>> >> >> > >  1
>>>>  column=pay:payment_date,
>>>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>>> >>> >> >> > >  1
>>>>  column=pay:rental_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=573
>>>> >>> >> >> > >  1
>>>>  column=pay:staff_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=1
>>>> >>> >> >> > >  10                                    column=pay:amount,
>>>> >>> >> >> > > timestamp=1485129504390, value=5.99
>>>> >>> >> >> > >  10
>>>> column=pay:customer_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=1
>>>> >>> >> >> > >  10
>>>> column=pay:last_update,
>>>> >>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>>>> >>> >> >> > >  10
>>>> column=pay:payment_date,
>>>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>>> >>> >> >> > >  10
>>>> column=pay:rental_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=4526
>>>> >>> >> >> > >  10
>>>> column=pay:staff_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=2
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> > > now assume that in source rental_id becomes NULL for
>>>> rowkey
>>>> >>> "1",
>>>> >>> >> and
>>>> >>> >> >> then
>>>> >>> >> >> > > we are doing incremental import into HBase. With current
>>>> >>> import the
>>>> >>> >> >> final
>>>> >>> >> >> > > HBase data after incremental import will look like this.
>>>> >>> >> >> > >
>>>> >>> >> >> > >  1                                     column=pay:amount,
>>>> >>> >> >> > > timestamp=1485129654025, value=4.99
>>>> >>> >> >> > >  1
>>>>  column=pay:customer_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=1
>>>> >>> >> >> > >  1
>>>>  column=pay:last_update,
>>>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>>> >>> >> >> > >  1
>>>>  column=pay:payment_date,
>>>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>>> >>> >> >> > >  1
>>>>  column=pay:rental_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=573
>>>> >>> >> >> > >  1
>>>>  column=pay:staff_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=1
>>>> >>> >> >> > >  10                                    column=pay:amount,
>>>> >>> >> >> > > timestamp=1485129504390, value=5.99
>>>> >>> >> >> > >  10
>>>> column=pay:customer_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=1
>>>> >>> >> >> > >  10
>>>> column=pay:last_update,
>>>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>>> >>> >> >> > >  10
>>>> column=pay:payment_date,
>>>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>>> >>> >> >> > >  10
>>>> column=pay:rental_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=126
>>>> >>> >> >> > >  10
>>>> column=pay:staff_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=2
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1",
>>>> the
>>>> >>> final
>>>> >>> >> >> HBase
>>>> >>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
>>>> >>> expecting
>>>> >>> >> >> below
>>>> >>> >> >> > > data for these rowkeys.
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> > >  1                                     column=pay:amount,
>>>> >>> >> >> > > timestamp=1485129654025, value=4.99
>>>> >>> >> >> > >  1
>>>>  column=pay:customer_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=1
>>>> >>> >> >> > >  1
>>>>  column=pay:last_update,
>>>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>>> >>> >> >> > >  1
>>>>  column=pay:payment_date,
>>>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>>> >>> >> >> > >  1
>>>>  column=pay:staff_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=1
>>>> >>> >> >> > >  10                                    column=pay:amount,
>>>> >>> >> >> > > timestamp=1485129504390, value=5.99
>>>> >>> >> >> > >  10
>>>> column=pay:customer_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=1
>>>> >>> >> >> > >  10
>>>> column=pay:last_update,
>>>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>>> >>> >> >> > >  10
>>>> column=pay:payment_date,
>>>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>>> >>> >> >> > >  10
>>>> column=pay:rental_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=126
>>>> >>> >> >> > >  10
>>>> column=pay:staff_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=2
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> > > Please let me know if anything required further.
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> > > Thanks,
>>>> >>> >> >> > > Jilani
>>>> >>> >> >> > >
>>>> >>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>>>> >>> >> >> > > liz.szilagyi@cloudera.com> wrote:
>>>> >>> >> >> > >
>>>> >>> >> >> > >> Hi Jilani,
>>>> >>> >> >> > >> I'm not sure I completely understand what you are trying
>>>> to
>>>> >>> do.
>>>> >>> >> Could
>>>> >>> >> >> > you
>>>> >>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of
>>>> >>> example
>>>> >>> >> data
>>>> >>> >> >> > >> showing the changes that happen compared to the changes
>>>> you'd
>>>> >>> >> like to
>>>> >>> >> >> > see?
>>>> >>> >> >> > >> Thanks,
>>>> >>> >> >> > >> Liz
>>>> >>> >> >> > >>
>>>> >>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>>>> >>> >> jilani2423@gmail.com>
>>>> >>> >> >> > >> wrote:
>>>> >>> >> >> > >>
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >> > Please help in resolving the issue, I am going through
>>>> >>> source
>>>> >>> >> code
>>>> >>> >> >> > some
>>>> >>> >> >> > >> > how the required nature is missing, But not sure is it
>>>> for
>>>> >>> some
>>>> >>> >> >> reason
>>>> >>> >> >> > >> we
>>>> >>> >> >> > >> > avoided this nature.
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >> > Provide me some suggestions how to go with this
>>>> scenario.
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >> > Thanks,
>>>> >>> >> >> > >> > Jilani
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>>>> >>> >> >> jilani2423@gmail.com>
>>>> >>> >> >> > >> > wrote:
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >> >> Hi,
>>>> >>> >> >> > >> >>
>>>> >>> >> >> > >> >> We have a scenario where we are importing data into
>>>> HBase
>>>> >>> with
>>>> >>> >> >> sqoop
>>>> >>> >> >> > >> >> incremental import.
>>>> >>> >> >> > >> >>
>>>> >>> >> >> > >> >> Lets say we imported a table and later source table
>>>> got
>>>> >>> updated
>>>> >>> >> >> for
>>>> >>> >> >> > >> some
>>>> >>> >> >> > >> >> columns as null values for some rows. Then while doing
>>>> >>> >> incremental
>>>> >>> >> >> > >> import
>>>> >>> >> >> > >> >> as per HBase these columns should not be there in
>>>> HBase
>>>> >>> table.
>>>> >>> >> But
>>>> >>> >> >> > >> right
>>>> >>> >> >> > >> >> now these columns will be as it is available with
>>>> previous
>>>> >>> >> values.
>>>> >>> >> >> > >> >>
>>>> >>> >> >> > >> >> Is there any fix to overcome this issue?
>>>> >>> >> >> > >> >>
>>>> >>> >> >> > >> >>
>>>> >>> >> >> > >> >> Thanks,
>>>> >>> >> >> > >> >> Jilani
>>>> >>> >> >> > >> >>
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >>
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> >
>>>> >>> >> >>
>>>> >>> >> >
>>>> >>> >> >
>>>> >>> >>
>>>> >>> >
>>>> >>> >
>>>> >>>
>>>> >>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Hi Bogi,

I have updated the patch and submitted review request for the same with
changes as a patch at both the places.

Changes applied on trunk version.

Thanks,
Jilani

On Fri, Mar 10, 2017 at 2:10 PM, Boglarka Egyed <bo...@cloudera.com> wrote:

> Hi Jilani,
>
> Information of a JIRA creation/modification are sent to dev@ so you don't
> need to share the JIRA here again. Also, if you have set "Sqoop" at the
> ReviewBoard as a group to review people in that group will receive a
> notification about it.
>
> I can see that you have created a JIRA
> <https://issues.apache.org/jira/browse/SQOOP-3149> but I don't see the
> patch file attached nor a link to the review board. Could you please add
> them as I have described it in one of my previous email?
>
> After you have attached the patch file and filed a review request (linked
> to the JIRA so that people can find your change to review) the community
> will review it which will take a while.
>
> Thanks,
> Bogi
>
> On Fri, Mar 10, 2017 at 7:08 PM, Jilani Shaik <ji...@gmail.com>
> wrote:
>
>> Hi Bogi,
>>
>> Do I need to share the JIRA and review ticket etc here again
>> or
>> Once I create a JIRA and review ticket and submit the detail, workflow
>> will follows from there like validating, comment/suggestion etc.
>>
>> Thanks,
>> Jilani
>>
>> On Fri, Mar 10, 2017 at 7:57 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>>> Yes you are correct, I am running from eclipse. Will run from command
>>> line.
>>>
>>> Sent from my iPhone
>>>
>>> On Mar 10, 2017, at 6:48 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>>>
>>> Hi Jilani,
>>>
>>> Please try to execute "ant compile" and then "ant test" from command
>>> line, it will run unit tests. If I understood you well, you tried run tests
>>> from Eclipse which won't work.
>>>
>>> Thanks,
>>> Bogi
>>>
>>> On Fri, Mar 10, 2017 at 6:10 AM, Jilani Shaik <ji...@gmail.com>
>>> wrote:
>>>
>>>> Hi Bogi,
>>>>
>>>> Thanks for the providing direction.
>>>>
>>>> As you suggested I explored further and resolved the issue and able to
>>>> test
>>>> the fix on trunk based code changes in my hadoop cluster.
>>>>
>>>> Root cause for my issue:
>>>> 1.4.6 code base using the same avro version which is there in my hadoop
>>>> cluster so there is no issue for that jar component, whereas trunk code
>>>> base using the avro-1.8.1 jar files, which is not available in my hadoop
>>>> cluster.
>>>>
>>>> Can you suggest how to do unit test etc for this component.
>>>>
>>>> I tried with "test" target, I am getting all as failed as below.
>>>>
>>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.415
>>>> sec
>>>>     [junit] Running com.cloudera.sqoop.TestDirectImport
>>>>     [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time
>>>> elapsed:
>>>> 13.705 sec
>>>>     [junit] Test com.cloudera.sqoop.TestDirectImport FAILED
>>>>     [junit] Running com.cloudera.sqoop.TestExport
>>>>     [junit] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time
>>>> elapsed: 22.564 sec
>>>>     [junit] Test com.cloudera.sqoop.TestExport FAILED
>>>>     [junit] Running com.cloudera.sqoop.TestExportUpdate
>>>>
>>>> Do I need to do any changes? I am running from eclipse with "test"
>>>> target.
>>>>
>>>> Thanks,
>>>> Jilani
>>>>
>>>>
>>>> On Thu, Mar 9, 2017 at 9:42 PM, Jilani Shaik <ji...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Bogi,
>>>> >
>>>> > - Prepared jar using trunk with "jar-all" target
>>>> >
>>>> > - Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/
>>>> >
>>>> > - Moved out existing jar to some other location
>>>> >
>>>> > - then execute the below command to do import
>>>> > sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
>>>> > --username test --password test123$ --table payment -m 2 --hbase-table
>>>> > /database/demoapp/hbase/payment --column-family pay --hbase-row-key
>>>> > payment_id --incremental lastmodified --merge-key payment_id
>>>> --check-column
>>>> > last_update --last-value '2017-01-08 08:02:05.0'
>>>> >
>>>> >
>>>> > The same steps I followed for both the jar from trunk code vs 1.4.6
>>>> branch
>>>> > code.
>>>> >
>>>> > Where are you suggesting the multiple avro jars, is it at the time of
>>>> jar
>>>> > preparation or running the command using the jar.
>>>> >
>>>> >
>>>> > Thanks,
>>>> > Jilani
>>>> >
>>>> > On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com>
>>>> wrote:
>>>> >
>>>> >> Hi Jilani,
>>>> >>
>>>> >> I suspect that you have an old version of Avro or even multiple Avro
>>>> >> versions on your classpath and thus Sqoop uses an older one.
>>>> >>
>>>> >> Could you please provide a list of the exact commands you have
>>>> performed
>>>> >> so that I can reproduce the issue?
>>>> >>
>>>> >> Thanks,
>>>> >> Bogi
>>>> >>
>>>> >> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com>
>>>> >> wrote:
>>>> >>
>>>> >>> Can some one provide me the pointers what am I missing with trunk vs
>>>> >>> 1.4.6
>>>> >>> builds, which is giving some error as mentioned in below mail chain.
>>>> >>>
>>>> >>> I did followed the same ant target to prepare jar for both
>>>> branches, but
>>>> >>> even though 1.4.6 jar is different to 1.4.7 which is created form
>>>> trunk.
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Jilani
>>>> >>>
>>>> >>>
>>>> >>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
>>>> >>> wrote:
>>>> >>>
>>>> >>> > Hi Bogi,
>>>> >>> >
>>>> >>> > I am getting below error, when I have prepared jar from trunk and
>>>> try
>>>> >>> to
>>>> >>> > do sqoop import with mysql database table and got below exception,
>>>> >>> where as
>>>> >>> > similar changes are working with branch 1.4.6.
>>>> >>> >
>>>> >>> >
>>>> >>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
>>>> >>> 1.4.7-SNAPSHOT
>>>> >>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>>>> >>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password
>>>> on the
>>>> >>> > command-line is insecure. Consider using -P instead.
>>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>>> >>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data
>>>> Connector for
>>>> >>> > Oracle and Hadoop can be called by Sqoop!
>>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>>> >>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
>>>> >>> scheme:
>>>> >>> > jdbc:mysql:
>>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>> >>> > org/apache/avro/LogicalType
>>>> >>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
>>>> >>> > DefaultManagerFactory.java:67)
>>>> >>> >         at org.apache.sqoop.ConnFactory.g
>>>> >>> etManager(ConnFactory.java:184)
>>>> >>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
>>>> >>> > BaseSqoopTool.java:270)
>>>> >>> >         at org.apache.sqoop.tool.ImportTo
>>>> ol.init(ImportTool.java:97)
>>>> >>> >         at org.apache.sqoop.tool.ImportTo
>>>> ol.run(ImportTool.java:617)
>>>> >>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>>>> >>> >         at org.apache.hadoop.util.ToolRun
>>>> ner.run(ToolRunner.java:70)
>>>> >>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>>>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>>>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>>>> >>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
>>>> >>> > Caused by: java.lang.ClassNotFoundException:
>>>> >>> org.apache.avro.LogicalType
>>>> >>> >         at java.net.URLClassLoader.findCl
>>>> ass(URLClassLoader.java:381)
>>>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>> >>> >         at sun.misc.Launcher$AppClassLoad
>>>> >>> er.loadClass(Launcher.java:331)
>>>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>> >>> >         ... 11 more
>>>> >>> >
>>>> >>> > Please let me know what is missing and how to resolve this
>>>> exception,
>>>> >>> Let
>>>> >>> > me know if you need further details.
>>>> >>> >
>>>> >>> > Thanks,
>>>> >>> > Jilani
>>>> >>> >
>>>> >>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bogi@cloudera.com
>>>> >
>>>> >>> wrote:
>>>> >>> >
>>>> >>> >> Hi Jilani,
>>>> >>> >>
>>>> >>> >> This is an example: SQOOP-3053
>>>> >>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the
>>>> review
>>>> >>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
>>>> >>> changes on
>>>> >>> >> trunk as it will be used to cut the future release so your patch
>>>> >>> >> definitely
>>>> >>> >> needs to be be able to apply on it.
>>>> >>> >>
>>>> >>> >> Thanks,
>>>> >>> >> Bogi
>>>> >>> >>
>>>> >>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <
>>>> jilani2423@gmail.com>
>>>> >>> >> wrote:
>>>> >>> >>
>>>> >>> >> > Hi Bogi,
>>>> >>> >> >
>>>> >>> >> > Can you provide me sample Jira tickets and Review requests
>>>> similar
>>>> >>> to
>>>> >>> >> > this, to proceed further.
>>>> >>> >> >
>>>> >>> >> > I applied the code changes from sqoop git from this branch
>>>> >>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will
>>>> take
>>>> >>> the
>>>> >>> >> code
>>>> >>> >> > from there and apply the changes before submit review for
>>>> request.
>>>> >>> >> >
>>>> >>> >> > Thanks,
>>>> >>> >> > Jilani
>>>> >>> >> >
>>>> >>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <
>>>> bogi@cloudera.com>
>>>> >>> >> wrote:
>>>> >>> >> >
>>>> >>> >> >> Hi Jilani,
>>>> >>> >> >>
>>>> >>> >> >> To get your change committed please do the following:
>>>> >>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
>>>> >>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project
>>>> Sqoop
>>>> >>> >> >> * Create a review request at Apache's review board
>>>> >>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link
>>>> it to
>>>> >>> the
>>>> >>> >> JIRA
>>>> >>> >> >>
>>>> >>> >> >> ticket
>>>> >>> >> >>
>>>> >>> >> >> Please consider the guidelines below:
>>>> >>> >> >>
>>>> >>> >> >> Review board
>>>> >>> >> >> * Summary: generate your summary using the issue's jira key +
>>>> jira
>>>> >>> >> title
>>>> >>> >> >> * Groups: add the relevant group so everyone on the project
>>>> will
>>>> >>> know
>>>> >>> >> >> about
>>>> >>> >> >> your patch (Sqoop)
>>>> >>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to
>>>> the
>>>> >>> jira
>>>> >>> >> side
>>>> >>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for
>>>> Sqoop2
>>>> >>> >> >> * And as soon as the patch gets committed, it's very useful
>>>> for the
>>>> >>> >> >> community if you close the review and mark it as "Submitted"
>>>> at the
>>>> >>> >> Review
>>>> >>> >> >> board. The button to do this is top right at your own tickets,
>>>> >>> right
>>>> >>> >> next
>>>> >>> >> >> to  the Download Diff button.
>>>> >>> >> >>
>>>> >>> >> >> Jira
>>>> >>> >> >> * Link: please add the link of the review as an external/web
>>>> link
>>>> >>> so
>>>> >>> >> it's
>>>> >>> >> >> easy to navigate to the reviews side
>>>> >>> >> >> * Status: mark it as "patch available"
>>>> >>> >> >>
>>>> >>> >> >> Sqoop community will receive emails about your new ticket and
>>>> >>> review
>>>> >>> >> >> request and will review your change.
>>>> >>> >> >>
>>>> >>> >> >> Thanks,
>>>> >>> >> >> Bogi
>>>> >>> >> >>
>>>> >>> >> >>
>>>> >>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <
>>>> >>> jilani2423@gmail.com>
>>>> >>> >> >> wrote:
>>>> >>> >> >>
>>>> >>> >> >> > Do we have any update?
>>>> >>> >> >> >
>>>> >>> >> >> > I did checkout of the 1.4.6 code and done code changes to
>>>> achieve
>>>> >>> >> this
>>>> >>> >> >> and
>>>> >>> >> >> > tested in cluster and it is working as expected. Is there a
>>>> way
>>>> >>> I can
>>>> >>> >> >> > contribute this as a patch and then the committers can
>>>> validate
>>>> >>> >> further
>>>> >>> >> >> and
>>>> >>> >> >> > suggest if any changes required to move further. Please
>>>> suggest
>>>> >>> the
>>>> >>> >> >> > approach.
>>>> >>> >> >> >
>>>> >>> >> >> > Thanks,
>>>> >>> >> >> > Jilani
>>>> >>> >> >> >
>>>> >>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
>>>> >>> jilani2423@gmail.com>
>>>> >>> >> >> > wrote:
>>>> >>> >> >> >
>>>> >>> >> >> > > Hi Liz,
>>>> >>> >> >> > >
>>>> >>> >> >> > > lets say we inserted data in a table with initial import,
>>>> that
>>>> >>> >> looks
>>>> >>> >> >> like
>>>> >>> >> >> > > this in hbase shell
>>>> >>> >> >> > >
>>>> >>> >> >> > >  1                                     column=pay:amount,
>>>> >>> >> >> > > timestamp=1485129654025, value=4.99
>>>> >>> >> >> > >  1
>>>>  column=pay:customer_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=1
>>>> >>> >> >> > >  1
>>>>  column=pay:last_update,
>>>> >>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>>>> >>> >> >> > >  1
>>>>  column=pay:payment_date,
>>>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>>> >>> >> >> > >  1
>>>>  column=pay:rental_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=573
>>>> >>> >> >> > >  1
>>>>  column=pay:staff_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=1
>>>> >>> >> >> > >  10                                    column=pay:amount,
>>>> >>> >> >> > > timestamp=1485129504390, value=5.99
>>>> >>> >> >> > >  10
>>>> column=pay:customer_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=1
>>>> >>> >> >> > >  10
>>>> column=pay:last_update,
>>>> >>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>>>> >>> >> >> > >  10
>>>> column=pay:payment_date,
>>>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>>> >>> >> >> > >  10
>>>> column=pay:rental_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=4526
>>>> >>> >> >> > >  10
>>>> column=pay:staff_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=2
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> > > now assume that in source rental_id becomes NULL for
>>>> rowkey
>>>> >>> "1",
>>>> >>> >> and
>>>> >>> >> >> then
>>>> >>> >> >> > > we are doing incremental import into HBase. With current
>>>> >>> import the
>>>> >>> >> >> final
>>>> >>> >> >> > > HBase data after incremental import will look like this.
>>>> >>> >> >> > >
>>>> >>> >> >> > >  1                                     column=pay:amount,
>>>> >>> >> >> > > timestamp=1485129654025, value=4.99
>>>> >>> >> >> > >  1
>>>>  column=pay:customer_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=1
>>>> >>> >> >> > >  1
>>>>  column=pay:last_update,
>>>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>>> >>> >> >> > >  1
>>>>  column=pay:payment_date,
>>>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>>> >>> >> >> > >  1
>>>>  column=pay:rental_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=573
>>>> >>> >> >> > >  1
>>>>  column=pay:staff_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=1
>>>> >>> >> >> > >  10                                    column=pay:amount,
>>>> >>> >> >> > > timestamp=1485129504390, value=5.99
>>>> >>> >> >> > >  10
>>>> column=pay:customer_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=1
>>>> >>> >> >> > >  10
>>>> column=pay:last_update,
>>>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>>> >>> >> >> > >  10
>>>> column=pay:payment_date,
>>>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>>> >>> >> >> > >  10
>>>> column=pay:rental_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=126
>>>> >>> >> >> > >  10
>>>> column=pay:staff_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=2
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1",
>>>> the
>>>> >>> final
>>>> >>> >> >> HBase
>>>> >>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
>>>> >>> expecting
>>>> >>> >> >> below
>>>> >>> >> >> > > data for these rowkeys.
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> > >  1                                     column=pay:amount,
>>>> >>> >> >> > > timestamp=1485129654025, value=4.99
>>>> >>> >> >> > >  1
>>>>  column=pay:customer_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=1
>>>> >>> >> >> > >  1
>>>>  column=pay:last_update,
>>>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>>> >>> >> >> > >  1
>>>>  column=pay:payment_date,
>>>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>>> >>> >> >> > >  1
>>>>  column=pay:staff_id,
>>>> >>> >> >> > > timestamp=1485129654025, value=1
>>>> >>> >> >> > >  10                                    column=pay:amount,
>>>> >>> >> >> > > timestamp=1485129504390, value=5.99
>>>> >>> >> >> > >  10
>>>> column=pay:customer_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=1
>>>> >>> >> >> > >  10
>>>> column=pay:last_update,
>>>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>>> >>> >> >> > >  10
>>>> column=pay:payment_date,
>>>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>>> >>> >> >> > >  10
>>>> column=pay:rental_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=126
>>>> >>> >> >> > >  10
>>>> column=pay:staff_id,
>>>> >>> >> >> > > timestamp=1485129504390, value=2
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> > > Please let me know if anything required further.
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> > > Thanks,
>>>> >>> >> >> > > Jilani
>>>> >>> >> >> > >
>>>> >>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>>>> >>> >> >> > > liz.szilagyi@cloudera.com> wrote:
>>>> >>> >> >> > >
>>>> >>> >> >> > >> Hi Jilani,
>>>> >>> >> >> > >> I'm not sure I completely understand what you are trying
>>>> to
>>>> >>> do.
>>>> >>> >> Could
>>>> >>> >> >> > you
>>>> >>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of
>>>> >>> example
>>>> >>> >> data
>>>> >>> >> >> > >> showing the changes that happen compared to the changes
>>>> you'd
>>>> >>> >> like to
>>>> >>> >> >> > see?
>>>> >>> >> >> > >> Thanks,
>>>> >>> >> >> > >> Liz
>>>> >>> >> >> > >>
>>>> >>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>>>> >>> >> jilani2423@gmail.com>
>>>> >>> >> >> > >> wrote:
>>>> >>> >> >> > >>
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >> > Please help in resolving the issue, I am going through
>>>> >>> source
>>>> >>> >> code
>>>> >>> >> >> > some
>>>> >>> >> >> > >> > how the required nature is missing, But not sure is it
>>>> for
>>>> >>> some
>>>> >>> >> >> reason
>>>> >>> >> >> > >> we
>>>> >>> >> >> > >> > avoided this nature.
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >> > Provide me some suggestions how to go with this
>>>> scenario.
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >> > Thanks,
>>>> >>> >> >> > >> > Jilani
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>>>> >>> >> >> jilani2423@gmail.com>
>>>> >>> >> >> > >> > wrote:
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >> >> Hi,
>>>> >>> >> >> > >> >>
>>>> >>> >> >> > >> >> We have a scenario where we are importing data into
>>>> HBase
>>>> >>> with
>>>> >>> >> >> sqoop
>>>> >>> >> >> > >> >> incremental import.
>>>> >>> >> >> > >> >>
>>>> >>> >> >> > >> >> Lets say we imported a table and later source table
>>>> got
>>>> >>> updated
>>>> >>> >> >> for
>>>> >>> >> >> > >> some
>>>> >>> >> >> > >> >> columns as null values for some rows. Then while doing
>>>> >>> >> incremental
>>>> >>> >> >> > >> import
>>>> >>> >> >> > >> >> as per HBase these columns should not be there in
>>>> HBase
>>>> >>> table.
>>>> >>> >> But
>>>> >>> >> >> > >> right
>>>> >>> >> >> > >> >> now these columns will be as it is available with
>>>> previous
>>>> >>> >> values.
>>>> >>> >> >> > >> >>
>>>> >>> >> >> > >> >> Is there any fix to overcome this issue?
>>>> >>> >> >> > >> >>
>>>> >>> >> >> > >> >>
>>>> >>> >> >> > >> >> Thanks,
>>>> >>> >> >> > >> >> Jilani
>>>> >>> >> >> > >> >>
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >> >
>>>> >>> >> >> > >>
>>>> >>> >> >> > >
>>>> >>> >> >> > >
>>>> >>> >> >> >
>>>> >>> >> >>
>>>> >>> >> >
>>>> >>> >> >
>>>> >>> >>
>>>> >>> >
>>>> >>> >
>>>> >>>
>>>> >>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Boglarka Egyed <bo...@cloudera.com>.
Hi Jilani,

Information of a JIRA creation/modification are sent to dev@ so you don't
need to share the JIRA here again. Also, if you have set "Sqoop" at the
ReviewBoard as a group to review people in that group will receive a
notification about it.

I can see that you have created a JIRA
<https://issues.apache.org/jira/browse/SQOOP-3149> but I don't see the
patch file attached nor a link to the review board. Could you please add
them as I have described it in one of my previous email?

After you have attached the patch file and filed a review request (linked
to the JIRA so that people can find your change to review) the community
will review it which will take a while.

Thanks,
Bogi

On Fri, Mar 10, 2017 at 7:08 PM, Jilani Shaik <ji...@gmail.com> wrote:

> Hi Bogi,
>
> Do I need to share the JIRA and review ticket etc here again
> or
> Once I create a JIRA and review ticket and submit the detail, workflow
> will follows from there like validating, comment/suggestion etc.
>
> Thanks,
> Jilani
>
> On Fri, Mar 10, 2017 at 7:57 AM, Jilani Shaik <ji...@gmail.com>
> wrote:
>
>> Yes you are correct, I am running from eclipse. Will run from command
>> line.
>>
>> Sent from my iPhone
>>
>> On Mar 10, 2017, at 6:48 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>>
>> Hi Jilani,
>>
>> Please try to execute "ant compile" and then "ant test" from command
>> line, it will run unit tests. If I understood you well, you tried run tests
>> from Eclipse which won't work.
>>
>> Thanks,
>> Bogi
>>
>> On Fri, Mar 10, 2017 at 6:10 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>>> Hi Bogi,
>>>
>>> Thanks for the providing direction.
>>>
>>> As you suggested I explored further and resolved the issue and able to
>>> test
>>> the fix on trunk based code changes in my hadoop cluster.
>>>
>>> Root cause for my issue:
>>> 1.4.6 code base using the same avro version which is there in my hadoop
>>> cluster so there is no issue for that jar component, whereas trunk code
>>> base using the avro-1.8.1 jar files, which is not available in my hadoop
>>> cluster.
>>>
>>> Can you suggest how to do unit test etc for this component.
>>>
>>> I tried with "test" target, I am getting all as failed as below.
>>>
>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.415 sec
>>>     [junit] Running com.cloudera.sqoop.TestDirectImport
>>>     [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time
>>> elapsed:
>>> 13.705 sec
>>>     [junit] Test com.cloudera.sqoop.TestDirectImport FAILED
>>>     [junit] Running com.cloudera.sqoop.TestExport
>>>     [junit] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time
>>> elapsed: 22.564 sec
>>>     [junit] Test com.cloudera.sqoop.TestExport FAILED
>>>     [junit] Running com.cloudera.sqoop.TestExportUpdate
>>>
>>> Do I need to do any changes? I am running from eclipse with "test"
>>> target.
>>>
>>> Thanks,
>>> Jilani
>>>
>>>
>>> On Thu, Mar 9, 2017 at 9:42 PM, Jilani Shaik <ji...@gmail.com>
>>> wrote:
>>>
>>> > Hi Bogi,
>>> >
>>> > - Prepared jar using trunk with "jar-all" target
>>> >
>>> > - Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/
>>> >
>>> > - Moved out existing jar to some other location
>>> >
>>> > - then execute the below command to do import
>>> > sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
>>> > --username test --password test123$ --table payment -m 2 --hbase-table
>>> > /database/demoapp/hbase/payment --column-family pay --hbase-row-key
>>> > payment_id --incremental lastmodified --merge-key payment_id
>>> --check-column
>>> > last_update --last-value '2017-01-08 08:02:05.0'
>>> >
>>> >
>>> > The same steps I followed for both the jar from trunk code vs 1.4.6
>>> branch
>>> > code.
>>> >
>>> > Where are you suggesting the multiple avro jars, is it at the time of
>>> jar
>>> > preparation or running the command using the jar.
>>> >
>>> >
>>> > Thanks,
>>> > Jilani
>>> >
>>> > On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com>
>>> wrote:
>>> >
>>> >> Hi Jilani,
>>> >>
>>> >> I suspect that you have an old version of Avro or even multiple Avro
>>> >> versions on your classpath and thus Sqoop uses an older one.
>>> >>
>>> >> Could you please provide a list of the exact commands you have
>>> performed
>>> >> so that I can reproduce the issue?
>>> >>
>>> >> Thanks,
>>> >> Bogi
>>> >>
>>> >> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com>
>>> >> wrote:
>>> >>
>>> >>> Can some one provide me the pointers what am I missing with trunk vs
>>> >>> 1.4.6
>>> >>> builds, which is giving some error as mentioned in below mail chain.
>>> >>>
>>> >>> I did followed the same ant target to prepare jar for both branches,
>>> but
>>> >>> even though 1.4.6 jar is different to 1.4.7 which is created form
>>> trunk.
>>> >>>
>>> >>> Thanks,
>>> >>> Jilani
>>> >>>
>>> >>>
>>> >>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>> > Hi Bogi,
>>> >>> >
>>> >>> > I am getting below error, when I have prepared jar from trunk and
>>> try
>>> >>> to
>>> >>> > do sqoop import with mysql database table and got below exception,
>>> >>> where as
>>> >>> > similar changes are working with branch 1.4.6.
>>> >>> >
>>> >>> >
>>> >>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
>>> >>> 1.4.7-SNAPSHOT
>>> >>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>>> >>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password
>>> on the
>>> >>> > command-line is insecure. Consider using -P instead.
>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>> >>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data
>>> Connector for
>>> >>> > Oracle and Hadoop can be called by Sqoop!
>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>> >>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
>>> >>> scheme:
>>> >>> > jdbc:mysql:
>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>> >>> > org/apache/avro/LogicalType
>>> >>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
>>> >>> > DefaultManagerFactory.java:67)
>>> >>> >         at org.apache.sqoop.ConnFactory.g
>>> >>> etManager(ConnFactory.java:184)
>>> >>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
>>> >>> > BaseSqoopTool.java:270)
>>> >>> >         at org.apache.sqoop.tool.ImportTo
>>> ol.init(ImportTool.java:97)
>>> >>> >         at org.apache.sqoop.tool.ImportTo
>>> ol.run(ImportTool.java:617)
>>> >>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>>> >>> >         at org.apache.hadoop.util.ToolRun
>>> ner.run(ToolRunner.java:70)
>>> >>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>>> >>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
>>> >>> > Caused by: java.lang.ClassNotFoundException:
>>> >>> org.apache.avro.LogicalType
>>> >>> >         at java.net.URLClassLoader.findCl
>>> ass(URLClassLoader.java:381)
>>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> >>> >         at sun.misc.Launcher$AppClassLoad
>>> >>> er.loadClass(Launcher.java:331)
>>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> >>> >         ... 11 more
>>> >>> >
>>> >>> > Please let me know what is missing and how to resolve this
>>> exception,
>>> >>> Let
>>> >>> > me know if you need further details.
>>> >>> >
>>> >>> > Thanks,
>>> >>> > Jilani
>>> >>> >
>>> >>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
>>> >>> wrote:
>>> >>> >
>>> >>> >> Hi Jilani,
>>> >>> >>
>>> >>> >> This is an example: SQOOP-3053
>>> >>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the
>>> review
>>> >>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
>>> >>> changes on
>>> >>> >> trunk as it will be used to cut the future release so your patch
>>> >>> >> definitely
>>> >>> >> needs to be be able to apply on it.
>>> >>> >>
>>> >>> >> Thanks,
>>> >>> >> Bogi
>>> >>> >>
>>> >>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <
>>> jilani2423@gmail.com>
>>> >>> >> wrote:
>>> >>> >>
>>> >>> >> > Hi Bogi,
>>> >>> >> >
>>> >>> >> > Can you provide me sample Jira tickets and Review requests
>>> similar
>>> >>> to
>>> >>> >> > this, to proceed further.
>>> >>> >> >
>>> >>> >> > I applied the code changes from sqoop git from this branch
>>> >>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will
>>> take
>>> >>> the
>>> >>> >> code
>>> >>> >> > from there and apply the changes before submit review for
>>> request.
>>> >>> >> >
>>> >>> >> > Thanks,
>>> >>> >> > Jilani
>>> >>> >> >
>>> >>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <
>>> bogi@cloudera.com>
>>> >>> >> wrote:
>>> >>> >> >
>>> >>> >> >> Hi Jilani,
>>> >>> >> >>
>>> >>> >> >> To get your change committed please do the following:
>>> >>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
>>> >>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project
>>> Sqoop
>>> >>> >> >> * Create a review request at Apache's review board
>>> >>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it
>>> to
>>> >>> the
>>> >>> >> JIRA
>>> >>> >> >>
>>> >>> >> >> ticket
>>> >>> >> >>
>>> >>> >> >> Please consider the guidelines below:
>>> >>> >> >>
>>> >>> >> >> Review board
>>> >>> >> >> * Summary: generate your summary using the issue's jira key +
>>> jira
>>> >>> >> title
>>> >>> >> >> * Groups: add the relevant group so everyone on the project
>>> will
>>> >>> know
>>> >>> >> >> about
>>> >>> >> >> your patch (Sqoop)
>>> >>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to
>>> the
>>> >>> jira
>>> >>> >> side
>>> >>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>>> >>> >> >> * And as soon as the patch gets committed, it's very useful
>>> for the
>>> >>> >> >> community if you close the review and mark it as "Submitted"
>>> at the
>>> >>> >> Review
>>> >>> >> >> board. The button to do this is top right at your own tickets,
>>> >>> right
>>> >>> >> next
>>> >>> >> >> to  the Download Diff button.
>>> >>> >> >>
>>> >>> >> >> Jira
>>> >>> >> >> * Link: please add the link of the review as an external/web
>>> link
>>> >>> so
>>> >>> >> it's
>>> >>> >> >> easy to navigate to the reviews side
>>> >>> >> >> * Status: mark it as "patch available"
>>> >>> >> >>
>>> >>> >> >> Sqoop community will receive emails about your new ticket and
>>> >>> review
>>> >>> >> >> request and will review your change.
>>> >>> >> >>
>>> >>> >> >> Thanks,
>>> >>> >> >> Bogi
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <
>>> >>> jilani2423@gmail.com>
>>> >>> >> >> wrote:
>>> >>> >> >>
>>> >>> >> >> > Do we have any update?
>>> >>> >> >> >
>>> >>> >> >> > I did checkout of the 1.4.6 code and done code changes to
>>> achieve
>>> >>> >> this
>>> >>> >> >> and
>>> >>> >> >> > tested in cluster and it is working as expected. Is there a
>>> way
>>> >>> I can
>>> >>> >> >> > contribute this as a patch and then the committers can
>>> validate
>>> >>> >> further
>>> >>> >> >> and
>>> >>> >> >> > suggest if any changes required to move further. Please
>>> suggest
>>> >>> the
>>> >>> >> >> > approach.
>>> >>> >> >> >
>>> >>> >> >> > Thanks,
>>> >>> >> >> > Jilani
>>> >>> >> >> >
>>> >>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
>>> >>> jilani2423@gmail.com>
>>> >>> >> >> > wrote:
>>> >>> >> >> >
>>> >>> >> >> > > Hi Liz,
>>> >>> >> >> > >
>>> >>> >> >> > > lets say we inserted data in a table with initial import,
>>> that
>>> >>> >> looks
>>> >>> >> >> like
>>> >>> >> >> > > this in hbase shell
>>> >>> >> >> > >
>>> >>> >> >> > >  1                                     column=pay:amount,
>>> >>> >> >> > > timestamp=1485129654025, value=4.99
>>> >>> >> >> > >  1
>>>  column=pay:customer_id,
>>> >>> >> >> > > timestamp=1485129654025, value=1
>>> >>> >> >> > >  1
>>>  column=pay:last_update,
>>> >>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>>> >>> >> >> > >  1
>>>  column=pay:payment_date,
>>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >>> >> >> > >  1
>>>  column=pay:rental_id,
>>> >>> >> >> > > timestamp=1485129654025, value=573
>>> >>> >> >> > >  1                                     column=pay:staff_id,
>>> >>> >> >> > > timestamp=1485129654025, value=1
>>> >>> >> >> > >  10                                    column=pay:amount,
>>> >>> >> >> > > timestamp=1485129504390, value=5.99
>>> >>> >> >> > >  10
>>> column=pay:customer_id,
>>> >>> >> >> > > timestamp=1485129504390, value=1
>>> >>> >> >> > >  10
>>> column=pay:last_update,
>>> >>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>>> >>> >> >> > >  10
>>> column=pay:payment_date,
>>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >>> >> >> > >  10
>>> column=pay:rental_id,
>>> >>> >> >> > > timestamp=1485129504390, value=4526
>>> >>> >> >> > >  10                                    column=pay:staff_id,
>>> >>> >> >> > > timestamp=1485129504390, value=2
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> > > now assume that in source rental_id becomes NULL for rowkey
>>> >>> "1",
>>> >>> >> and
>>> >>> >> >> then
>>> >>> >> >> > > we are doing incremental import into HBase. With current
>>> >>> import the
>>> >>> >> >> final
>>> >>> >> >> > > HBase data after incremental import will look like this.
>>> >>> >> >> > >
>>> >>> >> >> > >  1                                     column=pay:amount,
>>> >>> >> >> > > timestamp=1485129654025, value=4.99
>>> >>> >> >> > >  1
>>>  column=pay:customer_id,
>>> >>> >> >> > > timestamp=1485129654025, value=1
>>> >>> >> >> > >  1
>>>  column=pay:last_update,
>>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>> >>> >> >> > >  1
>>>  column=pay:payment_date,
>>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >>> >> >> > >  1
>>>  column=pay:rental_id,
>>> >>> >> >> > > timestamp=1485129654025, value=573
>>> >>> >> >> > >  1                                     column=pay:staff_id,
>>> >>> >> >> > > timestamp=1485129654025, value=1
>>> >>> >> >> > >  10                                    column=pay:amount,
>>> >>> >> >> > > timestamp=1485129504390, value=5.99
>>> >>> >> >> > >  10
>>> column=pay:customer_id,
>>> >>> >> >> > > timestamp=1485129504390, value=1
>>> >>> >> >> > >  10
>>> column=pay:last_update,
>>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>> >>> >> >> > >  10
>>> column=pay:payment_date,
>>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >>> >> >> > >  10
>>> column=pay:rental_id,
>>> >>> >> >> > > timestamp=1485129504390, value=126
>>> >>> >> >> > >  10                                    column=pay:staff_id,
>>> >>> >> >> > > timestamp=1485129504390, value=2
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1",
>>> the
>>> >>> final
>>> >>> >> >> HBase
>>> >>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
>>> >>> expecting
>>> >>> >> >> below
>>> >>> >> >> > > data for these rowkeys.
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> > >  1                                     column=pay:amount,
>>> >>> >> >> > > timestamp=1485129654025, value=4.99
>>> >>> >> >> > >  1
>>>  column=pay:customer_id,
>>> >>> >> >> > > timestamp=1485129654025, value=1
>>> >>> >> >> > >  1
>>>  column=pay:last_update,
>>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>> >>> >> >> > >  1
>>>  column=pay:payment_date,
>>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >>> >> >> > >  1                                     column=pay:staff_id,
>>> >>> >> >> > > timestamp=1485129654025, value=1
>>> >>> >> >> > >  10                                    column=pay:amount,
>>> >>> >> >> > > timestamp=1485129504390, value=5.99
>>> >>> >> >> > >  10
>>> column=pay:customer_id,
>>> >>> >> >> > > timestamp=1485129504390, value=1
>>> >>> >> >> > >  10
>>> column=pay:last_update,
>>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>> >>> >> >> > >  10
>>> column=pay:payment_date,
>>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >>> >> >> > >  10
>>> column=pay:rental_id,
>>> >>> >> >> > > timestamp=1485129504390, value=126
>>> >>> >> >> > >  10                                    column=pay:staff_id,
>>> >>> >> >> > > timestamp=1485129504390, value=2
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> > > Please let me know if anything required further.
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> > > Thanks,
>>> >>> >> >> > > Jilani
>>> >>> >> >> > >
>>> >>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>>> >>> >> >> > > liz.szilagyi@cloudera.com> wrote:
>>> >>> >> >> > >
>>> >>> >> >> > >> Hi Jilani,
>>> >>> >> >> > >> I'm not sure I completely understand what you are trying
>>> to
>>> >>> do.
>>> >>> >> Could
>>> >>> >> >> > you
>>> >>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of
>>> >>> example
>>> >>> >> data
>>> >>> >> >> > >> showing the changes that happen compared to the changes
>>> you'd
>>> >>> >> like to
>>> >>> >> >> > see?
>>> >>> >> >> > >> Thanks,
>>> >>> >> >> > >> Liz
>>> >>> >> >> > >>
>>> >>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>>> >>> >> jilani2423@gmail.com>
>>> >>> >> >> > >> wrote:
>>> >>> >> >> > >>
>>> >>> >> >> > >> >
>>> >>> >> >> > >> > Please help in resolving the issue, I am going through
>>> >>> source
>>> >>> >> code
>>> >>> >> >> > some
>>> >>> >> >> > >> > how the required nature is missing, But not sure is it
>>> for
>>> >>> some
>>> >>> >> >> reason
>>> >>> >> >> > >> we
>>> >>> >> >> > >> > avoided this nature.
>>> >>> >> >> > >> >
>>> >>> >> >> > >> > Provide me some suggestions how to go with this
>>> scenario.
>>> >>> >> >> > >> >
>>> >>> >> >> > >> > Thanks,
>>> >>> >> >> > >> > Jilani
>>> >>> >> >> > >> >
>>> >>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>>> >>> >> >> jilani2423@gmail.com>
>>> >>> >> >> > >> > wrote:
>>> >>> >> >> > >> >
>>> >>> >> >> > >> >> Hi,
>>> >>> >> >> > >> >>
>>> >>> >> >> > >> >> We have a scenario where we are importing data into
>>> HBase
>>> >>> with
>>> >>> >> >> sqoop
>>> >>> >> >> > >> >> incremental import.
>>> >>> >> >> > >> >>
>>> >>> >> >> > >> >> Lets say we imported a table and later source table got
>>> >>> updated
>>> >>> >> >> for
>>> >>> >> >> > >> some
>>> >>> >> >> > >> >> columns as null values for some rows. Then while doing
>>> >>> >> incremental
>>> >>> >> >> > >> import
>>> >>> >> >> > >> >> as per HBase these columns should not be there in HBase
>>> >>> table.
>>> >>> >> But
>>> >>> >> >> > >> right
>>> >>> >> >> > >> >> now these columns will be as it is available with
>>> previous
>>> >>> >> values.
>>> >>> >> >> > >> >>
>>> >>> >> >> > >> >> Is there any fix to overcome this issue?
>>> >>> >> >> > >> >>
>>> >>> >> >> > >> >>
>>> >>> >> >> > >> >> Thanks,
>>> >>> >> >> > >> >> Jilani
>>> >>> >> >> > >> >>
>>> >>> >> >> > >> >
>>> >>> >> >> > >> >
>>> >>> >> >> > >>
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> >
>>> >>> >> >>
>>> >>> >> >
>>> >>> >> >
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>
>>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Boglarka Egyed <bo...@cloudera.com>.
Hi Jilani,

Information of a JIRA creation/modification are sent to dev@ so you don't
need to share the JIRA here again. Also, if you have set "Sqoop" at the
ReviewBoard as a group to review people in that group will receive a
notification about it.

I can see that you have created a JIRA
<https://issues.apache.org/jira/browse/SQOOP-3149> but I don't see the
patch file attached nor a link to the review board. Could you please add
them as I have described it in one of my previous email?

After you have attached the patch file and filed a review request (linked
to the JIRA so that people can find your change to review) the community
will review it which will take a while.

Thanks,
Bogi

On Fri, Mar 10, 2017 at 7:08 PM, Jilani Shaik <ji...@gmail.com> wrote:

> Hi Bogi,
>
> Do I need to share the JIRA and review ticket etc here again
> or
> Once I create a JIRA and review ticket and submit the detail, workflow
> will follows from there like validating, comment/suggestion etc.
>
> Thanks,
> Jilani
>
> On Fri, Mar 10, 2017 at 7:57 AM, Jilani Shaik <ji...@gmail.com>
> wrote:
>
>> Yes you are correct, I am running from eclipse. Will run from command
>> line.
>>
>> Sent from my iPhone
>>
>> On Mar 10, 2017, at 6:48 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>>
>> Hi Jilani,
>>
>> Please try to execute "ant compile" and then "ant test" from command
>> line, it will run unit tests. If I understood you well, you tried run tests
>> from Eclipse which won't work.
>>
>> Thanks,
>> Bogi
>>
>> On Fri, Mar 10, 2017 at 6:10 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>>> Hi Bogi,
>>>
>>> Thanks for the providing direction.
>>>
>>> As you suggested I explored further and resolved the issue and able to
>>> test
>>> the fix on trunk based code changes in my hadoop cluster.
>>>
>>> Root cause for my issue:
>>> 1.4.6 code base using the same avro version which is there in my hadoop
>>> cluster so there is no issue for that jar component, whereas trunk code
>>> base using the avro-1.8.1 jar files, which is not available in my hadoop
>>> cluster.
>>>
>>> Can you suggest how to do unit test etc for this component.
>>>
>>> I tried with "test" target, I am getting all as failed as below.
>>>
>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.415 sec
>>>     [junit] Running com.cloudera.sqoop.TestDirectImport
>>>     [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time
>>> elapsed:
>>> 13.705 sec
>>>     [junit] Test com.cloudera.sqoop.TestDirectImport FAILED
>>>     [junit] Running com.cloudera.sqoop.TestExport
>>>     [junit] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time
>>> elapsed: 22.564 sec
>>>     [junit] Test com.cloudera.sqoop.TestExport FAILED
>>>     [junit] Running com.cloudera.sqoop.TestExportUpdate
>>>
>>> Do I need to do any changes? I am running from eclipse with "test"
>>> target.
>>>
>>> Thanks,
>>> Jilani
>>>
>>>
>>> On Thu, Mar 9, 2017 at 9:42 PM, Jilani Shaik <ji...@gmail.com>
>>> wrote:
>>>
>>> > Hi Bogi,
>>> >
>>> > - Prepared jar using trunk with "jar-all" target
>>> >
>>> > - Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/
>>> >
>>> > - Moved out existing jar to some other location
>>> >
>>> > - then execute the below command to do import
>>> > sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
>>> > --username test --password test123$ --table payment -m 2 --hbase-table
>>> > /database/demoapp/hbase/payment --column-family pay --hbase-row-key
>>> > payment_id --incremental lastmodified --merge-key payment_id
>>> --check-column
>>> > last_update --last-value '2017-01-08 08:02:05.0'
>>> >
>>> >
>>> > The same steps I followed for both the jar from trunk code vs 1.4.6
>>> branch
>>> > code.
>>> >
>>> > Where are you suggesting the multiple avro jars, is it at the time of
>>> jar
>>> > preparation or running the command using the jar.
>>> >
>>> >
>>> > Thanks,
>>> > Jilani
>>> >
>>> > On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com>
>>> wrote:
>>> >
>>> >> Hi Jilani,
>>> >>
>>> >> I suspect that you have an old version of Avro or even multiple Avro
>>> >> versions on your classpath and thus Sqoop uses an older one.
>>> >>
>>> >> Could you please provide a list of the exact commands you have
>>> performed
>>> >> so that I can reproduce the issue?
>>> >>
>>> >> Thanks,
>>> >> Bogi
>>> >>
>>> >> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com>
>>> >> wrote:
>>> >>
>>> >>> Can some one provide me the pointers what am I missing with trunk vs
>>> >>> 1.4.6
>>> >>> builds, which is giving some error as mentioned in below mail chain.
>>> >>>
>>> >>> I did followed the same ant target to prepare jar for both branches,
>>> but
>>> >>> even though 1.4.6 jar is different to 1.4.7 which is created form
>>> trunk.
>>> >>>
>>> >>> Thanks,
>>> >>> Jilani
>>> >>>
>>> >>>
>>> >>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>> > Hi Bogi,
>>> >>> >
>>> >>> > I am getting below error, when I have prepared jar from trunk and
>>> try
>>> >>> to
>>> >>> > do sqoop import with mysql database table and got below exception,
>>> >>> where as
>>> >>> > similar changes are working with branch 1.4.6.
>>> >>> >
>>> >>> >
>>> >>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
>>> >>> 1.4.7-SNAPSHOT
>>> >>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>>> >>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password
>>> on the
>>> >>> > command-line is insecure. Consider using -P instead.
>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>> >>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data
>>> Connector for
>>> >>> > Oracle and Hadoop can be called by Sqoop!
>>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>> >>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
>>> >>> scheme:
>>> >>> > jdbc:mysql:
>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>> >>> > org/apache/avro/LogicalType
>>> >>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
>>> >>> > DefaultManagerFactory.java:67)
>>> >>> >         at org.apache.sqoop.ConnFactory.g
>>> >>> etManager(ConnFactory.java:184)
>>> >>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
>>> >>> > BaseSqoopTool.java:270)
>>> >>> >         at org.apache.sqoop.tool.ImportTo
>>> ol.init(ImportTool.java:97)
>>> >>> >         at org.apache.sqoop.tool.ImportTo
>>> ol.run(ImportTool.java:617)
>>> >>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>>> >>> >         at org.apache.hadoop.util.ToolRun
>>> ner.run(ToolRunner.java:70)
>>> >>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>>> >>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
>>> >>> > Caused by: java.lang.ClassNotFoundException:
>>> >>> org.apache.avro.LogicalType
>>> >>> >         at java.net.URLClassLoader.findCl
>>> ass(URLClassLoader.java:381)
>>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> >>> >         at sun.misc.Launcher$AppClassLoad
>>> >>> er.loadClass(Launcher.java:331)
>>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> >>> >         ... 11 more
>>> >>> >
>>> >>> > Please let me know what is missing and how to resolve this
>>> exception,
>>> >>> Let
>>> >>> > me know if you need further details.
>>> >>> >
>>> >>> > Thanks,
>>> >>> > Jilani
>>> >>> >
>>> >>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
>>> >>> wrote:
>>> >>> >
>>> >>> >> Hi Jilani,
>>> >>> >>
>>> >>> >> This is an example: SQOOP-3053
>>> >>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the
>>> review
>>> >>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
>>> >>> changes on
>>> >>> >> trunk as it will be used to cut the future release so your patch
>>> >>> >> definitely
>>> >>> >> needs to be be able to apply on it.
>>> >>> >>
>>> >>> >> Thanks,
>>> >>> >> Bogi
>>> >>> >>
>>> >>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <
>>> jilani2423@gmail.com>
>>> >>> >> wrote:
>>> >>> >>
>>> >>> >> > Hi Bogi,
>>> >>> >> >
>>> >>> >> > Can you provide me sample Jira tickets and Review requests
>>> similar
>>> >>> to
>>> >>> >> > this, to proceed further.
>>> >>> >> >
>>> >>> >> > I applied the code changes from sqoop git from this branch
>>> >>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will
>>> take
>>> >>> the
>>> >>> >> code
>>> >>> >> > from there and apply the changes before submit review for
>>> request.
>>> >>> >> >
>>> >>> >> > Thanks,
>>> >>> >> > Jilani
>>> >>> >> >
>>> >>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <
>>> bogi@cloudera.com>
>>> >>> >> wrote:
>>> >>> >> >
>>> >>> >> >> Hi Jilani,
>>> >>> >> >>
>>> >>> >> >> To get your change committed please do the following:
>>> >>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
>>> >>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project
>>> Sqoop
>>> >>> >> >> * Create a review request at Apache's review board
>>> >>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it
>>> to
>>> >>> the
>>> >>> >> JIRA
>>> >>> >> >>
>>> >>> >> >> ticket
>>> >>> >> >>
>>> >>> >> >> Please consider the guidelines below:
>>> >>> >> >>
>>> >>> >> >> Review board
>>> >>> >> >> * Summary: generate your summary using the issue's jira key +
>>> jira
>>> >>> >> title
>>> >>> >> >> * Groups: add the relevant group so everyone on the project
>>> will
>>> >>> know
>>> >>> >> >> about
>>> >>> >> >> your patch (Sqoop)
>>> >>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to
>>> the
>>> >>> jira
>>> >>> >> side
>>> >>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>>> >>> >> >> * And as soon as the patch gets committed, it's very useful
>>> for the
>>> >>> >> >> community if you close the review and mark it as "Submitted"
>>> at the
>>> >>> >> Review
>>> >>> >> >> board. The button to do this is top right at your own tickets,
>>> >>> right
>>> >>> >> next
>>> >>> >> >> to  the Download Diff button.
>>> >>> >> >>
>>> >>> >> >> Jira
>>> >>> >> >> * Link: please add the link of the review as an external/web
>>> link
>>> >>> so
>>> >>> >> it's
>>> >>> >> >> easy to navigate to the reviews side
>>> >>> >> >> * Status: mark it as "patch available"
>>> >>> >> >>
>>> >>> >> >> Sqoop community will receive emails about your new ticket and
>>> >>> review
>>> >>> >> >> request and will review your change.
>>> >>> >> >>
>>> >>> >> >> Thanks,
>>> >>> >> >> Bogi
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <
>>> >>> jilani2423@gmail.com>
>>> >>> >> >> wrote:
>>> >>> >> >>
>>> >>> >> >> > Do we have any update?
>>> >>> >> >> >
>>> >>> >> >> > I did checkout of the 1.4.6 code and done code changes to
>>> achieve
>>> >>> >> this
>>> >>> >> >> and
>>> >>> >> >> > tested in cluster and it is working as expected. Is there a
>>> way
>>> >>> I can
>>> >>> >> >> > contribute this as a patch and then the committers can
>>> validate
>>> >>> >> further
>>> >>> >> >> and
>>> >>> >> >> > suggest if any changes required to move further. Please
>>> suggest
>>> >>> the
>>> >>> >> >> > approach.
>>> >>> >> >> >
>>> >>> >> >> > Thanks,
>>> >>> >> >> > Jilani
>>> >>> >> >> >
>>> >>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
>>> >>> jilani2423@gmail.com>
>>> >>> >> >> > wrote:
>>> >>> >> >> >
>>> >>> >> >> > > Hi Liz,
>>> >>> >> >> > >
>>> >>> >> >> > > lets say we inserted data in a table with initial import,
>>> that
>>> >>> >> looks
>>> >>> >> >> like
>>> >>> >> >> > > this in hbase shell
>>> >>> >> >> > >
>>> >>> >> >> > >  1                                     column=pay:amount,
>>> >>> >> >> > > timestamp=1485129654025, value=4.99
>>> >>> >> >> > >  1
>>>  column=pay:customer_id,
>>> >>> >> >> > > timestamp=1485129654025, value=1
>>> >>> >> >> > >  1
>>>  column=pay:last_update,
>>> >>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>>> >>> >> >> > >  1
>>>  column=pay:payment_date,
>>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >>> >> >> > >  1
>>>  column=pay:rental_id,
>>> >>> >> >> > > timestamp=1485129654025, value=573
>>> >>> >> >> > >  1                                     column=pay:staff_id,
>>> >>> >> >> > > timestamp=1485129654025, value=1
>>> >>> >> >> > >  10                                    column=pay:amount,
>>> >>> >> >> > > timestamp=1485129504390, value=5.99
>>> >>> >> >> > >  10
>>> column=pay:customer_id,
>>> >>> >> >> > > timestamp=1485129504390, value=1
>>> >>> >> >> > >  10
>>> column=pay:last_update,
>>> >>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>>> >>> >> >> > >  10
>>> column=pay:payment_date,
>>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >>> >> >> > >  10
>>> column=pay:rental_id,
>>> >>> >> >> > > timestamp=1485129504390, value=4526
>>> >>> >> >> > >  10                                    column=pay:staff_id,
>>> >>> >> >> > > timestamp=1485129504390, value=2
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> > > now assume that in source rental_id becomes NULL for rowkey
>>> >>> "1",
>>> >>> >> and
>>> >>> >> >> then
>>> >>> >> >> > > we are doing incremental import into HBase. With current
>>> >>> import the
>>> >>> >> >> final
>>> >>> >> >> > > HBase data after incremental import will look like this.
>>> >>> >> >> > >
>>> >>> >> >> > >  1                                     column=pay:amount,
>>> >>> >> >> > > timestamp=1485129654025, value=4.99
>>> >>> >> >> > >  1
>>>  column=pay:customer_id,
>>> >>> >> >> > > timestamp=1485129654025, value=1
>>> >>> >> >> > >  1
>>>  column=pay:last_update,
>>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>> >>> >> >> > >  1
>>>  column=pay:payment_date,
>>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >>> >> >> > >  1
>>>  column=pay:rental_id,
>>> >>> >> >> > > timestamp=1485129654025, value=573
>>> >>> >> >> > >  1                                     column=pay:staff_id,
>>> >>> >> >> > > timestamp=1485129654025, value=1
>>> >>> >> >> > >  10                                    column=pay:amount,
>>> >>> >> >> > > timestamp=1485129504390, value=5.99
>>> >>> >> >> > >  10
>>> column=pay:customer_id,
>>> >>> >> >> > > timestamp=1485129504390, value=1
>>> >>> >> >> > >  10
>>> column=pay:last_update,
>>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>> >>> >> >> > >  10
>>> column=pay:payment_date,
>>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >>> >> >> > >  10
>>> column=pay:rental_id,
>>> >>> >> >> > > timestamp=1485129504390, value=126
>>> >>> >> >> > >  10                                    column=pay:staff_id,
>>> >>> >> >> > > timestamp=1485129504390, value=2
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1",
>>> the
>>> >>> final
>>> >>> >> >> HBase
>>> >>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
>>> >>> expecting
>>> >>> >> >> below
>>> >>> >> >> > > data for these rowkeys.
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> > >  1                                     column=pay:amount,
>>> >>> >> >> > > timestamp=1485129654025, value=4.99
>>> >>> >> >> > >  1
>>>  column=pay:customer_id,
>>> >>> >> >> > > timestamp=1485129654025, value=1
>>> >>> >> >> > >  1
>>>  column=pay:last_update,
>>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>> >>> >> >> > >  1
>>>  column=pay:payment_date,
>>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >>> >> >> > >  1                                     column=pay:staff_id,
>>> >>> >> >> > > timestamp=1485129654025, value=1
>>> >>> >> >> > >  10                                    column=pay:amount,
>>> >>> >> >> > > timestamp=1485129504390, value=5.99
>>> >>> >> >> > >  10
>>> column=pay:customer_id,
>>> >>> >> >> > > timestamp=1485129504390, value=1
>>> >>> >> >> > >  10
>>> column=pay:last_update,
>>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>> >>> >> >> > >  10
>>> column=pay:payment_date,
>>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >>> >> >> > >  10
>>> column=pay:rental_id,
>>> >>> >> >> > > timestamp=1485129504390, value=126
>>> >>> >> >> > >  10                                    column=pay:staff_id,
>>> >>> >> >> > > timestamp=1485129504390, value=2
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> > > Please let me know if anything required further.
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> > > Thanks,
>>> >>> >> >> > > Jilani
>>> >>> >> >> > >
>>> >>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>>> >>> >> >> > > liz.szilagyi@cloudera.com> wrote:
>>> >>> >> >> > >
>>> >>> >> >> > >> Hi Jilani,
>>> >>> >> >> > >> I'm not sure I completely understand what you are trying
>>> to
>>> >>> do.
>>> >>> >> Could
>>> >>> >> >> > you
>>> >>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of
>>> >>> example
>>> >>> >> data
>>> >>> >> >> > >> showing the changes that happen compared to the changes
>>> you'd
>>> >>> >> like to
>>> >>> >> >> > see?
>>> >>> >> >> > >> Thanks,
>>> >>> >> >> > >> Liz
>>> >>> >> >> > >>
>>> >>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>>> >>> >> jilani2423@gmail.com>
>>> >>> >> >> > >> wrote:
>>> >>> >> >> > >>
>>> >>> >> >> > >> >
>>> >>> >> >> > >> > Please help in resolving the issue, I am going through
>>> >>> source
>>> >>> >> code
>>> >>> >> >> > some
>>> >>> >> >> > >> > how the required nature is missing, But not sure is it
>>> for
>>> >>> some
>>> >>> >> >> reason
>>> >>> >> >> > >> we
>>> >>> >> >> > >> > avoided this nature.
>>> >>> >> >> > >> >
>>> >>> >> >> > >> > Provide me some suggestions how to go with this
>>> scenario.
>>> >>> >> >> > >> >
>>> >>> >> >> > >> > Thanks,
>>> >>> >> >> > >> > Jilani
>>> >>> >> >> > >> >
>>> >>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>>> >>> >> >> jilani2423@gmail.com>
>>> >>> >> >> > >> > wrote:
>>> >>> >> >> > >> >
>>> >>> >> >> > >> >> Hi,
>>> >>> >> >> > >> >>
>>> >>> >> >> > >> >> We have a scenario where we are importing data into
>>> HBase
>>> >>> with
>>> >>> >> >> sqoop
>>> >>> >> >> > >> >> incremental import.
>>> >>> >> >> > >> >>
>>> >>> >> >> > >> >> Lets say we imported a table and later source table got
>>> >>> updated
>>> >>> >> >> for
>>> >>> >> >> > >> some
>>> >>> >> >> > >> >> columns as null values for some rows. Then while doing
>>> >>> >> incremental
>>> >>> >> >> > >> import
>>> >>> >> >> > >> >> as per HBase these columns should not be there in HBase
>>> >>> table.
>>> >>> >> But
>>> >>> >> >> > >> right
>>> >>> >> >> > >> >> now these columns will be as it is available with
>>> previous
>>> >>> >> values.
>>> >>> >> >> > >> >>
>>> >>> >> >> > >> >> Is there any fix to overcome this issue?
>>> >>> >> >> > >> >>
>>> >>> >> >> > >> >>
>>> >>> >> >> > >> >> Thanks,
>>> >>> >> >> > >> >> Jilani
>>> >>> >> >> > >> >>
>>> >>> >> >> > >> >
>>> >>> >> >> > >> >
>>> >>> >> >> > >>
>>> >>> >> >> > >
>>> >>> >> >> > >
>>> >>> >> >> >
>>> >>> >> >>
>>> >>> >> >
>>> >>> >> >
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>
>>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Hi Bogi,

Do I need to share the JIRA and review ticket etc here again
or
Once I create a JIRA and review ticket and submit the detail, workflow will
follows from there like validating, comment/suggestion etc.

Thanks,
Jilani

On Fri, Mar 10, 2017 at 7:57 AM, Jilani Shaik <ji...@gmail.com> wrote:

> Yes you are correct, I am running from eclipse. Will run from command line.
>
> Sent from my iPhone
>
> On Mar 10, 2017, at 6:48 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>
> Hi Jilani,
>
> Please try to execute "ant compile" and then "ant test" from command line,
> it will run unit tests. If I understood you well, you tried run tests from
> Eclipse which won't work.
>
> Thanks,
> Bogi
>
> On Fri, Mar 10, 2017 at 6:10 AM, Jilani Shaik <ji...@gmail.com>
> wrote:
>
>> Hi Bogi,
>>
>> Thanks for the providing direction.
>>
>> As you suggested I explored further and resolved the issue and able to
>> test
>> the fix on trunk based code changes in my hadoop cluster.
>>
>> Root cause for my issue:
>> 1.4.6 code base using the same avro version which is there in my hadoop
>> cluster so there is no issue for that jar component, whereas trunk code
>> base using the avro-1.8.1 jar files, which is not available in my hadoop
>> cluster.
>>
>> Can you suggest how to do unit test etc for this component.
>>
>> I tried with "test" target, I am getting all as failed as below.
>>
>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.415 sec
>>     [junit] Running com.cloudera.sqoop.TestDirectImport
>>     [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time
>> elapsed:
>> 13.705 sec
>>     [junit] Test com.cloudera.sqoop.TestDirectImport FAILED
>>     [junit] Running com.cloudera.sqoop.TestExport
>>     [junit] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time
>> elapsed: 22.564 sec
>>     [junit] Test com.cloudera.sqoop.TestExport FAILED
>>     [junit] Running com.cloudera.sqoop.TestExportUpdate
>>
>> Do I need to do any changes? I am running from eclipse with "test" target.
>>
>> Thanks,
>> Jilani
>>
>>
>> On Thu, Mar 9, 2017 at 9:42 PM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>> > Hi Bogi,
>> >
>> > - Prepared jar using trunk with "jar-all" target
>> >
>> > - Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/
>> >
>> > - Moved out existing jar to some other location
>> >
>> > - then execute the below command to do import
>> > sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
>> > --username test --password test123$ --table payment -m 2 --hbase-table
>> > /database/demoapp/hbase/payment --column-family pay --hbase-row-key
>> > payment_id --incremental lastmodified --merge-key payment_id
>> --check-column
>> > last_update --last-value '2017-01-08 08:02:05.0'
>> >
>> >
>> > The same steps I followed for both the jar from trunk code vs 1.4.6
>> branch
>> > code.
>> >
>> > Where are you suggesting the multiple avro jars, is it at the time of
>> jar
>> > preparation or running the command using the jar.
>> >
>> >
>> > Thanks,
>> > Jilani
>> >
>> > On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com>
>> wrote:
>> >
>> >> Hi Jilani,
>> >>
>> >> I suspect that you have an old version of Avro or even multiple Avro
>> >> versions on your classpath and thus Sqoop uses an older one.
>> >>
>> >> Could you please provide a list of the exact commands you have
>> performed
>> >> so that I can reproduce the issue?
>> >>
>> >> Thanks,
>> >> Bogi
>> >>
>> >> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com>
>> >> wrote:
>> >>
>> >>> Can some one provide me the pointers what am I missing with trunk vs
>> >>> 1.4.6
>> >>> builds, which is giving some error as mentioned in below mail chain.
>> >>>
>> >>> I did followed the same ant target to prepare jar for both branches,
>> but
>> >>> even though 1.4.6 jar is different to 1.4.7 which is created form
>> trunk.
>> >>>
>> >>> Thanks,
>> >>> Jilani
>> >>>
>> >>>
>> >>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Hi Bogi,
>> >>> >
>> >>> > I am getting below error, when I have prepared jar from trunk and
>> try
>> >>> to
>> >>> > do sqoop import with mysql database table and got below exception,
>> >>> where as
>> >>> > similar changes are working with branch 1.4.6.
>> >>> >
>> >>> >
>> >>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
>> >>> 1.4.7-SNAPSHOT
>> >>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>> >>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on
>> the
>> >>> > command-line is insecure. Consider using -P instead.
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector
>> for
>> >>> > Oracle and Hadoop can be called by Sqoop!
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
>> >>> scheme:
>> >>> > jdbc:mysql:
>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> >>> > org/apache/avro/LogicalType
>> >>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
>> >>> > DefaultManagerFactory.java:67)
>> >>> >         at org.apache.sqoop.ConnFactory.g
>> >>> etManager(ConnFactory.java:184)
>> >>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
>> >>> > BaseSqoopTool.java:270)
>> >>> >         at org.apache.sqoop.tool.ImportTo
>> ol.init(ImportTool.java:97)
>> >>> >         at org.apache.sqoop.tool.ImportTo
>> ol.run(ImportTool.java:617)
>> >>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>> >>> >         at org.apache.hadoop.util.ToolRun
>> ner.run(ToolRunner.java:70)
>> >>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>> >>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
>> >>> > Caused by: java.lang.ClassNotFoundException:
>> >>> org.apache.avro.LogicalType
>> >>> >         at java.net.URLClassLoader.findCl
>> ass(URLClassLoader.java:381)
>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> >>> >         at sun.misc.Launcher$AppClassLoad
>> >>> er.loadClass(Launcher.java:331)
>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> >>> >         ... 11 more
>> >>> >
>> >>> > Please let me know what is missing and how to resolve this
>> exception,
>> >>> Let
>> >>> > me know if you need further details.
>> >>> >
>> >>> > Thanks,
>> >>> > Jilani
>> >>> >
>> >>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
>> >>> wrote:
>> >>> >
>> >>> >> Hi Jilani,
>> >>> >>
>> >>> >> This is an example: SQOOP-3053
>> >>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
>> >>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
>> >>> changes on
>> >>> >> trunk as it will be used to cut the future release so your patch
>> >>> >> definitely
>> >>> >> needs to be be able to apply on it.
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Bogi
>> >>> >>
>> >>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <jilani2423@gmail.com
>> >
>> >>> >> wrote:
>> >>> >>
>> >>> >> > Hi Bogi,
>> >>> >> >
>> >>> >> > Can you provide me sample Jira tickets and Review requests
>> similar
>> >>> to
>> >>> >> > this, to proceed further.
>> >>> >> >
>> >>> >> > I applied the code changes from sqoop git from this branch
>> >>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will
>> take
>> >>> the
>> >>> >> code
>> >>> >> > from there and apply the changes before submit review for
>> request.
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> > Jilani
>> >>> >> >
>> >>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <
>> bogi@cloudera.com>
>> >>> >> wrote:
>> >>> >> >
>> >>> >> >> Hi Jilani,
>> >>> >> >>
>> >>> >> >> To get your change committed please do the following:
>> >>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
>> >>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project
>> Sqoop
>> >>> >> >> * Create a review request at Apache's review board
>> >>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it
>> to
>> >>> the
>> >>> >> JIRA
>> >>> >> >>
>> >>> >> >> ticket
>> >>> >> >>
>> >>> >> >> Please consider the guidelines below:
>> >>> >> >>
>> >>> >> >> Review board
>> >>> >> >> * Summary: generate your summary using the issue's jira key +
>> jira
>> >>> >> title
>> >>> >> >> * Groups: add the relevant group so everyone on the project will
>> >>> know
>> >>> >> >> about
>> >>> >> >> your patch (Sqoop)
>> >>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to the
>> >>> jira
>> >>> >> side
>> >>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>> >>> >> >> * And as soon as the patch gets committed, it's very useful for
>> the
>> >>> >> >> community if you close the review and mark it as "Submitted" at
>> the
>> >>> >> Review
>> >>> >> >> board. The button to do this is top right at your own tickets,
>> >>> right
>> >>> >> next
>> >>> >> >> to  the Download Diff button.
>> >>> >> >>
>> >>> >> >> Jira
>> >>> >> >> * Link: please add the link of the review as an external/web
>> link
>> >>> so
>> >>> >> it's
>> >>> >> >> easy to navigate to the reviews side
>> >>> >> >> * Status: mark it as "patch available"
>> >>> >> >>
>> >>> >> >> Sqoop community will receive emails about your new ticket and
>> >>> review
>> >>> >> >> request and will review your change.
>> >>> >> >>
>> >>> >> >> Thanks,
>> >>> >> >> Bogi
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <
>> >>> jilani2423@gmail.com>
>> >>> >> >> wrote:
>> >>> >> >>
>> >>> >> >> > Do we have any update?
>> >>> >> >> >
>> >>> >> >> > I did checkout of the 1.4.6 code and done code changes to
>> achieve
>> >>> >> this
>> >>> >> >> and
>> >>> >> >> > tested in cluster and it is working as expected. Is there a
>> way
>> >>> I can
>> >>> >> >> > contribute this as a patch and then the committers can
>> validate
>> >>> >> further
>> >>> >> >> and
>> >>> >> >> > suggest if any changes required to move further. Please
>> suggest
>> >>> the
>> >>> >> >> > approach.
>> >>> >> >> >
>> >>> >> >> > Thanks,
>> >>> >> >> > Jilani
>> >>> >> >> >
>> >>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
>> >>> jilani2423@gmail.com>
>> >>> >> >> > wrote:
>> >>> >> >> >
>> >>> >> >> > > Hi Liz,
>> >>> >> >> > >
>> >>> >> >> > > lets say we inserted data in a table with initial import,
>> that
>> >>> >> looks
>> >>> >> >> like
>> >>> >> >> > > this in hbase shell
>> >>> >> >> > >
>> >>> >> >> > >  1                                     column=pay:amount,
>> >>> >> >> > > timestamp=1485129654025, value=4.99
>> >>> >> >> > >  1
>>  column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  1
>>  column=pay:last_update,
>> >>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>> >>> >> >> > >  1
>>  column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >>> >> >> > >  1                                     column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129654025, value=573
>> >>> >> >> > >  1                                     column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  10                                    column=pay:amount,
>> >>> >> >> > > timestamp=1485129504390, value=5.99
>> >>> >> >> > >  10
>> column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129504390, value=1
>> >>> >> >> > >  10
>> column=pay:last_update,
>> >>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>> >>> >> >> > >  10
>> column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >>> >> >> > >  10                                    column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129504390, value=4526
>> >>> >> >> > >  10                                    column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129504390, value=2
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > now assume that in source rental_id becomes NULL for rowkey
>> >>> "1",
>> >>> >> and
>> >>> >> >> then
>> >>> >> >> > > we are doing incremental import into HBase. With current
>> >>> import the
>> >>> >> >> final
>> >>> >> >> > > HBase data after incremental import will look like this.
>> >>> >> >> > >
>> >>> >> >> > >  1                                     column=pay:amount,
>> >>> >> >> > > timestamp=1485129654025, value=4.99
>> >>> >> >> > >  1
>>  column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  1
>>  column=pay:last_update,
>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >>> >> >> > >  1
>>  column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >>> >> >> > >  1                                     column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129654025, value=573
>> >>> >> >> > >  1                                     column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  10                                    column=pay:amount,
>> >>> >> >> > > timestamp=1485129504390, value=5.99
>> >>> >> >> > >  10
>> column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129504390, value=1
>> >>> >> >> > >  10
>> column=pay:last_update,
>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >>> >> >> > >  10
>> column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >>> >> >> > >  10                                    column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129504390, value=126
>> >>> >> >> > >  10                                    column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129504390, value=2
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1",
>> the
>> >>> final
>> >>> >> >> HBase
>> >>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
>> >>> expecting
>> >>> >> >> below
>> >>> >> >> > > data for these rowkeys.
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > >  1                                     column=pay:amount,
>> >>> >> >> > > timestamp=1485129654025, value=4.99
>> >>> >> >> > >  1
>>  column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  1
>>  column=pay:last_update,
>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >>> >> >> > >  1
>>  column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >>> >> >> > >  1                                     column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  10                                    column=pay:amount,
>> >>> >> >> > > timestamp=1485129504390, value=5.99
>> >>> >> >> > >  10
>> column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129504390, value=1
>> >>> >> >> > >  10
>> column=pay:last_update,
>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >>> >> >> > >  10
>> column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >>> >> >> > >  10                                    column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129504390, value=126
>> >>> >> >> > >  10                                    column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129504390, value=2
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > Please let me know if anything required further.
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > Thanks,
>> >>> >> >> > > Jilani
>> >>> >> >> > >
>> >>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>> >>> >> >> > > liz.szilagyi@cloudera.com> wrote:
>> >>> >> >> > >
>> >>> >> >> > >> Hi Jilani,
>> >>> >> >> > >> I'm not sure I completely understand what you are trying to
>> >>> do.
>> >>> >> Could
>> >>> >> >> > you
>> >>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of
>> >>> example
>> >>> >> data
>> >>> >> >> > >> showing the changes that happen compared to the changes
>> you'd
>> >>> >> like to
>> >>> >> >> > see?
>> >>> >> >> > >> Thanks,
>> >>> >> >> > >> Liz
>> >>> >> >> > >>
>> >>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>> >>> >> jilani2423@gmail.com>
>> >>> >> >> > >> wrote:
>> >>> >> >> > >>
>> >>> >> >> > >> >
>> >>> >> >> > >> > Please help in resolving the issue, I am going through
>> >>> source
>> >>> >> code
>> >>> >> >> > some
>> >>> >> >> > >> > how the required nature is missing, But not sure is it
>> for
>> >>> some
>> >>> >> >> reason
>> >>> >> >> > >> we
>> >>> >> >> > >> > avoided this nature.
>> >>> >> >> > >> >
>> >>> >> >> > >> > Provide me some suggestions how to go with this scenario.
>> >>> >> >> > >> >
>> >>> >> >> > >> > Thanks,
>> >>> >> >> > >> > Jilani
>> >>> >> >> > >> >
>> >>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>> >>> >> >> jilani2423@gmail.com>
>> >>> >> >> > >> > wrote:
>> >>> >> >> > >> >
>> >>> >> >> > >> >> Hi,
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> We have a scenario where we are importing data into
>> HBase
>> >>> with
>> >>> >> >> sqoop
>> >>> >> >> > >> >> incremental import.
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> Lets say we imported a table and later source table got
>> >>> updated
>> >>> >> >> for
>> >>> >> >> > >> some
>> >>> >> >> > >> >> columns as null values for some rows. Then while doing
>> >>> >> incremental
>> >>> >> >> > >> import
>> >>> >> >> > >> >> as per HBase these columns should not be there in HBase
>> >>> table.
>> >>> >> But
>> >>> >> >> > >> right
>> >>> >> >> > >> >> now these columns will be as it is available with
>> previous
>> >>> >> values.
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> Is there any fix to overcome this issue?
>> >>> >> >> > >> >>
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> Thanks,
>> >>> >> >> > >> >> Jilani
>> >>> >> >> > >> >>
>> >>> >> >> > >> >
>> >>> >> >> > >> >
>> >>> >> >> > >>
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> >
>> >>> >> >>
>> >>> >> >
>> >>> >> >
>> >>> >>
>> >>> >
>> >>> >
>> >>>
>> >>
>> >>
>> >
>>
>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Hi Bogi,

Do I need to share the JIRA and review ticket etc here again
or
Once I create a JIRA and review ticket and submit the detail, workflow will
follows from there like validating, comment/suggestion etc.

Thanks,
Jilani

On Fri, Mar 10, 2017 at 7:57 AM, Jilani Shaik <ji...@gmail.com> wrote:

> Yes you are correct, I am running from eclipse. Will run from command line.
>
> Sent from my iPhone
>
> On Mar 10, 2017, at 6:48 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>
> Hi Jilani,
>
> Please try to execute "ant compile" and then "ant test" from command line,
> it will run unit tests. If I understood you well, you tried run tests from
> Eclipse which won't work.
>
> Thanks,
> Bogi
>
> On Fri, Mar 10, 2017 at 6:10 AM, Jilani Shaik <ji...@gmail.com>
> wrote:
>
>> Hi Bogi,
>>
>> Thanks for the providing direction.
>>
>> As you suggested I explored further and resolved the issue and able to
>> test
>> the fix on trunk based code changes in my hadoop cluster.
>>
>> Root cause for my issue:
>> 1.4.6 code base using the same avro version which is there in my hadoop
>> cluster so there is no issue for that jar component, whereas trunk code
>> base using the avro-1.8.1 jar files, which is not available in my hadoop
>> cluster.
>>
>> Can you suggest how to do unit test etc for this component.
>>
>> I tried with "test" target, I am getting all as failed as below.
>>
>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.415 sec
>>     [junit] Running com.cloudera.sqoop.TestDirectImport
>>     [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time
>> elapsed:
>> 13.705 sec
>>     [junit] Test com.cloudera.sqoop.TestDirectImport FAILED
>>     [junit] Running com.cloudera.sqoop.TestExport
>>     [junit] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time
>> elapsed: 22.564 sec
>>     [junit] Test com.cloudera.sqoop.TestExport FAILED
>>     [junit] Running com.cloudera.sqoop.TestExportUpdate
>>
>> Do I need to do any changes? I am running from eclipse with "test" target.
>>
>> Thanks,
>> Jilani
>>
>>
>> On Thu, Mar 9, 2017 at 9:42 PM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>> > Hi Bogi,
>> >
>> > - Prepared jar using trunk with "jar-all" target
>> >
>> > - Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/
>> >
>> > - Moved out existing jar to some other location
>> >
>> > - then execute the below command to do import
>> > sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
>> > --username test --password test123$ --table payment -m 2 --hbase-table
>> > /database/demoapp/hbase/payment --column-family pay --hbase-row-key
>> > payment_id --incremental lastmodified --merge-key payment_id
>> --check-column
>> > last_update --last-value '2017-01-08 08:02:05.0'
>> >
>> >
>> > The same steps I followed for both the jar from trunk code vs 1.4.6
>> branch
>> > code.
>> >
>> > Where are you suggesting the multiple avro jars, is it at the time of
>> jar
>> > preparation or running the command using the jar.
>> >
>> >
>> > Thanks,
>> > Jilani
>> >
>> > On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com>
>> wrote:
>> >
>> >> Hi Jilani,
>> >>
>> >> I suspect that you have an old version of Avro or even multiple Avro
>> >> versions on your classpath and thus Sqoop uses an older one.
>> >>
>> >> Could you please provide a list of the exact commands you have
>> performed
>> >> so that I can reproduce the issue?
>> >>
>> >> Thanks,
>> >> Bogi
>> >>
>> >> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com>
>> >> wrote:
>> >>
>> >>> Can some one provide me the pointers what am I missing with trunk vs
>> >>> 1.4.6
>> >>> builds, which is giving some error as mentioned in below mail chain.
>> >>>
>> >>> I did followed the same ant target to prepare jar for both branches,
>> but
>> >>> even though 1.4.6 jar is different to 1.4.7 which is created form
>> trunk.
>> >>>
>> >>> Thanks,
>> >>> Jilani
>> >>>
>> >>>
>> >>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Hi Bogi,
>> >>> >
>> >>> > I am getting below error, when I have prepared jar from trunk and
>> try
>> >>> to
>> >>> > do sqoop import with mysql database table and got below exception,
>> >>> where as
>> >>> > similar changes are working with branch 1.4.6.
>> >>> >
>> >>> >
>> >>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
>> >>> 1.4.7-SNAPSHOT
>> >>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>> >>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on
>> the
>> >>> > command-line is insecure. Consider using -P instead.
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector
>> for
>> >>> > Oracle and Hadoop can be called by Sqoop!
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
>> >>> scheme:
>> >>> > jdbc:mysql:
>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> >>> > org/apache/avro/LogicalType
>> >>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
>> >>> > DefaultManagerFactory.java:67)
>> >>> >         at org.apache.sqoop.ConnFactory.g
>> >>> etManager(ConnFactory.java:184)
>> >>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
>> >>> > BaseSqoopTool.java:270)
>> >>> >         at org.apache.sqoop.tool.ImportTo
>> ol.init(ImportTool.java:97)
>> >>> >         at org.apache.sqoop.tool.ImportTo
>> ol.run(ImportTool.java:617)
>> >>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>> >>> >         at org.apache.hadoop.util.ToolRun
>> ner.run(ToolRunner.java:70)
>> >>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>> >>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
>> >>> > Caused by: java.lang.ClassNotFoundException:
>> >>> org.apache.avro.LogicalType
>> >>> >         at java.net.URLClassLoader.findCl
>> ass(URLClassLoader.java:381)
>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> >>> >         at sun.misc.Launcher$AppClassLoad
>> >>> er.loadClass(Launcher.java:331)
>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> >>> >         ... 11 more
>> >>> >
>> >>> > Please let me know what is missing and how to resolve this
>> exception,
>> >>> Let
>> >>> > me know if you need further details.
>> >>> >
>> >>> > Thanks,
>> >>> > Jilani
>> >>> >
>> >>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
>> >>> wrote:
>> >>> >
>> >>> >> Hi Jilani,
>> >>> >>
>> >>> >> This is an example: SQOOP-3053
>> >>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
>> >>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
>> >>> changes on
>> >>> >> trunk as it will be used to cut the future release so your patch
>> >>> >> definitely
>> >>> >> needs to be be able to apply on it.
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Bogi
>> >>> >>
>> >>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <jilani2423@gmail.com
>> >
>> >>> >> wrote:
>> >>> >>
>> >>> >> > Hi Bogi,
>> >>> >> >
>> >>> >> > Can you provide me sample Jira tickets and Review requests
>> similar
>> >>> to
>> >>> >> > this, to proceed further.
>> >>> >> >
>> >>> >> > I applied the code changes from sqoop git from this branch
>> >>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will
>> take
>> >>> the
>> >>> >> code
>> >>> >> > from there and apply the changes before submit review for
>> request.
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> > Jilani
>> >>> >> >
>> >>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <
>> bogi@cloudera.com>
>> >>> >> wrote:
>> >>> >> >
>> >>> >> >> Hi Jilani,
>> >>> >> >>
>> >>> >> >> To get your change committed please do the following:
>> >>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
>> >>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project
>> Sqoop
>> >>> >> >> * Create a review request at Apache's review board
>> >>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it
>> to
>> >>> the
>> >>> >> JIRA
>> >>> >> >>
>> >>> >> >> ticket
>> >>> >> >>
>> >>> >> >> Please consider the guidelines below:
>> >>> >> >>
>> >>> >> >> Review board
>> >>> >> >> * Summary: generate your summary using the issue's jira key +
>> jira
>> >>> >> title
>> >>> >> >> * Groups: add the relevant group so everyone on the project will
>> >>> know
>> >>> >> >> about
>> >>> >> >> your patch (Sqoop)
>> >>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to the
>> >>> jira
>> >>> >> side
>> >>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>> >>> >> >> * And as soon as the patch gets committed, it's very useful for
>> the
>> >>> >> >> community if you close the review and mark it as "Submitted" at
>> the
>> >>> >> Review
>> >>> >> >> board. The button to do this is top right at your own tickets,
>> >>> right
>> >>> >> next
>> >>> >> >> to  the Download Diff button.
>> >>> >> >>
>> >>> >> >> Jira
>> >>> >> >> * Link: please add the link of the review as an external/web
>> link
>> >>> so
>> >>> >> it's
>> >>> >> >> easy to navigate to the reviews side
>> >>> >> >> * Status: mark it as "patch available"
>> >>> >> >>
>> >>> >> >> Sqoop community will receive emails about your new ticket and
>> >>> review
>> >>> >> >> request and will review your change.
>> >>> >> >>
>> >>> >> >> Thanks,
>> >>> >> >> Bogi
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <
>> >>> jilani2423@gmail.com>
>> >>> >> >> wrote:
>> >>> >> >>
>> >>> >> >> > Do we have any update?
>> >>> >> >> >
>> >>> >> >> > I did checkout of the 1.4.6 code and done code changes to
>> achieve
>> >>> >> this
>> >>> >> >> and
>> >>> >> >> > tested in cluster and it is working as expected. Is there a
>> way
>> >>> I can
>> >>> >> >> > contribute this as a patch and then the committers can
>> validate
>> >>> >> further
>> >>> >> >> and
>> >>> >> >> > suggest if any changes required to move further. Please
>> suggest
>> >>> the
>> >>> >> >> > approach.
>> >>> >> >> >
>> >>> >> >> > Thanks,
>> >>> >> >> > Jilani
>> >>> >> >> >
>> >>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
>> >>> jilani2423@gmail.com>
>> >>> >> >> > wrote:
>> >>> >> >> >
>> >>> >> >> > > Hi Liz,
>> >>> >> >> > >
>> >>> >> >> > > lets say we inserted data in a table with initial import,
>> that
>> >>> >> looks
>> >>> >> >> like
>> >>> >> >> > > this in hbase shell
>> >>> >> >> > >
>> >>> >> >> > >  1                                     column=pay:amount,
>> >>> >> >> > > timestamp=1485129654025, value=4.99
>> >>> >> >> > >  1
>>  column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  1
>>  column=pay:last_update,
>> >>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>> >>> >> >> > >  1
>>  column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >>> >> >> > >  1                                     column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129654025, value=573
>> >>> >> >> > >  1                                     column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  10                                    column=pay:amount,
>> >>> >> >> > > timestamp=1485129504390, value=5.99
>> >>> >> >> > >  10
>> column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129504390, value=1
>> >>> >> >> > >  10
>> column=pay:last_update,
>> >>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>> >>> >> >> > >  10
>> column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >>> >> >> > >  10                                    column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129504390, value=4526
>> >>> >> >> > >  10                                    column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129504390, value=2
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > now assume that in source rental_id becomes NULL for rowkey
>> >>> "1",
>> >>> >> and
>> >>> >> >> then
>> >>> >> >> > > we are doing incremental import into HBase. With current
>> >>> import the
>> >>> >> >> final
>> >>> >> >> > > HBase data after incremental import will look like this.
>> >>> >> >> > >
>> >>> >> >> > >  1                                     column=pay:amount,
>> >>> >> >> > > timestamp=1485129654025, value=4.99
>> >>> >> >> > >  1
>>  column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  1
>>  column=pay:last_update,
>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >>> >> >> > >  1
>>  column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >>> >> >> > >  1                                     column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129654025, value=573
>> >>> >> >> > >  1                                     column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  10                                    column=pay:amount,
>> >>> >> >> > > timestamp=1485129504390, value=5.99
>> >>> >> >> > >  10
>> column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129504390, value=1
>> >>> >> >> > >  10
>> column=pay:last_update,
>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >>> >> >> > >  10
>> column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >>> >> >> > >  10                                    column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129504390, value=126
>> >>> >> >> > >  10                                    column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129504390, value=2
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1",
>> the
>> >>> final
>> >>> >> >> HBase
>> >>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
>> >>> expecting
>> >>> >> >> below
>> >>> >> >> > > data for these rowkeys.
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > >  1                                     column=pay:amount,
>> >>> >> >> > > timestamp=1485129654025, value=4.99
>> >>> >> >> > >  1
>>  column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  1
>>  column=pay:last_update,
>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >>> >> >> > >  1
>>  column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >>> >> >> > >  1                                     column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  10                                    column=pay:amount,
>> >>> >> >> > > timestamp=1485129504390, value=5.99
>> >>> >> >> > >  10
>> column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129504390, value=1
>> >>> >> >> > >  10
>> column=pay:last_update,
>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >>> >> >> > >  10
>> column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >>> >> >> > >  10                                    column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129504390, value=126
>> >>> >> >> > >  10                                    column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129504390, value=2
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > Please let me know if anything required further.
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > Thanks,
>> >>> >> >> > > Jilani
>> >>> >> >> > >
>> >>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>> >>> >> >> > > liz.szilagyi@cloudera.com> wrote:
>> >>> >> >> > >
>> >>> >> >> > >> Hi Jilani,
>> >>> >> >> > >> I'm not sure I completely understand what you are trying to
>> >>> do.
>> >>> >> Could
>> >>> >> >> > you
>> >>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of
>> >>> example
>> >>> >> data
>> >>> >> >> > >> showing the changes that happen compared to the changes
>> you'd
>> >>> >> like to
>> >>> >> >> > see?
>> >>> >> >> > >> Thanks,
>> >>> >> >> > >> Liz
>> >>> >> >> > >>
>> >>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>> >>> >> jilani2423@gmail.com>
>> >>> >> >> > >> wrote:
>> >>> >> >> > >>
>> >>> >> >> > >> >
>> >>> >> >> > >> > Please help in resolving the issue, I am going through
>> >>> source
>> >>> >> code
>> >>> >> >> > some
>> >>> >> >> > >> > how the required nature is missing, But not sure is it
>> for
>> >>> some
>> >>> >> >> reason
>> >>> >> >> > >> we
>> >>> >> >> > >> > avoided this nature.
>> >>> >> >> > >> >
>> >>> >> >> > >> > Provide me some suggestions how to go with this scenario.
>> >>> >> >> > >> >
>> >>> >> >> > >> > Thanks,
>> >>> >> >> > >> > Jilani
>> >>> >> >> > >> >
>> >>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>> >>> >> >> jilani2423@gmail.com>
>> >>> >> >> > >> > wrote:
>> >>> >> >> > >> >
>> >>> >> >> > >> >> Hi,
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> We have a scenario where we are importing data into
>> HBase
>> >>> with
>> >>> >> >> sqoop
>> >>> >> >> > >> >> incremental import.
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> Lets say we imported a table and later source table got
>> >>> updated
>> >>> >> >> for
>> >>> >> >> > >> some
>> >>> >> >> > >> >> columns as null values for some rows. Then while doing
>> >>> >> incremental
>> >>> >> >> > >> import
>> >>> >> >> > >> >> as per HBase these columns should not be there in HBase
>> >>> table.
>> >>> >> But
>> >>> >> >> > >> right
>> >>> >> >> > >> >> now these columns will be as it is available with
>> previous
>> >>> >> values.
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> Is there any fix to overcome this issue?
>> >>> >> >> > >> >>
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> Thanks,
>> >>> >> >> > >> >> Jilani
>> >>> >> >> > >> >>
>> >>> >> >> > >> >
>> >>> >> >> > >> >
>> >>> >> >> > >>
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> >
>> >>> >> >>
>> >>> >> >
>> >>> >> >
>> >>> >>
>> >>> >
>> >>> >
>> >>>
>> >>
>> >>
>> >
>>
>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Yes you are correct, I am running from eclipse. Will run from command line.

Sent from my iPhone

> On Mar 10, 2017, at 6:48 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
> 
> Hi Jilani,
> 
> Please try to execute "ant compile" and then "ant test" from command line, it will run unit tests. If I understood you well, you tried run tests from Eclipse which won't work.
> 
> Thanks,
> Bogi
> 
>> On Fri, Mar 10, 2017 at 6:10 AM, Jilani Shaik <ji...@gmail.com> wrote:
>> Hi Bogi,
>> 
>> Thanks for the providing direction.
>> 
>> As you suggested I explored further and resolved the issue and able to test
>> the fix on trunk based code changes in my hadoop cluster.
>> 
>> Root cause for my issue:
>> 1.4.6 code base using the same avro version which is there in my hadoop
>> cluster so there is no issue for that jar component, whereas trunk code
>> base using the avro-1.8.1 jar files, which is not available in my hadoop
>> cluster.
>> 
>> Can you suggest how to do unit test etc for this component.
>> 
>> I tried with "test" target, I am getting all as failed as below.
>> 
>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.415 sec
>>     [junit] Running com.cloudera.sqoop.TestDirectImport
>>     [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
>> 13.705 sec
>>     [junit] Test com.cloudera.sqoop.TestDirectImport FAILED
>>     [junit] Running com.cloudera.sqoop.TestExport
>>     [junit] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time
>> elapsed: 22.564 sec
>>     [junit] Test com.cloudera.sqoop.TestExport FAILED
>>     [junit] Running com.cloudera.sqoop.TestExportUpdate
>> 
>> Do I need to do any changes? I am running from eclipse with "test" target.
>> 
>> Thanks,
>> Jilani
>> 
>> 
>> On Thu, Mar 9, 2017 at 9:42 PM, Jilani Shaik <ji...@gmail.com> wrote:
>> 
>> > Hi Bogi,
>> >
>> > - Prepared jar using trunk with "jar-all" target
>> >
>> > - Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/
>> >
>> > - Moved out existing jar to some other location
>> >
>> > - then execute the below command to do import
>> > sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
>> > --username test --password test123$ --table payment -m 2 --hbase-table
>> > /database/demoapp/hbase/payment --column-family pay --hbase-row-key
>> > payment_id --incremental lastmodified --merge-key payment_id --check-column
>> > last_update --last-value '2017-01-08 08:02:05.0'
>> >
>> >
>> > The same steps I followed for both the jar from trunk code vs 1.4.6 branch
>> > code.
>> >
>> > Where are you suggesting the multiple avro jars, is it at the time of jar
>> > preparation or running the command using the jar.
>> >
>> >
>> > Thanks,
>> > Jilani
>> >
>> > On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>> >
>> >> Hi Jilani,
>> >>
>> >> I suspect that you have an old version of Avro or even multiple Avro
>> >> versions on your classpath and thus Sqoop uses an older one.
>> >>
>> >> Could you please provide a list of the exact commands you have performed
>> >> so that I can reproduce the issue?
>> >>
>> >> Thanks,
>> >> Bogi
>> >>
>> >> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com>
>> >> wrote:
>> >>
>> >>> Can some one provide me the pointers what am I missing with trunk vs
>> >>> 1.4.6
>> >>> builds, which is giving some error as mentioned in below mail chain.
>> >>>
>> >>> I did followed the same ant target to prepare jar for both branches, but
>> >>> even though 1.4.6 jar is different to 1.4.7 which is created form trunk.
>> >>>
>> >>> Thanks,
>> >>> Jilani
>> >>>
>> >>>
>> >>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Hi Bogi,
>> >>> >
>> >>> > I am getting below error, when I have prepared jar from trunk and try
>> >>> to
>> >>> > do sqoop import with mysql database table and got below exception,
>> >>> where as
>> >>> > similar changes are working with branch 1.4.6.
>> >>> >
>> >>> >
>> >>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
>> >>> 1.4.7-SNAPSHOT
>> >>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>> >>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on the
>> >>> > command-line is insecure. Consider using -P instead.
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector for
>> >>> > Oracle and Hadoop can be called by Sqoop!
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
>> >>> scheme:
>> >>> > jdbc:mysql:
>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> >>> > org/apache/avro/LogicalType
>> >>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
>> >>> > DefaultManagerFactory.java:67)
>> >>> >         at org.apache.sqoop.ConnFactory.g
>> >>> etManager(ConnFactory.java:184)
>> >>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
>> >>> > BaseSqoopTool.java:270)
>> >>> >         at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
>> >>> >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
>> >>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>> >>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> >>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>> >>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
>> >>> > Caused by: java.lang.ClassNotFoundException:
>> >>> org.apache.avro.LogicalType
>> >>> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> >>> >         at sun.misc.Launcher$AppClassLoad
>> >>> er.loadClass(Launcher.java:331)
>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> >>> >         ... 11 more
>> >>> >
>> >>> > Please let me know what is missing and how to resolve this exception,
>> >>> Let
>> >>> > me know if you need further details.
>> >>> >
>> >>> > Thanks,
>> >>> > Jilani
>> >>> >
>> >>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
>> >>> wrote:
>> >>> >
>> >>> >> Hi Jilani,
>> >>> >>
>> >>> >> This is an example: SQOOP-3053
>> >>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
>> >>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
>> >>> changes on
>> >>> >> trunk as it will be used to cut the future release so your patch
>> >>> >> definitely
>> >>> >> needs to be be able to apply on it.
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Bogi
>> >>> >>
>> >>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com>
>> >>> >> wrote:
>> >>> >>
>> >>> >> > Hi Bogi,
>> >>> >> >
>> >>> >> > Can you provide me sample Jira tickets and Review requests similar
>> >>> to
>> >>> >> > this, to proceed further.
>> >>> >> >
>> >>> >> > I applied the code changes from sqoop git from this branch
>> >>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take
>> >>> the
>> >>> >> code
>> >>> >> > from there and apply the changes before submit review for request.
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> > Jilani
>> >>> >> >
>> >>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com>
>> >>> >> wrote:
>> >>> >> >
>> >>> >> >> Hi Jilani,
>> >>> >> >>
>> >>> >> >> To get your change committed please do the following:
>> >>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
>> >>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
>> >>> >> >> * Create a review request at Apache's review board
>> >>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it to
>> >>> the
>> >>> >> JIRA
>> >>> >> >>
>> >>> >> >> ticket
>> >>> >> >>
>> >>> >> >> Please consider the guidelines below:
>> >>> >> >>
>> >>> >> >> Review board
>> >>> >> >> * Summary: generate your summary using the issue's jira key + jira
>> >>> >> title
>> >>> >> >> * Groups: add the relevant group so everyone on the project will
>> >>> know
>> >>> >> >> about
>> >>> >> >> your patch (Sqoop)
>> >>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to the
>> >>> jira
>> >>> >> side
>> >>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>> >>> >> >> * And as soon as the patch gets committed, it's very useful for the
>> >>> >> >> community if you close the review and mark it as "Submitted" at the
>> >>> >> Review
>> >>> >> >> board. The button to do this is top right at your own tickets,
>> >>> right
>> >>> >> next
>> >>> >> >> to  the Download Diff button.
>> >>> >> >>
>> >>> >> >> Jira
>> >>> >> >> * Link: please add the link of the review as an external/web link
>> >>> so
>> >>> >> it's
>> >>> >> >> easy to navigate to the reviews side
>> >>> >> >> * Status: mark it as "patch available"
>> >>> >> >>
>> >>> >> >> Sqoop community will receive emails about your new ticket and
>> >>> review
>> >>> >> >> request and will review your change.
>> >>> >> >>
>> >>> >> >> Thanks,
>> >>> >> >> Bogi
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <
>> >>> jilani2423@gmail.com>
>> >>> >> >> wrote:
>> >>> >> >>
>> >>> >> >> > Do we have any update?
>> >>> >> >> >
>> >>> >> >> > I did checkout of the 1.4.6 code and done code changes to achieve
>> >>> >> this
>> >>> >> >> and
>> >>> >> >> > tested in cluster and it is working as expected. Is there a way
>> >>> I can
>> >>> >> >> > contribute this as a patch and then the committers can validate
>> >>> >> further
>> >>> >> >> and
>> >>> >> >> > suggest if any changes required to move further. Please suggest
>> >>> the
>> >>> >> >> > approach.
>> >>> >> >> >
>> >>> >> >> > Thanks,
>> >>> >> >> > Jilani
>> >>> >> >> >
>> >>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
>> >>> jilani2423@gmail.com>
>> >>> >> >> > wrote:
>> >>> >> >> >
>> >>> >> >> > > Hi Liz,
>> >>> >> >> > >
>> >>> >> >> > > lets say we inserted data in a table with initial import, that
>> >>> >> looks
>> >>> >> >> like
>> >>> >> >> > > this in hbase shell
>> >>> >> >> > >
>> >>> >> >> > >  1                                     column=pay:amount,
>> >>> >> >> > > timestamp=1485129654025, value=4.99
>> >>> >> >> > >  1                                     column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  1                                     column=pay:last_update,
>> >>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>> >>> >> >> > >  1                                     column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >>> >> >> > >  1                                     column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129654025, value=573
>> >>> >> >> > >  1                                     column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  10                                    column=pay:amount,
>> >>> >> >> > > timestamp=1485129504390, value=5.99
>> >>> >> >> > >  10                                    column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129504390, value=1
>> >>> >> >> > >  10                                    column=pay:last_update,
>> >>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>> >>> >> >> > >  10                                    column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >>> >> >> > >  10                                    column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129504390, value=4526
>> >>> >> >> > >  10                                    column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129504390, value=2
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > now assume that in source rental_id becomes NULL for rowkey
>> >>> "1",
>> >>> >> and
>> >>> >> >> then
>> >>> >> >> > > we are doing incremental import into HBase. With current
>> >>> import the
>> >>> >> >> final
>> >>> >> >> > > HBase data after incremental import will look like this.
>> >>> >> >> > >
>> >>> >> >> > >  1                                     column=pay:amount,
>> >>> >> >> > > timestamp=1485129654025, value=4.99
>> >>> >> >> > >  1                                     column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  1                                     column=pay:last_update,
>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >>> >> >> > >  1                                     column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >>> >> >> > >  1                                     column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129654025, value=573
>> >>> >> >> > >  1                                     column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  10                                    column=pay:amount,
>> >>> >> >> > > timestamp=1485129504390, value=5.99
>> >>> >> >> > >  10                                    column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129504390, value=1
>> >>> >> >> > >  10                                    column=pay:last_update,
>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >>> >> >> > >  10                                    column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >>> >> >> > >  10                                    column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129504390, value=126
>> >>> >> >> > >  10                                    column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129504390, value=2
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1", the
>> >>> final
>> >>> >> >> HBase
>> >>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
>> >>> expecting
>> >>> >> >> below
>> >>> >> >> > > data for these rowkeys.
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > >  1                                     column=pay:amount,
>> >>> >> >> > > timestamp=1485129654025, value=4.99
>> >>> >> >> > >  1                                     column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  1                                     column=pay:last_update,
>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >>> >> >> > >  1                                     column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >>> >> >> > >  1                                     column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  10                                    column=pay:amount,
>> >>> >> >> > > timestamp=1485129504390, value=5.99
>> >>> >> >> > >  10                                    column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129504390, value=1
>> >>> >> >> > >  10                                    column=pay:last_update,
>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >>> >> >> > >  10                                    column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >>> >> >> > >  10                                    column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129504390, value=126
>> >>> >> >> > >  10                                    column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129504390, value=2
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > Please let me know if anything required further.
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > Thanks,
>> >>> >> >> > > Jilani
>> >>> >> >> > >
>> >>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>> >>> >> >> > > liz.szilagyi@cloudera.com> wrote:
>> >>> >> >> > >
>> >>> >> >> > >> Hi Jilani,
>> >>> >> >> > >> I'm not sure I completely understand what you are trying to
>> >>> do.
>> >>> >> Could
>> >>> >> >> > you
>> >>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of
>> >>> example
>> >>> >> data
>> >>> >> >> > >> showing the changes that happen compared to the changes you'd
>> >>> >> like to
>> >>> >> >> > see?
>> >>> >> >> > >> Thanks,
>> >>> >> >> > >> Liz
>> >>> >> >> > >>
>> >>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>> >>> >> jilani2423@gmail.com>
>> >>> >> >> > >> wrote:
>> >>> >> >> > >>
>> >>> >> >> > >> >
>> >>> >> >> > >> > Please help in resolving the issue, I am going through
>> >>> source
>> >>> >> code
>> >>> >> >> > some
>> >>> >> >> > >> > how the required nature is missing, But not sure is it for
>> >>> some
>> >>> >> >> reason
>> >>> >> >> > >> we
>> >>> >> >> > >> > avoided this nature.
>> >>> >> >> > >> >
>> >>> >> >> > >> > Provide me some suggestions how to go with this scenario.
>> >>> >> >> > >> >
>> >>> >> >> > >> > Thanks,
>> >>> >> >> > >> > Jilani
>> >>> >> >> > >> >
>> >>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>> >>> >> >> jilani2423@gmail.com>
>> >>> >> >> > >> > wrote:
>> >>> >> >> > >> >
>> >>> >> >> > >> >> Hi,
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> We have a scenario where we are importing data into HBase
>> >>> with
>> >>> >> >> sqoop
>> >>> >> >> > >> >> incremental import.
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> Lets say we imported a table and later source table got
>> >>> updated
>> >>> >> >> for
>> >>> >> >> > >> some
>> >>> >> >> > >> >> columns as null values for some rows. Then while doing
>> >>> >> incremental
>> >>> >> >> > >> import
>> >>> >> >> > >> >> as per HBase these columns should not be there in HBase
>> >>> table.
>> >>> >> But
>> >>> >> >> > >> right
>> >>> >> >> > >> >> now these columns will be as it is available with previous
>> >>> >> values.
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> Is there any fix to overcome this issue?
>> >>> >> >> > >> >>
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> Thanks,
>> >>> >> >> > >> >> Jilani
>> >>> >> >> > >> >>
>> >>> >> >> > >> >
>> >>> >> >> > >> >
>> >>> >> >> > >>
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> >
>> >>> >> >>
>> >>> >> >
>> >>> >> >
>> >>> >>
>> >>> >
>> >>> >
>> >>>
>> >>
>> >>
>> >
> 

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Yes you are correct, I am running from eclipse. Will run from command line.

Sent from my iPhone

> On Mar 10, 2017, at 6:48 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
> 
> Hi Jilani,
> 
> Please try to execute "ant compile" and then "ant test" from command line, it will run unit tests. If I understood you well, you tried run tests from Eclipse which won't work.
> 
> Thanks,
> Bogi
> 
>> On Fri, Mar 10, 2017 at 6:10 AM, Jilani Shaik <ji...@gmail.com> wrote:
>> Hi Bogi,
>> 
>> Thanks for the providing direction.
>> 
>> As you suggested I explored further and resolved the issue and able to test
>> the fix on trunk based code changes in my hadoop cluster.
>> 
>> Root cause for my issue:
>> 1.4.6 code base using the same avro version which is there in my hadoop
>> cluster so there is no issue for that jar component, whereas trunk code
>> base using the avro-1.8.1 jar files, which is not available in my hadoop
>> cluster.
>> 
>> Can you suggest how to do unit test etc for this component.
>> 
>> I tried with "test" target, I am getting all as failed as below.
>> 
>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.415 sec
>>     [junit] Running com.cloudera.sqoop.TestDirectImport
>>     [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
>> 13.705 sec
>>     [junit] Test com.cloudera.sqoop.TestDirectImport FAILED
>>     [junit] Running com.cloudera.sqoop.TestExport
>>     [junit] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time
>> elapsed: 22.564 sec
>>     [junit] Test com.cloudera.sqoop.TestExport FAILED
>>     [junit] Running com.cloudera.sqoop.TestExportUpdate
>> 
>> Do I need to do any changes? I am running from eclipse with "test" target.
>> 
>> Thanks,
>> Jilani
>> 
>> 
>> On Thu, Mar 9, 2017 at 9:42 PM, Jilani Shaik <ji...@gmail.com> wrote:
>> 
>> > Hi Bogi,
>> >
>> > - Prepared jar using trunk with "jar-all" target
>> >
>> > - Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/
>> >
>> > - Moved out existing jar to some other location
>> >
>> > - then execute the below command to do import
>> > sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
>> > --username test --password test123$ --table payment -m 2 --hbase-table
>> > /database/demoapp/hbase/payment --column-family pay --hbase-row-key
>> > payment_id --incremental lastmodified --merge-key payment_id --check-column
>> > last_update --last-value '2017-01-08 08:02:05.0'
>> >
>> >
>> > The same steps I followed for both the jar from trunk code vs 1.4.6 branch
>> > code.
>> >
>> > Where are you suggesting the multiple avro jars, is it at the time of jar
>> > preparation or running the command using the jar.
>> >
>> >
>> > Thanks,
>> > Jilani
>> >
>> > On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>> >
>> >> Hi Jilani,
>> >>
>> >> I suspect that you have an old version of Avro or even multiple Avro
>> >> versions on your classpath and thus Sqoop uses an older one.
>> >>
>> >> Could you please provide a list of the exact commands you have performed
>> >> so that I can reproduce the issue?
>> >>
>> >> Thanks,
>> >> Bogi
>> >>
>> >> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com>
>> >> wrote:
>> >>
>> >>> Can some one provide me the pointers what am I missing with trunk vs
>> >>> 1.4.6
>> >>> builds, which is giving some error as mentioned in below mail chain.
>> >>>
>> >>> I did followed the same ant target to prepare jar for both branches, but
>> >>> even though 1.4.6 jar is different to 1.4.7 which is created form trunk.
>> >>>
>> >>> Thanks,
>> >>> Jilani
>> >>>
>> >>>
>> >>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Hi Bogi,
>> >>> >
>> >>> > I am getting below error, when I have prepared jar from trunk and try
>> >>> to
>> >>> > do sqoop import with mysql database table and got below exception,
>> >>> where as
>> >>> > similar changes are working with branch 1.4.6.
>> >>> >
>> >>> >
>> >>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
>> >>> 1.4.7-SNAPSHOT
>> >>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>> >>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on the
>> >>> > command-line is insecure. Consider using -P instead.
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector for
>> >>> > Oracle and Hadoop can be called by Sqoop!
>> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>> >>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
>> >>> scheme:
>> >>> > jdbc:mysql:
>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> >>> > org/apache/avro/LogicalType
>> >>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
>> >>> > DefaultManagerFactory.java:67)
>> >>> >         at org.apache.sqoop.ConnFactory.g
>> >>> etManager(ConnFactory.java:184)
>> >>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
>> >>> > BaseSqoopTool.java:270)
>> >>> >         at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
>> >>> >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
>> >>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>> >>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> >>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>> >>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
>> >>> > Caused by: java.lang.ClassNotFoundException:
>> >>> org.apache.avro.LogicalType
>> >>> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> >>> >         at sun.misc.Launcher$AppClassLoad
>> >>> er.loadClass(Launcher.java:331)
>> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> >>> >         ... 11 more
>> >>> >
>> >>> > Please let me know what is missing and how to resolve this exception,
>> >>> Let
>> >>> > me know if you need further details.
>> >>> >
>> >>> > Thanks,
>> >>> > Jilani
>> >>> >
>> >>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
>> >>> wrote:
>> >>> >
>> >>> >> Hi Jilani,
>> >>> >>
>> >>> >> This is an example: SQOOP-3053
>> >>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
>> >>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
>> >>> changes on
>> >>> >> trunk as it will be used to cut the future release so your patch
>> >>> >> definitely
>> >>> >> needs to be be able to apply on it.
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Bogi
>> >>> >>
>> >>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com>
>> >>> >> wrote:
>> >>> >>
>> >>> >> > Hi Bogi,
>> >>> >> >
>> >>> >> > Can you provide me sample Jira tickets and Review requests similar
>> >>> to
>> >>> >> > this, to proceed further.
>> >>> >> >
>> >>> >> > I applied the code changes from sqoop git from this branch
>> >>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take
>> >>> the
>> >>> >> code
>> >>> >> > from there and apply the changes before submit review for request.
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> > Jilani
>> >>> >> >
>> >>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com>
>> >>> >> wrote:
>> >>> >> >
>> >>> >> >> Hi Jilani,
>> >>> >> >>
>> >>> >> >> To get your change committed please do the following:
>> >>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
>> >>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
>> >>> >> >> * Create a review request at Apache's review board
>> >>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it to
>> >>> the
>> >>> >> JIRA
>> >>> >> >>
>> >>> >> >> ticket
>> >>> >> >>
>> >>> >> >> Please consider the guidelines below:
>> >>> >> >>
>> >>> >> >> Review board
>> >>> >> >> * Summary: generate your summary using the issue's jira key + jira
>> >>> >> title
>> >>> >> >> * Groups: add the relevant group so everyone on the project will
>> >>> know
>> >>> >> >> about
>> >>> >> >> your patch (Sqoop)
>> >>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to the
>> >>> jira
>> >>> >> side
>> >>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>> >>> >> >> * And as soon as the patch gets committed, it's very useful for the
>> >>> >> >> community if you close the review and mark it as "Submitted" at the
>> >>> >> Review
>> >>> >> >> board. The button to do this is top right at your own tickets,
>> >>> right
>> >>> >> next
>> >>> >> >> to  the Download Diff button.
>> >>> >> >>
>> >>> >> >> Jira
>> >>> >> >> * Link: please add the link of the review as an external/web link
>> >>> so
>> >>> >> it's
>> >>> >> >> easy to navigate to the reviews side
>> >>> >> >> * Status: mark it as "patch available"
>> >>> >> >>
>> >>> >> >> Sqoop community will receive emails about your new ticket and
>> >>> review
>> >>> >> >> request and will review your change.
>> >>> >> >>
>> >>> >> >> Thanks,
>> >>> >> >> Bogi
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <
>> >>> jilani2423@gmail.com>
>> >>> >> >> wrote:
>> >>> >> >>
>> >>> >> >> > Do we have any update?
>> >>> >> >> >
>> >>> >> >> > I did checkout of the 1.4.6 code and done code changes to achieve
>> >>> >> this
>> >>> >> >> and
>> >>> >> >> > tested in cluster and it is working as expected. Is there a way
>> >>> I can
>> >>> >> >> > contribute this as a patch and then the committers can validate
>> >>> >> further
>> >>> >> >> and
>> >>> >> >> > suggest if any changes required to move further. Please suggest
>> >>> the
>> >>> >> >> > approach.
>> >>> >> >> >
>> >>> >> >> > Thanks,
>> >>> >> >> > Jilani
>> >>> >> >> >
>> >>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
>> >>> jilani2423@gmail.com>
>> >>> >> >> > wrote:
>> >>> >> >> >
>> >>> >> >> > > Hi Liz,
>> >>> >> >> > >
>> >>> >> >> > > lets say we inserted data in a table with initial import, that
>> >>> >> looks
>> >>> >> >> like
>> >>> >> >> > > this in hbase shell
>> >>> >> >> > >
>> >>> >> >> > >  1                                     column=pay:amount,
>> >>> >> >> > > timestamp=1485129654025, value=4.99
>> >>> >> >> > >  1                                     column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  1                                     column=pay:last_update,
>> >>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>> >>> >> >> > >  1                                     column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >>> >> >> > >  1                                     column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129654025, value=573
>> >>> >> >> > >  1                                     column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  10                                    column=pay:amount,
>> >>> >> >> > > timestamp=1485129504390, value=5.99
>> >>> >> >> > >  10                                    column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129504390, value=1
>> >>> >> >> > >  10                                    column=pay:last_update,
>> >>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>> >>> >> >> > >  10                                    column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >>> >> >> > >  10                                    column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129504390, value=4526
>> >>> >> >> > >  10                                    column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129504390, value=2
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > now assume that in source rental_id becomes NULL for rowkey
>> >>> "1",
>> >>> >> and
>> >>> >> >> then
>> >>> >> >> > > we are doing incremental import into HBase. With current
>> >>> import the
>> >>> >> >> final
>> >>> >> >> > > HBase data after incremental import will look like this.
>> >>> >> >> > >
>> >>> >> >> > >  1                                     column=pay:amount,
>> >>> >> >> > > timestamp=1485129654025, value=4.99
>> >>> >> >> > >  1                                     column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  1                                     column=pay:last_update,
>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >>> >> >> > >  1                                     column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >>> >> >> > >  1                                     column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129654025, value=573
>> >>> >> >> > >  1                                     column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  10                                    column=pay:amount,
>> >>> >> >> > > timestamp=1485129504390, value=5.99
>> >>> >> >> > >  10                                    column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129504390, value=1
>> >>> >> >> > >  10                                    column=pay:last_update,
>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >>> >> >> > >  10                                    column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >>> >> >> > >  10                                    column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129504390, value=126
>> >>> >> >> > >  10                                    column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129504390, value=2
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1", the
>> >>> final
>> >>> >> >> HBase
>> >>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
>> >>> expecting
>> >>> >> >> below
>> >>> >> >> > > data for these rowkeys.
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > >  1                                     column=pay:amount,
>> >>> >> >> > > timestamp=1485129654025, value=4.99
>> >>> >> >> > >  1                                     column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  1                                     column=pay:last_update,
>> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >>> >> >> > >  1                                     column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >>> >> >> > >  1                                     column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129654025, value=1
>> >>> >> >> > >  10                                    column=pay:amount,
>> >>> >> >> > > timestamp=1485129504390, value=5.99
>> >>> >> >> > >  10                                    column=pay:customer_id,
>> >>> >> >> > > timestamp=1485129504390, value=1
>> >>> >> >> > >  10                                    column=pay:last_update,
>> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >>> >> >> > >  10                                    column=pay:payment_date,
>> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >>> >> >> > >  10                                    column=pay:rental_id,
>> >>> >> >> > > timestamp=1485129504390, value=126
>> >>> >> >> > >  10                                    column=pay:staff_id,
>> >>> >> >> > > timestamp=1485129504390, value=2
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > Please let me know if anything required further.
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> > > Thanks,
>> >>> >> >> > > Jilani
>> >>> >> >> > >
>> >>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>> >>> >> >> > > liz.szilagyi@cloudera.com> wrote:
>> >>> >> >> > >
>> >>> >> >> > >> Hi Jilani,
>> >>> >> >> > >> I'm not sure I completely understand what you are trying to
>> >>> do.
>> >>> >> Could
>> >>> >> >> > you
>> >>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of
>> >>> example
>> >>> >> data
>> >>> >> >> > >> showing the changes that happen compared to the changes you'd
>> >>> >> like to
>> >>> >> >> > see?
>> >>> >> >> > >> Thanks,
>> >>> >> >> > >> Liz
>> >>> >> >> > >>
>> >>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>> >>> >> jilani2423@gmail.com>
>> >>> >> >> > >> wrote:
>> >>> >> >> > >>
>> >>> >> >> > >> >
>> >>> >> >> > >> > Please help in resolving the issue, I am going through
>> >>> source
>> >>> >> code
>> >>> >> >> > some
>> >>> >> >> > >> > how the required nature is missing, But not sure is it for
>> >>> some
>> >>> >> >> reason
>> >>> >> >> > >> we
>> >>> >> >> > >> > avoided this nature.
>> >>> >> >> > >> >
>> >>> >> >> > >> > Provide me some suggestions how to go with this scenario.
>> >>> >> >> > >> >
>> >>> >> >> > >> > Thanks,
>> >>> >> >> > >> > Jilani
>> >>> >> >> > >> >
>> >>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>> >>> >> >> jilani2423@gmail.com>
>> >>> >> >> > >> > wrote:
>> >>> >> >> > >> >
>> >>> >> >> > >> >> Hi,
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> We have a scenario where we are importing data into HBase
>> >>> with
>> >>> >> >> sqoop
>> >>> >> >> > >> >> incremental import.
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> Lets say we imported a table and later source table got
>> >>> updated
>> >>> >> >> for
>> >>> >> >> > >> some
>> >>> >> >> > >> >> columns as null values for some rows. Then while doing
>> >>> >> incremental
>> >>> >> >> > >> import
>> >>> >> >> > >> >> as per HBase these columns should not be there in HBase
>> >>> table.
>> >>> >> But
>> >>> >> >> > >> right
>> >>> >> >> > >> >> now these columns will be as it is available with previous
>> >>> >> values.
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> Is there any fix to overcome this issue?
>> >>> >> >> > >> >>
>> >>> >> >> > >> >>
>> >>> >> >> > >> >> Thanks,
>> >>> >> >> > >> >> Jilani
>> >>> >> >> > >> >>
>> >>> >> >> > >> >
>> >>> >> >> > >> >
>> >>> >> >> > >>
>> >>> >> >> > >
>> >>> >> >> > >
>> >>> >> >> >
>> >>> >> >>
>> >>> >> >
>> >>> >> >
>> >>> >>
>> >>> >
>> >>> >
>> >>>
>> >>
>> >>
>> >
> 

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Boglarka Egyed <bo...@cloudera.com>.
Hi Jilani,

Please try to execute "ant compile" and then "ant test" from command line,
it will run unit tests. If I understood you well, you tried run tests from
Eclipse which won't work.

Thanks,
Bogi

On Fri, Mar 10, 2017 at 6:10 AM, Jilani Shaik <ji...@gmail.com> wrote:

> Hi Bogi,
>
> Thanks for the providing direction.
>
> As you suggested I explored further and resolved the issue and able to test
> the fix on trunk based code changes in my hadoop cluster.
>
> Root cause for my issue:
> 1.4.6 code base using the same avro version which is there in my hadoop
> cluster so there is no issue for that jar component, whereas trunk code
> base using the avro-1.8.1 jar files, which is not available in my hadoop
> cluster.
>
> Can you suggest how to do unit test etc for this component.
>
> I tried with "test" target, I am getting all as failed as below.
>
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.415 sec
>     [junit] Running com.cloudera.sqoop.TestDirectImport
>     [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 13.705 sec
>     [junit] Test com.cloudera.sqoop.TestDirectImport FAILED
>     [junit] Running com.cloudera.sqoop.TestExport
>     [junit] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time
> elapsed: 22.564 sec
>     [junit] Test com.cloudera.sqoop.TestExport FAILED
>     [junit] Running com.cloudera.sqoop.TestExportUpdate
>
> Do I need to do any changes? I am running from eclipse with "test" target.
>
> Thanks,
> Jilani
>
>
> On Thu, Mar 9, 2017 at 9:42 PM, Jilani Shaik <ji...@gmail.com> wrote:
>
> > Hi Bogi,
> >
> > - Prepared jar using trunk with "jar-all" target
> >
> > - Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/
> >
> > - Moved out existing jar to some other location
> >
> > - then execute the below command to do import
> > sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
> > --username test --password test123$ --table payment -m 2 --hbase-table
> > /database/demoapp/hbase/payment --column-family pay --hbase-row-key
> > payment_id --incremental lastmodified --merge-key payment_id
> --check-column
> > last_update --last-value '2017-01-08 08:02:05.0'
> >
> >
> > The same steps I followed for both the jar from trunk code vs 1.4.6
> branch
> > code.
> >
> > Where are you suggesting the multiple avro jars, is it at the time of jar
> > preparation or running the command using the jar.
> >
> >
> > Thanks,
> > Jilani
> >
> > On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com>
> wrote:
> >
> >> Hi Jilani,
> >>
> >> I suspect that you have an old version of Avro or even multiple Avro
> >> versions on your classpath and thus Sqoop uses an older one.
> >>
> >> Could you please provide a list of the exact commands you have performed
> >> so that I can reproduce the issue?
> >>
> >> Thanks,
> >> Bogi
> >>
> >> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com>
> >> wrote:
> >>
> >>> Can some one provide me the pointers what am I missing with trunk vs
> >>> 1.4.6
> >>> builds, which is giving some error as mentioned in below mail chain.
> >>>
> >>> I did followed the same ant target to prepare jar for both branches,
> but
> >>> even though 1.4.6 jar is different to 1.4.7 which is created form
> trunk.
> >>>
> >>> Thanks,
> >>> Jilani
> >>>
> >>>
> >>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
> >>> wrote:
> >>>
> >>> > Hi Bogi,
> >>> >
> >>> > I am getting below error, when I have prepared jar from trunk and try
> >>> to
> >>> > do sqoop import with mysql database table and got below exception,
> >>> where as
> >>> > similar changes are working with branch 1.4.6.
> >>> >
> >>> >
> >>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
> >>> 1.4.7-SNAPSHOT
> >>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
> >>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on
> the
> >>> > command-line is insecure. Consider using -P instead.
> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
> >>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector
> for
> >>> > Oracle and Hadoop can be called by Sqoop!
> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
> >>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
> >>> scheme:
> >>> > jdbc:mysql:
> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
> >>> > org/apache/avro/LogicalType
> >>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
> >>> > DefaultManagerFactory.java:67)
> >>> >         at org.apache.sqoop.ConnFactory.g
> >>> etManager(ConnFactory.java:184)
> >>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
> >>> > BaseSqoopTool.java:270)
> >>> >         at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
> >>> >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
> >>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
> >>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
> >>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
> >>> > Caused by: java.lang.ClassNotFoundException:
> >>> org.apache.avro.LogicalType
> >>> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:
> 381)
> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >>> >         at sun.misc.Launcher$AppClassLoad
> >>> er.loadClass(Launcher.java:331)
> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >>> >         ... 11 more
> >>> >
> >>> > Please let me know what is missing and how to resolve this exception,
> >>> Let
> >>> > me know if you need further details.
> >>> >
> >>> > Thanks,
> >>> > Jilani
> >>> >
> >>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
> >>> wrote:
> >>> >
> >>> >> Hi Jilani,
> >>> >>
> >>> >> This is an example: SQOOP-3053
> >>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
> >>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
> >>> changes on
> >>> >> trunk as it will be used to cut the future release so your patch
> >>> >> definitely
> >>> >> needs to be be able to apply on it.
> >>> >>
> >>> >> Thanks,
> >>> >> Bogi
> >>> >>
> >>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com>
> >>> >> wrote:
> >>> >>
> >>> >> > Hi Bogi,
> >>> >> >
> >>> >> > Can you provide me sample Jira tickets and Review requests similar
> >>> to
> >>> >> > this, to proceed further.
> >>> >> >
> >>> >> > I applied the code changes from sqoop git from this branch
> >>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take
> >>> the
> >>> >> code
> >>> >> > from there and apply the changes before submit review for request.
> >>> >> >
> >>> >> > Thanks,
> >>> >> > Jilani
> >>> >> >
> >>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <
> bogi@cloudera.com>
> >>> >> wrote:
> >>> >> >
> >>> >> >> Hi Jilani,
> >>> >> >>
> >>> >> >> To get your change committed please do the following:
> >>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
> >>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
> >>> >> >> * Create a review request at Apache's review board
> >>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it to
> >>> the
> >>> >> JIRA
> >>> >> >>
> >>> >> >> ticket
> >>> >> >>
> >>> >> >> Please consider the guidelines below:
> >>> >> >>
> >>> >> >> Review board
> >>> >> >> * Summary: generate your summary using the issue's jira key +
> jira
> >>> >> title
> >>> >> >> * Groups: add the relevant group so everyone on the project will
> >>> know
> >>> >> >> about
> >>> >> >> your patch (Sqoop)
> >>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to the
> >>> jira
> >>> >> side
> >>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
> >>> >> >> * And as soon as the patch gets committed, it's very useful for
> the
> >>> >> >> community if you close the review and mark it as "Submitted" at
> the
> >>> >> Review
> >>> >> >> board. The button to do this is top right at your own tickets,
> >>> right
> >>> >> next
> >>> >> >> to  the Download Diff button.
> >>> >> >>
> >>> >> >> Jira
> >>> >> >> * Link: please add the link of the review as an external/web link
> >>> so
> >>> >> it's
> >>> >> >> easy to navigate to the reviews side
> >>> >> >> * Status: mark it as "patch available"
> >>> >> >>
> >>> >> >> Sqoop community will receive emails about your new ticket and
> >>> review
> >>> >> >> request and will review your change.
> >>> >> >>
> >>> >> >> Thanks,
> >>> >> >> Bogi
> >>> >> >>
> >>> >> >>
> >>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <
> >>> jilani2423@gmail.com>
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >> > Do we have any update?
> >>> >> >> >
> >>> >> >> > I did checkout of the 1.4.6 code and done code changes to
> achieve
> >>> >> this
> >>> >> >> and
> >>> >> >> > tested in cluster and it is working as expected. Is there a way
> >>> I can
> >>> >> >> > contribute this as a patch and then the committers can validate
> >>> >> further
> >>> >> >> and
> >>> >> >> > suggest if any changes required to move further. Please suggest
> >>> the
> >>> >> >> > approach.
> >>> >> >> >
> >>> >> >> > Thanks,
> >>> >> >> > Jilani
> >>> >> >> >
> >>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
> >>> jilani2423@gmail.com>
> >>> >> >> > wrote:
> >>> >> >> >
> >>> >> >> > > Hi Liz,
> >>> >> >> > >
> >>> >> >> > > lets say we inserted data in a table with initial import,
> that
> >>> >> looks
> >>> >> >> like
> >>> >> >> > > this in hbase shell
> >>> >> >> > >
> >>> >> >> > >  1                                     column=pay:amount,
> >>> >> >> > > timestamp=1485129654025, value=4.99
> >>> >> >> > >  1
>  column=pay:customer_id,
> >>> >> >> > > timestamp=1485129654025, value=1
> >>> >> >> > >  1
>  column=pay:last_update,
> >>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
> >>> >> >> > >  1
>  column=pay:payment_date,
> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >>> >> >> > >  1                                     column=pay:rental_id,
> >>> >> >> > > timestamp=1485129654025, value=573
> >>> >> >> > >  1                                     column=pay:staff_id,
> >>> >> >> > > timestamp=1485129654025, value=1
> >>> >> >> > >  10                                    column=pay:amount,
> >>> >> >> > > timestamp=1485129504390, value=5.99
> >>> >> >> > >  10
> column=pay:customer_id,
> >>> >> >> > > timestamp=1485129504390, value=1
> >>> >> >> > >  10
> column=pay:last_update,
> >>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
> >>> >> >> > >  10
> column=pay:payment_date,
> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >>> >> >> > >  10                                    column=pay:rental_id,
> >>> >> >> > > timestamp=1485129504390, value=4526
> >>> >> >> > >  10                                    column=pay:staff_id,
> >>> >> >> > > timestamp=1485129504390, value=2
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > > now assume that in source rental_id becomes NULL for rowkey
> >>> "1",
> >>> >> and
> >>> >> >> then
> >>> >> >> > > we are doing incremental import into HBase. With current
> >>> import the
> >>> >> >> final
> >>> >> >> > > HBase data after incremental import will look like this.
> >>> >> >> > >
> >>> >> >> > >  1                                     column=pay:amount,
> >>> >> >> > > timestamp=1485129654025, value=4.99
> >>> >> >> > >  1
>  column=pay:customer_id,
> >>> >> >> > > timestamp=1485129654025, value=1
> >>> >> >> > >  1
>  column=pay:last_update,
> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >>> >> >> > >  1
>  column=pay:payment_date,
> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >>> >> >> > >  1                                     column=pay:rental_id,
> >>> >> >> > > timestamp=1485129654025, value=573
> >>> >> >> > >  1                                     column=pay:staff_id,
> >>> >> >> > > timestamp=1485129654025, value=1
> >>> >> >> > >  10                                    column=pay:amount,
> >>> >> >> > > timestamp=1485129504390, value=5.99
> >>> >> >> > >  10
> column=pay:customer_id,
> >>> >> >> > > timestamp=1485129504390, value=1
> >>> >> >> > >  10
> column=pay:last_update,
> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >>> >> >> > >  10
> column=pay:payment_date,
> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >>> >> >> > >  10                                    column=pay:rental_id,
> >>> >> >> > > timestamp=1485129504390, value=126
> >>> >> >> > >  10                                    column=pay:staff_id,
> >>> >> >> > > timestamp=1485129504390, value=2
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1", the
> >>> final
> >>> >> >> HBase
> >>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
> >>> expecting
> >>> >> >> below
> >>> >> >> > > data for these rowkeys.
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > >  1                                     column=pay:amount,
> >>> >> >> > > timestamp=1485129654025, value=4.99
> >>> >> >> > >  1
>  column=pay:customer_id,
> >>> >> >> > > timestamp=1485129654025, value=1
> >>> >> >> > >  1
>  column=pay:last_update,
> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >>> >> >> > >  1
>  column=pay:payment_date,
> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >>> >> >> > >  1                                     column=pay:staff_id,
> >>> >> >> > > timestamp=1485129654025, value=1
> >>> >> >> > >  10                                    column=pay:amount,
> >>> >> >> > > timestamp=1485129504390, value=5.99
> >>> >> >> > >  10
> column=pay:customer_id,
> >>> >> >> > > timestamp=1485129504390, value=1
> >>> >> >> > >  10
> column=pay:last_update,
> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >>> >> >> > >  10
> column=pay:payment_date,
> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >>> >> >> > >  10                                    column=pay:rental_id,
> >>> >> >> > > timestamp=1485129504390, value=126
> >>> >> >> > >  10                                    column=pay:staff_id,
> >>> >> >> > > timestamp=1485129504390, value=2
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > > Please let me know if anything required further.
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > > Thanks,
> >>> >> >> > > Jilani
> >>> >> >> > >
> >>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> >>> >> >> > > liz.szilagyi@cloudera.com> wrote:
> >>> >> >> > >
> >>> >> >> > >> Hi Jilani,
> >>> >> >> > >> I'm not sure I completely understand what you are trying to
> >>> do.
> >>> >> Could
> >>> >> >> > you
> >>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of
> >>> example
> >>> >> data
> >>> >> >> > >> showing the changes that happen compared to the changes
> you'd
> >>> >> like to
> >>> >> >> > see?
> >>> >> >> > >> Thanks,
> >>> >> >> > >> Liz
> >>> >> >> > >>
> >>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
> >>> >> jilani2423@gmail.com>
> >>> >> >> > >> wrote:
> >>> >> >> > >>
> >>> >> >> > >> >
> >>> >> >> > >> > Please help in resolving the issue, I am going through
> >>> source
> >>> >> code
> >>> >> >> > some
> >>> >> >> > >> > how the required nature is missing, But not sure is it for
> >>> some
> >>> >> >> reason
> >>> >> >> > >> we
> >>> >> >> > >> > avoided this nature.
> >>> >> >> > >> >
> >>> >> >> > >> > Provide me some suggestions how to go with this scenario.
> >>> >> >> > >> >
> >>> >> >> > >> > Thanks,
> >>> >> >> > >> > Jilani
> >>> >> >> > >> >
> >>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
> >>> >> >> jilani2423@gmail.com>
> >>> >> >> > >> > wrote:
> >>> >> >> > >> >
> >>> >> >> > >> >> Hi,
> >>> >> >> > >> >>
> >>> >> >> > >> >> We have a scenario where we are importing data into HBase
> >>> with
> >>> >> >> sqoop
> >>> >> >> > >> >> incremental import.
> >>> >> >> > >> >>
> >>> >> >> > >> >> Lets say we imported a table and later source table got
> >>> updated
> >>> >> >> for
> >>> >> >> > >> some
> >>> >> >> > >> >> columns as null values for some rows. Then while doing
> >>> >> incremental
> >>> >> >> > >> import
> >>> >> >> > >> >> as per HBase these columns should not be there in HBase
> >>> table.
> >>> >> But
> >>> >> >> > >> right
> >>> >> >> > >> >> now these columns will be as it is available with
> previous
> >>> >> values.
> >>> >> >> > >> >>
> >>> >> >> > >> >> Is there any fix to overcome this issue?
> >>> >> >> > >> >>
> >>> >> >> > >> >>
> >>> >> >> > >> >> Thanks,
> >>> >> >> > >> >> Jilani
> >>> >> >> > >> >>
> >>> >> >> > >> >
> >>> >> >> > >> >
> >>> >> >> > >>
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> >
> >>> >> >>
> >>> >> >
> >>> >> >
> >>> >>
> >>> >
> >>> >
> >>>
> >>
> >>
> >
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Boglarka Egyed <bo...@cloudera.com>.
Hi Jilani,

Please try to execute "ant compile" and then "ant test" from command line,
it will run unit tests. If I understood you well, you tried run tests from
Eclipse which won't work.

Thanks,
Bogi

On Fri, Mar 10, 2017 at 6:10 AM, Jilani Shaik <ji...@gmail.com> wrote:

> Hi Bogi,
>
> Thanks for the providing direction.
>
> As you suggested I explored further and resolved the issue and able to test
> the fix on trunk based code changes in my hadoop cluster.
>
> Root cause for my issue:
> 1.4.6 code base using the same avro version which is there in my hadoop
> cluster so there is no issue for that jar component, whereas trunk code
> base using the avro-1.8.1 jar files, which is not available in my hadoop
> cluster.
>
> Can you suggest how to do unit test etc for this component.
>
> I tried with "test" target, I am getting all as failed as below.
>
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.415 sec
>     [junit] Running com.cloudera.sqoop.TestDirectImport
>     [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 13.705 sec
>     [junit] Test com.cloudera.sqoop.TestDirectImport FAILED
>     [junit] Running com.cloudera.sqoop.TestExport
>     [junit] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time
> elapsed: 22.564 sec
>     [junit] Test com.cloudera.sqoop.TestExport FAILED
>     [junit] Running com.cloudera.sqoop.TestExportUpdate
>
> Do I need to do any changes? I am running from eclipse with "test" target.
>
> Thanks,
> Jilani
>
>
> On Thu, Mar 9, 2017 at 9:42 PM, Jilani Shaik <ji...@gmail.com> wrote:
>
> > Hi Bogi,
> >
> > - Prepared jar using trunk with "jar-all" target
> >
> > - Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/
> >
> > - Moved out existing jar to some other location
> >
> > - then execute the below command to do import
> > sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
> > --username test --password test123$ --table payment -m 2 --hbase-table
> > /database/demoapp/hbase/payment --column-family pay --hbase-row-key
> > payment_id --incremental lastmodified --merge-key payment_id
> --check-column
> > last_update --last-value '2017-01-08 08:02:05.0'
> >
> >
> > The same steps I followed for both the jar from trunk code vs 1.4.6
> branch
> > code.
> >
> > Where are you suggesting the multiple avro jars, is it at the time of jar
> > preparation or running the command using the jar.
> >
> >
> > Thanks,
> > Jilani
> >
> > On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com>
> wrote:
> >
> >> Hi Jilani,
> >>
> >> I suspect that you have an old version of Avro or even multiple Avro
> >> versions on your classpath and thus Sqoop uses an older one.
> >>
> >> Could you please provide a list of the exact commands you have performed
> >> so that I can reproduce the issue?
> >>
> >> Thanks,
> >> Bogi
> >>
> >> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com>
> >> wrote:
> >>
> >>> Can some one provide me the pointers what am I missing with trunk vs
> >>> 1.4.6
> >>> builds, which is giving some error as mentioned in below mail chain.
> >>>
> >>> I did followed the same ant target to prepare jar for both branches,
> but
> >>> even though 1.4.6 jar is different to 1.4.7 which is created form
> trunk.
> >>>
> >>> Thanks,
> >>> Jilani
> >>>
> >>>
> >>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
> >>> wrote:
> >>>
> >>> > Hi Bogi,
> >>> >
> >>> > I am getting below error, when I have prepared jar from trunk and try
> >>> to
> >>> > do sqoop import with mysql database table and got below exception,
> >>> where as
> >>> > similar changes are working with branch 1.4.6.
> >>> >
> >>> >
> >>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
> >>> 1.4.7-SNAPSHOT
> >>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
> >>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on
> the
> >>> > command-line is insecure. Consider using -P instead.
> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
> >>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
> >>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector
> for
> >>> > Oracle and Hadoop can be called by Sqoop!
> >>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
> >>> > com.cloudera.sqoop.manager.DefaultManagerFactory
> >>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
> >>> scheme:
> >>> > jdbc:mysql:
> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
> >>> > org/apache/avro/LogicalType
> >>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
> >>> > DefaultManagerFactory.java:67)
> >>> >         at org.apache.sqoop.ConnFactory.g
> >>> etManager(ConnFactory.java:184)
> >>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
> >>> > BaseSqoopTool.java:270)
> >>> >         at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
> >>> >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
> >>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
> >>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
> >>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
> >>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
> >>> > Caused by: java.lang.ClassNotFoundException:
> >>> org.apache.avro.LogicalType
> >>> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:
> 381)
> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >>> >         at sun.misc.Launcher$AppClassLoad
> >>> er.loadClass(Launcher.java:331)
> >>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >>> >         ... 11 more
> >>> >
> >>> > Please let me know what is missing and how to resolve this exception,
> >>> Let
> >>> > me know if you need further details.
> >>> >
> >>> > Thanks,
> >>> > Jilani
> >>> >
> >>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
> >>> wrote:
> >>> >
> >>> >> Hi Jilani,
> >>> >>
> >>> >> This is an example: SQOOP-3053
> >>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
> >>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
> >>> changes on
> >>> >> trunk as it will be used to cut the future release so your patch
> >>> >> definitely
> >>> >> needs to be be able to apply on it.
> >>> >>
> >>> >> Thanks,
> >>> >> Bogi
> >>> >>
> >>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com>
> >>> >> wrote:
> >>> >>
> >>> >> > Hi Bogi,
> >>> >> >
> >>> >> > Can you provide me sample Jira tickets and Review requests similar
> >>> to
> >>> >> > this, to proceed further.
> >>> >> >
> >>> >> > I applied the code changes from sqoop git from this branch
> >>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take
> >>> the
> >>> >> code
> >>> >> > from there and apply the changes before submit review for request.
> >>> >> >
> >>> >> > Thanks,
> >>> >> > Jilani
> >>> >> >
> >>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <
> bogi@cloudera.com>
> >>> >> wrote:
> >>> >> >
> >>> >> >> Hi Jilani,
> >>> >> >>
> >>> >> >> To get your change committed please do the following:
> >>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
> >>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
> >>> >> >> * Create a review request at Apache's review board
> >>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it to
> >>> the
> >>> >> JIRA
> >>> >> >>
> >>> >> >> ticket
> >>> >> >>
> >>> >> >> Please consider the guidelines below:
> >>> >> >>
> >>> >> >> Review board
> >>> >> >> * Summary: generate your summary using the issue's jira key +
> jira
> >>> >> title
> >>> >> >> * Groups: add the relevant group so everyone on the project will
> >>> know
> >>> >> >> about
> >>> >> >> your patch (Sqoop)
> >>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to the
> >>> jira
> >>> >> side
> >>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
> >>> >> >> * And as soon as the patch gets committed, it's very useful for
> the
> >>> >> >> community if you close the review and mark it as "Submitted" at
> the
> >>> >> Review
> >>> >> >> board. The button to do this is top right at your own tickets,
> >>> right
> >>> >> next
> >>> >> >> to  the Download Diff button.
> >>> >> >>
> >>> >> >> Jira
> >>> >> >> * Link: please add the link of the review as an external/web link
> >>> so
> >>> >> it's
> >>> >> >> easy to navigate to the reviews side
> >>> >> >> * Status: mark it as "patch available"
> >>> >> >>
> >>> >> >> Sqoop community will receive emails about your new ticket and
> >>> review
> >>> >> >> request and will review your change.
> >>> >> >>
> >>> >> >> Thanks,
> >>> >> >> Bogi
> >>> >> >>
> >>> >> >>
> >>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <
> >>> jilani2423@gmail.com>
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >> > Do we have any update?
> >>> >> >> >
> >>> >> >> > I did checkout of the 1.4.6 code and done code changes to
> achieve
> >>> >> this
> >>> >> >> and
> >>> >> >> > tested in cluster and it is working as expected. Is there a way
> >>> I can
> >>> >> >> > contribute this as a patch and then the committers can validate
> >>> >> further
> >>> >> >> and
> >>> >> >> > suggest if any changes required to move further. Please suggest
> >>> the
> >>> >> >> > approach.
> >>> >> >> >
> >>> >> >> > Thanks,
> >>> >> >> > Jilani
> >>> >> >> >
> >>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
> >>> jilani2423@gmail.com>
> >>> >> >> > wrote:
> >>> >> >> >
> >>> >> >> > > Hi Liz,
> >>> >> >> > >
> >>> >> >> > > lets say we inserted data in a table with initial import,
> that
> >>> >> looks
> >>> >> >> like
> >>> >> >> > > this in hbase shell
> >>> >> >> > >
> >>> >> >> > >  1                                     column=pay:amount,
> >>> >> >> > > timestamp=1485129654025, value=4.99
> >>> >> >> > >  1
>  column=pay:customer_id,
> >>> >> >> > > timestamp=1485129654025, value=1
> >>> >> >> > >  1
>  column=pay:last_update,
> >>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
> >>> >> >> > >  1
>  column=pay:payment_date,
> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >>> >> >> > >  1                                     column=pay:rental_id,
> >>> >> >> > > timestamp=1485129654025, value=573
> >>> >> >> > >  1                                     column=pay:staff_id,
> >>> >> >> > > timestamp=1485129654025, value=1
> >>> >> >> > >  10                                    column=pay:amount,
> >>> >> >> > > timestamp=1485129504390, value=5.99
> >>> >> >> > >  10
> column=pay:customer_id,
> >>> >> >> > > timestamp=1485129504390, value=1
> >>> >> >> > >  10
> column=pay:last_update,
> >>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
> >>> >> >> > >  10
> column=pay:payment_date,
> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >>> >> >> > >  10                                    column=pay:rental_id,
> >>> >> >> > > timestamp=1485129504390, value=4526
> >>> >> >> > >  10                                    column=pay:staff_id,
> >>> >> >> > > timestamp=1485129504390, value=2
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > > now assume that in source rental_id becomes NULL for rowkey
> >>> "1",
> >>> >> and
> >>> >> >> then
> >>> >> >> > > we are doing incremental import into HBase. With current
> >>> import the
> >>> >> >> final
> >>> >> >> > > HBase data after incremental import will look like this.
> >>> >> >> > >
> >>> >> >> > >  1                                     column=pay:amount,
> >>> >> >> > > timestamp=1485129654025, value=4.99
> >>> >> >> > >  1
>  column=pay:customer_id,
> >>> >> >> > > timestamp=1485129654025, value=1
> >>> >> >> > >  1
>  column=pay:last_update,
> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >>> >> >> > >  1
>  column=pay:payment_date,
> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >>> >> >> > >  1                                     column=pay:rental_id,
> >>> >> >> > > timestamp=1485129654025, value=573
> >>> >> >> > >  1                                     column=pay:staff_id,
> >>> >> >> > > timestamp=1485129654025, value=1
> >>> >> >> > >  10                                    column=pay:amount,
> >>> >> >> > > timestamp=1485129504390, value=5.99
> >>> >> >> > >  10
> column=pay:customer_id,
> >>> >> >> > > timestamp=1485129504390, value=1
> >>> >> >> > >  10
> column=pay:last_update,
> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >>> >> >> > >  10
> column=pay:payment_date,
> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >>> >> >> > >  10                                    column=pay:rental_id,
> >>> >> >> > > timestamp=1485129504390, value=126
> >>> >> >> > >  10                                    column=pay:staff_id,
> >>> >> >> > > timestamp=1485129504390, value=2
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1", the
> >>> final
> >>> >> >> HBase
> >>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
> >>> expecting
> >>> >> >> below
> >>> >> >> > > data for these rowkeys.
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > >  1                                     column=pay:amount,
> >>> >> >> > > timestamp=1485129654025, value=4.99
> >>> >> >> > >  1
>  column=pay:customer_id,
> >>> >> >> > > timestamp=1485129654025, value=1
> >>> >> >> > >  1
>  column=pay:last_update,
> >>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >>> >> >> > >  1
>  column=pay:payment_date,
> >>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >>> >> >> > >  1                                     column=pay:staff_id,
> >>> >> >> > > timestamp=1485129654025, value=1
> >>> >> >> > >  10                                    column=pay:amount,
> >>> >> >> > > timestamp=1485129504390, value=5.99
> >>> >> >> > >  10
> column=pay:customer_id,
> >>> >> >> > > timestamp=1485129504390, value=1
> >>> >> >> > >  10
> column=pay:last_update,
> >>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >>> >> >> > >  10
> column=pay:payment_date,
> >>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >>> >> >> > >  10                                    column=pay:rental_id,
> >>> >> >> > > timestamp=1485129504390, value=126
> >>> >> >> > >  10                                    column=pay:staff_id,
> >>> >> >> > > timestamp=1485129504390, value=2
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > > Please let me know if anything required further.
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> > > Thanks,
> >>> >> >> > > Jilani
> >>> >> >> > >
> >>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> >>> >> >> > > liz.szilagyi@cloudera.com> wrote:
> >>> >> >> > >
> >>> >> >> > >> Hi Jilani,
> >>> >> >> > >> I'm not sure I completely understand what you are trying to
> >>> do.
> >>> >> Could
> >>> >> >> > you
> >>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of
> >>> example
> >>> >> data
> >>> >> >> > >> showing the changes that happen compared to the changes
> you'd
> >>> >> like to
> >>> >> >> > see?
> >>> >> >> > >> Thanks,
> >>> >> >> > >> Liz
> >>> >> >> > >>
> >>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
> >>> >> jilani2423@gmail.com>
> >>> >> >> > >> wrote:
> >>> >> >> > >>
> >>> >> >> > >> >
> >>> >> >> > >> > Please help in resolving the issue, I am going through
> >>> source
> >>> >> code
> >>> >> >> > some
> >>> >> >> > >> > how the required nature is missing, But not sure is it for
> >>> some
> >>> >> >> reason
> >>> >> >> > >> we
> >>> >> >> > >> > avoided this nature.
> >>> >> >> > >> >
> >>> >> >> > >> > Provide me some suggestions how to go with this scenario.
> >>> >> >> > >> >
> >>> >> >> > >> > Thanks,
> >>> >> >> > >> > Jilani
> >>> >> >> > >> >
> >>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
> >>> >> >> jilani2423@gmail.com>
> >>> >> >> > >> > wrote:
> >>> >> >> > >> >
> >>> >> >> > >> >> Hi,
> >>> >> >> > >> >>
> >>> >> >> > >> >> We have a scenario where we are importing data into HBase
> >>> with
> >>> >> >> sqoop
> >>> >> >> > >> >> incremental import.
> >>> >> >> > >> >>
> >>> >> >> > >> >> Lets say we imported a table and later source table got
> >>> updated
> >>> >> >> for
> >>> >> >> > >> some
> >>> >> >> > >> >> columns as null values for some rows. Then while doing
> >>> >> incremental
> >>> >> >> > >> import
> >>> >> >> > >> >> as per HBase these columns should not be there in HBase
> >>> table.
> >>> >> But
> >>> >> >> > >> right
> >>> >> >> > >> >> now these columns will be as it is available with
> previous
> >>> >> values.
> >>> >> >> > >> >>
> >>> >> >> > >> >> Is there any fix to overcome this issue?
> >>> >> >> > >> >>
> >>> >> >> > >> >>
> >>> >> >> > >> >> Thanks,
> >>> >> >> > >> >> Jilani
> >>> >> >> > >> >>
> >>> >> >> > >> >
> >>> >> >> > >> >
> >>> >> >> > >>
> >>> >> >> > >
> >>> >> >> > >
> >>> >> >> >
> >>> >> >>
> >>> >> >
> >>> >> >
> >>> >>
> >>> >
> >>> >
> >>>
> >>
> >>
> >
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Hi Bogi,

Thanks for the providing direction.

As you suggested I explored further and resolved the issue and able to test
the fix on trunk based code changes in my hadoop cluster.

Root cause for my issue:
1.4.6 code base using the same avro version which is there in my hadoop
cluster so there is no issue for that jar component, whereas trunk code
base using the avro-1.8.1 jar files, which is not available in my hadoop
cluster.

Can you suggest how to do unit test etc for this component.

I tried with "test" target, I am getting all as failed as below.

Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.415 sec
    [junit] Running com.cloudera.sqoop.TestDirectImport
    [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
13.705 sec
    [junit] Test com.cloudera.sqoop.TestDirectImport FAILED
    [junit] Running com.cloudera.sqoop.TestExport
    [junit] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time
elapsed: 22.564 sec
    [junit] Test com.cloudera.sqoop.TestExport FAILED
    [junit] Running com.cloudera.sqoop.TestExportUpdate

Do I need to do any changes? I am running from eclipse with "test" target.

Thanks,
Jilani


On Thu, Mar 9, 2017 at 9:42 PM, Jilani Shaik <ji...@gmail.com> wrote:

> Hi Bogi,
>
> - Prepared jar using trunk with "jar-all" target
>
> - Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/
>
> - Moved out existing jar to some other location
>
> - then execute the below command to do import
> sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
> --username test --password test123$ --table payment -m 2 --hbase-table
> /database/demoapp/hbase/payment --column-family pay --hbase-row-key
> payment_id --incremental lastmodified --merge-key payment_id --check-column
> last_update --last-value '2017-01-08 08:02:05.0'
>
>
> The same steps I followed for both the jar from trunk code vs 1.4.6 branch
> code.
>
> Where are you suggesting the multiple avro jars, is it at the time of jar
> preparation or running the command using the jar.
>
>
> Thanks,
> Jilani
>
> On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>
>> Hi Jilani,
>>
>> I suspect that you have an old version of Avro or even multiple Avro
>> versions on your classpath and thus Sqoop uses an older one.
>>
>> Could you please provide a list of the exact commands you have performed
>> so that I can reproduce the issue?
>>
>> Thanks,
>> Bogi
>>
>> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>>> Can some one provide me the pointers what am I missing with trunk vs
>>> 1.4.6
>>> builds, which is giving some error as mentioned in below mail chain.
>>>
>>> I did followed the same ant target to prepare jar for both branches, but
>>> even though 1.4.6 jar is different to 1.4.7 which is created form trunk.
>>>
>>> Thanks,
>>> Jilani
>>>
>>>
>>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
>>> wrote:
>>>
>>> > Hi Bogi,
>>> >
>>> > I am getting below error, when I have prepared jar from trunk and try
>>> to
>>> > do sqoop import with mysql database table and got below exception,
>>> where as
>>> > similar changes are working with branch 1.4.6.
>>> >
>>> >
>>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
>>> 1.4.7-SNAPSHOT
>>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on the
>>> > command-line is insecure. Consider using -P instead.
>>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector for
>>> > Oracle and Hadoop can be called by Sqoop!
>>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
>>> scheme:
>>> > jdbc:mysql:
>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>> > org/apache/avro/LogicalType
>>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
>>> > DefaultManagerFactory.java:67)
>>> >         at org.apache.sqoop.ConnFactory.g
>>> etManager(ConnFactory.java:184)
>>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
>>> > BaseSqoopTool.java:270)
>>> >         at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
>>> >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
>>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
>>> > Caused by: java.lang.ClassNotFoundException:
>>> org.apache.avro.LogicalType
>>> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> >         at sun.misc.Launcher$AppClassLoad
>>> er.loadClass(Launcher.java:331)
>>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> >         ... 11 more
>>> >
>>> > Please let me know what is missing and how to resolve this exception,
>>> Let
>>> > me know if you need further details.
>>> >
>>> > Thanks,
>>> > Jilani
>>> >
>>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
>>> wrote:
>>> >
>>> >> Hi Jilani,
>>> >>
>>> >> This is an example: SQOOP-3053
>>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
>>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
>>> changes on
>>> >> trunk as it will be used to cut the future release so your patch
>>> >> definitely
>>> >> needs to be be able to apply on it.
>>> >>
>>> >> Thanks,
>>> >> Bogi
>>> >>
>>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com>
>>> >> wrote:
>>> >>
>>> >> > Hi Bogi,
>>> >> >
>>> >> > Can you provide me sample Jira tickets and Review requests similar
>>> to
>>> >> > this, to proceed further.
>>> >> >
>>> >> > I applied the code changes from sqoop git from this branch
>>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take
>>> the
>>> >> code
>>> >> > from there and apply the changes before submit review for request.
>>> >> >
>>> >> > Thanks,
>>> >> > Jilani
>>> >> >
>>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com>
>>> >> wrote:
>>> >> >
>>> >> >> Hi Jilani,
>>> >> >>
>>> >> >> To get your change committed please do the following:
>>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
>>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
>>> >> >> * Create a review request at Apache's review board
>>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it to
>>> the
>>> >> JIRA
>>> >> >>
>>> >> >> ticket
>>> >> >>
>>> >> >> Please consider the guidelines below:
>>> >> >>
>>> >> >> Review board
>>> >> >> * Summary: generate your summary using the issue's jira key + jira
>>> >> title
>>> >> >> * Groups: add the relevant group so everyone on the project will
>>> know
>>> >> >> about
>>> >> >> your patch (Sqoop)
>>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to the
>>> jira
>>> >> side
>>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>>> >> >> * And as soon as the patch gets committed, it's very useful for the
>>> >> >> community if you close the review and mark it as "Submitted" at the
>>> >> Review
>>> >> >> board. The button to do this is top right at your own tickets,
>>> right
>>> >> next
>>> >> >> to  the Download Diff button.
>>> >> >>
>>> >> >> Jira
>>> >> >> * Link: please add the link of the review as an external/web link
>>> so
>>> >> it's
>>> >> >> easy to navigate to the reviews side
>>> >> >> * Status: mark it as "patch available"
>>> >> >>
>>> >> >> Sqoop community will receive emails about your new ticket and
>>> review
>>> >> >> request and will review your change.
>>> >> >>
>>> >> >> Thanks,
>>> >> >> Bogi
>>> >> >>
>>> >> >>
>>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <
>>> jilani2423@gmail.com>
>>> >> >> wrote:
>>> >> >>
>>> >> >> > Do we have any update?
>>> >> >> >
>>> >> >> > I did checkout of the 1.4.6 code and done code changes to achieve
>>> >> this
>>> >> >> and
>>> >> >> > tested in cluster and it is working as expected. Is there a way
>>> I can
>>> >> >> > contribute this as a patch and then the committers can validate
>>> >> further
>>> >> >> and
>>> >> >> > suggest if any changes required to move further. Please suggest
>>> the
>>> >> >> > approach.
>>> >> >> >
>>> >> >> > Thanks,
>>> >> >> > Jilani
>>> >> >> >
>>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
>>> jilani2423@gmail.com>
>>> >> >> > wrote:
>>> >> >> >
>>> >> >> > > Hi Liz,
>>> >> >> > >
>>> >> >> > > lets say we inserted data in a table with initial import, that
>>> >> looks
>>> >> >> like
>>> >> >> > > this in hbase shell
>>> >> >> > >
>>> >> >> > >  1                                     column=pay:amount,
>>> >> >> > > timestamp=1485129654025, value=4.99
>>> >> >> > >  1                                     column=pay:customer_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  1                                     column=pay:last_update,
>>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>>> >> >> > >  1                                     column=pay:payment_date,
>>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >> >> > >  1                                     column=pay:rental_id,
>>> >> >> > > timestamp=1485129654025, value=573
>>> >> >> > >  1                                     column=pay:staff_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  10                                    column=pay:amount,
>>> >> >> > > timestamp=1485129504390, value=5.99
>>> >> >> > >  10                                    column=pay:customer_id,
>>> >> >> > > timestamp=1485129504390, value=1
>>> >> >> > >  10                                    column=pay:last_update,
>>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>>> >> >> > >  10                                    column=pay:payment_date,
>>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >> >> > >  10                                    column=pay:rental_id,
>>> >> >> > > timestamp=1485129504390, value=4526
>>> >> >> > >  10                                    column=pay:staff_id,
>>> >> >> > > timestamp=1485129504390, value=2
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > now assume that in source rental_id becomes NULL for rowkey
>>> "1",
>>> >> and
>>> >> >> then
>>> >> >> > > we are doing incremental import into HBase. With current
>>> import the
>>> >> >> final
>>> >> >> > > HBase data after incremental import will look like this.
>>> >> >> > >
>>> >> >> > >  1                                     column=pay:amount,
>>> >> >> > > timestamp=1485129654025, value=4.99
>>> >> >> > >  1                                     column=pay:customer_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  1                                     column=pay:last_update,
>>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>> >> >> > >  1                                     column=pay:payment_date,
>>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >> >> > >  1                                     column=pay:rental_id,
>>> >> >> > > timestamp=1485129654025, value=573
>>> >> >> > >  1                                     column=pay:staff_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  10                                    column=pay:amount,
>>> >> >> > > timestamp=1485129504390, value=5.99
>>> >> >> > >  10                                    column=pay:customer_id,
>>> >> >> > > timestamp=1485129504390, value=1
>>> >> >> > >  10                                    column=pay:last_update,
>>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>> >> >> > >  10                                    column=pay:payment_date,
>>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >> >> > >  10                                    column=pay:rental_id,
>>> >> >> > > timestamp=1485129504390, value=126
>>> >> >> > >  10                                    column=pay:staff_id,
>>> >> >> > > timestamp=1485129504390, value=2
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1", the
>>> final
>>> >> >> HBase
>>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
>>> expecting
>>> >> >> below
>>> >> >> > > data for these rowkeys.
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >  1                                     column=pay:amount,
>>> >> >> > > timestamp=1485129654025, value=4.99
>>> >> >> > >  1                                     column=pay:customer_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  1                                     column=pay:last_update,
>>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>> >> >> > >  1                                     column=pay:payment_date,
>>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >> >> > >  1                                     column=pay:staff_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  10                                    column=pay:amount,
>>> >> >> > > timestamp=1485129504390, value=5.99
>>> >> >> > >  10                                    column=pay:customer_id,
>>> >> >> > > timestamp=1485129504390, value=1
>>> >> >> > >  10                                    column=pay:last_update,
>>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>> >> >> > >  10                                    column=pay:payment_date,
>>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >> >> > >  10                                    column=pay:rental_id,
>>> >> >> > > timestamp=1485129504390, value=126
>>> >> >> > >  10                                    column=pay:staff_id,
>>> >> >> > > timestamp=1485129504390, value=2
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > Please let me know if anything required further.
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > Thanks,
>>> >> >> > > Jilani
>>> >> >> > >
>>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>>> >> >> > > liz.szilagyi@cloudera.com> wrote:
>>> >> >> > >
>>> >> >> > >> Hi Jilani,
>>> >> >> > >> I'm not sure I completely understand what you are trying to
>>> do.
>>> >> Could
>>> >> >> > you
>>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of
>>> example
>>> >> data
>>> >> >> > >> showing the changes that happen compared to the changes you'd
>>> >> like to
>>> >> >> > see?
>>> >> >> > >> Thanks,
>>> >> >> > >> Liz
>>> >> >> > >>
>>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>>> >> jilani2423@gmail.com>
>>> >> >> > >> wrote:
>>> >> >> > >>
>>> >> >> > >> >
>>> >> >> > >> > Please help in resolving the issue, I am going through
>>> source
>>> >> code
>>> >> >> > some
>>> >> >> > >> > how the required nature is missing, But not sure is it for
>>> some
>>> >> >> reason
>>> >> >> > >> we
>>> >> >> > >> > avoided this nature.
>>> >> >> > >> >
>>> >> >> > >> > Provide me some suggestions how to go with this scenario.
>>> >> >> > >> >
>>> >> >> > >> > Thanks,
>>> >> >> > >> > Jilani
>>> >> >> > >> >
>>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>>> >> >> jilani2423@gmail.com>
>>> >> >> > >> > wrote:
>>> >> >> > >> >
>>> >> >> > >> >> Hi,
>>> >> >> > >> >>
>>> >> >> > >> >> We have a scenario where we are importing data into HBase
>>> with
>>> >> >> sqoop
>>> >> >> > >> >> incremental import.
>>> >> >> > >> >>
>>> >> >> > >> >> Lets say we imported a table and later source table got
>>> updated
>>> >> >> for
>>> >> >> > >> some
>>> >> >> > >> >> columns as null values for some rows. Then while doing
>>> >> incremental
>>> >> >> > >> import
>>> >> >> > >> >> as per HBase these columns should not be there in HBase
>>> table.
>>> >> But
>>> >> >> > >> right
>>> >> >> > >> >> now these columns will be as it is available with previous
>>> >> values.
>>> >> >> > >> >>
>>> >> >> > >> >> Is there any fix to overcome this issue?
>>> >> >> > >> >>
>>> >> >> > >> >>
>>> >> >> > >> >> Thanks,
>>> >> >> > >> >> Jilani
>>> >> >> > >> >>
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >>
>>> >> >> > >
>>> >> >> > >
>>> >> >> >
>>> >> >>
>>> >> >
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Hi Bogi,

Thanks for the providing direction.

As you suggested I explored further and resolved the issue and able to test
the fix on trunk based code changes in my hadoop cluster.

Root cause for my issue:
1.4.6 code base using the same avro version which is there in my hadoop
cluster so there is no issue for that jar component, whereas trunk code
base using the avro-1.8.1 jar files, which is not available in my hadoop
cluster.

Can you suggest how to do unit test etc for this component.

I tried with "test" target, I am getting all as failed as below.

Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.415 sec
    [junit] Running com.cloudera.sqoop.TestDirectImport
    [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
13.705 sec
    [junit] Test com.cloudera.sqoop.TestDirectImport FAILED
    [junit] Running com.cloudera.sqoop.TestExport
    [junit] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time
elapsed: 22.564 sec
    [junit] Test com.cloudera.sqoop.TestExport FAILED
    [junit] Running com.cloudera.sqoop.TestExportUpdate

Do I need to do any changes? I am running from eclipse with "test" target.

Thanks,
Jilani


On Thu, Mar 9, 2017 at 9:42 PM, Jilani Shaik <ji...@gmail.com> wrote:

> Hi Bogi,
>
> - Prepared jar using trunk with "jar-all" target
>
> - Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/
>
> - Moved out existing jar to some other location
>
> - then execute the below command to do import
> sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
> --username test --password test123$ --table payment -m 2 --hbase-table
> /database/demoapp/hbase/payment --column-family pay --hbase-row-key
> payment_id --incremental lastmodified --merge-key payment_id --check-column
> last_update --last-value '2017-01-08 08:02:05.0'
>
>
> The same steps I followed for both the jar from trunk code vs 1.4.6 branch
> code.
>
> Where are you suggesting the multiple avro jars, is it at the time of jar
> preparation or running the command using the jar.
>
>
> Thanks,
> Jilani
>
> On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>
>> Hi Jilani,
>>
>> I suspect that you have an old version of Avro or even multiple Avro
>> versions on your classpath and thus Sqoop uses an older one.
>>
>> Could you please provide a list of the exact commands you have performed
>> so that I can reproduce the issue?
>>
>> Thanks,
>> Bogi
>>
>> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>>> Can some one provide me the pointers what am I missing with trunk vs
>>> 1.4.6
>>> builds, which is giving some error as mentioned in below mail chain.
>>>
>>> I did followed the same ant target to prepare jar for both branches, but
>>> even though 1.4.6 jar is different to 1.4.7 which is created form trunk.
>>>
>>> Thanks,
>>> Jilani
>>>
>>>
>>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
>>> wrote:
>>>
>>> > Hi Bogi,
>>> >
>>> > I am getting below error, when I have prepared jar from trunk and try
>>> to
>>> > do sqoop import with mysql database table and got below exception,
>>> where as
>>> > similar changes are working with branch 1.4.6.
>>> >
>>> >
>>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
>>> 1.4.7-SNAPSHOT
>>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on the
>>> > command-line is insecure. Consider using -P instead.
>>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector for
>>> > Oracle and Hadoop can be called by Sqoop!
>>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
>>> scheme:
>>> > jdbc:mysql:
>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>> > org/apache/avro/LogicalType
>>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
>>> > DefaultManagerFactory.java:67)
>>> >         at org.apache.sqoop.ConnFactory.g
>>> etManager(ConnFactory.java:184)
>>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
>>> > BaseSqoopTool.java:270)
>>> >         at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
>>> >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
>>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
>>> > Caused by: java.lang.ClassNotFoundException:
>>> org.apache.avro.LogicalType
>>> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> >         at sun.misc.Launcher$AppClassLoad
>>> er.loadClass(Launcher.java:331)
>>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> >         ... 11 more
>>> >
>>> > Please let me know what is missing and how to resolve this exception,
>>> Let
>>> > me know if you need further details.
>>> >
>>> > Thanks,
>>> > Jilani
>>> >
>>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
>>> wrote:
>>> >
>>> >> Hi Jilani,
>>> >>
>>> >> This is an example: SQOOP-3053
>>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
>>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
>>> changes on
>>> >> trunk as it will be used to cut the future release so your patch
>>> >> definitely
>>> >> needs to be be able to apply on it.
>>> >>
>>> >> Thanks,
>>> >> Bogi
>>> >>
>>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com>
>>> >> wrote:
>>> >>
>>> >> > Hi Bogi,
>>> >> >
>>> >> > Can you provide me sample Jira tickets and Review requests similar
>>> to
>>> >> > this, to proceed further.
>>> >> >
>>> >> > I applied the code changes from sqoop git from this branch
>>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take
>>> the
>>> >> code
>>> >> > from there and apply the changes before submit review for request.
>>> >> >
>>> >> > Thanks,
>>> >> > Jilani
>>> >> >
>>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com>
>>> >> wrote:
>>> >> >
>>> >> >> Hi Jilani,
>>> >> >>
>>> >> >> To get your change committed please do the following:
>>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
>>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
>>> >> >> * Create a review request at Apache's review board
>>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it to
>>> the
>>> >> JIRA
>>> >> >>
>>> >> >> ticket
>>> >> >>
>>> >> >> Please consider the guidelines below:
>>> >> >>
>>> >> >> Review board
>>> >> >> * Summary: generate your summary using the issue's jira key + jira
>>> >> title
>>> >> >> * Groups: add the relevant group so everyone on the project will
>>> know
>>> >> >> about
>>> >> >> your patch (Sqoop)
>>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to the
>>> jira
>>> >> side
>>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>>> >> >> * And as soon as the patch gets committed, it's very useful for the
>>> >> >> community if you close the review and mark it as "Submitted" at the
>>> >> Review
>>> >> >> board. The button to do this is top right at your own tickets,
>>> right
>>> >> next
>>> >> >> to  the Download Diff button.
>>> >> >>
>>> >> >> Jira
>>> >> >> * Link: please add the link of the review as an external/web link
>>> so
>>> >> it's
>>> >> >> easy to navigate to the reviews side
>>> >> >> * Status: mark it as "patch available"
>>> >> >>
>>> >> >> Sqoop community will receive emails about your new ticket and
>>> review
>>> >> >> request and will review your change.
>>> >> >>
>>> >> >> Thanks,
>>> >> >> Bogi
>>> >> >>
>>> >> >>
>>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <
>>> jilani2423@gmail.com>
>>> >> >> wrote:
>>> >> >>
>>> >> >> > Do we have any update?
>>> >> >> >
>>> >> >> > I did checkout of the 1.4.6 code and done code changes to achieve
>>> >> this
>>> >> >> and
>>> >> >> > tested in cluster and it is working as expected. Is there a way
>>> I can
>>> >> >> > contribute this as a patch and then the committers can validate
>>> >> further
>>> >> >> and
>>> >> >> > suggest if any changes required to move further. Please suggest
>>> the
>>> >> >> > approach.
>>> >> >> >
>>> >> >> > Thanks,
>>> >> >> > Jilani
>>> >> >> >
>>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
>>> jilani2423@gmail.com>
>>> >> >> > wrote:
>>> >> >> >
>>> >> >> > > Hi Liz,
>>> >> >> > >
>>> >> >> > > lets say we inserted data in a table with initial import, that
>>> >> looks
>>> >> >> like
>>> >> >> > > this in hbase shell
>>> >> >> > >
>>> >> >> > >  1                                     column=pay:amount,
>>> >> >> > > timestamp=1485129654025, value=4.99
>>> >> >> > >  1                                     column=pay:customer_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  1                                     column=pay:last_update,
>>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>>> >> >> > >  1                                     column=pay:payment_date,
>>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >> >> > >  1                                     column=pay:rental_id,
>>> >> >> > > timestamp=1485129654025, value=573
>>> >> >> > >  1                                     column=pay:staff_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  10                                    column=pay:amount,
>>> >> >> > > timestamp=1485129504390, value=5.99
>>> >> >> > >  10                                    column=pay:customer_id,
>>> >> >> > > timestamp=1485129504390, value=1
>>> >> >> > >  10                                    column=pay:last_update,
>>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>>> >> >> > >  10                                    column=pay:payment_date,
>>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >> >> > >  10                                    column=pay:rental_id,
>>> >> >> > > timestamp=1485129504390, value=4526
>>> >> >> > >  10                                    column=pay:staff_id,
>>> >> >> > > timestamp=1485129504390, value=2
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > now assume that in source rental_id becomes NULL for rowkey
>>> "1",
>>> >> and
>>> >> >> then
>>> >> >> > > we are doing incremental import into HBase. With current
>>> import the
>>> >> >> final
>>> >> >> > > HBase data after incremental import will look like this.
>>> >> >> > >
>>> >> >> > >  1                                     column=pay:amount,
>>> >> >> > > timestamp=1485129654025, value=4.99
>>> >> >> > >  1                                     column=pay:customer_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  1                                     column=pay:last_update,
>>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>> >> >> > >  1                                     column=pay:payment_date,
>>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >> >> > >  1                                     column=pay:rental_id,
>>> >> >> > > timestamp=1485129654025, value=573
>>> >> >> > >  1                                     column=pay:staff_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  10                                    column=pay:amount,
>>> >> >> > > timestamp=1485129504390, value=5.99
>>> >> >> > >  10                                    column=pay:customer_id,
>>> >> >> > > timestamp=1485129504390, value=1
>>> >> >> > >  10                                    column=pay:last_update,
>>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>> >> >> > >  10                                    column=pay:payment_date,
>>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >> >> > >  10                                    column=pay:rental_id,
>>> >> >> > > timestamp=1485129504390, value=126
>>> >> >> > >  10                                    column=pay:staff_id,
>>> >> >> > > timestamp=1485129504390, value=2
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1", the
>>> final
>>> >> >> HBase
>>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
>>> expecting
>>> >> >> below
>>> >> >> > > data for these rowkeys.
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >  1                                     column=pay:amount,
>>> >> >> > > timestamp=1485129654025, value=4.99
>>> >> >> > >  1                                     column=pay:customer_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  1                                     column=pay:last_update,
>>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>> >> >> > >  1                                     column=pay:payment_date,
>>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >> >> > >  1                                     column=pay:staff_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  10                                    column=pay:amount,
>>> >> >> > > timestamp=1485129504390, value=5.99
>>> >> >> > >  10                                    column=pay:customer_id,
>>> >> >> > > timestamp=1485129504390, value=1
>>> >> >> > >  10                                    column=pay:last_update,
>>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>> >> >> > >  10                                    column=pay:payment_date,
>>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >> >> > >  10                                    column=pay:rental_id,
>>> >> >> > > timestamp=1485129504390, value=126
>>> >> >> > >  10                                    column=pay:staff_id,
>>> >> >> > > timestamp=1485129504390, value=2
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > Please let me know if anything required further.
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > Thanks,
>>> >> >> > > Jilani
>>> >> >> > >
>>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>>> >> >> > > liz.szilagyi@cloudera.com> wrote:
>>> >> >> > >
>>> >> >> > >> Hi Jilani,
>>> >> >> > >> I'm not sure I completely understand what you are trying to
>>> do.
>>> >> Could
>>> >> >> > you
>>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of
>>> example
>>> >> data
>>> >> >> > >> showing the changes that happen compared to the changes you'd
>>> >> like to
>>> >> >> > see?
>>> >> >> > >> Thanks,
>>> >> >> > >> Liz
>>> >> >> > >>
>>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>>> >> jilani2423@gmail.com>
>>> >> >> > >> wrote:
>>> >> >> > >>
>>> >> >> > >> >
>>> >> >> > >> > Please help in resolving the issue, I am going through
>>> source
>>> >> code
>>> >> >> > some
>>> >> >> > >> > how the required nature is missing, But not sure is it for
>>> some
>>> >> >> reason
>>> >> >> > >> we
>>> >> >> > >> > avoided this nature.
>>> >> >> > >> >
>>> >> >> > >> > Provide me some suggestions how to go with this scenario.
>>> >> >> > >> >
>>> >> >> > >> > Thanks,
>>> >> >> > >> > Jilani
>>> >> >> > >> >
>>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>>> >> >> jilani2423@gmail.com>
>>> >> >> > >> > wrote:
>>> >> >> > >> >
>>> >> >> > >> >> Hi,
>>> >> >> > >> >>
>>> >> >> > >> >> We have a scenario where we are importing data into HBase
>>> with
>>> >> >> sqoop
>>> >> >> > >> >> incremental import.
>>> >> >> > >> >>
>>> >> >> > >> >> Lets say we imported a table and later source table got
>>> updated
>>> >> >> for
>>> >> >> > >> some
>>> >> >> > >> >> columns as null values for some rows. Then while doing
>>> >> incremental
>>> >> >> > >> import
>>> >> >> > >> >> as per HBase these columns should not be there in HBase
>>> table.
>>> >> But
>>> >> >> > >> right
>>> >> >> > >> >> now these columns will be as it is available with previous
>>> >> values.
>>> >> >> > >> >>
>>> >> >> > >> >> Is there any fix to overcome this issue?
>>> >> >> > >> >>
>>> >> >> > >> >>
>>> >> >> > >> >> Thanks,
>>> >> >> > >> >> Jilani
>>> >> >> > >> >>
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >>
>>> >> >> > >
>>> >> >> > >
>>> >> >> >
>>> >> >>
>>> >> >
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Hi Bogi,

- Prepared jar using trunk with "jar-all" target

- Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/

- Moved out existing jar to some other location

- then execute the below command to do import
sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
--username test --password test123$ --table payment -m 2 --hbase-table
/database/demoapp/hbase/payment --column-family pay --hbase-row-key
payment_id --incremental lastmodified --merge-key payment_id --check-column
last_update --last-value '2017-01-08 08:02:05.0'


The same steps I followed for both the jar from trunk code vs 1.4.6 branch
code.

Where are you suggesting the multiple avro jars, is it at the time of jar
preparation or running the command using the jar.


Thanks,
Jilani

On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com> wrote:

> Hi Jilani,
>
> I suspect that you have an old version of Avro or even multiple Avro
> versions on your classpath and thus Sqoop uses an older one.
>
> Could you please provide a list of the exact commands you have performed
> so that I can reproduce the issue?
>
> Thanks,
> Bogi
>
> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com> wrote:
>
>> Can some one provide me the pointers what am I missing with trunk vs 1.4.6
>> builds, which is giving some error as mentioned in below mail chain.
>>
>> I did followed the same ant target to prepare jar for both branches, but
>> even though 1.4.6 jar is different to 1.4.7 which is created form trunk.
>>
>> Thanks,
>> Jilani
>>
>>
>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>> > Hi Bogi,
>> >
>> > I am getting below error, when I have prepared jar from trunk and try to
>> > do sqoop import with mysql database table and got below exception,
>> where as
>> > similar changes are working with branch 1.4.6.
>> >
>> >
>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
>> 1.4.7-SNAPSHOT
>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on the
>> > command-line is insecure. Consider using -P instead.
>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector for
>> > Oracle and Hadoop can be called by Sqoop!
>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
>> scheme:
>> > jdbc:mysql:
>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> > org/apache/avro/LogicalType
>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
>> > DefaultManagerFactory.java:67)
>> >         at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:184
>> )
>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
>> > BaseSqoopTool.java:270)
>> >         at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
>> >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
>> > Caused by: java.lang.ClassNotFoundException:
>> org.apache.avro.LogicalType
>> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> >         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:
>> 331)
>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> >         ... 11 more
>> >
>> > Please let me know what is missing and how to resolve this exception,
>> Let
>> > me know if you need further details.
>> >
>> > Thanks,
>> > Jilani
>> >
>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
>> wrote:
>> >
>> >> Hi Jilani,
>> >>
>> >> This is an example: SQOOP-3053
>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
>> changes on
>> >> trunk as it will be used to cut the future release so your patch
>> >> definitely
>> >> needs to be be able to apply on it.
>> >>
>> >> Thanks,
>> >> Bogi
>> >>
>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi Bogi,
>> >> >
>> >> > Can you provide me sample Jira tickets and Review requests similar to
>> >> > this, to proceed further.
>> >> >
>> >> > I applied the code changes from sqoop git from this branch
>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take
>> the
>> >> code
>> >> > from there and apply the changes before submit review for request.
>> >> >
>> >> > Thanks,
>> >> > Jilani
>> >> >
>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com>
>> >> wrote:
>> >> >
>> >> >> Hi Jilani,
>> >> >>
>> >> >> To get your change committed please do the following:
>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
>> >> >> * Create a review request at Apache's review board
>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it to
>> the
>> >> JIRA
>> >> >>
>> >> >> ticket
>> >> >>
>> >> >> Please consider the guidelines below:
>> >> >>
>> >> >> Review board
>> >> >> * Summary: generate your summary using the issue's jira key + jira
>> >> title
>> >> >> * Groups: add the relevant group so everyone on the project will
>> know
>> >> >> about
>> >> >> your patch (Sqoop)
>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to the
>> jira
>> >> side
>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>> >> >> * And as soon as the patch gets committed, it's very useful for the
>> >> >> community if you close the review and mark it as "Submitted" at the
>> >> Review
>> >> >> board. The button to do this is top right at your own tickets, right
>> >> next
>> >> >> to  the Download Diff button.
>> >> >>
>> >> >> Jira
>> >> >> * Link: please add the link of the review as an external/web link so
>> >> it's
>> >> >> easy to navigate to the reviews side
>> >> >> * Status: mark it as "patch available"
>> >> >>
>> >> >> Sqoop community will receive emails about your new ticket and review
>> >> >> request and will review your change.
>> >> >>
>> >> >> Thanks,
>> >> >> Bogi
>> >> >>
>> >> >>
>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <jilani2423@gmail.com
>> >
>> >> >> wrote:
>> >> >>
>> >> >> > Do we have any update?
>> >> >> >
>> >> >> > I did checkout of the 1.4.6 code and done code changes to achieve
>> >> this
>> >> >> and
>> >> >> > tested in cluster and it is working as expected. Is there a way I
>> can
>> >> >> > contribute this as a patch and then the committers can validate
>> >> further
>> >> >> and
>> >> >> > suggest if any changes required to move further. Please suggest
>> the
>> >> >> > approach.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Jilani
>> >> >> >
>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
>> jilani2423@gmail.com>
>> >> >> > wrote:
>> >> >> >
>> >> >> > > Hi Liz,
>> >> >> > >
>> >> >> > > lets say we inserted data in a table with initial import, that
>> >> looks
>> >> >> like
>> >> >> > > this in hbase shell
>> >> >> > >
>> >> >> > >  1                                     column=pay:amount,
>> >> >> > > timestamp=1485129654025, value=4.99
>> >> >> > >  1                                     column=pay:customer_id,
>> >> >> > > timestamp=1485129654025, value=1
>> >> >> > >  1                                     column=pay:last_update,
>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>> >> >> > >  1                                     column=pay:payment_date,
>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >> >> > >  1                                     column=pay:rental_id,
>> >> >> > > timestamp=1485129654025, value=573
>> >> >> > >  1                                     column=pay:staff_id,
>> >> >> > > timestamp=1485129654025, value=1
>> >> >> > >  10                                    column=pay:amount,
>> >> >> > > timestamp=1485129504390, value=5.99
>> >> >> > >  10                                    column=pay:customer_id,
>> >> >> > > timestamp=1485129504390, value=1
>> >> >> > >  10                                    column=pay:last_update,
>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>> >> >> > >  10                                    column=pay:payment_date,
>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >> >> > >  10                                    column=pay:rental_id,
>> >> >> > > timestamp=1485129504390, value=4526
>> >> >> > >  10                                    column=pay:staff_id,
>> >> >> > > timestamp=1485129504390, value=2
>> >> >> > >
>> >> >> > >
>> >> >> > > now assume that in source rental_id becomes NULL for rowkey "1",
>> >> and
>> >> >> then
>> >> >> > > we are doing incremental import into HBase. With current import
>> the
>> >> >> final
>> >> >> > > HBase data after incremental import will look like this.
>> >> >> > >
>> >> >> > >  1                                     column=pay:amount,
>> >> >> > > timestamp=1485129654025, value=4.99
>> >> >> > >  1                                     column=pay:customer_id,
>> >> >> > > timestamp=1485129654025, value=1
>> >> >> > >  1                                     column=pay:last_update,
>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >> >> > >  1                                     column=pay:payment_date,
>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >> >> > >  1                                     column=pay:rental_id,
>> >> >> > > timestamp=1485129654025, value=573
>> >> >> > >  1                                     column=pay:staff_id,
>> >> >> > > timestamp=1485129654025, value=1
>> >> >> > >  10                                    column=pay:amount,
>> >> >> > > timestamp=1485129504390, value=5.99
>> >> >> > >  10                                    column=pay:customer_id,
>> >> >> > > timestamp=1485129504390, value=1
>> >> >> > >  10                                    column=pay:last_update,
>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >> >> > >  10                                    column=pay:payment_date,
>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >> >> > >  10                                    column=pay:rental_id,
>> >> >> > > timestamp=1485129504390, value=126
>> >> >> > >  10                                    column=pay:staff_id,
>> >> >> > > timestamp=1485129504390, value=2
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1", the
>> final
>> >> >> HBase
>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
>> expecting
>> >> >> below
>> >> >> > > data for these rowkeys.
>> >> >> > >
>> >> >> > >
>> >> >> > >  1                                     column=pay:amount,
>> >> >> > > timestamp=1485129654025, value=4.99
>> >> >> > >  1                                     column=pay:customer_id,
>> >> >> > > timestamp=1485129654025, value=1
>> >> >> > >  1                                     column=pay:last_update,
>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >> >> > >  1                                     column=pay:payment_date,
>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >> >> > >  1                                     column=pay:staff_id,
>> >> >> > > timestamp=1485129654025, value=1
>> >> >> > >  10                                    column=pay:amount,
>> >> >> > > timestamp=1485129504390, value=5.99
>> >> >> > >  10                                    column=pay:customer_id,
>> >> >> > > timestamp=1485129504390, value=1
>> >> >> > >  10                                    column=pay:last_update,
>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >> >> > >  10                                    column=pay:payment_date,
>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >> >> > >  10                                    column=pay:rental_id,
>> >> >> > > timestamp=1485129504390, value=126
>> >> >> > >  10                                    column=pay:staff_id,
>> >> >> > > timestamp=1485129504390, value=2
>> >> >> > >
>> >> >> > >
>> >> >> > > Please let me know if anything required further.
>> >> >> > >
>> >> >> > >
>> >> >> > > Thanks,
>> >> >> > > Jilani
>> >> >> > >
>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>> >> >> > > liz.szilagyi@cloudera.com> wrote:
>> >> >> > >
>> >> >> > >> Hi Jilani,
>> >> >> > >> I'm not sure I completely understand what you are trying to do.
>> >> Could
>> >> >> > you
>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of example
>> >> data
>> >> >> > >> showing the changes that happen compared to the changes you'd
>> >> like to
>> >> >> > see?
>> >> >> > >> Thanks,
>> >> >> > >> Liz
>> >> >> > >>
>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>> >> jilani2423@gmail.com>
>> >> >> > >> wrote:
>> >> >> > >>
>> >> >> > >> >
>> >> >> > >> > Please help in resolving the issue, I am going through source
>> >> code
>> >> >> > some
>> >> >> > >> > how the required nature is missing, But not sure is it for
>> some
>> >> >> reason
>> >> >> > >> we
>> >> >> > >> > avoided this nature.
>> >> >> > >> >
>> >> >> > >> > Provide me some suggestions how to go with this scenario.
>> >> >> > >> >
>> >> >> > >> > Thanks,
>> >> >> > >> > Jilani
>> >> >> > >> >
>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>> >> >> jilani2423@gmail.com>
>> >> >> > >> > wrote:
>> >> >> > >> >
>> >> >> > >> >> Hi,
>> >> >> > >> >>
>> >> >> > >> >> We have a scenario where we are importing data into HBase
>> with
>> >> >> sqoop
>> >> >> > >> >> incremental import.
>> >> >> > >> >>
>> >> >> > >> >> Lets say we imported a table and later source table got
>> updated
>> >> >> for
>> >> >> > >> some
>> >> >> > >> >> columns as null values for some rows. Then while doing
>> >> incremental
>> >> >> > >> import
>> >> >> > >> >> as per HBase these columns should not be there in HBase
>> table.
>> >> But
>> >> >> > >> right
>> >> >> > >> >> now these columns will be as it is available with previous
>> >> values.
>> >> >> > >> >>
>> >> >> > >> >> Is there any fix to overcome this issue?
>> >> >> > >> >>
>> >> >> > >> >>
>> >> >> > >> >> Thanks,
>> >> >> > >> >> Jilani
>> >> >> > >> >>
>> >> >> > >> >
>> >> >> > >> >
>> >> >> > >>
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Hi Bogi,

- Prepared jar using trunk with "jar-all" target

- Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/

- Moved out existing jar to some other location

- then execute the below command to do import
sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
--username test --password test123$ --table payment -m 2 --hbase-table
/database/demoapp/hbase/payment --column-family pay --hbase-row-key
payment_id --incremental lastmodified --merge-key payment_id --check-column
last_update --last-value '2017-01-08 08:02:05.0'


The same steps I followed for both the jar from trunk code vs 1.4.6 branch
code.

Where are you suggesting the multiple avro jars, is it at the time of jar
preparation or running the command using the jar.


Thanks,
Jilani

On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bo...@cloudera.com> wrote:

> Hi Jilani,
>
> I suspect that you have an old version of Avro or even multiple Avro
> versions on your classpath and thus Sqoop uses an older one.
>
> Could you please provide a list of the exact commands you have performed
> so that I can reproduce the issue?
>
> Thanks,
> Bogi
>
> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com> wrote:
>
>> Can some one provide me the pointers what am I missing with trunk vs 1.4.6
>> builds, which is giving some error as mentioned in below mail chain.
>>
>> I did followed the same ant target to prepare jar for both branches, but
>> even though 1.4.6 jar is different to 1.4.7 which is created form trunk.
>>
>> Thanks,
>> Jilani
>>
>>
>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>> > Hi Bogi,
>> >
>> > I am getting below error, when I have prepared jar from trunk and try to
>> > do sqoop import with mysql database table and got below exception,
>> where as
>> > similar changes are working with branch 1.4.6.
>> >
>> >
>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
>> 1.4.7-SNAPSHOT
>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on the
>> > command-line is insecure. Consider using -P instead.
>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector for
>> > Oracle and Hadoop can be called by Sqoop!
>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
>> scheme:
>> > jdbc:mysql:
>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>> > org/apache/avro/LogicalType
>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
>> > DefaultManagerFactory.java:67)
>> >         at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:184
>> )
>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
>> > BaseSqoopTool.java:270)
>> >         at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
>> >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
>> > Caused by: java.lang.ClassNotFoundException:
>> org.apache.avro.LogicalType
>> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> >         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:
>> 331)
>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> >         ... 11 more
>> >
>> > Please let me know what is missing and how to resolve this exception,
>> Let
>> > me know if you need further details.
>> >
>> > Thanks,
>> > Jilani
>> >
>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
>> wrote:
>> >
>> >> Hi Jilani,
>> >>
>> >> This is an example: SQOOP-3053
>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
>> changes on
>> >> trunk as it will be used to cut the future release so your patch
>> >> definitely
>> >> needs to be be able to apply on it.
>> >>
>> >> Thanks,
>> >> Bogi
>> >>
>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi Bogi,
>> >> >
>> >> > Can you provide me sample Jira tickets and Review requests similar to
>> >> > this, to proceed further.
>> >> >
>> >> > I applied the code changes from sqoop git from this branch
>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take
>> the
>> >> code
>> >> > from there and apply the changes before submit review for request.
>> >> >
>> >> > Thanks,
>> >> > Jilani
>> >> >
>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com>
>> >> wrote:
>> >> >
>> >> >> Hi Jilani,
>> >> >>
>> >> >> To get your change committed please do the following:
>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
>> >> >> * Create a review request at Apache's review board
>> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it to
>> the
>> >> JIRA
>> >> >>
>> >> >> ticket
>> >> >>
>> >> >> Please consider the guidelines below:
>> >> >>
>> >> >> Review board
>> >> >> * Summary: generate your summary using the issue's jira key + jira
>> >> title
>> >> >> * Groups: add the relevant group so everyone on the project will
>> know
>> >> >> about
>> >> >> your patch (Sqoop)
>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to the
>> jira
>> >> side
>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>> >> >> * And as soon as the patch gets committed, it's very useful for the
>> >> >> community if you close the review and mark it as "Submitted" at the
>> >> Review
>> >> >> board. The button to do this is top right at your own tickets, right
>> >> next
>> >> >> to  the Download Diff button.
>> >> >>
>> >> >> Jira
>> >> >> * Link: please add the link of the review as an external/web link so
>> >> it's
>> >> >> easy to navigate to the reviews side
>> >> >> * Status: mark it as "patch available"
>> >> >>
>> >> >> Sqoop community will receive emails about your new ticket and review
>> >> >> request and will review your change.
>> >> >>
>> >> >> Thanks,
>> >> >> Bogi
>> >> >>
>> >> >>
>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <jilani2423@gmail.com
>> >
>> >> >> wrote:
>> >> >>
>> >> >> > Do we have any update?
>> >> >> >
>> >> >> > I did checkout of the 1.4.6 code and done code changes to achieve
>> >> this
>> >> >> and
>> >> >> > tested in cluster and it is working as expected. Is there a way I
>> can
>> >> >> > contribute this as a patch and then the committers can validate
>> >> further
>> >> >> and
>> >> >> > suggest if any changes required to move further. Please suggest
>> the
>> >> >> > approach.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Jilani
>> >> >> >
>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
>> jilani2423@gmail.com>
>> >> >> > wrote:
>> >> >> >
>> >> >> > > Hi Liz,
>> >> >> > >
>> >> >> > > lets say we inserted data in a table with initial import, that
>> >> looks
>> >> >> like
>> >> >> > > this in hbase shell
>> >> >> > >
>> >> >> > >  1                                     column=pay:amount,
>> >> >> > > timestamp=1485129654025, value=4.99
>> >> >> > >  1                                     column=pay:customer_id,
>> >> >> > > timestamp=1485129654025, value=1
>> >> >> > >  1                                     column=pay:last_update,
>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>> >> >> > >  1                                     column=pay:payment_date,
>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >> >> > >  1                                     column=pay:rental_id,
>> >> >> > > timestamp=1485129654025, value=573
>> >> >> > >  1                                     column=pay:staff_id,
>> >> >> > > timestamp=1485129654025, value=1
>> >> >> > >  10                                    column=pay:amount,
>> >> >> > > timestamp=1485129504390, value=5.99
>> >> >> > >  10                                    column=pay:customer_id,
>> >> >> > > timestamp=1485129504390, value=1
>> >> >> > >  10                                    column=pay:last_update,
>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>> >> >> > >  10                                    column=pay:payment_date,
>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >> >> > >  10                                    column=pay:rental_id,
>> >> >> > > timestamp=1485129504390, value=4526
>> >> >> > >  10                                    column=pay:staff_id,
>> >> >> > > timestamp=1485129504390, value=2
>> >> >> > >
>> >> >> > >
>> >> >> > > now assume that in source rental_id becomes NULL for rowkey "1",
>> >> and
>> >> >> then
>> >> >> > > we are doing incremental import into HBase. With current import
>> the
>> >> >> final
>> >> >> > > HBase data after incremental import will look like this.
>> >> >> > >
>> >> >> > >  1                                     column=pay:amount,
>> >> >> > > timestamp=1485129654025, value=4.99
>> >> >> > >  1                                     column=pay:customer_id,
>> >> >> > > timestamp=1485129654025, value=1
>> >> >> > >  1                                     column=pay:last_update,
>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >> >> > >  1                                     column=pay:payment_date,
>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >> >> > >  1                                     column=pay:rental_id,
>> >> >> > > timestamp=1485129654025, value=573
>> >> >> > >  1                                     column=pay:staff_id,
>> >> >> > > timestamp=1485129654025, value=1
>> >> >> > >  10                                    column=pay:amount,
>> >> >> > > timestamp=1485129504390, value=5.99
>> >> >> > >  10                                    column=pay:customer_id,
>> >> >> > > timestamp=1485129504390, value=1
>> >> >> > >  10                                    column=pay:last_update,
>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >> >> > >  10                                    column=pay:payment_date,
>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >> >> > >  10                                    column=pay:rental_id,
>> >> >> > > timestamp=1485129504390, value=126
>> >> >> > >  10                                    column=pay:staff_id,
>> >> >> > > timestamp=1485129504390, value=2
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > As source column "rental_id" becomes NULL for rowkey "1", the
>> final
>> >> >> HBase
>> >> >> > > should not have the "rental_id" for this rowkey "1". I am
>> expecting
>> >> >> below
>> >> >> > > data for these rowkeys.
>> >> >> > >
>> >> >> > >
>> >> >> > >  1                                     column=pay:amount,
>> >> >> > > timestamp=1485129654025, value=4.99
>> >> >> > >  1                                     column=pay:customer_id,
>> >> >> > > timestamp=1485129654025, value=1
>> >> >> > >  1                                     column=pay:last_update,
>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >> >> > >  1                                     column=pay:payment_date,
>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >> >> > >  1                                     column=pay:staff_id,
>> >> >> > > timestamp=1485129654025, value=1
>> >> >> > >  10                                    column=pay:amount,
>> >> >> > > timestamp=1485129504390, value=5.99
>> >> >> > >  10                                    column=pay:customer_id,
>> >> >> > > timestamp=1485129504390, value=1
>> >> >> > >  10                                    column=pay:last_update,
>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >> >> > >  10                                    column=pay:payment_date,
>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >> >> > >  10                                    column=pay:rental_id,
>> >> >> > > timestamp=1485129504390, value=126
>> >> >> > >  10                                    column=pay:staff_id,
>> >> >> > > timestamp=1485129504390, value=2
>> >> >> > >
>> >> >> > >
>> >> >> > > Please let me know if anything required further.
>> >> >> > >
>> >> >> > >
>> >> >> > > Thanks,
>> >> >> > > Jilani
>> >> >> > >
>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>> >> >> > > liz.szilagyi@cloudera.com> wrote:
>> >> >> > >
>> >> >> > >> Hi Jilani,
>> >> >> > >> I'm not sure I completely understand what you are trying to do.
>> >> Could
>> >> >> > you
>> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of example
>> >> data
>> >> >> > >> showing the changes that happen compared to the changes you'd
>> >> like to
>> >> >> > see?
>> >> >> > >> Thanks,
>> >> >> > >> Liz
>> >> >> > >>
>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>> >> jilani2423@gmail.com>
>> >> >> > >> wrote:
>> >> >> > >>
>> >> >> > >> >
>> >> >> > >> > Please help in resolving the issue, I am going through source
>> >> code
>> >> >> > some
>> >> >> > >> > how the required nature is missing, But not sure is it for
>> some
>> >> >> reason
>> >> >> > >> we
>> >> >> > >> > avoided this nature.
>> >> >> > >> >
>> >> >> > >> > Provide me some suggestions how to go with this scenario.
>> >> >> > >> >
>> >> >> > >> > Thanks,
>> >> >> > >> > Jilani
>> >> >> > >> >
>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>> >> >> jilani2423@gmail.com>
>> >> >> > >> > wrote:
>> >> >> > >> >
>> >> >> > >> >> Hi,
>> >> >> > >> >>
>> >> >> > >> >> We have a scenario where we are importing data into HBase
>> with
>> >> >> sqoop
>> >> >> > >> >> incremental import.
>> >> >> > >> >>
>> >> >> > >> >> Lets say we imported a table and later source table got
>> updated
>> >> >> for
>> >> >> > >> some
>> >> >> > >> >> columns as null values for some rows. Then while doing
>> >> incremental
>> >> >> > >> import
>> >> >> > >> >> as per HBase these columns should not be there in HBase
>> table.
>> >> But
>> >> >> > >> right
>> >> >> > >> >> now these columns will be as it is available with previous
>> >> values.
>> >> >> > >> >>
>> >> >> > >> >> Is there any fix to overcome this issue?
>> >> >> > >> >>
>> >> >> > >> >>
>> >> >> > >> >> Thanks,
>> >> >> > >> >> Jilani
>> >> >> > >> >>
>> >> >> > >> >
>> >> >> > >> >
>> >> >> > >>
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Boglarka Egyed <bo...@cloudera.com>.
Hi Jilani,

I suspect that you have an old version of Avro or even multiple Avro
versions on your classpath and thus Sqoop uses an older one.

Could you please provide a list of the exact commands you have performed so
that I can reproduce the issue?

Thanks,
Bogi

On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com> wrote:

> Can some one provide me the pointers what am I missing with trunk vs 1.4.6
> builds, which is giving some error as mentioned in below mail chain.
>
> I did followed the same ant target to prepare jar for both branches, but
> even though 1.4.6 jar is different to 1.4.7 which is created form trunk.
>
> Thanks,
> Jilani
>
>
> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com> wrote:
>
> > Hi Bogi,
> >
> > I am getting below error, when I have prepared jar from trunk and try to
> > do sqoop import with mysql database table and got below exception, where
> as
> > similar changes are working with branch 1.4.6.
> >
> >
> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7-SNAPSHOT
> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on the
> > command-line is insecure. Consider using -P instead.
> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
> > com.cloudera.sqoop.manager.DefaultManagerFactory
> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector for
> > Oracle and Hadoop can be called by Sqoop!
> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
> > com.cloudera.sqoop.manager.DefaultManagerFactory
> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
> scheme:
> > jdbc:mysql:
> > Exception in thread "main" java.lang.NoClassDefFoundError:
> > org/apache/avro/LogicalType
> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
> > DefaultManagerFactory.java:67)
> >         at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:184)
> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
> > BaseSqoopTool.java:270)
> >         at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
> >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
> > Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType
> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >         ... 11 more
> >
> > Please let me know what is missing and how to resolve this exception, Let
> > me know if you need further details.
> >
> > Thanks,
> > Jilani
> >
> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
> wrote:
> >
> >> Hi Jilani,
> >>
> >> This is an example: SQOOP-3053
> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
> >> <https://reviews.apache.org/r/54206/> linked. Please make your changes
> on
> >> trunk as it will be used to cut the future release so your patch
> >> definitely
> >> needs to be be able to apply on it.
> >>
> >> Thanks,
> >> Bogi
> >>
> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com>
> >> wrote:
> >>
> >> > Hi Bogi,
> >> >
> >> > Can you provide me sample Jira tickets and Review requests similar to
> >> > this, to proceed further.
> >> >
> >> > I applied the code changes from sqoop git from this branch
> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take the
> >> code
> >> > from there and apply the changes before submit review for request.
> >> >
> >> > Thanks,
> >> > Jilani
> >> >
> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com>
> >> wrote:
> >> >
> >> >> Hi Jilani,
> >> >>
> >> >> To get your change committed please do the following:
> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
> >> >> * Create a review request at Apache's review board
> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it to the
> >> JIRA
> >> >>
> >> >> ticket
> >> >>
> >> >> Please consider the guidelines below:
> >> >>
> >> >> Review board
> >> >> * Summary: generate your summary using the issue's jira key + jira
> >> title
> >> >> * Groups: add the relevant group so everyone on the project will know
> >> >> about
> >> >> your patch (Sqoop)
> >> >> * Bugs: add the issue's jira key so it's easy to navigate to the jira
> >> side
> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
> >> >> * And as soon as the patch gets committed, it's very useful for the
> >> >> community if you close the review and mark it as "Submitted" at the
> >> Review
> >> >> board. The button to do this is top right at your own tickets, right
> >> next
> >> >> to  the Download Diff button.
> >> >>
> >> >> Jira
> >> >> * Link: please add the link of the review as an external/web link so
> >> it's
> >> >> easy to navigate to the reviews side
> >> >> * Status: mark it as "patch available"
> >> >>
> >> >> Sqoop community will receive emails about your new ticket and review
> >> >> request and will review your change.
> >> >>
> >> >> Thanks,
> >> >> Bogi
> >> >>
> >> >>
> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <ji...@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > Do we have any update?
> >> >> >
> >> >> > I did checkout of the 1.4.6 code and done code changes to achieve
> >> this
> >> >> and
> >> >> > tested in cluster and it is working as expected. Is there a way I
> can
> >> >> > contribute this as a patch and then the committers can validate
> >> further
> >> >> and
> >> >> > suggest if any changes required to move further. Please suggest the
> >> >> > approach.
> >> >> >
> >> >> > Thanks,
> >> >> > Jilani
> >> >> >
> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
> jilani2423@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> > > Hi Liz,
> >> >> > >
> >> >> > > lets say we inserted data in a table with initial import, that
> >> looks
> >> >> like
> >> >> > > this in hbase shell
> >> >> > >
> >> >> > >  1                                     column=pay:amount,
> >> >> > > timestamp=1485129654025, value=4.99
> >> >> > >  1                                     column=pay:customer_id,
> >> >> > > timestamp=1485129654025, value=1
> >> >> > >  1                                     column=pay:last_update,
> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
> >> >> > >  1                                     column=pay:payment_date,
> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >> >> > >  1                                     column=pay:rental_id,
> >> >> > > timestamp=1485129654025, value=573
> >> >> > >  1                                     column=pay:staff_id,
> >> >> > > timestamp=1485129654025, value=1
> >> >> > >  10                                    column=pay:amount,
> >> >> > > timestamp=1485129504390, value=5.99
> >> >> > >  10                                    column=pay:customer_id,
> >> >> > > timestamp=1485129504390, value=1
> >> >> > >  10                                    column=pay:last_update,
> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
> >> >> > >  10                                    column=pay:payment_date,
> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >> >> > >  10                                    column=pay:rental_id,
> >> >> > > timestamp=1485129504390, value=4526
> >> >> > >  10                                    column=pay:staff_id,
> >> >> > > timestamp=1485129504390, value=2
> >> >> > >
> >> >> > >
> >> >> > > now assume that in source rental_id becomes NULL for rowkey "1",
> >> and
> >> >> then
> >> >> > > we are doing incremental import into HBase. With current import
> the
> >> >> final
> >> >> > > HBase data after incremental import will look like this.
> >> >> > >
> >> >> > >  1                                     column=pay:amount,
> >> >> > > timestamp=1485129654025, value=4.99
> >> >> > >  1                                     column=pay:customer_id,
> >> >> > > timestamp=1485129654025, value=1
> >> >> > >  1                                     column=pay:last_update,
> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >> >> > >  1                                     column=pay:payment_date,
> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >> >> > >  1                                     column=pay:rental_id,
> >> >> > > timestamp=1485129654025, value=573
> >> >> > >  1                                     column=pay:staff_id,
> >> >> > > timestamp=1485129654025, value=1
> >> >> > >  10                                    column=pay:amount,
> >> >> > > timestamp=1485129504390, value=5.99
> >> >> > >  10                                    column=pay:customer_id,
> >> >> > > timestamp=1485129504390, value=1
> >> >> > >  10                                    column=pay:last_update,
> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >> >> > >  10                                    column=pay:payment_date,
> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >> >> > >  10                                    column=pay:rental_id,
> >> >> > > timestamp=1485129504390, value=126
> >> >> > >  10                                    column=pay:staff_id,
> >> >> > > timestamp=1485129504390, value=2
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > As source column "rental_id" becomes NULL for rowkey "1", the
> final
> >> >> HBase
> >> >> > > should not have the "rental_id" for this rowkey "1". I am
> expecting
> >> >> below
> >> >> > > data for these rowkeys.
> >> >> > >
> >> >> > >
> >> >> > >  1                                     column=pay:amount,
> >> >> > > timestamp=1485129654025, value=4.99
> >> >> > >  1                                     column=pay:customer_id,
> >> >> > > timestamp=1485129654025, value=1
> >> >> > >  1                                     column=pay:last_update,
> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >> >> > >  1                                     column=pay:payment_date,
> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >> >> > >  1                                     column=pay:staff_id,
> >> >> > > timestamp=1485129654025, value=1
> >> >> > >  10                                    column=pay:amount,
> >> >> > > timestamp=1485129504390, value=5.99
> >> >> > >  10                                    column=pay:customer_id,
> >> >> > > timestamp=1485129504390, value=1
> >> >> > >  10                                    column=pay:last_update,
> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >> >> > >  10                                    column=pay:payment_date,
> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >> >> > >  10                                    column=pay:rental_id,
> >> >> > > timestamp=1485129504390, value=126
> >> >> > >  10                                    column=pay:staff_id,
> >> >> > > timestamp=1485129504390, value=2
> >> >> > >
> >> >> > >
> >> >> > > Please let me know if anything required further.
> >> >> > >
> >> >> > >
> >> >> > > Thanks,
> >> >> > > Jilani
> >> >> > >
> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> >> >> > > liz.szilagyi@cloudera.com> wrote:
> >> >> > >
> >> >> > >> Hi Jilani,
> >> >> > >> I'm not sure I completely understand what you are trying to do.
> >> Could
> >> >> > you
> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of example
> >> data
> >> >> > >> showing the changes that happen compared to the changes you'd
> >> like to
> >> >> > see?
> >> >> > >> Thanks,
> >> >> > >> Liz
> >> >> > >>
> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
> >> jilani2423@gmail.com>
> >> >> > >> wrote:
> >> >> > >>
> >> >> > >> >
> >> >> > >> > Please help in resolving the issue, I am going through source
> >> code
> >> >> > some
> >> >> > >> > how the required nature is missing, But not sure is it for
> some
> >> >> reason
> >> >> > >> we
> >> >> > >> > avoided this nature.
> >> >> > >> >
> >> >> > >> > Provide me some suggestions how to go with this scenario.
> >> >> > >> >
> >> >> > >> > Thanks,
> >> >> > >> > Jilani
> >> >> > >> >
> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
> >> >> jilani2423@gmail.com>
> >> >> > >> > wrote:
> >> >> > >> >
> >> >> > >> >> Hi,
> >> >> > >> >>
> >> >> > >> >> We have a scenario where we are importing data into HBase
> with
> >> >> sqoop
> >> >> > >> >> incremental import.
> >> >> > >> >>
> >> >> > >> >> Lets say we imported a table and later source table got
> updated
> >> >> for
> >> >> > >> some
> >> >> > >> >> columns as null values for some rows. Then while doing
> >> incremental
> >> >> > >> import
> >> >> > >> >> as per HBase these columns should not be there in HBase
> table.
> >> But
> >> >> > >> right
> >> >> > >> >> now these columns will be as it is available with previous
> >> values.
> >> >> > >> >>
> >> >> > >> >> Is there any fix to overcome this issue?
> >> >> > >> >>
> >> >> > >> >>
> >> >> > >> >> Thanks,
> >> >> > >> >> Jilani
> >> >> > >> >>
> >> >> > >> >
> >> >> > >> >
> >> >> > >>
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Boglarka Egyed <bo...@cloudera.com>.
Hi Jilani,

I suspect that you have an old version of Avro or even multiple Avro
versions on your classpath and thus Sqoop uses an older one.

Could you please provide a list of the exact commands you have performed so
that I can reproduce the issue?

Thanks,
Bogi

On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <ji...@gmail.com> wrote:

> Can some one provide me the pointers what am I missing with trunk vs 1.4.6
> builds, which is giving some error as mentioned in below mail chain.
>
> I did followed the same ant target to prepare jar for both branches, but
> even though 1.4.6 jar is different to 1.4.7 which is created form trunk.
>
> Thanks,
> Jilani
>
>
> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com> wrote:
>
> > Hi Bogi,
> >
> > I am getting below error, when I have prepared jar from trunk and try to
> > do sqoop import with mysql database table and got below exception, where
> as
> > similar changes are working with branch 1.4.6.
> >
> >
> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7-SNAPSHOT
> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on the
> > command-line is insecure. Consider using -P instead.
> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
> > com.cloudera.sqoop.manager.DefaultManagerFactory
> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector for
> > Oracle and Hadoop can be called by Sqoop!
> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
> > com.cloudera.sqoop.manager.DefaultManagerFactory
> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
> scheme:
> > jdbc:mysql:
> > Exception in thread "main" java.lang.NoClassDefFoundError:
> > org/apache/avro/LogicalType
> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
> > DefaultManagerFactory.java:67)
> >         at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:184)
> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
> > BaseSqoopTool.java:270)
> >         at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
> >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
> > Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType
> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >         ... 11 more
> >
> > Please let me know what is missing and how to resolve this exception, Let
> > me know if you need further details.
> >
> > Thanks,
> > Jilani
> >
> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com>
> wrote:
> >
> >> Hi Jilani,
> >>
> >> This is an example: SQOOP-3053
> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
> >> <https://reviews.apache.org/r/54206/> linked. Please make your changes
> on
> >> trunk as it will be used to cut the future release so your patch
> >> definitely
> >> needs to be be able to apply on it.
> >>
> >> Thanks,
> >> Bogi
> >>
> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com>
> >> wrote:
> >>
> >> > Hi Bogi,
> >> >
> >> > Can you provide me sample Jira tickets and Review requests similar to
> >> > this, to proceed further.
> >> >
> >> > I applied the code changes from sqoop git from this branch
> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take the
> >> code
> >> > from there and apply the changes before submit review for request.
> >> >
> >> > Thanks,
> >> > Jilani
> >> >
> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com>
> >> wrote:
> >> >
> >> >> Hi Jilani,
> >> >>
> >> >> To get your change committed please do the following:
> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
> >> >> * Create a review request at Apache's review board
> >> >> <https://reviews.apache.org/r/> for project Sqoop and link it to the
> >> JIRA
> >> >>
> >> >> ticket
> >> >>
> >> >> Please consider the guidelines below:
> >> >>
> >> >> Review board
> >> >> * Summary: generate your summary using the issue's jira key + jira
> >> title
> >> >> * Groups: add the relevant group so everyone on the project will know
> >> >> about
> >> >> your patch (Sqoop)
> >> >> * Bugs: add the issue's jira key so it's easy to navigate to the jira
> >> side
> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
> >> >> * And as soon as the patch gets committed, it's very useful for the
> >> >> community if you close the review and mark it as "Submitted" at the
> >> Review
> >> >> board. The button to do this is top right at your own tickets, right
> >> next
> >> >> to  the Download Diff button.
> >> >>
> >> >> Jira
> >> >> * Link: please add the link of the review as an external/web link so
> >> it's
> >> >> easy to navigate to the reviews side
> >> >> * Status: mark it as "patch available"
> >> >>
> >> >> Sqoop community will receive emails about your new ticket and review
> >> >> request and will review your change.
> >> >>
> >> >> Thanks,
> >> >> Bogi
> >> >>
> >> >>
> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <ji...@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > Do we have any update?
> >> >> >
> >> >> > I did checkout of the 1.4.6 code and done code changes to achieve
> >> this
> >> >> and
> >> >> > tested in cluster and it is working as expected. Is there a way I
> can
> >> >> > contribute this as a patch and then the committers can validate
> >> further
> >> >> and
> >> >> > suggest if any changes required to move further. Please suggest the
> >> >> > approach.
> >> >> >
> >> >> > Thanks,
> >> >> > Jilani
> >> >> >
> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
> jilani2423@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> > > Hi Liz,
> >> >> > >
> >> >> > > lets say we inserted data in a table with initial import, that
> >> looks
> >> >> like
> >> >> > > this in hbase shell
> >> >> > >
> >> >> > >  1                                     column=pay:amount,
> >> >> > > timestamp=1485129654025, value=4.99
> >> >> > >  1                                     column=pay:customer_id,
> >> >> > > timestamp=1485129654025, value=1
> >> >> > >  1                                     column=pay:last_update,
> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
> >> >> > >  1                                     column=pay:payment_date,
> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >> >> > >  1                                     column=pay:rental_id,
> >> >> > > timestamp=1485129654025, value=573
> >> >> > >  1                                     column=pay:staff_id,
> >> >> > > timestamp=1485129654025, value=1
> >> >> > >  10                                    column=pay:amount,
> >> >> > > timestamp=1485129504390, value=5.99
> >> >> > >  10                                    column=pay:customer_id,
> >> >> > > timestamp=1485129504390, value=1
> >> >> > >  10                                    column=pay:last_update,
> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
> >> >> > >  10                                    column=pay:payment_date,
> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >> >> > >  10                                    column=pay:rental_id,
> >> >> > > timestamp=1485129504390, value=4526
> >> >> > >  10                                    column=pay:staff_id,
> >> >> > > timestamp=1485129504390, value=2
> >> >> > >
> >> >> > >
> >> >> > > now assume that in source rental_id becomes NULL for rowkey "1",
> >> and
> >> >> then
> >> >> > > we are doing incremental import into HBase. With current import
> the
> >> >> final
> >> >> > > HBase data after incremental import will look like this.
> >> >> > >
> >> >> > >  1                                     column=pay:amount,
> >> >> > > timestamp=1485129654025, value=4.99
> >> >> > >  1                                     column=pay:customer_id,
> >> >> > > timestamp=1485129654025, value=1
> >> >> > >  1                                     column=pay:last_update,
> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >> >> > >  1                                     column=pay:payment_date,
> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >> >> > >  1                                     column=pay:rental_id,
> >> >> > > timestamp=1485129654025, value=573
> >> >> > >  1                                     column=pay:staff_id,
> >> >> > > timestamp=1485129654025, value=1
> >> >> > >  10                                    column=pay:amount,
> >> >> > > timestamp=1485129504390, value=5.99
> >> >> > >  10                                    column=pay:customer_id,
> >> >> > > timestamp=1485129504390, value=1
> >> >> > >  10                                    column=pay:last_update,
> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >> >> > >  10                                    column=pay:payment_date,
> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >> >> > >  10                                    column=pay:rental_id,
> >> >> > > timestamp=1485129504390, value=126
> >> >> > >  10                                    column=pay:staff_id,
> >> >> > > timestamp=1485129504390, value=2
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > As source column "rental_id" becomes NULL for rowkey "1", the
> final
> >> >> HBase
> >> >> > > should not have the "rental_id" for this rowkey "1". I am
> expecting
> >> >> below
> >> >> > > data for these rowkeys.
> >> >> > >
> >> >> > >
> >> >> > >  1                                     column=pay:amount,
> >> >> > > timestamp=1485129654025, value=4.99
> >> >> > >  1                                     column=pay:customer_id,
> >> >> > > timestamp=1485129654025, value=1
> >> >> > >  1                                     column=pay:last_update,
> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >> >> > >  1                                     column=pay:payment_date,
> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >> >> > >  1                                     column=pay:staff_id,
> >> >> > > timestamp=1485129654025, value=1
> >> >> > >  10                                    column=pay:amount,
> >> >> > > timestamp=1485129504390, value=5.99
> >> >> > >  10                                    column=pay:customer_id,
> >> >> > > timestamp=1485129504390, value=1
> >> >> > >  10                                    column=pay:last_update,
> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >> >> > >  10                                    column=pay:payment_date,
> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >> >> > >  10                                    column=pay:rental_id,
> >> >> > > timestamp=1485129504390, value=126
> >> >> > >  10                                    column=pay:staff_id,
> >> >> > > timestamp=1485129504390, value=2
> >> >> > >
> >> >> > >
> >> >> > > Please let me know if anything required further.
> >> >> > >
> >> >> > >
> >> >> > > Thanks,
> >> >> > > Jilani
> >> >> > >
> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> >> >> > > liz.szilagyi@cloudera.com> wrote:
> >> >> > >
> >> >> > >> Hi Jilani,
> >> >> > >> I'm not sure I completely understand what you are trying to do.
> >> Could
> >> >> > you
> >> >> > >> give us some examples with e.g. 4 columns and 2 rows of example
> >> data
> >> >> > >> showing the changes that happen compared to the changes you'd
> >> like to
> >> >> > see?
> >> >> > >> Thanks,
> >> >> > >> Liz
> >> >> > >>
> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
> >> jilani2423@gmail.com>
> >> >> > >> wrote:
> >> >> > >>
> >> >> > >> >
> >> >> > >> > Please help in resolving the issue, I am going through source
> >> code
> >> >> > some
> >> >> > >> > how the required nature is missing, But not sure is it for
> some
> >> >> reason
> >> >> > >> we
> >> >> > >> > avoided this nature.
> >> >> > >> >
> >> >> > >> > Provide me some suggestions how to go with this scenario.
> >> >> > >> >
> >> >> > >> > Thanks,
> >> >> > >> > Jilani
> >> >> > >> >
> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
> >> >> jilani2423@gmail.com>
> >> >> > >> > wrote:
> >> >> > >> >
> >> >> > >> >> Hi,
> >> >> > >> >>
> >> >> > >> >> We have a scenario where we are importing data into HBase
> with
> >> >> sqoop
> >> >> > >> >> incremental import.
> >> >> > >> >>
> >> >> > >> >> Lets say we imported a table and later source table got
> updated
> >> >> for
> >> >> > >> some
> >> >> > >> >> columns as null values for some rows. Then while doing
> >> incremental
> >> >> > >> import
> >> >> > >> >> as per HBase these columns should not be there in HBase
> table.
> >> But
> >> >> > >> right
> >> >> > >> >> now these columns will be as it is available with previous
> >> values.
> >> >> > >> >>
> >> >> > >> >> Is there any fix to overcome this issue?
> >> >> > >> >>
> >> >> > >> >>
> >> >> > >> >> Thanks,
> >> >> > >> >> Jilani
> >> >> > >> >>
> >> >> > >> >
> >> >> > >> >
> >> >> > >>
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Can some one provide me the pointers what am I missing with trunk vs 1.4.6
builds, which is giving some error as mentioned in below mail chain.

I did followed the same ant target to prepare jar for both branches, but
even though 1.4.6 jar is different to 1.4.7 which is created form trunk.

Thanks,
Jilani


On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com> wrote:

> Hi Bogi,
>
> I am getting below error, when I have prepared jar from trunk and try to
> do sqoop import with mysql database table and got below exception, where as
> similar changes are working with branch 1.4.6.
>
>
> 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7-SNAPSHOT
> 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
> 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on the
> command-line is insecure. Consider using -P instead.
> 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
> org.apache.sqoop.manager.oracle.OraOopManagerFactory
> 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
> com.cloudera.sqoop.manager.DefaultManagerFactory
> 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
> org.apache.sqoop.manager.oracle.OraOopManagerFactory
> 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector for
> Oracle and Hadoop can be called by Sqoop!
> 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
> com.cloudera.sqoop.manager.DefaultManagerFactory
> 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with scheme:
> jdbc:mysql:
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/avro/LogicalType
>         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
> DefaultManagerFactory.java:67)
>         at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:184)
>         at org.apache.sqoop.tool.BaseSqoopTool.init(
> BaseSqoopTool.java:270)
>         at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
>         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
>         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
> Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         ... 11 more
>
> Please let me know what is missing and how to resolve this exception, Let
> me know if you need further details.
>
> Thanks,
> Jilani
>
> On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>
>> Hi Jilani,
>>
>> This is an example: SQOOP-3053
>> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
>> <https://reviews.apache.org/r/54206/> linked. Please make your changes on
>> trunk as it will be used to cut the future release so your patch
>> definitely
>> needs to be be able to apply on it.
>>
>> Thanks,
>> Bogi
>>
>> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>> > Hi Bogi,
>> >
>> > Can you provide me sample Jira tickets and Review requests similar to
>> > this, to proceed further.
>> >
>> > I applied the code changes from sqoop git from this branch
>> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take the
>> code
>> > from there and apply the changes before submit review for request.
>> >
>> > Thanks,
>> > Jilani
>> >
>> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com>
>> wrote:
>> >
>> >> Hi Jilani,
>> >>
>> >> To get your change committed please do the following:
>> >> * Open a JIRA ticket for your change in Apache's JIRA system
>> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
>> >> * Create a review request at Apache's review board
>> >> <https://reviews.apache.org/r/> for project Sqoop and link it to the
>> JIRA
>> >>
>> >> ticket
>> >>
>> >> Please consider the guidelines below:
>> >>
>> >> Review board
>> >> * Summary: generate your summary using the issue's jira key + jira
>> title
>> >> * Groups: add the relevant group so everyone on the project will know
>> >> about
>> >> your patch (Sqoop)
>> >> * Bugs: add the issue's jira key so it's easy to navigate to the jira
>> side
>> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>> >> * And as soon as the patch gets committed, it's very useful for the
>> >> community if you close the review and mark it as "Submitted" at the
>> Review
>> >> board. The button to do this is top right at your own tickets, right
>> next
>> >> to  the Download Diff button.
>> >>
>> >> Jira
>> >> * Link: please add the link of the review as an external/web link so
>> it's
>> >> easy to navigate to the reviews side
>> >> * Status: mark it as "patch available"
>> >>
>> >> Sqoop community will receive emails about your new ticket and review
>> >> request and will review your change.
>> >>
>> >> Thanks,
>> >> Bogi
>> >>
>> >>
>> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <ji...@gmail.com>
>> >> wrote:
>> >>
>> >> > Do we have any update?
>> >> >
>> >> > I did checkout of the 1.4.6 code and done code changes to achieve
>> this
>> >> and
>> >> > tested in cluster and it is working as expected. Is there a way I can
>> >> > contribute this as a patch and then the committers can validate
>> further
>> >> and
>> >> > suggest if any changes required to move further. Please suggest the
>> >> > approach.
>> >> >
>> >> > Thanks,
>> >> > Jilani
>> >> >
>> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <ji...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Hi Liz,
>> >> > >
>> >> > > lets say we inserted data in a table with initial import, that
>> looks
>> >> like
>> >> > > this in hbase shell
>> >> > >
>> >> > >  1                                     column=pay:amount,
>> >> > > timestamp=1485129654025, value=4.99
>> >> > >  1                                     column=pay:customer_id,
>> >> > > timestamp=1485129654025, value=1
>> >> > >  1                                     column=pay:last_update,
>> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>> >> > >  1                                     column=pay:payment_date,
>> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >> > >  1                                     column=pay:rental_id,
>> >> > > timestamp=1485129654025, value=573
>> >> > >  1                                     column=pay:staff_id,
>> >> > > timestamp=1485129654025, value=1
>> >> > >  10                                    column=pay:amount,
>> >> > > timestamp=1485129504390, value=5.99
>> >> > >  10                                    column=pay:customer_id,
>> >> > > timestamp=1485129504390, value=1
>> >> > >  10                                    column=pay:last_update,
>> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>> >> > >  10                                    column=pay:payment_date,
>> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >> > >  10                                    column=pay:rental_id,
>> >> > > timestamp=1485129504390, value=4526
>> >> > >  10                                    column=pay:staff_id,
>> >> > > timestamp=1485129504390, value=2
>> >> > >
>> >> > >
>> >> > > now assume that in source rental_id becomes NULL for rowkey "1",
>> and
>> >> then
>> >> > > we are doing incremental import into HBase. With current import the
>> >> final
>> >> > > HBase data after incremental import will look like this.
>> >> > >
>> >> > >  1                                     column=pay:amount,
>> >> > > timestamp=1485129654025, value=4.99
>> >> > >  1                                     column=pay:customer_id,
>> >> > > timestamp=1485129654025, value=1
>> >> > >  1                                     column=pay:last_update,
>> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >> > >  1                                     column=pay:payment_date,
>> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >> > >  1                                     column=pay:rental_id,
>> >> > > timestamp=1485129654025, value=573
>> >> > >  1                                     column=pay:staff_id,
>> >> > > timestamp=1485129654025, value=1
>> >> > >  10                                    column=pay:amount,
>> >> > > timestamp=1485129504390, value=5.99
>> >> > >  10                                    column=pay:customer_id,
>> >> > > timestamp=1485129504390, value=1
>> >> > >  10                                    column=pay:last_update,
>> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >> > >  10                                    column=pay:payment_date,
>> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >> > >  10                                    column=pay:rental_id,
>> >> > > timestamp=1485129504390, value=126
>> >> > >  10                                    column=pay:staff_id,
>> >> > > timestamp=1485129504390, value=2
>> >> > >
>> >> > >
>> >> > >
>> >> > > As source column "rental_id" becomes NULL for rowkey "1", the final
>> >> HBase
>> >> > > should not have the "rental_id" for this rowkey "1". I am expecting
>> >> below
>> >> > > data for these rowkeys.
>> >> > >
>> >> > >
>> >> > >  1                                     column=pay:amount,
>> >> > > timestamp=1485129654025, value=4.99
>> >> > >  1                                     column=pay:customer_id,
>> >> > > timestamp=1485129654025, value=1
>> >> > >  1                                     column=pay:last_update,
>> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >> > >  1                                     column=pay:payment_date,
>> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >> > >  1                                     column=pay:staff_id,
>> >> > > timestamp=1485129654025, value=1
>> >> > >  10                                    column=pay:amount,
>> >> > > timestamp=1485129504390, value=5.99
>> >> > >  10                                    column=pay:customer_id,
>> >> > > timestamp=1485129504390, value=1
>> >> > >  10                                    column=pay:last_update,
>> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >> > >  10                                    column=pay:payment_date,
>> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >> > >  10                                    column=pay:rental_id,
>> >> > > timestamp=1485129504390, value=126
>> >> > >  10                                    column=pay:staff_id,
>> >> > > timestamp=1485129504390, value=2
>> >> > >
>> >> > >
>> >> > > Please let me know if anything required further.
>> >> > >
>> >> > >
>> >> > > Thanks,
>> >> > > Jilani
>> >> > >
>> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>> >> > > liz.szilagyi@cloudera.com> wrote:
>> >> > >
>> >> > >> Hi Jilani,
>> >> > >> I'm not sure I completely understand what you are trying to do.
>> Could
>> >> > you
>> >> > >> give us some examples with e.g. 4 columns and 2 rows of example
>> data
>> >> > >> showing the changes that happen compared to the changes you'd
>> like to
>> >> > see?
>> >> > >> Thanks,
>> >> > >> Liz
>> >> > >>
>> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>> jilani2423@gmail.com>
>> >> > >> wrote:
>> >> > >>
>> >> > >> >
>> >> > >> > Please help in resolving the issue, I am going through source
>> code
>> >> > some
>> >> > >> > how the required nature is missing, But not sure is it for some
>> >> reason
>> >> > >> we
>> >> > >> > avoided this nature.
>> >> > >> >
>> >> > >> > Provide me some suggestions how to go with this scenario.
>> >> > >> >
>> >> > >> > Thanks,
>> >> > >> > Jilani
>> >> > >> >
>> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>> >> jilani2423@gmail.com>
>> >> > >> > wrote:
>> >> > >> >
>> >> > >> >> Hi,
>> >> > >> >>
>> >> > >> >> We have a scenario where we are importing data into HBase with
>> >> sqoop
>> >> > >> >> incremental import.
>> >> > >> >>
>> >> > >> >> Lets say we imported a table and later source table got updated
>> >> for
>> >> > >> some
>> >> > >> >> columns as null values for some rows. Then while doing
>> incremental
>> >> > >> import
>> >> > >> >> as per HBase these columns should not be there in HBase table.
>> But
>> >> > >> right
>> >> > >> >> now these columns will be as it is available with previous
>> values.
>> >> > >> >>
>> >> > >> >> Is there any fix to overcome this issue?
>> >> > >> >>
>> >> > >> >>
>> >> > >> >> Thanks,
>> >> > >> >> Jilani
>> >> > >> >>
>> >> > >> >
>> >> > >> >
>> >> > >>
>> >> > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Can some one provide me the pointers what am I missing with trunk vs 1.4.6
builds, which is giving some error as mentioned in below mail chain.

I did followed the same ant target to prepare jar for both branches, but
even though 1.4.6 jar is different to 1.4.7 which is created form trunk.

Thanks,
Jilani


On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <ji...@gmail.com> wrote:

> Hi Bogi,
>
> I am getting below error, when I have prepared jar from trunk and try to
> do sqoop import with mysql database table and got below exception, where as
> similar changes are working with branch 1.4.6.
>
>
> 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7-SNAPSHOT
> 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
> 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on the
> command-line is insecure. Consider using -P instead.
> 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
> org.apache.sqoop.manager.oracle.OraOopManagerFactory
> 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
> com.cloudera.sqoop.manager.DefaultManagerFactory
> 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
> org.apache.sqoop.manager.oracle.OraOopManagerFactory
> 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector for
> Oracle and Hadoop can be called by Sqoop!
> 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
> com.cloudera.sqoop.manager.DefaultManagerFactory
> 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with scheme:
> jdbc:mysql:
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/avro/LogicalType
>         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
> DefaultManagerFactory.java:67)
>         at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:184)
>         at org.apache.sqoop.tool.BaseSqoopTool.init(
> BaseSqoopTool.java:270)
>         at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
>         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
>         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
> Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         ... 11 more
>
> Please let me know what is missing and how to resolve this exception, Let
> me know if you need further details.
>
> Thanks,
> Jilani
>
> On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>
>> Hi Jilani,
>>
>> This is an example: SQOOP-3053
>> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
>> <https://reviews.apache.org/r/54206/> linked. Please make your changes on
>> trunk as it will be used to cut the future release so your patch
>> definitely
>> needs to be be able to apply on it.
>>
>> Thanks,
>> Bogi
>>
>> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>> > Hi Bogi,
>> >
>> > Can you provide me sample Jira tickets and Review requests similar to
>> > this, to proceed further.
>> >
>> > I applied the code changes from sqoop git from this branch
>> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take the
>> code
>> > from there and apply the changes before submit review for request.
>> >
>> > Thanks,
>> > Jilani
>> >
>> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com>
>> wrote:
>> >
>> >> Hi Jilani,
>> >>
>> >> To get your change committed please do the following:
>> >> * Open a JIRA ticket for your change in Apache's JIRA system
>> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
>> >> * Create a review request at Apache's review board
>> >> <https://reviews.apache.org/r/> for project Sqoop and link it to the
>> JIRA
>> >>
>> >> ticket
>> >>
>> >> Please consider the guidelines below:
>> >>
>> >> Review board
>> >> * Summary: generate your summary using the issue's jira key + jira
>> title
>> >> * Groups: add the relevant group so everyone on the project will know
>> >> about
>> >> your patch (Sqoop)
>> >> * Bugs: add the issue's jira key so it's easy to navigate to the jira
>> side
>> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>> >> * And as soon as the patch gets committed, it's very useful for the
>> >> community if you close the review and mark it as "Submitted" at the
>> Review
>> >> board. The button to do this is top right at your own tickets, right
>> next
>> >> to  the Download Diff button.
>> >>
>> >> Jira
>> >> * Link: please add the link of the review as an external/web link so
>> it's
>> >> easy to navigate to the reviews side
>> >> * Status: mark it as "patch available"
>> >>
>> >> Sqoop community will receive emails about your new ticket and review
>> >> request and will review your change.
>> >>
>> >> Thanks,
>> >> Bogi
>> >>
>> >>
>> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <ji...@gmail.com>
>> >> wrote:
>> >>
>> >> > Do we have any update?
>> >> >
>> >> > I did checkout of the 1.4.6 code and done code changes to achieve
>> this
>> >> and
>> >> > tested in cluster and it is working as expected. Is there a way I can
>> >> > contribute this as a patch and then the committers can validate
>> further
>> >> and
>> >> > suggest if any changes required to move further. Please suggest the
>> >> > approach.
>> >> >
>> >> > Thanks,
>> >> > Jilani
>> >> >
>> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <ji...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Hi Liz,
>> >> > >
>> >> > > lets say we inserted data in a table with initial import, that
>> looks
>> >> like
>> >> > > this in hbase shell
>> >> > >
>> >> > >  1                                     column=pay:amount,
>> >> > > timestamp=1485129654025, value=4.99
>> >> > >  1                                     column=pay:customer_id,
>> >> > > timestamp=1485129654025, value=1
>> >> > >  1                                     column=pay:last_update,
>> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>> >> > >  1                                     column=pay:payment_date,
>> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >> > >  1                                     column=pay:rental_id,
>> >> > > timestamp=1485129654025, value=573
>> >> > >  1                                     column=pay:staff_id,
>> >> > > timestamp=1485129654025, value=1
>> >> > >  10                                    column=pay:amount,
>> >> > > timestamp=1485129504390, value=5.99
>> >> > >  10                                    column=pay:customer_id,
>> >> > > timestamp=1485129504390, value=1
>> >> > >  10                                    column=pay:last_update,
>> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>> >> > >  10                                    column=pay:payment_date,
>> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >> > >  10                                    column=pay:rental_id,
>> >> > > timestamp=1485129504390, value=4526
>> >> > >  10                                    column=pay:staff_id,
>> >> > > timestamp=1485129504390, value=2
>> >> > >
>> >> > >
>> >> > > now assume that in source rental_id becomes NULL for rowkey "1",
>> and
>> >> then
>> >> > > we are doing incremental import into HBase. With current import the
>> >> final
>> >> > > HBase data after incremental import will look like this.
>> >> > >
>> >> > >  1                                     column=pay:amount,
>> >> > > timestamp=1485129654025, value=4.99
>> >> > >  1                                     column=pay:customer_id,
>> >> > > timestamp=1485129654025, value=1
>> >> > >  1                                     column=pay:last_update,
>> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >> > >  1                                     column=pay:payment_date,
>> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >> > >  1                                     column=pay:rental_id,
>> >> > > timestamp=1485129654025, value=573
>> >> > >  1                                     column=pay:staff_id,
>> >> > > timestamp=1485129654025, value=1
>> >> > >  10                                    column=pay:amount,
>> >> > > timestamp=1485129504390, value=5.99
>> >> > >  10                                    column=pay:customer_id,
>> >> > > timestamp=1485129504390, value=1
>> >> > >  10                                    column=pay:last_update,
>> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >> > >  10                                    column=pay:payment_date,
>> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >> > >  10                                    column=pay:rental_id,
>> >> > > timestamp=1485129504390, value=126
>> >> > >  10                                    column=pay:staff_id,
>> >> > > timestamp=1485129504390, value=2
>> >> > >
>> >> > >
>> >> > >
>> >> > > As source column "rental_id" becomes NULL for rowkey "1", the final
>> >> HBase
>> >> > > should not have the "rental_id" for this rowkey "1". I am expecting
>> >> below
>> >> > > data for these rowkeys.
>> >> > >
>> >> > >
>> >> > >  1                                     column=pay:amount,
>> >> > > timestamp=1485129654025, value=4.99
>> >> > >  1                                     column=pay:customer_id,
>> >> > > timestamp=1485129654025, value=1
>> >> > >  1                                     column=pay:last_update,
>> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> >> > >  1                                     column=pay:payment_date,
>> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> >> > >  1                                     column=pay:staff_id,
>> >> > > timestamp=1485129654025, value=1
>> >> > >  10                                    column=pay:amount,
>> >> > > timestamp=1485129504390, value=5.99
>> >> > >  10                                    column=pay:customer_id,
>> >> > > timestamp=1485129504390, value=1
>> >> > >  10                                    column=pay:last_update,
>> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> >> > >  10                                    column=pay:payment_date,
>> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> >> > >  10                                    column=pay:rental_id,
>> >> > > timestamp=1485129504390, value=126
>> >> > >  10                                    column=pay:staff_id,
>> >> > > timestamp=1485129504390, value=2
>> >> > >
>> >> > >
>> >> > > Please let me know if anything required further.
>> >> > >
>> >> > >
>> >> > > Thanks,
>> >> > > Jilani
>> >> > >
>> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>> >> > > liz.szilagyi@cloudera.com> wrote:
>> >> > >
>> >> > >> Hi Jilani,
>> >> > >> I'm not sure I completely understand what you are trying to do.
>> Could
>> >> > you
>> >> > >> give us some examples with e.g. 4 columns and 2 rows of example
>> data
>> >> > >> showing the changes that happen compared to the changes you'd
>> like to
>> >> > see?
>> >> > >> Thanks,
>> >> > >> Liz
>> >> > >>
>> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
>> jilani2423@gmail.com>
>> >> > >> wrote:
>> >> > >>
>> >> > >> >
>> >> > >> > Please help in resolving the issue, I am going through source
>> code
>> >> > some
>> >> > >> > how the required nature is missing, But not sure is it for some
>> >> reason
>> >> > >> we
>> >> > >> > avoided this nature.
>> >> > >> >
>> >> > >> > Provide me some suggestions how to go with this scenario.
>> >> > >> >
>> >> > >> > Thanks,
>> >> > >> > Jilani
>> >> > >> >
>> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>> >> jilani2423@gmail.com>
>> >> > >> > wrote:
>> >> > >> >
>> >> > >> >> Hi,
>> >> > >> >>
>> >> > >> >> We have a scenario where we are importing data into HBase with
>> >> sqoop
>> >> > >> >> incremental import.
>> >> > >> >>
>> >> > >> >> Lets say we imported a table and later source table got updated
>> >> for
>> >> > >> some
>> >> > >> >> columns as null values for some rows. Then while doing
>> incremental
>> >> > >> import
>> >> > >> >> as per HBase these columns should not be there in HBase table.
>> But
>> >> > >> right
>> >> > >> >> now these columns will be as it is available with previous
>> values.
>> >> > >> >>
>> >> > >> >> Is there any fix to overcome this issue?
>> >> > >> >>
>> >> > >> >>
>> >> > >> >> Thanks,
>> >> > >> >> Jilani
>> >> > >> >>
>> >> > >> >
>> >> > >> >
>> >> > >>
>> >> > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Hi Bogi,

I am getting below error, when I have prepared jar from trunk and try to do
sqoop import with mysql database table and got below exception, where as
similar changes are working with branch 1.4.6.


17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7-SNAPSHOT
17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on the
command-line is insecure. Consider using -P instead.
17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
org.apache.sqoop.manager.oracle.OraOopManagerFactory
17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
com.cloudera.sqoop.manager.DefaultManagerFactory
17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
org.apache.sqoop.manager.oracle.OraOopManagerFactory
17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector for
Oracle and Hadoop can be called by Sqoop!
17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
com.cloudera.sqoop.manager.DefaultManagerFactory
17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with scheme:
jdbc:mysql:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/avro/LogicalType
        at
org.apache.sqoop.manager.DefaultManagerFactory.accept(DefaultManagerFactory.java:67)
        at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:184)
        at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:270)
        at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 11 more

Please let me know what is missing and how to resolve this exception, Let
me know if you need further details.

Thanks,
Jilani

On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com> wrote:

> Hi Jilani,
>
> This is an example: SQOOP-3053
> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
> <https://reviews.apache.org/r/54206/> linked. Please make your changes on
> trunk as it will be used to cut the future release so your patch definitely
> needs to be be able to apply on it.
>
> Thanks,
> Bogi
>
> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com> wrote:
>
> > Hi Bogi,
> >
> > Can you provide me sample Jira tickets and Review requests similar to
> > this, to proceed further.
> >
> > I applied the code changes from sqoop git from this branch
> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take the
> code
> > from there and apply the changes before submit review for request.
> >
> > Thanks,
> > Jilani
> >
> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com>
> wrote:
> >
> >> Hi Jilani,
> >>
> >> To get your change committed please do the following:
> >> * Open a JIRA ticket for your change in Apache's JIRA system
> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
> >> * Create a review request at Apache's review board
> >> <https://reviews.apache.org/r/> for project Sqoop and link it to the
> JIRA
> >>
> >> ticket
> >>
> >> Please consider the guidelines below:
> >>
> >> Review board
> >> * Summary: generate your summary using the issue's jira key + jira title
> >> * Groups: add the relevant group so everyone on the project will know
> >> about
> >> your patch (Sqoop)
> >> * Bugs: add the issue's jira key so it's easy to navigate to the jira
> side
> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
> >> * And as soon as the patch gets committed, it's very useful for the
> >> community if you close the review and mark it as "Submitted" at the
> Review
> >> board. The button to do this is top right at your own tickets, right
> next
> >> to  the Download Diff button.
> >>
> >> Jira
> >> * Link: please add the link of the review as an external/web link so
> it's
> >> easy to navigate to the reviews side
> >> * Status: mark it as "patch available"
> >>
> >> Sqoop community will receive emails about your new ticket and review
> >> request and will review your change.
> >>
> >> Thanks,
> >> Bogi
> >>
> >>
> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <ji...@gmail.com>
> >> wrote:
> >>
> >> > Do we have any update?
> >> >
> >> > I did checkout of the 1.4.6 code and done code changes to achieve this
> >> and
> >> > tested in cluster and it is working as expected. Is there a way I can
> >> > contribute this as a patch and then the committers can validate
> further
> >> and
> >> > suggest if any changes required to move further. Please suggest the
> >> > approach.
> >> >
> >> > Thanks,
> >> > Jilani
> >> >
> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <ji...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Liz,
> >> > >
> >> > > lets say we inserted data in a table with initial import, that looks
> >> like
> >> > > this in hbase shell
> >> > >
> >> > >  1                                     column=pay:amount,
> >> > > timestamp=1485129654025, value=4.99
> >> > >  1                                     column=pay:customer_id,
> >> > > timestamp=1485129654025, value=1
> >> > >  1                                     column=pay:last_update,
> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
> >> > >  1                                     column=pay:payment_date,
> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >> > >  1                                     column=pay:rental_id,
> >> > > timestamp=1485129654025, value=573
> >> > >  1                                     column=pay:staff_id,
> >> > > timestamp=1485129654025, value=1
> >> > >  10                                    column=pay:amount,
> >> > > timestamp=1485129504390, value=5.99
> >> > >  10                                    column=pay:customer_id,
> >> > > timestamp=1485129504390, value=1
> >> > >  10                                    column=pay:last_update,
> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
> >> > >  10                                    column=pay:payment_date,
> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >> > >  10                                    column=pay:rental_id,
> >> > > timestamp=1485129504390, value=4526
> >> > >  10                                    column=pay:staff_id,
> >> > > timestamp=1485129504390, value=2
> >> > >
> >> > >
> >> > > now assume that in source rental_id becomes NULL for rowkey "1", and
> >> then
> >> > > we are doing incremental import into HBase. With current import the
> >> final
> >> > > HBase data after incremental import will look like this.
> >> > >
> >> > >  1                                     column=pay:amount,
> >> > > timestamp=1485129654025, value=4.99
> >> > >  1                                     column=pay:customer_id,
> >> > > timestamp=1485129654025, value=1
> >> > >  1                                     column=pay:last_update,
> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >> > >  1                                     column=pay:payment_date,
> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >> > >  1                                     column=pay:rental_id,
> >> > > timestamp=1485129654025, value=573
> >> > >  1                                     column=pay:staff_id,
> >> > > timestamp=1485129654025, value=1
> >> > >  10                                    column=pay:amount,
> >> > > timestamp=1485129504390, value=5.99
> >> > >  10                                    column=pay:customer_id,
> >> > > timestamp=1485129504390, value=1
> >> > >  10                                    column=pay:last_update,
> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >> > >  10                                    column=pay:payment_date,
> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >> > >  10                                    column=pay:rental_id,
> >> > > timestamp=1485129504390, value=126
> >> > >  10                                    column=pay:staff_id,
> >> > > timestamp=1485129504390, value=2
> >> > >
> >> > >
> >> > >
> >> > > As source column "rental_id" becomes NULL for rowkey "1", the final
> >> HBase
> >> > > should not have the "rental_id" for this rowkey "1". I am expecting
> >> below
> >> > > data for these rowkeys.
> >> > >
> >> > >
> >> > >  1                                     column=pay:amount,
> >> > > timestamp=1485129654025, value=4.99
> >> > >  1                                     column=pay:customer_id,
> >> > > timestamp=1485129654025, value=1
> >> > >  1                                     column=pay:last_update,
> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >> > >  1                                     column=pay:payment_date,
> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >> > >  1                                     column=pay:staff_id,
> >> > > timestamp=1485129654025, value=1
> >> > >  10                                    column=pay:amount,
> >> > > timestamp=1485129504390, value=5.99
> >> > >  10                                    column=pay:customer_id,
> >> > > timestamp=1485129504390, value=1
> >> > >  10                                    column=pay:last_update,
> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >> > >  10                                    column=pay:payment_date,
> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >> > >  10                                    column=pay:rental_id,
> >> > > timestamp=1485129504390, value=126
> >> > >  10                                    column=pay:staff_id,
> >> > > timestamp=1485129504390, value=2
> >> > >
> >> > >
> >> > > Please let me know if anything required further.
> >> > >
> >> > >
> >> > > Thanks,
> >> > > Jilani
> >> > >
> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> >> > > liz.szilagyi@cloudera.com> wrote:
> >> > >
> >> > >> Hi Jilani,
> >> > >> I'm not sure I completely understand what you are trying to do.
> Could
> >> > you
> >> > >> give us some examples with e.g. 4 columns and 2 rows of example
> data
> >> > >> showing the changes that happen compared to the changes you'd like
> to
> >> > see?
> >> > >> Thanks,
> >> > >> Liz
> >> > >>
> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
> jilani2423@gmail.com>
> >> > >> wrote:
> >> > >>
> >> > >> >
> >> > >> > Please help in resolving the issue, I am going through source
> code
> >> > some
> >> > >> > how the required nature is missing, But not sure is it for some
> >> reason
> >> > >> we
> >> > >> > avoided this nature.
> >> > >> >
> >> > >> > Provide me some suggestions how to go with this scenario.
> >> > >> >
> >> > >> > Thanks,
> >> > >> > Jilani
> >> > >> >
> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
> >> jilani2423@gmail.com>
> >> > >> > wrote:
> >> > >> >
> >> > >> >> Hi,
> >> > >> >>
> >> > >> >> We have a scenario where we are importing data into HBase with
> >> sqoop
> >> > >> >> incremental import.
> >> > >> >>
> >> > >> >> Lets say we imported a table and later source table got updated
> >> for
> >> > >> some
> >> > >> >> columns as null values for some rows. Then while doing
> incremental
> >> > >> import
> >> > >> >> as per HBase these columns should not be there in HBase table.
> But
> >> > >> right
> >> > >> >> now these columns will be as it is available with previous
> values.
> >> > >> >>
> >> > >> >> Is there any fix to overcome this issue?
> >> > >> >>
> >> > >> >>
> >> > >> >> Thanks,
> >> > >> >> Jilani
> >> > >> >>
> >> > >> >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Hi Bogi,

I am getting below error, when I have prepared jar from trunk and try to do
sqoop import with mysql database table and got below exception, where as
similar changes are working with branch 1.4.6.


17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7-SNAPSHOT
17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on the
command-line is insecure. Consider using -P instead.
17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
org.apache.sqoop.manager.oracle.OraOopManagerFactory
17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
com.cloudera.sqoop.manager.DefaultManagerFactory
17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
org.apache.sqoop.manager.oracle.OraOopManagerFactory
17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector for
Oracle and Hadoop can be called by Sqoop!
17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
com.cloudera.sqoop.manager.DefaultManagerFactory
17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with scheme:
jdbc:mysql:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/avro/LogicalType
        at
org.apache.sqoop.manager.DefaultManagerFactory.accept(DefaultManagerFactory.java:67)
        at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:184)
        at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:270)
        at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.apache.avro.LogicalType
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 11 more

Please let me know what is missing and how to resolve this exception, Let
me know if you need further details.

Thanks,
Jilani

On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bo...@cloudera.com> wrote:

> Hi Jilani,
>
> This is an example: SQOOP-3053
> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
> <https://reviews.apache.org/r/54206/> linked. Please make your changes on
> trunk as it will be used to cut the future release so your patch definitely
> needs to be be able to apply on it.
>
> Thanks,
> Bogi
>
> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com> wrote:
>
> > Hi Bogi,
> >
> > Can you provide me sample Jira tickets and Review requests similar to
> > this, to proceed further.
> >
> > I applied the code changes from sqoop git from this branch
> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take the
> code
> > from there and apply the changes before submit review for request.
> >
> > Thanks,
> > Jilani
> >
> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com>
> wrote:
> >
> >> Hi Jilani,
> >>
> >> To get your change committed please do the following:
> >> * Open a JIRA ticket for your change in Apache's JIRA system
> >> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
> >> * Create a review request at Apache's review board
> >> <https://reviews.apache.org/r/> for project Sqoop and link it to the
> JIRA
> >>
> >> ticket
> >>
> >> Please consider the guidelines below:
> >>
> >> Review board
> >> * Summary: generate your summary using the issue's jira key + jira title
> >> * Groups: add the relevant group so everyone on the project will know
> >> about
> >> your patch (Sqoop)
> >> * Bugs: add the issue's jira key so it's easy to navigate to the jira
> side
> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
> >> * And as soon as the patch gets committed, it's very useful for the
> >> community if you close the review and mark it as "Submitted" at the
> Review
> >> board. The button to do this is top right at your own tickets, right
> next
> >> to  the Download Diff button.
> >>
> >> Jira
> >> * Link: please add the link of the review as an external/web link so
> it's
> >> easy to navigate to the reviews side
> >> * Status: mark it as "patch available"
> >>
> >> Sqoop community will receive emails about your new ticket and review
> >> request and will review your change.
> >>
> >> Thanks,
> >> Bogi
> >>
> >>
> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <ji...@gmail.com>
> >> wrote:
> >>
> >> > Do we have any update?
> >> >
> >> > I did checkout of the 1.4.6 code and done code changes to achieve this
> >> and
> >> > tested in cluster and it is working as expected. Is there a way I can
> >> > contribute this as a patch and then the committers can validate
> further
> >> and
> >> > suggest if any changes required to move further. Please suggest the
> >> > approach.
> >> >
> >> > Thanks,
> >> > Jilani
> >> >
> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <ji...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Liz,
> >> > >
> >> > > lets say we inserted data in a table with initial import, that looks
> >> like
> >> > > this in hbase shell
> >> > >
> >> > >  1                                     column=pay:amount,
> >> > > timestamp=1485129654025, value=4.99
> >> > >  1                                     column=pay:customer_id,
> >> > > timestamp=1485129654025, value=1
> >> > >  1                                     column=pay:last_update,
> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
> >> > >  1                                     column=pay:payment_date,
> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >> > >  1                                     column=pay:rental_id,
> >> > > timestamp=1485129654025, value=573
> >> > >  1                                     column=pay:staff_id,
> >> > > timestamp=1485129654025, value=1
> >> > >  10                                    column=pay:amount,
> >> > > timestamp=1485129504390, value=5.99
> >> > >  10                                    column=pay:customer_id,
> >> > > timestamp=1485129504390, value=1
> >> > >  10                                    column=pay:last_update,
> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
> >> > >  10                                    column=pay:payment_date,
> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >> > >  10                                    column=pay:rental_id,
> >> > > timestamp=1485129504390, value=4526
> >> > >  10                                    column=pay:staff_id,
> >> > > timestamp=1485129504390, value=2
> >> > >
> >> > >
> >> > > now assume that in source rental_id becomes NULL for rowkey "1", and
> >> then
> >> > > we are doing incremental import into HBase. With current import the
> >> final
> >> > > HBase data after incremental import will look like this.
> >> > >
> >> > >  1                                     column=pay:amount,
> >> > > timestamp=1485129654025, value=4.99
> >> > >  1                                     column=pay:customer_id,
> >> > > timestamp=1485129654025, value=1
> >> > >  1                                     column=pay:last_update,
> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >> > >  1                                     column=pay:payment_date,
> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >> > >  1                                     column=pay:rental_id,
> >> > > timestamp=1485129654025, value=573
> >> > >  1                                     column=pay:staff_id,
> >> > > timestamp=1485129654025, value=1
> >> > >  10                                    column=pay:amount,
> >> > > timestamp=1485129504390, value=5.99
> >> > >  10                                    column=pay:customer_id,
> >> > > timestamp=1485129504390, value=1
> >> > >  10                                    column=pay:last_update,
> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >> > >  10                                    column=pay:payment_date,
> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >> > >  10                                    column=pay:rental_id,
> >> > > timestamp=1485129504390, value=126
> >> > >  10                                    column=pay:staff_id,
> >> > > timestamp=1485129504390, value=2
> >> > >
> >> > >
> >> > >
> >> > > As source column "rental_id" becomes NULL for rowkey "1", the final
> >> HBase
> >> > > should not have the "rental_id" for this rowkey "1". I am expecting
> >> below
> >> > > data for these rowkeys.
> >> > >
> >> > >
> >> > >  1                                     column=pay:amount,
> >> > > timestamp=1485129654025, value=4.99
> >> > >  1                                     column=pay:customer_id,
> >> > > timestamp=1485129654025, value=1
> >> > >  1                                     column=pay:last_update,
> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >> > >  1                                     column=pay:payment_date,
> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >> > >  1                                     column=pay:staff_id,
> >> > > timestamp=1485129654025, value=1
> >> > >  10                                    column=pay:amount,
> >> > > timestamp=1485129504390, value=5.99
> >> > >  10                                    column=pay:customer_id,
> >> > > timestamp=1485129504390, value=1
> >> > >  10                                    column=pay:last_update,
> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >> > >  10                                    column=pay:payment_date,
> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >> > >  10                                    column=pay:rental_id,
> >> > > timestamp=1485129504390, value=126
> >> > >  10                                    column=pay:staff_id,
> >> > > timestamp=1485129504390, value=2
> >> > >
> >> > >
> >> > > Please let me know if anything required further.
> >> > >
> >> > >
> >> > > Thanks,
> >> > > Jilani
> >> > >
> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> >> > > liz.szilagyi@cloudera.com> wrote:
> >> > >
> >> > >> Hi Jilani,
> >> > >> I'm not sure I completely understand what you are trying to do.
> Could
> >> > you
> >> > >> give us some examples with e.g. 4 columns and 2 rows of example
> data
> >> > >> showing the changes that happen compared to the changes you'd like
> to
> >> > see?
> >> > >> Thanks,
> >> > >> Liz
> >> > >>
> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <
> jilani2423@gmail.com>
> >> > >> wrote:
> >> > >>
> >> > >> >
> >> > >> > Please help in resolving the issue, I am going through source
> code
> >> > some
> >> > >> > how the required nature is missing, But not sure is it for some
> >> reason
> >> > >> we
> >> > >> > avoided this nature.
> >> > >> >
> >> > >> > Provide me some suggestions how to go with this scenario.
> >> > >> >
> >> > >> > Thanks,
> >> > >> > Jilani
> >> > >> >
> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
> >> jilani2423@gmail.com>
> >> > >> > wrote:
> >> > >> >
> >> > >> >> Hi,
> >> > >> >>
> >> > >> >> We have a scenario where we are importing data into HBase with
> >> sqoop
> >> > >> >> incremental import.
> >> > >> >>
> >> > >> >> Lets say we imported a table and later source table got updated
> >> for
> >> > >> some
> >> > >> >> columns as null values for some rows. Then while doing
> incremental
> >> > >> import
> >> > >> >> as per HBase these columns should not be there in HBase table.
> But
> >> > >> right
> >> > >> >> now these columns will be as it is available with previous
> values.
> >> > >> >>
> >> > >> >> Is there any fix to overcome this issue?
> >> > >> >>
> >> > >> >>
> >> > >> >> Thanks,
> >> > >> >> Jilani
> >> > >> >>
> >> > >> >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Boglarka Egyed <bo...@cloudera.com>.
Hi Jilani,

This is an example: SQOOP-3053
<https://issues.apache.org/jira/browse/SQOOP-3053> with the review
<https://reviews.apache.org/r/54206/> linked. Please make your changes on
trunk as it will be used to cut the future release so your patch definitely
needs to be be able to apply on it.

Thanks,
Bogi

On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com> wrote:

> Hi Bogi,
>
> Can you provide me sample Jira tickets and Review requests similar to
> this, to proceed further.
>
> I applied the code changes from sqoop git from this branch
> "sqoop-release-1.4.6-rc0", If you suggest right branch I will take the code
> from there and apply the changes before submit review for request.
>
> Thanks,
> Jilani
>
> On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>
>> Hi Jilani,
>>
>> To get your change committed please do the following:
>> * Open a JIRA ticket for your change in Apache's JIRA system
>> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
>> * Create a review request at Apache's review board
>> <https://reviews.apache.org/r/> for project Sqoop and link it to the JIRA
>>
>> ticket
>>
>> Please consider the guidelines below:
>>
>> Review board
>> * Summary: generate your summary using the issue's jira key + jira title
>> * Groups: add the relevant group so everyone on the project will know
>> about
>> your patch (Sqoop)
>> * Bugs: add the issue's jira key so it's easy to navigate to the jira side
>> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>> * And as soon as the patch gets committed, it's very useful for the
>> community if you close the review and mark it as "Submitted" at the Review
>> board. The button to do this is top right at your own tickets, right next
>> to  the Download Diff button.
>>
>> Jira
>> * Link: please add the link of the review as an external/web link so it's
>> easy to navigate to the reviews side
>> * Status: mark it as "patch available"
>>
>> Sqoop community will receive emails about your new ticket and review
>> request and will review your change.
>>
>> Thanks,
>> Bogi
>>
>>
>> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>> > Do we have any update?
>> >
>> > I did checkout of the 1.4.6 code and done code changes to achieve this
>> and
>> > tested in cluster and it is working as expected. Is there a way I can
>> > contribute this as a patch and then the committers can validate further
>> and
>> > suggest if any changes required to move further. Please suggest the
>> > approach.
>> >
>> > Thanks,
>> > Jilani
>> >
>> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <ji...@gmail.com>
>> > wrote:
>> >
>> > > Hi Liz,
>> > >
>> > > lets say we inserted data in a table with initial import, that looks
>> like
>> > > this in hbase shell
>> > >
>> > >  1                                     column=pay:amount,
>> > > timestamp=1485129654025, value=4.99
>> > >  1                                     column=pay:customer_id,
>> > > timestamp=1485129654025, value=1
>> > >  1                                     column=pay:last_update,
>> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>> > >  1                                     column=pay:payment_date,
>> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> > >  1                                     column=pay:rental_id,
>> > > timestamp=1485129654025, value=573
>> > >  1                                     column=pay:staff_id,
>> > > timestamp=1485129654025, value=1
>> > >  10                                    column=pay:amount,
>> > > timestamp=1485129504390, value=5.99
>> > >  10                                    column=pay:customer_id,
>> > > timestamp=1485129504390, value=1
>> > >  10                                    column=pay:last_update,
>> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>> > >  10                                    column=pay:payment_date,
>> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> > >  10                                    column=pay:rental_id,
>> > > timestamp=1485129504390, value=4526
>> > >  10                                    column=pay:staff_id,
>> > > timestamp=1485129504390, value=2
>> > >
>> > >
>> > > now assume that in source rental_id becomes NULL for rowkey "1", and
>> then
>> > > we are doing incremental import into HBase. With current import the
>> final
>> > > HBase data after incremental import will look like this.
>> > >
>> > >  1                                     column=pay:amount,
>> > > timestamp=1485129654025, value=4.99
>> > >  1                                     column=pay:customer_id,
>> > > timestamp=1485129654025, value=1
>> > >  1                                     column=pay:last_update,
>> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> > >  1                                     column=pay:payment_date,
>> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> > >  1                                     column=pay:rental_id,
>> > > timestamp=1485129654025, value=573
>> > >  1                                     column=pay:staff_id,
>> > > timestamp=1485129654025, value=1
>> > >  10                                    column=pay:amount,
>> > > timestamp=1485129504390, value=5.99
>> > >  10                                    column=pay:customer_id,
>> > > timestamp=1485129504390, value=1
>> > >  10                                    column=pay:last_update,
>> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> > >  10                                    column=pay:payment_date,
>> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> > >  10                                    column=pay:rental_id,
>> > > timestamp=1485129504390, value=126
>> > >  10                                    column=pay:staff_id,
>> > > timestamp=1485129504390, value=2
>> > >
>> > >
>> > >
>> > > As source column "rental_id" becomes NULL for rowkey "1", the final
>> HBase
>> > > should not have the "rental_id" for this rowkey "1". I am expecting
>> below
>> > > data for these rowkeys.
>> > >
>> > >
>> > >  1                                     column=pay:amount,
>> > > timestamp=1485129654025, value=4.99
>> > >  1                                     column=pay:customer_id,
>> > > timestamp=1485129654025, value=1
>> > >  1                                     column=pay:last_update,
>> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> > >  1                                     column=pay:payment_date,
>> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> > >  1                                     column=pay:staff_id,
>> > > timestamp=1485129654025, value=1
>> > >  10                                    column=pay:amount,
>> > > timestamp=1485129504390, value=5.99
>> > >  10                                    column=pay:customer_id,
>> > > timestamp=1485129504390, value=1
>> > >  10                                    column=pay:last_update,
>> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> > >  10                                    column=pay:payment_date,
>> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> > >  10                                    column=pay:rental_id,
>> > > timestamp=1485129504390, value=126
>> > >  10                                    column=pay:staff_id,
>> > > timestamp=1485129504390, value=2
>> > >
>> > >
>> > > Please let me know if anything required further.
>> > >
>> > >
>> > > Thanks,
>> > > Jilani
>> > >
>> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>> > > liz.szilagyi@cloudera.com> wrote:
>> > >
>> > >> Hi Jilani,
>> > >> I'm not sure I completely understand what you are trying to do. Could
>> > you
>> > >> give us some examples with e.g. 4 columns and 2 rows of example data
>> > >> showing the changes that happen compared to the changes you'd like to
>> > see?
>> > >> Thanks,
>> > >> Liz
>> > >>
>> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <ji...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> >
>> > >> > Please help in resolving the issue, I am going through source code
>> > some
>> > >> > how the required nature is missing, But not sure is it for some
>> reason
>> > >> we
>> > >> > avoided this nature.
>> > >> >
>> > >> > Provide me some suggestions how to go with this scenario.
>> > >> >
>> > >> > Thanks,
>> > >> > Jilani
>> > >> >
>> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>> jilani2423@gmail.com>
>> > >> > wrote:
>> > >> >
>> > >> >> Hi,
>> > >> >>
>> > >> >> We have a scenario where we are importing data into HBase with
>> sqoop
>> > >> >> incremental import.
>> > >> >>
>> > >> >> Lets say we imported a table and later source table got updated
>> for
>> > >> some
>> > >> >> columns as null values for some rows. Then while doing incremental
>> > >> import
>> > >> >> as per HBase these columns should not be there in HBase table. But
>> > >> right
>> > >> >> now these columns will be as it is available with previous values.
>> > >> >>
>> > >> >> Is there any fix to overcome this issue?
>> > >> >>
>> > >> >>
>> > >> >> Thanks,
>> > >> >> Jilani
>> > >> >>
>> > >> >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Boglarka Egyed <bo...@cloudera.com>.
Hi Jilani,

This is an example: SQOOP-3053
<https://issues.apache.org/jira/browse/SQOOP-3053> with the review
<https://reviews.apache.org/r/54206/> linked. Please make your changes on
trunk as it will be used to cut the future release so your patch definitely
needs to be be able to apply on it.

Thanks,
Bogi

On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <ji...@gmail.com> wrote:

> Hi Bogi,
>
> Can you provide me sample Jira tickets and Review requests similar to
> this, to proceed further.
>
> I applied the code changes from sqoop git from this branch
> "sqoop-release-1.4.6-rc0", If you suggest right branch I will take the code
> from there and apply the changes before submit review for request.
>
> Thanks,
> Jilani
>
> On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com> wrote:
>
>> Hi Jilani,
>>
>> To get your change committed please do the following:
>> * Open a JIRA ticket for your change in Apache's JIRA system
>> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
>> * Create a review request at Apache's review board
>> <https://reviews.apache.org/r/> for project Sqoop and link it to the JIRA
>>
>> ticket
>>
>> Please consider the guidelines below:
>>
>> Review board
>> * Summary: generate your summary using the issue's jira key + jira title
>> * Groups: add the relevant group so everyone on the project will know
>> about
>> your patch (Sqoop)
>> * Bugs: add the issue's jira key so it's easy to navigate to the jira side
>> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>> * And as soon as the patch gets committed, it's very useful for the
>> community if you close the review and mark it as "Submitted" at the Review
>> board. The button to do this is top right at your own tickets, right next
>> to  the Download Diff button.
>>
>> Jira
>> * Link: please add the link of the review as an external/web link so it's
>> easy to navigate to the reviews side
>> * Status: mark it as "patch available"
>>
>> Sqoop community will receive emails about your new ticket and review
>> request and will review your change.
>>
>> Thanks,
>> Bogi
>>
>>
>> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>> > Do we have any update?
>> >
>> > I did checkout of the 1.4.6 code and done code changes to achieve this
>> and
>> > tested in cluster and it is working as expected. Is there a way I can
>> > contribute this as a patch and then the committers can validate further
>> and
>> > suggest if any changes required to move further. Please suggest the
>> > approach.
>> >
>> > Thanks,
>> > Jilani
>> >
>> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <ji...@gmail.com>
>> > wrote:
>> >
>> > > Hi Liz,
>> > >
>> > > lets say we inserted data in a table with initial import, that looks
>> like
>> > > this in hbase shell
>> > >
>> > >  1                                     column=pay:amount,
>> > > timestamp=1485129654025, value=4.99
>> > >  1                                     column=pay:customer_id,
>> > > timestamp=1485129654025, value=1
>> > >  1                                     column=pay:last_update,
>> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>> > >  1                                     column=pay:payment_date,
>> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> > >  1                                     column=pay:rental_id,
>> > > timestamp=1485129654025, value=573
>> > >  1                                     column=pay:staff_id,
>> > > timestamp=1485129654025, value=1
>> > >  10                                    column=pay:amount,
>> > > timestamp=1485129504390, value=5.99
>> > >  10                                    column=pay:customer_id,
>> > > timestamp=1485129504390, value=1
>> > >  10                                    column=pay:last_update,
>> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>> > >  10                                    column=pay:payment_date,
>> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> > >  10                                    column=pay:rental_id,
>> > > timestamp=1485129504390, value=4526
>> > >  10                                    column=pay:staff_id,
>> > > timestamp=1485129504390, value=2
>> > >
>> > >
>> > > now assume that in source rental_id becomes NULL for rowkey "1", and
>> then
>> > > we are doing incremental import into HBase. With current import the
>> final
>> > > HBase data after incremental import will look like this.
>> > >
>> > >  1                                     column=pay:amount,
>> > > timestamp=1485129654025, value=4.99
>> > >  1                                     column=pay:customer_id,
>> > > timestamp=1485129654025, value=1
>> > >  1                                     column=pay:last_update,
>> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> > >  1                                     column=pay:payment_date,
>> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> > >  1                                     column=pay:rental_id,
>> > > timestamp=1485129654025, value=573
>> > >  1                                     column=pay:staff_id,
>> > > timestamp=1485129654025, value=1
>> > >  10                                    column=pay:amount,
>> > > timestamp=1485129504390, value=5.99
>> > >  10                                    column=pay:customer_id,
>> > > timestamp=1485129504390, value=1
>> > >  10                                    column=pay:last_update,
>> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> > >  10                                    column=pay:payment_date,
>> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> > >  10                                    column=pay:rental_id,
>> > > timestamp=1485129504390, value=126
>> > >  10                                    column=pay:staff_id,
>> > > timestamp=1485129504390, value=2
>> > >
>> > >
>> > >
>> > > As source column "rental_id" becomes NULL for rowkey "1", the final
>> HBase
>> > > should not have the "rental_id" for this rowkey "1". I am expecting
>> below
>> > > data for these rowkeys.
>> > >
>> > >
>> > >  1                                     column=pay:amount,
>> > > timestamp=1485129654025, value=4.99
>> > >  1                                     column=pay:customer_id,
>> > > timestamp=1485129654025, value=1
>> > >  1                                     column=pay:last_update,
>> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>> > >  1                                     column=pay:payment_date,
>> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>> > >  1                                     column=pay:staff_id,
>> > > timestamp=1485129654025, value=1
>> > >  10                                    column=pay:amount,
>> > > timestamp=1485129504390, value=5.99
>> > >  10                                    column=pay:customer_id,
>> > > timestamp=1485129504390, value=1
>> > >  10                                    column=pay:last_update,
>> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>> > >  10                                    column=pay:payment_date,
>> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>> > >  10                                    column=pay:rental_id,
>> > > timestamp=1485129504390, value=126
>> > >  10                                    column=pay:staff_id,
>> > > timestamp=1485129504390, value=2
>> > >
>> > >
>> > > Please let me know if anything required further.
>> > >
>> > >
>> > > Thanks,
>> > > Jilani
>> > >
>> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
>> > > liz.szilagyi@cloudera.com> wrote:
>> > >
>> > >> Hi Jilani,
>> > >> I'm not sure I completely understand what you are trying to do. Could
>> > you
>> > >> give us some examples with e.g. 4 columns and 2 rows of example data
>> > >> showing the changes that happen compared to the changes you'd like to
>> > see?
>> > >> Thanks,
>> > >> Liz
>> > >>
>> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <ji...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> >
>> > >> > Please help in resolving the issue, I am going through source code
>> > some
>> > >> > how the required nature is missing, But not sure is it for some
>> reason
>> > >> we
>> > >> > avoided this nature.
>> > >> >
>> > >> > Provide me some suggestions how to go with this scenario.
>> > >> >
>> > >> > Thanks,
>> > >> > Jilani
>> > >> >
>> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <
>> jilani2423@gmail.com>
>> > >> > wrote:
>> > >> >
>> > >> >> Hi,
>> > >> >>
>> > >> >> We have a scenario where we are importing data into HBase with
>> sqoop
>> > >> >> incremental import.
>> > >> >>
>> > >> >> Lets say we imported a table and later source table got updated
>> for
>> > >> some
>> > >> >> columns as null values for some rows. Then while doing incremental
>> > >> import
>> > >> >> as per HBase these columns should not be there in HBase table. But
>> > >> right
>> > >> >> now these columns will be as it is available with previous values.
>> > >> >>
>> > >> >> Is there any fix to overcome this issue?
>> > >> >>
>> > >> >>
>> > >> >> Thanks,
>> > >> >> Jilani
>> > >> >>
>> > >> >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Hi Bogi,

Can you provide me sample Jira tickets and Review requests similar to this,
to proceed further.

I applied the code changes from sqoop git from this branch
"sqoop-release-1.4.6-rc0", If you suggest right branch I will take the code
from there and apply the changes before submit review for request.

Thanks,
Jilani

On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com> wrote:

> Hi Jilani,
>
> To get your change committed please do the following:
> * Open a JIRA ticket for your change in Apache's JIRA system
> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
> * Create a review request at Apache's review board
> <https://reviews.apache.org/r/> for project Sqoop and link it to the JIRA
> ticket
>
> Please consider the guidelines below:
>
> Review board
> * Summary: generate your summary using the issue's jira key + jira title
> * Groups: add the relevant group so everyone on the project will know about
> your patch (Sqoop)
> * Bugs: add the issue's jira key so it's easy to navigate to the jira side
> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
> * And as soon as the patch gets committed, it's very useful for the
> community if you close the review and mark it as "Submitted" at the Review
> board. The button to do this is top right at your own tickets, right next
> to  the Download Diff button.
>
> Jira
> * Link: please add the link of the review as an external/web link so it's
> easy to navigate to the reviews side
> * Status: mark it as "patch available"
>
> Sqoop community will receive emails about your new ticket and review
> request and will review your change.
>
> Thanks,
> Bogi
>
>
> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <ji...@gmail.com>
> wrote:
>
> > Do we have any update?
> >
> > I did checkout of the 1.4.6 code and done code changes to achieve this
> and
> > tested in cluster and it is working as expected. Is there a way I can
> > contribute this as a patch and then the committers can validate further
> and
> > suggest if any changes required to move further. Please suggest the
> > approach.
> >
> > Thanks,
> > Jilani
> >
> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <ji...@gmail.com>
> > wrote:
> >
> > > Hi Liz,
> > >
> > > lets say we inserted data in a table with initial import, that looks
> like
> > > this in hbase shell
> > >
> > >  1                                     column=pay:amount,
> > > timestamp=1485129654025, value=4.99
> > >  1                                     column=pay:customer_id,
> > > timestamp=1485129654025, value=1
> > >  1                                     column=pay:last_update,
> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
> > >  1                                     column=pay:payment_date,
> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> > >  1                                     column=pay:rental_id,
> > > timestamp=1485129654025, value=573
> > >  1                                     column=pay:staff_id,
> > > timestamp=1485129654025, value=1
> > >  10                                    column=pay:amount,
> > > timestamp=1485129504390, value=5.99
> > >  10                                    column=pay:customer_id,
> > > timestamp=1485129504390, value=1
> > >  10                                    column=pay:last_update,
> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
> > >  10                                    column=pay:payment_date,
> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> > >  10                                    column=pay:rental_id,
> > > timestamp=1485129504390, value=4526
> > >  10                                    column=pay:staff_id,
> > > timestamp=1485129504390, value=2
> > >
> > >
> > > now assume that in source rental_id becomes NULL for rowkey "1", and
> then
> > > we are doing incremental import into HBase. With current import the
> final
> > > HBase data after incremental import will look like this.
> > >
> > >  1                                     column=pay:amount,
> > > timestamp=1485129654025, value=4.99
> > >  1                                     column=pay:customer_id,
> > > timestamp=1485129654025, value=1
> > >  1                                     column=pay:last_update,
> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> > >  1                                     column=pay:payment_date,
> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> > >  1                                     column=pay:rental_id,
> > > timestamp=1485129654025, value=573
> > >  1                                     column=pay:staff_id,
> > > timestamp=1485129654025, value=1
> > >  10                                    column=pay:amount,
> > > timestamp=1485129504390, value=5.99
> > >  10                                    column=pay:customer_id,
> > > timestamp=1485129504390, value=1
> > >  10                                    column=pay:last_update,
> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> > >  10                                    column=pay:payment_date,
> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> > >  10                                    column=pay:rental_id,
> > > timestamp=1485129504390, value=126
> > >  10                                    column=pay:staff_id,
> > > timestamp=1485129504390, value=2
> > >
> > >
> > >
> > > As source column "rental_id" becomes NULL for rowkey "1", the final
> HBase
> > > should not have the "rental_id" for this rowkey "1". I am expecting
> below
> > > data for these rowkeys.
> > >
> > >
> > >  1                                     column=pay:amount,
> > > timestamp=1485129654025, value=4.99
> > >  1                                     column=pay:customer_id,
> > > timestamp=1485129654025, value=1
> > >  1                                     column=pay:last_update,
> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> > >  1                                     column=pay:payment_date,
> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> > >  1                                     column=pay:staff_id,
> > > timestamp=1485129654025, value=1
> > >  10                                    column=pay:amount,
> > > timestamp=1485129504390, value=5.99
> > >  10                                    column=pay:customer_id,
> > > timestamp=1485129504390, value=1
> > >  10                                    column=pay:last_update,
> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> > >  10                                    column=pay:payment_date,
> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> > >  10                                    column=pay:rental_id,
> > > timestamp=1485129504390, value=126
> > >  10                                    column=pay:staff_id,
> > > timestamp=1485129504390, value=2
> > >
> > >
> > > Please let me know if anything required further.
> > >
> > >
> > > Thanks,
> > > Jilani
> > >
> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> > > liz.szilagyi@cloudera.com> wrote:
> > >
> > >> Hi Jilani,
> > >> I'm not sure I completely understand what you are trying to do. Could
> > you
> > >> give us some examples with e.g. 4 columns and 2 rows of example data
> > >> showing the changes that happen compared to the changes you'd like to
> > see?
> > >> Thanks,
> > >> Liz
> > >>
> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <ji...@gmail.com>
> > >> wrote:
> > >>
> > >> >
> > >> > Please help in resolving the issue, I am going through source code
> > some
> > >> > how the required nature is missing, But not sure is it for some
> reason
> > >> we
> > >> > avoided this nature.
> > >> >
> > >> > Provide me some suggestions how to go with this scenario.
> > >> >
> > >> > Thanks,
> > >> > Jilani
> > >> >
> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <jilani2423@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> We have a scenario where we are importing data into HBase with
> sqoop
> > >> >> incremental import.
> > >> >>
> > >> >> Lets say we imported a table and later source table got updated for
> > >> some
> > >> >> columns as null values for some rows. Then while doing incremental
> > >> import
> > >> >> as per HBase these columns should not be there in HBase table. But
> > >> right
> > >> >> now these columns will be as it is available with previous values.
> > >> >>
> > >> >> Is there any fix to overcome this issue?
> > >> >>
> > >> >>
> > >> >> Thanks,
> > >> >> Jilani
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Hi Bogi,

Can you provide me sample Jira tickets and Review requests similar to this,
to proceed further.

I applied the code changes from sqoop git from this branch
"sqoop-release-1.4.6-rc0", If you suggest right branch I will take the code
from there and apply the changes before submit review for request.

Thanks,
Jilani

On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bo...@cloudera.com> wrote:

> Hi Jilani,
>
> To get your change committed please do the following:
> * Open a JIRA ticket for your change in Apache's JIRA system
> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
> * Create a review request at Apache's review board
> <https://reviews.apache.org/r/> for project Sqoop and link it to the JIRA
> ticket
>
> Please consider the guidelines below:
>
> Review board
> * Summary: generate your summary using the issue's jira key + jira title
> * Groups: add the relevant group so everyone on the project will know about
> your patch (Sqoop)
> * Bugs: add the issue's jira key so it's easy to navigate to the jira side
> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
> * And as soon as the patch gets committed, it's very useful for the
> community if you close the review and mark it as "Submitted" at the Review
> board. The button to do this is top right at your own tickets, right next
> to  the Download Diff button.
>
> Jira
> * Link: please add the link of the review as an external/web link so it's
> easy to navigate to the reviews side
> * Status: mark it as "patch available"
>
> Sqoop community will receive emails about your new ticket and review
> request and will review your change.
>
> Thanks,
> Bogi
>
>
> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <ji...@gmail.com>
> wrote:
>
> > Do we have any update?
> >
> > I did checkout of the 1.4.6 code and done code changes to achieve this
> and
> > tested in cluster and it is working as expected. Is there a way I can
> > contribute this as a patch and then the committers can validate further
> and
> > suggest if any changes required to move further. Please suggest the
> > approach.
> >
> > Thanks,
> > Jilani
> >
> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <ji...@gmail.com>
> > wrote:
> >
> > > Hi Liz,
> > >
> > > lets say we inserted data in a table with initial import, that looks
> like
> > > this in hbase shell
> > >
> > >  1                                     column=pay:amount,
> > > timestamp=1485129654025, value=4.99
> > >  1                                     column=pay:customer_id,
> > > timestamp=1485129654025, value=1
> > >  1                                     column=pay:last_update,
> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
> > >  1                                     column=pay:payment_date,
> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> > >  1                                     column=pay:rental_id,
> > > timestamp=1485129654025, value=573
> > >  1                                     column=pay:staff_id,
> > > timestamp=1485129654025, value=1
> > >  10                                    column=pay:amount,
> > > timestamp=1485129504390, value=5.99
> > >  10                                    column=pay:customer_id,
> > > timestamp=1485129504390, value=1
> > >  10                                    column=pay:last_update,
> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
> > >  10                                    column=pay:payment_date,
> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> > >  10                                    column=pay:rental_id,
> > > timestamp=1485129504390, value=4526
> > >  10                                    column=pay:staff_id,
> > > timestamp=1485129504390, value=2
> > >
> > >
> > > now assume that in source rental_id becomes NULL for rowkey "1", and
> then
> > > we are doing incremental import into HBase. With current import the
> final
> > > HBase data after incremental import will look like this.
> > >
> > >  1                                     column=pay:amount,
> > > timestamp=1485129654025, value=4.99
> > >  1                                     column=pay:customer_id,
> > > timestamp=1485129654025, value=1
> > >  1                                     column=pay:last_update,
> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> > >  1                                     column=pay:payment_date,
> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> > >  1                                     column=pay:rental_id,
> > > timestamp=1485129654025, value=573
> > >  1                                     column=pay:staff_id,
> > > timestamp=1485129654025, value=1
> > >  10                                    column=pay:amount,
> > > timestamp=1485129504390, value=5.99
> > >  10                                    column=pay:customer_id,
> > > timestamp=1485129504390, value=1
> > >  10                                    column=pay:last_update,
> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> > >  10                                    column=pay:payment_date,
> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> > >  10                                    column=pay:rental_id,
> > > timestamp=1485129504390, value=126
> > >  10                                    column=pay:staff_id,
> > > timestamp=1485129504390, value=2
> > >
> > >
> > >
> > > As source column "rental_id" becomes NULL for rowkey "1", the final
> HBase
> > > should not have the "rental_id" for this rowkey "1". I am expecting
> below
> > > data for these rowkeys.
> > >
> > >
> > >  1                                     column=pay:amount,
> > > timestamp=1485129654025, value=4.99
> > >  1                                     column=pay:customer_id,
> > > timestamp=1485129654025, value=1
> > >  1                                     column=pay:last_update,
> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> > >  1                                     column=pay:payment_date,
> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> > >  1                                     column=pay:staff_id,
> > > timestamp=1485129654025, value=1
> > >  10                                    column=pay:amount,
> > > timestamp=1485129504390, value=5.99
> > >  10                                    column=pay:customer_id,
> > > timestamp=1485129504390, value=1
> > >  10                                    column=pay:last_update,
> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> > >  10                                    column=pay:payment_date,
> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> > >  10                                    column=pay:rental_id,
> > > timestamp=1485129504390, value=126
> > >  10                                    column=pay:staff_id,
> > > timestamp=1485129504390, value=2
> > >
> > >
> > > Please let me know if anything required further.
> > >
> > >
> > > Thanks,
> > > Jilani
> > >
> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> > > liz.szilagyi@cloudera.com> wrote:
> > >
> > >> Hi Jilani,
> > >> I'm not sure I completely understand what you are trying to do. Could
> > you
> > >> give us some examples with e.g. 4 columns and 2 rows of example data
> > >> showing the changes that happen compared to the changes you'd like to
> > see?
> > >> Thanks,
> > >> Liz
> > >>
> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <ji...@gmail.com>
> > >> wrote:
> > >>
> > >> >
> > >> > Please help in resolving the issue, I am going through source code
> > some
> > >> > how the required nature is missing, But not sure is it for some
> reason
> > >> we
> > >> > avoided this nature.
> > >> >
> > >> > Provide me some suggestions how to go with this scenario.
> > >> >
> > >> > Thanks,
> > >> > Jilani
> > >> >
> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <jilani2423@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> We have a scenario where we are importing data into HBase with
> sqoop
> > >> >> incremental import.
> > >> >>
> > >> >> Lets say we imported a table and later source table got updated for
> > >> some
> > >> >> columns as null values for some rows. Then while doing incremental
> > >> import
> > >> >> as per HBase these columns should not be there in HBase table. But
> > >> right
> > >> >> now these columns will be as it is available with previous values.
> > >> >>
> > >> >> Is there any fix to overcome this issue?
> > >> >>
> > >> >>
> > >> >> Thanks,
> > >> >> Jilani
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Boglarka Egyed <bo...@cloudera.com>.
Hi Jilani,

To get your change committed please do the following:
* Open a JIRA ticket for your change in Apache's JIRA system
<https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
* Create a review request at Apache's review board
<https://reviews.apache.org/r/> for project Sqoop and link it to the JIRA
ticket

Please consider the guidelines below:

Review board
* Summary: generate your summary using the issue's jira key + jira title
* Groups: add the relevant group so everyone on the project will know about
your patch (Sqoop)
* Bugs: add the issue's jira key so it's easy to navigate to the jira side
* Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
* And as soon as the patch gets committed, it's very useful for the
community if you close the review and mark it as "Submitted" at the Review
board. The button to do this is top right at your own tickets, right next
to  the Download Diff button.

Jira
* Link: please add the link of the review as an external/web link so it's
easy to navigate to the reviews side
* Status: mark it as "patch available"

Sqoop community will receive emails about your new ticket and review
request and will review your change.

Thanks,
Bogi


On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <ji...@gmail.com> wrote:

> Do we have any update?
>
> I did checkout of the 1.4.6 code and done code changes to achieve this and
> tested in cluster and it is working as expected. Is there a way I can
> contribute this as a patch and then the committers can validate further and
> suggest if any changes required to move further. Please suggest the
> approach.
>
> Thanks,
> Jilani
>
> On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <ji...@gmail.com>
> wrote:
>
> > Hi Liz,
> >
> > lets say we inserted data in a table with initial import, that looks like
> > this in hbase shell
> >
> >  1                                     column=pay:amount,
> > timestamp=1485129654025, value=4.99
> >  1                                     column=pay:customer_id,
> > timestamp=1485129654025, value=1
> >  1                                     column=pay:last_update,
> > timestamp=1485129654025, value=2017-01-23 05:29:09.0
> >  1                                     column=pay:payment_date,
> > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >  1                                     column=pay:rental_id,
> > timestamp=1485129654025, value=573
> >  1                                     column=pay:staff_id,
> > timestamp=1485129654025, value=1
> >  10                                    column=pay:amount,
> > timestamp=1485129504390, value=5.99
> >  10                                    column=pay:customer_id,
> > timestamp=1485129504390, value=1
> >  10                                    column=pay:last_update,
> > timestamp=1485129504390, value=2006-02-15 22:12:30.0
> >  10                                    column=pay:payment_date,
> > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >  10                                    column=pay:rental_id,
> > timestamp=1485129504390, value=4526
> >  10                                    column=pay:staff_id,
> > timestamp=1485129504390, value=2
> >
> >
> > now assume that in source rental_id becomes NULL for rowkey "1", and then
> > we are doing incremental import into HBase. With current import the final
> > HBase data after incremental import will look like this.
> >
> >  1                                     column=pay:amount,
> > timestamp=1485129654025, value=4.99
> >  1                                     column=pay:customer_id,
> > timestamp=1485129654025, value=1
> >  1                                     column=pay:last_update,
> > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >  1                                     column=pay:payment_date,
> > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >  1                                     column=pay:rental_id,
> > timestamp=1485129654025, value=573
> >  1                                     column=pay:staff_id,
> > timestamp=1485129654025, value=1
> >  10                                    column=pay:amount,
> > timestamp=1485129504390, value=5.99
> >  10                                    column=pay:customer_id,
> > timestamp=1485129504390, value=1
> >  10                                    column=pay:last_update,
> > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >  10                                    column=pay:payment_date,
> > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >  10                                    column=pay:rental_id,
> > timestamp=1485129504390, value=126
> >  10                                    column=pay:staff_id,
> > timestamp=1485129504390, value=2
> >
> >
> >
> > As source column "rental_id" becomes NULL for rowkey "1", the final HBase
> > should not have the "rental_id" for this rowkey "1". I am expecting below
> > data for these rowkeys.
> >
> >
> >  1                                     column=pay:amount,
> > timestamp=1485129654025, value=4.99
> >  1                                     column=pay:customer_id,
> > timestamp=1485129654025, value=1
> >  1                                     column=pay:last_update,
> > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >  1                                     column=pay:payment_date,
> > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >  1                                     column=pay:staff_id,
> > timestamp=1485129654025, value=1
> >  10                                    column=pay:amount,
> > timestamp=1485129504390, value=5.99
> >  10                                    column=pay:customer_id,
> > timestamp=1485129504390, value=1
> >  10                                    column=pay:last_update,
> > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >  10                                    column=pay:payment_date,
> > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >  10                                    column=pay:rental_id,
> > timestamp=1485129504390, value=126
> >  10                                    column=pay:staff_id,
> > timestamp=1485129504390, value=2
> >
> >
> > Please let me know if anything required further.
> >
> >
> > Thanks,
> > Jilani
> >
> > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> > liz.szilagyi@cloudera.com> wrote:
> >
> >> Hi Jilani,
> >> I'm not sure I completely understand what you are trying to do. Could
> you
> >> give us some examples with e.g. 4 columns and 2 rows of example data
> >> showing the changes that happen compared to the changes you'd like to
> see?
> >> Thanks,
> >> Liz
> >>
> >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <ji...@gmail.com>
> >> wrote:
> >>
> >> >
> >> > Please help in resolving the issue, I am going through source code
> some
> >> > how the required nature is missing, But not sure is it for some reason
> >> we
> >> > avoided this nature.
> >> >
> >> > Provide me some suggestions how to go with this scenario.
> >> >
> >> > Thanks,
> >> > Jilani
> >> >
> >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <ji...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> We have a scenario where we are importing data into HBase with sqoop
> >> >> incremental import.
> >> >>
> >> >> Lets say we imported a table and later source table got updated for
> >> some
> >> >> columns as null values for some rows. Then while doing incremental
> >> import
> >> >> as per HBase these columns should not be there in HBase table. But
> >> right
> >> >> now these columns will be as it is available with previous values.
> >> >>
> >> >> Is there any fix to overcome this issue?
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Jilani
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Boglarka Egyed <bo...@cloudera.com>.
Hi Jilani,

To get your change committed please do the following:
* Open a JIRA ticket for your change in Apache's JIRA system
<https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
* Create a review request at Apache's review board
<https://reviews.apache.org/r/> for project Sqoop and link it to the JIRA
ticket

Please consider the guidelines below:

Review board
* Summary: generate your summary using the issue's jira key + jira title
* Groups: add the relevant group so everyone on the project will know about
your patch (Sqoop)
* Bugs: add the issue's jira key so it's easy to navigate to the jira side
* Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
* And as soon as the patch gets committed, it's very useful for the
community if you close the review and mark it as "Submitted" at the Review
board. The button to do this is top right at your own tickets, right next
to  the Download Diff button.

Jira
* Link: please add the link of the review as an external/web link so it's
easy to navigate to the reviews side
* Status: mark it as "patch available"

Sqoop community will receive emails about your new ticket and review
request and will review your change.

Thanks,
Bogi


On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <ji...@gmail.com> wrote:

> Do we have any update?
>
> I did checkout of the 1.4.6 code and done code changes to achieve this and
> tested in cluster and it is working as expected. Is there a way I can
> contribute this as a patch and then the committers can validate further and
> suggest if any changes required to move further. Please suggest the
> approach.
>
> Thanks,
> Jilani
>
> On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <ji...@gmail.com>
> wrote:
>
> > Hi Liz,
> >
> > lets say we inserted data in a table with initial import, that looks like
> > this in hbase shell
> >
> >  1                                     column=pay:amount,
> > timestamp=1485129654025, value=4.99
> >  1                                     column=pay:customer_id,
> > timestamp=1485129654025, value=1
> >  1                                     column=pay:last_update,
> > timestamp=1485129654025, value=2017-01-23 05:29:09.0
> >  1                                     column=pay:payment_date,
> > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >  1                                     column=pay:rental_id,
> > timestamp=1485129654025, value=573
> >  1                                     column=pay:staff_id,
> > timestamp=1485129654025, value=1
> >  10                                    column=pay:amount,
> > timestamp=1485129504390, value=5.99
> >  10                                    column=pay:customer_id,
> > timestamp=1485129504390, value=1
> >  10                                    column=pay:last_update,
> > timestamp=1485129504390, value=2006-02-15 22:12:30.0
> >  10                                    column=pay:payment_date,
> > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >  10                                    column=pay:rental_id,
> > timestamp=1485129504390, value=4526
> >  10                                    column=pay:staff_id,
> > timestamp=1485129504390, value=2
> >
> >
> > now assume that in source rental_id becomes NULL for rowkey "1", and then
> > we are doing incremental import into HBase. With current import the final
> > HBase data after incremental import will look like this.
> >
> >  1                                     column=pay:amount,
> > timestamp=1485129654025, value=4.99
> >  1                                     column=pay:customer_id,
> > timestamp=1485129654025, value=1
> >  1                                     column=pay:last_update,
> > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >  1                                     column=pay:payment_date,
> > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >  1                                     column=pay:rental_id,
> > timestamp=1485129654025, value=573
> >  1                                     column=pay:staff_id,
> > timestamp=1485129654025, value=1
> >  10                                    column=pay:amount,
> > timestamp=1485129504390, value=5.99
> >  10                                    column=pay:customer_id,
> > timestamp=1485129504390, value=1
> >  10                                    column=pay:last_update,
> > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >  10                                    column=pay:payment_date,
> > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >  10                                    column=pay:rental_id,
> > timestamp=1485129504390, value=126
> >  10                                    column=pay:staff_id,
> > timestamp=1485129504390, value=2
> >
> >
> >
> > As source column "rental_id" becomes NULL for rowkey "1", the final HBase
> > should not have the "rental_id" for this rowkey "1". I am expecting below
> > data for these rowkeys.
> >
> >
> >  1                                     column=pay:amount,
> > timestamp=1485129654025, value=4.99
> >  1                                     column=pay:customer_id,
> > timestamp=1485129654025, value=1
> >  1                                     column=pay:last_update,
> > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >  1                                     column=pay:payment_date,
> > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >  1                                     column=pay:staff_id,
> > timestamp=1485129654025, value=1
> >  10                                    column=pay:amount,
> > timestamp=1485129504390, value=5.99
> >  10                                    column=pay:customer_id,
> > timestamp=1485129504390, value=1
> >  10                                    column=pay:last_update,
> > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >  10                                    column=pay:payment_date,
> > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >  10                                    column=pay:rental_id,
> > timestamp=1485129504390, value=126
> >  10                                    column=pay:staff_id,
> > timestamp=1485129504390, value=2
> >
> >
> > Please let me know if anything required further.
> >
> >
> > Thanks,
> > Jilani
> >
> > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> > liz.szilagyi@cloudera.com> wrote:
> >
> >> Hi Jilani,
> >> I'm not sure I completely understand what you are trying to do. Could
> you
> >> give us some examples with e.g. 4 columns and 2 rows of example data
> >> showing the changes that happen compared to the changes you'd like to
> see?
> >> Thanks,
> >> Liz
> >>
> >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <ji...@gmail.com>
> >> wrote:
> >>
> >> >
> >> > Please help in resolving the issue, I am going through source code
> some
> >> > how the required nature is missing, But not sure is it for some reason
> >> we
> >> > avoided this nature.
> >> >
> >> > Provide me some suggestions how to go with this scenario.
> >> >
> >> > Thanks,
> >> > Jilani
> >> >
> >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <ji...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> We have a scenario where we are importing data into HBase with sqoop
> >> >> incremental import.
> >> >>
> >> >> Lets say we imported a table and later source table got updated for
> >> some
> >> >> columns as null values for some rows. Then while doing incremental
> >> import
> >> >> as per HBase these columns should not be there in HBase table. But
> >> right
> >> >> now these columns will be as it is available with previous values.
> >> >>
> >> >> Is there any fix to overcome this issue?
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Jilani
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Do we have any update?

I did checkout of the 1.4.6 code and done code changes to achieve this and
tested in cluster and it is working as expected. Is there a way I can
contribute this as a patch and then the committers can validate further and
suggest if any changes required to move further. Please suggest the
approach.

Thanks,
Jilani

On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <ji...@gmail.com> wrote:

> Hi Liz,
>
> lets say we inserted data in a table with initial import, that looks like
> this in hbase shell
>
>  1                                     column=pay:amount,
> timestamp=1485129654025, value=4.99
>  1                                     column=pay:customer_id,
> timestamp=1485129654025, value=1
>  1                                     column=pay:last_update,
> timestamp=1485129654025, value=2017-01-23 05:29:09.0
>  1                                     column=pay:payment_date,
> timestamp=1485129654025, value=2005-05-25 11:30:37.0
>  1                                     column=pay:rental_id,
> timestamp=1485129654025, value=573
>  1                                     column=pay:staff_id,
> timestamp=1485129654025, value=1
>  10                                    column=pay:amount,
> timestamp=1485129504390, value=5.99
>  10                                    column=pay:customer_id,
> timestamp=1485129504390, value=1
>  10                                    column=pay:last_update,
> timestamp=1485129504390, value=2006-02-15 22:12:30.0
>  10                                    column=pay:payment_date,
> timestamp=1485129504390, value=2005-07-08 03:17:05.0
>  10                                    column=pay:rental_id,
> timestamp=1485129504390, value=4526
>  10                                    column=pay:staff_id,
> timestamp=1485129504390, value=2
>
>
> now assume that in source rental_id becomes NULL for rowkey "1", and then
> we are doing incremental import into HBase. With current import the final
> HBase data after incremental import will look like this.
>
>  1                                     column=pay:amount,
> timestamp=1485129654025, value=4.99
>  1                                     column=pay:customer_id,
> timestamp=1485129654025, value=1
>  1                                     column=pay:last_update,
> timestamp=1485129654025, value=2017-02-05 05:29:09.0
>  1                                     column=pay:payment_date,
> timestamp=1485129654025, value=2005-05-25 11:30:37.0
>  1                                     column=pay:rental_id,
> timestamp=1485129654025, value=573
>  1                                     column=pay:staff_id,
> timestamp=1485129654025, value=1
>  10                                    column=pay:amount,
> timestamp=1485129504390, value=5.99
>  10                                    column=pay:customer_id,
> timestamp=1485129504390, value=1
>  10                                    column=pay:last_update,
> timestamp=1485129504390, value=2017-02-05 05:12:30.0
>  10                                    column=pay:payment_date,
> timestamp=1485129504390, value=2005-07-08 03:17:05.0
>  10                                    column=pay:rental_id,
> timestamp=1485129504390, value=126
>  10                                    column=pay:staff_id,
> timestamp=1485129504390, value=2
>
>
>
> As source column "rental_id" becomes NULL for rowkey "1", the final HBase
> should not have the "rental_id" for this rowkey "1". I am expecting below
> data for these rowkeys.
>
>
>  1                                     column=pay:amount,
> timestamp=1485129654025, value=4.99
>  1                                     column=pay:customer_id,
> timestamp=1485129654025, value=1
>  1                                     column=pay:last_update,
> timestamp=1485129654025, value=2017-02-05 05:29:09.0
>  1                                     column=pay:payment_date,
> timestamp=1485129654025, value=2005-05-25 11:30:37.0
>  1                                     column=pay:staff_id,
> timestamp=1485129654025, value=1
>  10                                    column=pay:amount,
> timestamp=1485129504390, value=5.99
>  10                                    column=pay:customer_id,
> timestamp=1485129504390, value=1
>  10                                    column=pay:last_update,
> timestamp=1485129504390, value=2017-02-05 05:12:30.0
>  10                                    column=pay:payment_date,
> timestamp=1485129504390, value=2005-07-08 03:17:05.0
>  10                                    column=pay:rental_id,
> timestamp=1485129504390, value=126
>  10                                    column=pay:staff_id,
> timestamp=1485129504390, value=2
>
>
> Please let me know if anything required further.
>
>
> Thanks,
> Jilani
>
> On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> liz.szilagyi@cloudera.com> wrote:
>
>> Hi Jilani,
>> I'm not sure I completely understand what you are trying to do. Could you
>> give us some examples with e.g. 4 columns and 2 rows of example data
>> showing the changes that happen compared to the changes you'd like to see?
>> Thanks,
>> Liz
>>
>> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>> >
>> > Please help in resolving the issue, I am going through source code some
>> > how the required nature is missing, But not sure is it for some reason
>> we
>> > avoided this nature.
>> >
>> > Provide me some suggestions how to go with this scenario.
>> >
>> > Thanks,
>> > Jilani
>> >
>> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <ji...@gmail.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> We have a scenario where we are importing data into HBase with sqoop
>> >> incremental import.
>> >>
>> >> Lets say we imported a table and later source table got updated for
>> some
>> >> columns as null values for some rows. Then while doing incremental
>> import
>> >> as per HBase these columns should not be there in HBase table. But
>> right
>> >> now these columns will be as it is available with previous values.
>> >>
>> >> Is there any fix to overcome this issue?
>> >>
>> >>
>> >> Thanks,
>> >> Jilani
>> >>
>> >
>> >
>>
>
>

Re: sqoop hbase incremental import - Sqoop 1.4.6

Posted by Jilani Shaik <ji...@gmail.com>.
Do we have any update?

I did checkout of the 1.4.6 code and done code changes to achieve this and
tested in cluster and it is working as expected. Is there a way I can
contribute this as a patch and then the committers can validate further and
suggest if any changes required to move further. Please suggest the
approach.

Thanks,
Jilani

On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <ji...@gmail.com> wrote:

> Hi Liz,
>
> lets say we inserted data in a table with initial import, that looks like
> this in hbase shell
>
>  1                                     column=pay:amount,
> timestamp=1485129654025, value=4.99
>  1                                     column=pay:customer_id,
> timestamp=1485129654025, value=1
>  1                                     column=pay:last_update,
> timestamp=1485129654025, value=2017-01-23 05:29:09.0
>  1                                     column=pay:payment_date,
> timestamp=1485129654025, value=2005-05-25 11:30:37.0
>  1                                     column=pay:rental_id,
> timestamp=1485129654025, value=573
>  1                                     column=pay:staff_id,
> timestamp=1485129654025, value=1
>  10                                    column=pay:amount,
> timestamp=1485129504390, value=5.99
>  10                                    column=pay:customer_id,
> timestamp=1485129504390, value=1
>  10                                    column=pay:last_update,
> timestamp=1485129504390, value=2006-02-15 22:12:30.0
>  10                                    column=pay:payment_date,
> timestamp=1485129504390, value=2005-07-08 03:17:05.0
>  10                                    column=pay:rental_id,
> timestamp=1485129504390, value=4526
>  10                                    column=pay:staff_id,
> timestamp=1485129504390, value=2
>
>
> now assume that in source rental_id becomes NULL for rowkey "1", and then
> we are doing incremental import into HBase. With current import the final
> HBase data after incremental import will look like this.
>
>  1                                     column=pay:amount,
> timestamp=1485129654025, value=4.99
>  1                                     column=pay:customer_id,
> timestamp=1485129654025, value=1
>  1                                     column=pay:last_update,
> timestamp=1485129654025, value=2017-02-05 05:29:09.0
>  1                                     column=pay:payment_date,
> timestamp=1485129654025, value=2005-05-25 11:30:37.0
>  1                                     column=pay:rental_id,
> timestamp=1485129654025, value=573
>  1                                     column=pay:staff_id,
> timestamp=1485129654025, value=1
>  10                                    column=pay:amount,
> timestamp=1485129504390, value=5.99
>  10                                    column=pay:customer_id,
> timestamp=1485129504390, value=1
>  10                                    column=pay:last_update,
> timestamp=1485129504390, value=2017-02-05 05:12:30.0
>  10                                    column=pay:payment_date,
> timestamp=1485129504390, value=2005-07-08 03:17:05.0
>  10                                    column=pay:rental_id,
> timestamp=1485129504390, value=126
>  10                                    column=pay:staff_id,
> timestamp=1485129504390, value=2
>
>
>
> As source column "rental_id" becomes NULL for rowkey "1", the final HBase
> should not have the "rental_id" for this rowkey "1". I am expecting below
> data for these rowkeys.
>
>
>  1                                     column=pay:amount,
> timestamp=1485129654025, value=4.99
>  1                                     column=pay:customer_id,
> timestamp=1485129654025, value=1
>  1                                     column=pay:last_update,
> timestamp=1485129654025, value=2017-02-05 05:29:09.0
>  1                                     column=pay:payment_date,
> timestamp=1485129654025, value=2005-05-25 11:30:37.0
>  1                                     column=pay:staff_id,
> timestamp=1485129654025, value=1
>  10                                    column=pay:amount,
> timestamp=1485129504390, value=5.99
>  10                                    column=pay:customer_id,
> timestamp=1485129504390, value=1
>  10                                    column=pay:last_update,
> timestamp=1485129504390, value=2017-02-05 05:12:30.0
>  10                                    column=pay:payment_date,
> timestamp=1485129504390, value=2005-07-08 03:17:05.0
>  10                                    column=pay:rental_id,
> timestamp=1485129504390, value=126
>  10                                    column=pay:staff_id,
> timestamp=1485129504390, value=2
>
>
> Please let me know if anything required further.
>
>
> Thanks,
> Jilani
>
> On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> liz.szilagyi@cloudera.com> wrote:
>
>> Hi Jilani,
>> I'm not sure I completely understand what you are trying to do. Could you
>> give us some examples with e.g. 4 columns and 2 rows of example data
>> showing the changes that happen compared to the changes you'd like to see?
>> Thanks,
>> Liz
>>
>> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <ji...@gmail.com>
>> wrote:
>>
>> >
>> > Please help in resolving the issue, I am going through source code some
>> > how the required nature is missing, But not sure is it for some reason
>> we
>> > avoided this nature.
>> >
>> > Provide me some suggestions how to go with this scenario.
>> >
>> > Thanks,
>> > Jilani
>> >
>> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <ji...@gmail.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> We have a scenario where we are importing data into HBase with sqoop
>> >> incremental import.
>> >>
>> >> Lets say we imported a table and later source table got updated for
>> some
>> >> columns as null values for some rows. Then while doing incremental
>> import
>> >> as per HBase these columns should not be there in HBase table. But
>> right
>> >> now these columns will be as it is available with previous values.
>> >>
>> >> Is there any fix to overcome this issue?
>> >>
>> >>
>> >> Thanks,
>> >> Jilani
>> >>
>> >
>> >
>>
>
>