You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by SE...@homedepot.com on 2016/11/01 16:51:25 UTC
RE: An extremely fast cassandra table full scan utility

In general, I recommend that full table scans are a bad use case for Cassandra. Another technology might be a better choice.


Sean Durity

From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Monday, October 03, 2016 4:38 PM
To: user@cassandra.apache.org
Subject: Re: An extremely fast cassandra table full scan utility

I undertook a similar effort a while ago.

https://issues.apache.org/jira/browse/CASSANDRA-7014

Other than the fact that it was closed with no comments, I can tell you that other efforts I had to embed things in Cassandra did not go swimmingly. Although at the time ideas were rejected like groovy udfs

On Mon, Oct 3, 2016 at 4:22 PM, Bhuvan Rawal <bh...@gmail.com>> wrote:
Hi Jonathan,

If full scan is a regular requirement then setting up a spark cluster in locality with Cassandra nodes makes perfect sense. But supposing that it is a one off requirement, say a weekly or a fortnightly task, a spark cluster could be an added overhead with additional capacity, resource planning as far as operations / maintenance is concerned.

So this could be thought a simple substitute for a single threaded scan without additional efforts to setup and maintain another technology.

Regards,
Bhuvan

On Tue, Oct 4, 2016 at 1:37 AM, siddharth verma <si...@gmail.com>> wrote:
Hi Jon,
It wan't allowed.
Moreover, if someone who isn't familiar with spark, and might be new to map filter reduce etc. operations, could also use the utility for some simple operations assuming a sequential scan of the cassandra table.

Regards
Siddharth Verma

On Tue, Oct 4, 2016 at 1:32 AM, Jonathan Haddad <jo...@jonhaddad.com>> wrote:
Couldn't set up as couldn't get it working, or its not allowed?

On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma <ve...@snapdeal.com>> wrote:
Hi Jon,
We couldn't setup a spark cluster.

For some use case, a spark cluster was required, but for some reason we couldn't create spark cluster. Hence, one may use this utility to iterate through the entire table at very high speed.
Had to find a work around, that would be faster than paging on result set.
Regards




Siddharth Verma
Software Engineer I - CaMS

[http://i.sdlcdn.com/img/marketing-mailers/mailer/2015/signature_24mar/images/dilkideal_logo.png]


M: +91 9013689856<tel:%2B91%209013689856>, T: 011 22791596 EXT: 14697
CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road,
Udyog Vihar Phase - IV, Gurgaon-122016, INDIA


[http://i3.sdlcdn.com/img/homepage/03/sirertPlaceWrk2.png]

Download Our App

<https://play.google.com/store/apps/details?id=com.snapdeal.main&utm_source=mobileAppLp&utm_campaign=android>[A]<https://play.google.com/store/apps/details?id=com.snapdeal.main&utm_source=mobileAppLp&utm_campaign=android><https://play.google.com/store/apps/details?id=com.snapdeal.main&utm_source=mobileAppLp&utm_campaign=android>

<https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1&mt=8&utm_source=mobileAppLp&utm_campaign=ios>[A]<https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1&mt=8&utm_source=mobileAppLp&utm_campaign=ios><https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1&mt=8&utm_source=mobileAppLp&utm_campaign=ios>

<http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f>[W]<http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f><http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f>







On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad <jo...@jonhaddad.com>> wrote:
It almost sounds like you're duplicating all the work of both spark and the connector. May I ask why you decided to not use the existing tools?

On Mon, Oct 3, 2016 at 2:21 PM siddharth verma <si...@gmail.com>> wrote:
Hi DuyHai,
Thanks for your reply.
A few more features planned in the next one(if there is one) like,
custom policy keeping in mind the replication of token range on specific nodes,
fine graining the token range(for more speedup),
and a few more.

I think, as fine graining a token range,
If one token range is split further in say, 2-3 parts, divided among threads, this would exploit the possible parallelism on a large scaled out cluster.

And, as you mentioned the JIRA, streaming of request, that would of huge help with further splitting the range.

Thanks once again for your valuable comments. :-)

Regards,
Siddharth Verma





________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.