You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Durity, Sean R" <SE...@homedepot.com> on 2019/02/07 14:14:12 UTC

RE: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

Kenneth is right. Trying to port/support a relational model to a CQL model the way you are doing it is not going to go well. You won’t be able to scale or get the search flexibility that you want. It will make Cassandra seem like a bad fit. You want to play to Cassandra’s strengths – availability, low latency, scalability, etc. so you need to store the data the way you want to retrieve it (query first modeling!). You could look at defining the “right” partition and clustering keys, so that the searches are within a single, reasonably sized partition. And you could have lookup tables for other common search patterns (item_by_model_name, etc.)

If that kind of modeling gets you to a situation where you have too many lookup tables to keep consistent, you could consider something like DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on searchable fields. A SOLR query will typically be an order of magnitude slower than a partition key lookup, though.

It really boils down to the purpose of the data store. If you are looking for primarily an “anything goes” search engine, Cassandra may not be a good choice. If you need Cassandra-level availability, extremely low latency queries (on known access patterns), high volume/low latency writes, easy scalability, etc. then you are going to have to rethink how you model the data.


Sean Durity

From: Kenneth Brotman <ke...@yahoo.com.INVALID>
Sent: Thursday, February 07, 2019 7:01 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

Peter,

Sounds like you may need to use a different architecture.  Perhaps you need something like Presto or Kafka as a part of the solution.  If the data from the legacy system is wrong for Cassandra it’s an ETL problem?  You’d have to transform the data you want to use with Cassandra so that a proper data model for Cassandra can be used.

From: Peter Heitman [mailto:peter@heitman.us]
Sent: Wednesday, February 06, 2019 10:05 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: SASI queries- cqlsh vs java driver

Yes, I have read the material. The problem is that the application has a query facility available to the user where they can type in "(A = foo AND B = bar) OR C = chex" where A, B, and C are from a defined list of terms, many of which are columns in the mytable below while others are from other tables. This query facility was implemented and shipped years before we decided to move to Cassandra
On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman <ke...@yahoo.com.invalid>> wrote:
The problem is you’re not using a query first design.  I would recommend first reading chapter 5 of Cassandra: The Definitive Guide by Jeff Carpenter and Eben Hewitt.  It’s available free online at this link<https://urldefense.proofpoint.com/v2/url?u=https-3A__books.google.com_books-3Fid-3DuW-2DPDAAAQBAJ-26pg-3DPA79-26lpg-3DPA79-26dq-3Djeff-2Bcarpenter-2Bchapter-2B5-26source-3Dbl-26ots-3D58bUYyNM-2DJ-26sig-3DACfU3U22U58-2DQPlz6kzo0zziNF-2DbP30l4Q-26hl-3Den-26sa-3DX-26ved-3D2ahUKEwi0n-2DnWzajgAhXnHzQIHf6jBJIQ6AEwAXoECAgQAQ-23v-3Donepage-26q-3Djeff-2520carpenter-2520chapter-25205-26f-3Dfalse&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=dsY_P-wGUZe0KuIuE01HDz4w9EI5AH4457c9uWyQx5g&s=C6imJ8BRMoV5A9NzORjdrEq6B77ZSAEO9dP__FAXUz8&e=>.

Kenneth Brotman

From: Peter Heitman [mailto:peter@heitman.us<ma...@heitman.us>]
Sent: Wednesday, February 06, 2019 6:33 PM

To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: SASI queries- cqlsh vs java driver

Yes, I "know" that allow filtering is a sign of a (possibly fatal) inefficient data model. I haven't figured out how to do it correctly yet
On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman <ke...@yahoo.com.invalid>> wrote:
Exactly.  When you design your data model correctly you shouldn’t have to use ALLOW FILTERING in the queries.  That is not recommended.

Kenneth Brotman

From: Peter Heitman [mailto:peter@heitman.us<ma...@heitman.us>]
Sent: Wednesday, February 06, 2019 6:09 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: SASI queries- cqlsh vs java driver

You are completely right! My problem is that I am trying to port code for SQL to CQL for an application that provides the user with a relatively general search facility. The original implementation didn't worry about secondary indexes - it just took advantage of the ability to create arbitrarily complex queries with inner joins, left joins, etc. I am reimplimenting it to create a parse tree of CQL queries and doing the ANDs and ORs in the application. Of course once I get enough of this implemented I will have to load up the table with a large data set and see if it gives acceptable performance for our use case.
On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman <ke...@yahoo.cominvalid>> wrote:
Isn’t that a lot of SASI indexes for one table.  Could you denormalize more to reduce both columns per table and SASI indexes per table?  Eight SASI indexes on one table seems like a lot.

Kenneth Brotman

From: Peter Heitman [mailto:peter@heitman.us<ma...@heitman.us>]
Sent: Tuesday, February 05, 2019 6:59 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: SASI queries- cqlsh vs java driver

The table and secondary indexes look generally like this Note that I have changed the names of many of the columns to be generic since they aren't important to the question as far as I know. I left the actual names for those columns that I've created SASI indexes for. The query I use to try to create a PreparedStatement is:

SELECT sql_id, type, cpe_id, serial, product_class, manufacturer, sw_version FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING

the schema cql statements are:

CREATE TABLE IF NOT EXISTS mykeyspace.mytable (
  id text,
  sql_id bigint,
  cpe_id text,
  sw_version text,
  hw_version text,
  manufacturer text,
  product_class text,
  manufacturer_oui text,
  description text,
  periodic_inform_interval text,
  restricted_mode_enabled text,
  restricted_mode_reason text,
  type text,
  model_name text,
  serial text,
  mac text,
   text,
  generic0 timestamp,
  household_id text,
  generic1 int,
  generic2 text,
  generic3 text,
  generic4 int,
  generic5 int,
  generic6 text,
  generic7 text,
  generic8 text,
  generic9 text,
  generic10 text,
  generic11 timestamp,
  generic12 text,
  generic13 text,
  generic14 timestamp,
  generic15 text,
  generic16 text,
  generic17 text,
  generic18 text,
  generic19 text,
  generic20 text,
  generic21 text,
  generic22 text,
  generic23 text,
  generic24 text,
  generic25 text,
  generic26 text,
  generic27 text,
  generic28 int,
  generic29 int,
  generic30 text,
  generic31 text,
  generic32 text,
  generic33 text,
  generic34 text,
  generic35 int,
  generic36 int,
  generic37 int,
  generic38 int,
  generic39 text,
  generic40 text,
  generic41 text,
  generic42 text,
  generic43 text,
  generic44 text,
  generic45 text,
  PRIMARY KEY (id)
);

CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);

CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable (serial)
       USING 'org.apache.cassandra.index.sasi.SASIIndex'
       WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable (cpe_id)
       USING 'org.apache.cassandra.index.sasi.SASIIndex'
       WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)
       USING 'org.apache.cassandra.index.sasi.SASIIndex'
       WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'orgapache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_idx ON mykeyspace.mytable (manufacturer)
       USING 'org.apache.cassandra.index.sasi.SASIIndex'
       WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_oui_idx ON mykeyspace.mytable (manufacturer_oui)
       USING 'org.apache.cassandra.index.sasiSASIIndex'
       WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_hw_version_idx ON mykeyspace.mytable (hw_version)
       USING 'org.apache.cassandra.index.sasi.SASIIndex'
       WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_sw_version_idx ON mykeyspace.mytable (sw_version)
       USING 'org.apache.cassandra.index.sasi.SASIIndex'
       WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};

CREATE CUSTOM INDEX IF NOT EXISTS bv_household_id_idx ON mykeyspace.mytable (household_id)
       USING 'org.apache.cassandra.index.sasi.SASIIndex'
       WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.indexsasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};


On Tue, Feb 5, 2019 at 3:33 PM Oleksandr Petrov <ol...@gmail.com>> wrote:
Could you post full table schema (names obfuscated, if required) with index creation statements and queries?

On Mon, Feb 4, 2019 at 10:04 AM Jacques-Henri Berthemet <ja...@genesys.com>> wrote:
I’m not sure why it`s not allowed by the Datastax driver, but maybe you could try to use OR instead of IN?
SELECT blah FROM foo WHERE <indexed column> = :val1 OR <indexed column> = :val2 ALLOW FILTERING

It should be the same as IN query, but I don’t if it makes a difference for performance.

From: Peter Heitman <pe...@heitman.us>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Monday 4 February 2019 at 07:17
To: "user@cassandra.apache.org<ma...@cassandraapache.org>" <us...@cassandra.apache.org>>
Subject: SASI queries- cqlsh vs java driver

When I create a SASI index on a secondary column, from cqlsh I can execute a query

SELECT blah FROM foo WHERE <indexed column> IN ('mytext') ALLOW FILTERING;

but not from the java driver:

SELECT blah FROM foo WHERE <indexed column> IN :val ALLOW FILTERING

Here I get an exception

com.datastax.driver.coreexceptions.InvalidQueryException: IN predicates on non-primary-key columns (<indexed column>) is not yet supported
at com.datastax.driver.coreexceptions.InvalidQueryException.copy(InvalidQueryException.java:49) ~[cassandra-driver-core-3.6.0.jar:na]

Why are they different? Is there anything I can do with the java driver to get past this exception?

Peter




--
alex p

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

Posted by Peter Heitman <pe...@heitman.us>.

I appreciate the thoughtful replies. We will have to evaluate whether
cassandra is the right datastore for us. It was chosen because our primary
requirement is to store lots of data about lots of devices at a high rate.
The search requirements are very secondary but required for the management
of the devices. We are close to being able to do some scale testing of the
solution and will evaluate the cost of cassandra for this application at
that time.

On Wed, Feb 27, 2019 at 2:04 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:

> If the goal is arbitrary queries, I'd avoid Cassandra altogether.  Don't
> use DSE Search or Ellesandra, they're two solutions designed to solve
> problems that are Cassandra first, search second.
>
> I'd go straight to elastic search for workloads that are primarily search
> driven, like you listed above.  The idea of having one DB doing both things
> sounds great until it's an operational nightmare.
>
> On Wed, Feb 27, 2019 at 10:57 AM Rahul Singh <ra...@gmail.com>
> wrote:
>
>> +1 on Datastax and could consider looking at Elassandra.
>>
>> On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R <
>> SEAN_R_DURITY@homedepot.com> wrote:
>>
>>> Kenneth is right. Trying to port/support a relational model to a CQL
>>> model the way you are doing it is not going to go well. You won’t be able
>>> to scale or get the search flexibility that you want. It will make
>>> Cassandra seem like a bad fit. You want to play to Cassandra’s strengths –
>>> availability, low latency, scalability, etc. so you need to store the data
>>> the way you want to retrieve it (query first modeling!). You could look at
>>> defining the “right” partition and clustering keys, so that the searches
>>> are within a single, reasonably sized partition. And you could have lookup
>>> tables for other common search patterns (item_by_model_name, etc.)
>>>
>>>
>>>
>>> If that kind of modeling gets you to a situation where you have too many
>>> lookup tables to keep consistent, you could consider something like
>>> DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on
>>> searchable fields. A SOLR query will typically be an order of magnitude
>>> slower than a partition key lookup, though.
>>>
>>>
>>>
>>> It really boils down to the purpose of the data store. If you are
>>> looking for primarily an “anything goes” search engine, Cassandra may not
>>> be a good choice. If you need Cassandra-level availability, extremely low
>>> latency queries (on known access patterns), high volume/low latency writes,
>>> easy scalability, etc. then you are going to have to rethink how you model
>>> the data.
>>>
>>>
>>>
>>>
>>>
>>> Sean Durity
>>>
>>>
>>>
>>> *From:* Kenneth Brotman <ke...@yahoo.com.INVALID>
>>> *Sent:* Thursday, February 07, 2019 7:01 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver
>>>
>>>
>>>
>>> Peter,
>>>
>>>
>>>
>>> Sounds like you may need to use a different architecture.  Perhaps you
>>> need something like Presto or Kafka as a part of the solution.  If the data
>>> from the legacy system is wrong for Cassandra it’s an ETL problem?  You’d
>>> have to transform the data you want to use with Cassandra so that a proper
>>> data model for Cassandra can be used.
>>>
>>>
>>>
>>> *From:* Peter Heitman [mailto:peter@heitman.us <pe...@heitman.us>]
>>> *Sent:* Wednesday, February 06, 2019 10:05 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>>
>>>
>>>
>>> Yes, I have read the material. The problem is that the application has a
>>> query facility available to the user where they can type in "(A = foo AND B
>>> = bar) OR C = chex" where A, B, and C are from a defined list of terms,
>>> many of which are columns in the mytable below while others are from other
>>> tables. This query facility was implemented and shipped years before we
>>> decided to move to Cassandra
>>>
>>> On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman <
>>> kenbrotman@yahoo.com.invalid> wrote:
>>>
>>> The problem is you’re not using a query first design.  I would recommend
>>> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff
>>> Carpenter and Eben Hewitt.  It’s available free online at this link
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__books.google.com_books-3Fid-3DuW-2DPDAAAQBAJ-26pg-3DPA79-26lpg-3DPA79-26dq-3Djeff-2Bcarpenter-2Bchapter-2B5-26source-3Dbl-26ots-3D58bUYyNM-2DJ-26sig-3DACfU3U22U58-2DQPlz6kzo0zziNF-2DbP30l4Q-26hl-3Den-26sa-3DX-26ved-3D2ahUKEwi0n-2DnWzajgAhXnHzQIHf6jBJIQ6AEwAXoECAgQAQ-23v-3Donepage-26q-3Djeff-2520carpenter-2520chapter-25205-26f-3Dfalse&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=dsY_P-wGUZe0KuIuE01HDz4w9EI5AH4457c9uWyQx5g&s=C6imJ8BRMoV5A9NzORjdrEq6B77ZSAEO9dP__FAXUz8&e=>
>>> .
>>>
>>>
>>>
>>> Kenneth Brotman
>>>
>>>
>>>
>>> *From:* Peter Heitman [mailto:peter@heitman.us]
>>> *Sent:* Wednesday, February 06, 2019 6:33 PM
>>>
>>>
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>>
>>>
>>>
>>> Yes, I "know" that allow filtering is a sign of a (possibly fatal)
>>> inefficient data model. I haven't figured out how to do it correctly yet
>>>
>>> On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman <
>>> kenbrotman@yahoo.com.invalid> wrote:
>>>
>>> Exactly.  When you design your data model correctly you shouldn’t have
>>> to use ALLOW FILTERING in the queries.  That is not recommended.
>>>
>>>
>>>
>>> Kenneth Brotman
>>>
>>>
>>>
>>> *From:* Peter Heitman [mailto:peter@heitman.us]
>>> *Sent:* Wednesday, February 06, 2019 6:09 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>>
>>>
>>>
>>> You are completely right! My problem is that I am trying to port code
>>> for SQL to CQL for an application that provides the user with a relatively
>>> general search facility. The original implementation didn't worry about
>>> secondary indexes - it just took advantage of the ability to create
>>> arbitrarily complex queries with inner joins, left joins, etc. I am
>>> reimplimenting it to create a parse tree of CQL queries and doing the ANDs
>>> and ORs in the application. Of course once I get enough of this implemented
>>> I will have to load up the table with a large data set and see if it gives
>>> acceptable performance for our use case.
>>>
>>> On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman <
>>> kenbrotman@yahoo.cominvalid> wrote:
>>>
>>> Isn’t that a lot of SASI indexes for one table.  Could you denormalize
>>> more to reduce both columns per table and SASI indexes per table?  Eight
>>> SASI indexes on one table seems like a lot.
>>>
>>>
>>>
>>> Kenneth Brotman
>>>
>>>
>>>
>>> *From:* Peter Heitman [mailto:peter@heitman.us]
>>> *Sent:* Tuesday, February 05, 2019 6:59 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>>
>>>
>>>
>>> The table and secondary indexes look generally like this Note that I
>>> have changed the names of many of the columns to be generic since they
>>> aren't important to the question as far as I know. I left the actual names
>>> for those columns that I've created SASI indexes for. The query I use to
>>> try to create a PreparedStatement is:
>>>
>>>
>>>
>>> SELECT sql_id, type, cpe_id, serial, product_class, manufacturer,
>>> sw_version FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING
>>>
>>>
>>>
>>> the schema cql statements are:
>>>
>>>
>>>
>>> CREATE TABLE IF NOT EXISTS mykeyspace.mytable (
>>>
>>>   id text,
>>>
>>>   sql_id bigint,
>>>
>>>   cpe_id text,
>>>
>>>   sw_version text,
>>>
>>>   hw_version text,
>>>
>>>   manufacturer text,
>>>
>>>   product_class text,
>>>
>>>   manufacturer_oui text,
>>>
>>>   description text,
>>>
>>>   periodic_inform_interval text,
>>>
>>>   restricted_mode_enabled text,
>>>
>>>   restricted_mode_reason text,
>>>
>>>   type text,
>>>
>>>   model_name text,
>>>
>>>   serial text,
>>>
>>>   mac text,
>>>
>>>    text,
>>>
>>>   generic0 timestamp,
>>>
>>>   household_id text,
>>>
>>>   generic1 int,
>>>
>>>   generic2 text,
>>>
>>>   generic3 text,
>>>
>>>   generic4 int,
>>>
>>>   generic5 int,
>>>
>>>   generic6 text,
>>>
>>>   generic7 text,
>>>
>>>   generic8 text,
>>>
>>>   generic9 text,
>>>
>>>   generic10 text,
>>>
>>>   generic11 timestamp,
>>>
>>>   generic12 text,
>>>
>>>   generic13 text,
>>>
>>>   generic14 timestamp,
>>>
>>>   generic15 text,
>>>
>>>   generic16 text,
>>>
>>>   generic17 text,
>>>
>>>   generic18 text,
>>>
>>>   generic19 text,
>>>
>>>   generic20 text,
>>>
>>>   generic21 text,
>>>
>>>   generic22 text,
>>>
>>>   generic23 text,
>>>
>>>   generic24 text,
>>>
>>>   generic25 text,
>>>
>>>   generic26 text,
>>>
>>>   generic27 text,
>>>
>>>   generic28 int,
>>>
>>>   generic29 int,
>>>
>>>   generic30 text,
>>>
>>>   generic31 text,
>>>
>>>   generic32 text,
>>>
>>>   generic33 text,
>>>
>>>   generic34 text,
>>>
>>>   generic35 int,
>>>
>>>   generic36 int,
>>>
>>>   generic37 int,
>>>
>>>   generic38 int,
>>>
>>>   generic39 text,
>>>
>>>   generic40 text,
>>>
>>>   generic41 text,
>>>
>>>   generic42 text,
>>>
>>>   generic43 text,
>>>
>>>   generic44 text,
>>>
>>>   generic45 text,
>>>
>>>   PRIMARY KEY (id)
>>>
>>> );
>>>
>>>
>>>
>>> CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);
>>>
>>>
>>>
>>> CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable
>>> (serial)
>>>
>>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>>
>>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>>> 'case_sensitive': 'false'};
>>>
>>>
>>>
>>> CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable
>>> (cpe_id)
>>>
>>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>>
>>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>>> 'case_sensitive': 'false'};
>>>
>>>
>>>
>>> CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)
>>>
>>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>>
>>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>>> 'orgapache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>>> 'case_sensitive': 'false'};
>>>
>>>
>>>
>>> CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_idx ON
>>> mykeyspace.mytable (manufacturer)
>>>
>>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>>
>>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>>> 'case_sensitive': 'false'};
>>>
>>>
>>>
>>> CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_oui_idx ON
>>> mykeyspace.mytable (manufacturer_oui)
>>>
>>>        USING 'org.apache.cassandra.index.sasiSASIIndex'
>>>
>>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>>> 'case_sensitive': 'false'};
>>>
>>>
>>>
>>> CREATE CUSTOM INDEX IF NOT EXISTS bv_hw_version_idx ON
>>> mykeyspace.mytable (hw_version)
>>>
>>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>>
>>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>>> 'case_sensitive': 'false'};
>>>
>>>
>>>
>>> CREATE CUSTOM INDEX IF NOT EXISTS bv_sw_version_idx ON
>>> mykeyspace.mytable (sw_version)
>>>
>>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>>
>>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>>> 'case_sensitive': 'false'};
>>>
>>>
>>>
>>> CREATE CUSTOM INDEX IF NOT EXISTS bv_household_id_idx ON
>>> mykeyspace.mytable (household_id)
>>>
>>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>>
>>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>>> 'org.apache.cassandra.indexsasi.analyzer.NonTokenizingAnalyzer',
>>> 'case_sensitive': 'false'};
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Feb 5, 2019 at 3:33 PM Oleksandr Petrov <
>>> oleksandr.petrov@gmail.com> wrote:
>>>
>>> Could you post full table schema (names obfuscated, if required) with
>>> index creation statements and queries?
>>>
>>>
>>>
>>> On Mon, Feb 4, 2019 at 10:04 AM Jacques-Henri Berthemet <
>>> jacques-henri.berthemet@genesys.com> wrote:
>>>
>>> I’m not sure why it`s not allowed by the Datastax driver, but maybe you
>>> could try to use OR instead of IN?
>>>
>>> SELECT blah FROM foo WHERE <indexed column> = :val1 OR <indexed column>
>>> = :val2 ALLOW FILTERING
>>>
>>>
>>>
>>> It should be the same as IN query, but I don’t if it makes a difference
>>> for performance.
>>>
>>>
>>>
>>> *From: *Peter Heitman <pe...@heitman.us>
>>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> *Date: *Monday 4 February 2019 at 07:17
>>> *To: *"user@cassandra.apache.org <us...@cassandraapache.org>" <
>>> user@cassandra.apache.org>
>>> *Subject: *SASI queries- cqlsh vs java driver
>>>
>>>
>>>
>>> When I create a SASI index on a secondary column, from cqlsh I can
>>> execute a query
>>>
>>>
>>>
>>> SELECT blah FROM foo WHERE <indexed column> IN ('mytext') ALLOW
>>> FILTERING;
>>>
>>>
>>>
>>> but not from the java driver:
>>>
>>>
>>>
>>> SELECT blah FROM foo WHERE <indexed column> IN :val ALLOW FILTERING
>>>
>>>
>>>
>>> Here I get an exception
>>>
>>>
>>>
>>> com.datastax.driver.coreexceptions.InvalidQueryException: IN predicates
>>> on non-primary-key columns (<indexed column>) is not yet supported
>>>
>>> at
>>> com.datastax.driver.coreexceptions.InvalidQueryException.copy(InvalidQueryException.java:49)
>>> ~[cassandra-driver-core-3.6.0.jar:na]
>>>
>>>
>>>
>>> Why are they different? Is there anything I can do with the java driver
>>> to get past this exception?
>>>
>>>
>>>
>>> Peter
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> alex p
>>>
>>>
>>> ------------------------------
>>>
>>> The information in this Internet Email is confidential and may be
>>> legally privileged. It is intended solely for the addressee. Access to this
>>> Email by anyone else is unauthorized. If you are not the intended
>>> recipient, any disclosure, copying, distribution or any action taken or
>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>> When addressed to our clients any opinions or advice contained in this
>>> Email are subject to the terms and conditions expressed in any applicable
>>> governing The Home Depot terms of business or client engagement letter. The
>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>> content of this attachment and for any damages or losses arising from any
>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>> items of a destructive nature, which may be contained in this attachment
>>> and shall not be liable for direct, indirect, consequential or special
>>> damages in connection with this e-mail message or its attachment.
>>>
>>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

If the goal is arbitrary queries, I'd avoid Cassandra altogether.  Don't
use DSE Search or Ellesandra, they're two solutions designed to solve
problems that are Cassandra first, search second.

I'd go straight to elastic search for workloads that are primarily search
driven, like you listed above.  The idea of having one DB doing both things
sounds great until it's an operational nightmare.

On Wed, Feb 27, 2019 at 10:57 AM Rahul Singh <ra...@gmail.com>
wrote:

> +1 on Datastax and could consider looking at Elassandra.
>
> On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R <SE...@homedepot.com>
> wrote:
>
>> Kenneth is right. Trying to port/support a relational model to a CQL
>> model the way you are doing it is not going to go well. You won’t be able
>> to scale or get the search flexibility that you want. It will make
>> Cassandra seem like a bad fit. You want to play to Cassandra’s strengths –
>> availability, low latency, scalability, etc. so you need to store the data
>> the way you want to retrieve it (query first modeling!). You could look at
>> defining the “right” partition and clustering keys, so that the searches
>> are within a single, reasonably sized partition. And you could have lookup
>> tables for other common search patterns (item_by_model_name, etc.)
>>
>>
>>
>> If that kind of modeling gets you to a situation where you have too many
>> lookup tables to keep consistent, you could consider something like
>> DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on
>> searchable fields. A SOLR query will typically be an order of magnitude
>> slower than a partition key lookup, though.
>>
>>
>>
>> It really boils down to the purpose of the data store. If you are looking
>> for primarily an “anything goes” search engine, Cassandra may not be a good
>> choice. If you need Cassandra-level availability, extremely low latency
>> queries (on known access patterns), high volume/low latency writes, easy
>> scalability, etc. then you are going to have to rethink how you model the
>> data.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> *From:* Kenneth Brotman <ke...@yahoo.com.INVALID>
>> *Sent:* Thursday, February 07, 2019 7:01 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Peter,
>>
>>
>>
>> Sounds like you may need to use a different architecture.  Perhaps you
>> need something like Presto or Kafka as a part of the solution.  If the data
>> from the legacy system is wrong for Cassandra it’s an ETL problem?  You’d
>> have to transform the data you want to use with Cassandra so that a proper
>> data model for Cassandra can be used.
>>
>>
>>
>> *From:* Peter Heitman [mailto:peter@heitman.us <pe...@heitman.us>]
>> *Sent:* Wednesday, February 06, 2019 10:05 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Yes, I have read the material. The problem is that the application has a
>> query facility available to the user where they can type in "(A = foo AND B
>> = bar) OR C = chex" where A, B, and C are from a defined list of terms,
>> many of which are columns in the mytable below while others are from other
>> tables. This query facility was implemented and shipped years before we
>> decided to move to Cassandra
>>
>> On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman <
>> kenbrotman@yahoo.com.invalid> wrote:
>>
>> The problem is you’re not using a query first design.  I would recommend
>> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff
>> Carpenter and Eben Hewitt.  It’s available free online at this link
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__books.google.com_books-3Fid-3DuW-2DPDAAAQBAJ-26pg-3DPA79-26lpg-3DPA79-26dq-3Djeff-2Bcarpenter-2Bchapter-2B5-26source-3Dbl-26ots-3D58bUYyNM-2DJ-26sig-3DACfU3U22U58-2DQPlz6kzo0zziNF-2DbP30l4Q-26hl-3Den-26sa-3DX-26ved-3D2ahUKEwi0n-2DnWzajgAhXnHzQIHf6jBJIQ6AEwAXoECAgQAQ-23v-3Donepage-26q-3Djeff-2520carpenter-2520chapter-25205-26f-3Dfalse&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=dsY_P-wGUZe0KuIuE01HDz4w9EI5AH4457c9uWyQx5g&s=C6imJ8BRMoV5A9NzORjdrEq6B77ZSAEO9dP__FAXUz8&e=>
>> .
>>
>>
>>
>> Kenneth Brotman
>>
>>
>>
>> *From:* Peter Heitman [mailto:peter@heitman.us]
>> *Sent:* Wednesday, February 06, 2019 6:33 PM
>>
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Yes, I "know" that allow filtering is a sign of a (possibly fatal)
>> inefficient data model. I haven't figured out how to do it correctly yet
>>
>> On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman <
>> kenbrotman@yahoo.com.invalid> wrote:
>>
>> Exactly.  When you design your data model correctly you shouldn’t have to
>> use ALLOW FILTERING in the queries.  That is not recommended.
>>
>>
>>
>> Kenneth Brotman
>>
>>
>>
>> *From:* Peter Heitman [mailto:peter@heitman.us]
>> *Sent:* Wednesday, February 06, 2019 6:09 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> You are completely right! My problem is that I am trying to port code for
>> SQL to CQL for an application that provides the user with a relatively
>> general search facility. The original implementation didn't worry about
>> secondary indexes - it just took advantage of the ability to create
>> arbitrarily complex queries with inner joins, left joins, etc. I am
>> reimplimenting it to create a parse tree of CQL queries and doing the ANDs
>> and ORs in the application. Of course once I get enough of this implemented
>> I will have to load up the table with a large data set and see if it gives
>> acceptable performance for our use case.
>>
>> On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman <ke...@yahoo.cominvalid>
>> wrote:
>>
>> Isn’t that a lot of SASI indexes for one table.  Could you denormalize
>> more to reduce both columns per table and SASI indexes per table?  Eight
>> SASI indexes on one table seems like a lot.
>>
>>
>>
>> Kenneth Brotman
>>
>>
>>
>> *From:* Peter Heitman [mailto:peter@heitman.us]
>> *Sent:* Tuesday, February 05, 2019 6:59 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> The table and secondary indexes look generally like this Note that I have
>> changed the names of many of the columns to be generic since they aren't
>> important to the question as far as I know. I left the actual names for
>> those columns that I've created SASI indexes for. The query I use to try to
>> create a PreparedStatement is:
>>
>>
>>
>> SELECT sql_id, type, cpe_id, serial, product_class, manufacturer,
>> sw_version FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING
>>
>>
>>
>> the schema cql statements are:
>>
>>
>>
>> CREATE TABLE IF NOT EXISTS mykeyspace.mytable (
>>
>>   id text,
>>
>>   sql_id bigint,
>>
>>   cpe_id text,
>>
>>   sw_version text,
>>
>>   hw_version text,
>>
>>   manufacturer text,
>>
>>   product_class text,
>>
>>   manufacturer_oui text,
>>
>>   description text,
>>
>>   periodic_inform_interval text,
>>
>>   restricted_mode_enabled text,
>>
>>   restricted_mode_reason text,
>>
>>   type text,
>>
>>   model_name text,
>>
>>   serial text,
>>
>>   mac text,
>>
>>    text,
>>
>>   generic0 timestamp,
>>
>>   household_id text,
>>
>>   generic1 int,
>>
>>   generic2 text,
>>
>>   generic3 text,
>>
>>   generic4 int,
>>
>>   generic5 int,
>>
>>   generic6 text,
>>
>>   generic7 text,
>>
>>   generic8 text,
>>
>>   generic9 text,
>>
>>   generic10 text,
>>
>>   generic11 timestamp,
>>
>>   generic12 text,
>>
>>   generic13 text,
>>
>>   generic14 timestamp,
>>
>>   generic15 text,
>>
>>   generic16 text,
>>
>>   generic17 text,
>>
>>   generic18 text,
>>
>>   generic19 text,
>>
>>   generic20 text,
>>
>>   generic21 text,
>>
>>   generic22 text,
>>
>>   generic23 text,
>>
>>   generic24 text,
>>
>>   generic25 text,
>>
>>   generic26 text,
>>
>>   generic27 text,
>>
>>   generic28 int,
>>
>>   generic29 int,
>>
>>   generic30 text,
>>
>>   generic31 text,
>>
>>   generic32 text,
>>
>>   generic33 text,
>>
>>   generic34 text,
>>
>>   generic35 int,
>>
>>   generic36 int,
>>
>>   generic37 int,
>>
>>   generic38 int,
>>
>>   generic39 text,
>>
>>   generic40 text,
>>
>>   generic41 text,
>>
>>   generic42 text,
>>
>>   generic43 text,
>>
>>   generic44 text,
>>
>>   generic45 text,
>>
>>   PRIMARY KEY (id)
>>
>> );
>>
>>
>>
>> CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable
>> (serial)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable
>> (cpe_id)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'orgapache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_idx ON
>> mykeyspace.mytable (manufacturer)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_oui_idx ON
>> mykeyspace.mytable (manufacturer_oui)
>>
>>        USING 'org.apache.cassandra.index.sasiSASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_hw_version_idx ON mykeyspace.mytable
>> (hw_version)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_sw_version_idx ON mykeyspace.mytable
>> (sw_version)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_household_id_idx ON
>> mykeyspace.mytable (household_id)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.indexsasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>>
>>
>> On Tue, Feb 5, 2019 at 3:33 PM Oleksandr Petrov <
>> oleksandr.petrov@gmail.com> wrote:
>>
>> Could you post full table schema (names obfuscated, if required) with
>> index creation statements and queries?
>>
>>
>>
>> On Mon, Feb 4, 2019 at 10:04 AM Jacques-Henri Berthemet <
>> jacques-henri.berthemet@genesys.com> wrote:
>>
>> I’m not sure why it`s not allowed by the Datastax driver, but maybe you
>> could try to use OR instead of IN?
>>
>> SELECT blah FROM foo WHERE <indexed column> = :val1 OR <indexed column> =
>> :val2 ALLOW FILTERING
>>
>>
>>
>> It should be the same as IN query, but I don’t if it makes a difference
>> for performance.
>>
>>
>>
>> *From: *Peter Heitman <pe...@heitman.us>
>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Date: *Monday 4 February 2019 at 07:17
>> *To: *"user@cassandra.apache.org <us...@cassandraapache.org>" <
>> user@cassandra.apache.org>
>> *Subject: *SASI queries- cqlsh vs java driver
>>
>>
>>
>> When I create a SASI index on a secondary column, from cqlsh I can
>> execute a query
>>
>>
>>
>> SELECT blah FROM foo WHERE <indexed column> IN ('mytext') ALLOW FILTERING;
>>
>>
>>
>> but not from the java driver:
>>
>>
>>
>> SELECT blah FROM foo WHERE <indexed column> IN :val ALLOW FILTERING
>>
>>
>>
>> Here I get an exception
>>
>>
>>
>> com.datastax.driver.coreexceptions.InvalidQueryException: IN predicates
>> on non-primary-key columns (<indexed column>) is not yet supported
>>
>> at
>> com.datastax.driver.coreexceptions.InvalidQueryException.copy(InvalidQueryException.java:49)
>> ~[cassandra-driver-core-3.6.0.jar:na]
>>
>>
>>
>> Why are they different? Is there anything I can do with the java driver
>> to get past this exception?
>>
>>
>>
>> Peter
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> alex p
>>
>>
>> ------------------------------
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

Posted by Rahul Singh <ra...@gmail.com>.

+1 on Datastax and could consider looking at Elassandra.

On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R <SE...@homedepot.com>
wrote:

> Kenneth is right. Trying to port/support a relational model to a CQL model
> the way you are doing it is not going to go well. You won’t be able to
> scale or get the search flexibility that you want. It will make Cassandra
> seem like a bad fit. You want to play to Cassandra’s strengths –
> availability, low latency, scalability, etc. so you need to store the data
> the way you want to retrieve it (query first modeling!). You could look at
> defining the “right” partition and clustering keys, so that the searches
> are within a single, reasonably sized partition. And you could have lookup
> tables for other common search patterns (item_by_model_name, etc.)
>
>
>
> If that kind of modeling gets you to a situation where you have too many
> lookup tables to keep consistent, you could consider something like
> DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on
> searchable fields. A SOLR query will typically be an order of magnitude
> slower than a partition key lookup, though.
>
>
>
> It really boils down to the purpose of the data store. If you are looking
> for primarily an “anything goes” search engine, Cassandra may not be a good
> choice. If you need Cassandra-level availability, extremely low latency
> queries (on known access patterns), high volume/low latency writes, easy
> scalability, etc. then you are going to have to rethink how you model the
> data.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Kenneth Brotman <ke...@yahoo.com.INVALID>
> *Sent:* Thursday, February 07, 2019 7:01 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver
>
>
>
> Peter,
>
>
>
> Sounds like you may need to use a different architecture.  Perhaps you
> need something like Presto or Kafka as a part of the solution.  If the data
> from the legacy system is wrong for Cassandra it’s an ETL problem?  You’d
> have to transform the data you want to use with Cassandra so that a proper
> data model for Cassandra can be used.
>
>
>
> *From:* Peter Heitman [mailto:peter@heitman.us <pe...@heitman.us>]
> *Sent:* Wednesday, February 06, 2019 10:05 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> Yes, I have read the material. The problem is that the application has a
> query facility available to the user where they can type in "(A = foo AND B
> = bar) OR C = chex" where A, B, and C are from a defined list of terms,
> many of which are columns in the mytable below while others are from other
> tables. This query facility was implemented and shipped years before we
> decided to move to Cassandra
>
> On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman <ke...@yahoo.com.invalid>
> wrote:
>
> The problem is you’re not using a query first design.  I would recommend
> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff
> Carpenter and Eben Hewitt.  It’s available free online at this link
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__books.google.com_books-3Fid-3DuW-2DPDAAAQBAJ-26pg-3DPA79-26lpg-3DPA79-26dq-3Djeff-2Bcarpenter-2Bchapter-2B5-26source-3Dbl-26ots-3D58bUYyNM-2DJ-26sig-3DACfU3U22U58-2DQPlz6kzo0zziNF-2DbP30l4Q-26hl-3Den-26sa-3DX-26ved-3D2ahUKEwi0n-2DnWzajgAhXnHzQIHf6jBJIQ6AEwAXoECAgQAQ-23v-3Donepage-26q-3Djeff-2520carpenter-2520chapter-25205-26f-3Dfalse&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=dsY_P-wGUZe0KuIuE01HDz4w9EI5AH4457c9uWyQx5g&s=C6imJ8BRMoV5A9NzORjdrEq6B77ZSAEO9dP__FAXUz8&e=>
> .
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Peter Heitman [mailto:peter@heitman.us]
> *Sent:* Wednesday, February 06, 2019 6:33 PM
>
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> Yes, I "know" that allow filtering is a sign of a (possibly fatal)
> inefficient data model. I haven't figured out how to do it correctly yet
>
> On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman <ke...@yahoo.com.invalid>
> wrote:
>
> Exactly.  When you design your data model correctly you shouldn’t have to
> use ALLOW FILTERING in the queries.  That is not recommended.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Peter Heitman [mailto:peter@heitman.us]
> *Sent:* Wednesday, February 06, 2019 6:09 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> You are completely right! My problem is that I am trying to port code for
> SQL to CQL for an application that provides the user with a relatively
> general search facility. The original implementation didn't worry about
> secondary indexes - it just took advantage of the ability to create
> arbitrarily complex queries with inner joins, left joins, etc. I am
> reimplimenting it to create a parse tree of CQL queries and doing the ANDs
> and ORs in the application. Of course once I get enough of this implemented
> I will have to load up the table with a large data set and see if it gives
> acceptable performance for our use case.
>
> On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman <ke...@yahoo.cominvalid>
> wrote:
>
> Isn’t that a lot of SASI indexes for one table.  Could you denormalize
> more to reduce both columns per table and SASI indexes per table?  Eight
> SASI indexes on one table seems like a lot.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Peter Heitman [mailto:peter@heitman.us]
> *Sent:* Tuesday, February 05, 2019 6:59 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: SASI queries- cqlsh vs java driver
>
>
>
> The table and secondary indexes look generally like this Note that I have
> changed the names of many of the columns to be generic since they aren't
> important to the question as far as I know. I left the actual names for
> those columns that I've created SASI indexes for. The query I use to try to
> create a PreparedStatement is:
>
>
>
> SELECT sql_id, type, cpe_id, serial, product_class, manufacturer,
> sw_version FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING
>
>
>
> the schema cql statements are:
>
>
>
> CREATE TABLE IF NOT EXISTS mykeyspace.mytable (
>
>   id text,
>
>   sql_id bigint,
>
>   cpe_id text,
>
>   sw_version text,
>
>   hw_version text,
>
>   manufacturer text,
>
>   product_class text,
>
>   manufacturer_oui text,
>
>   description text,
>
>   periodic_inform_interval text,
>
>   restricted_mode_enabled text,
>
>   restricted_mode_reason text,
>
>   type text,
>
>   model_name text,
>
>   serial text,
>
>   mac text,
>
>    text,
>
>   generic0 timestamp,
>
>   household_id text,
>
>   generic1 int,
>
>   generic2 text,
>
>   generic3 text,
>
>   generic4 int,
>
>   generic5 int,
>
>   generic6 text,
>
>   generic7 text,
>
>   generic8 text,
>
>   generic9 text,
>
>   generic10 text,
>
>   generic11 timestamp,
>
>   generic12 text,
>
>   generic13 text,
>
>   generic14 timestamp,
>
>   generic15 text,
>
>   generic16 text,
>
>   generic17 text,
>
>   generic18 text,
>
>   generic19 text,
>
>   generic20 text,
>
>   generic21 text,
>
>   generic22 text,
>
>   generic23 text,
>
>   generic24 text,
>
>   generic25 text,
>
>   generic26 text,
>
>   generic27 text,
>
>   generic28 int,
>
>   generic29 int,
>
>   generic30 text,
>
>   generic31 text,
>
>   generic32 text,
>
>   generic33 text,
>
>   generic34 text,
>
>   generic35 int,
>
>   generic36 int,
>
>   generic37 int,
>
>   generic38 int,
>
>   generic39 text,
>
>   generic40 text,
>
>   generic41 text,
>
>   generic42 text,
>
>   generic43 text,
>
>   generic44 text,
>
>   generic45 text,
>
>   PRIMARY KEY (id)
>
> );
>
>
>
> CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable
> (serial)
>
>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable
> (cpe_id)
>
>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)
>
>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'orgapache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_idx ON
> mykeyspace.mytable (manufacturer)
>
>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_oui_idx ON
> mykeyspace.mytable (manufacturer_oui)
>
>        USING 'org.apache.cassandra.index.sasiSASIIndex'
>
>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_hw_version_idx ON mykeyspace.mytable
> (hw_version)
>
>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_sw_version_idx ON mykeyspace.mytable
> (sw_version)
>
>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
> CREATE CUSTOM INDEX IF NOT EXISTS bv_household_id_idx ON
> mykeyspace.mytable (household_id)
>
>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>
>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.indexsasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
>
>
>
>
> On Tue, Feb 5, 2019 at 3:33 PM Oleksandr Petrov <
> oleksandr.petrov@gmail.com> wrote:
>
> Could you post full table schema (names obfuscated, if required) with
> index creation statements and queries?
>
>
>
> On Mon, Feb 4, 2019 at 10:04 AM Jacques-Henri Berthemet <
> jacques-henri.berthemet@genesys.com> wrote:
>
> I’m not sure why it`s not allowed by the Datastax driver, but maybe you
> could try to use OR instead of IN?
>
> SELECT blah FROM foo WHERE <indexed column> = :val1 OR <indexed column> =
> :val2 ALLOW FILTERING
>
>
>
> It should be the same as IN query, but I don’t if it makes a difference
> for performance.
>
>
>
> *From: *Peter Heitman <pe...@heitman.us>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Monday 4 February 2019 at 07:17
> *To: *"user@cassandra.apache.org <us...@cassandraapache.org>" <
> user@cassandra.apache.org>
> *Subject: *SASI queries- cqlsh vs java driver
>
>
>
> When I create a SASI index on a secondary column, from cqlsh I can execute
> a query
>
>
>
> SELECT blah FROM foo WHERE <indexed column> IN ('mytext') ALLOW FILTERING;
>
>
>
> but not from the java driver:
>
>
>
> SELECT blah FROM foo WHERE <indexed column> IN :val ALLOW FILTERING
>
>
>
> Here I get an exception
>
>
>
> com.datastax.driver.coreexceptions.InvalidQueryException: IN predicates on
> non-primary-key columns (<indexed column>) is not yet supported
>
> at
> com.datastax.driver.coreexceptions.InvalidQueryException.copy(InvalidQueryException.java:49)
> ~[cassandra-driver-core-3.6.0.jar:na]
>
>
>
> Why are they different? Is there anything I can do with the java driver to
> get past this exception?
>
>
>
> Peter
>
>
>
>
>
>
>
>
> --
>
> alex p
>
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>