You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Richard L. Burton III" <mr...@gmail.com> on 2016/01/11 20:47:42 UTC

Recommendations for an embedded Cassandra and Unit Tests

I'm looking to see what's recommended for an embedded version of Cassandra,
just for unit testing.

I'm looking at https://github.com/jsevellec/cassandra-unit/wiki but I
wanted to see if there's was a better recommendation?

-- 
-Richard L. Burton III
@rburton

Re: Recommendations for an embedded Cassandra and Unit Tests

Posted by "Richard L. Burton III" <mr...@gmail.com>.
I was hoping that the cqlsh project would expose a class that you can feed
is a source file via Java.

The parsers in these other projects don't properly parse CQL. e.g., when
you encounter a semicolon within a string, ignore it and continue on
looking for the end of the string.

I ended up having separate *.cql files that I execute during the setup of
my tests. Not ideal, but it'll work.


On Tue, Jan 12, 2016 at 7:24 AM, DuyHai Doan <do...@gmail.com> wrote:

> "What I'm noticing with these projects is that they don't handle CQL
> files properly"
>
> --> your concern is very legit. But handling CQL files properly is very
> complex, let me explain the reasons.
>
> A naive solution if you want to handle CQL syntax is to re-use the ANTLR
> grammar file here:
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/Cql.g
>
>  I've gone down this path in the past and it's nearly impossible, simply
> because the Cql.g grammar file is using a lot of "internal" Cassandra
> classes. Just look at the import block at the beginning of the file.
>
> At a higher level, we should clearly define the "scope" of a CQL script
> executor. Is it responsible for 1) parsing CQL statements or 2) validating
> CQL statements ?
>
> As far as I'm concerned, point 2) should be done by Cassandra. If we limit
> the scope of a script executor to point 1) it's sufficient.
>
> Indeed the remaining challenge is : how to split a block of input text
> that contains multiples CQL statements into a list of CQL statements that
> can be executed sequentially (or in //) by the Java driver ?
>
> The Zeppelin Cassandra interpreter is using Scala combinator parser to
> define a minimum grammar to split differents CQL statements apart:
> https://github.com/doanduyhai/incubator-zeppelin/blob/CassandraInterpreter-V2/cassandra/src/main/scala/org/apache/zeppelin/cassandra/ParagraphParser.scala#L179-L198
>
> Until Cassandra 2.1, it's pretty easy, the semi-colon (;) can be used as
> statement separator. Since Cassandra 2.2 and the introduction of UDF, it's
> much more complex. Semi-colon can appears in Java source code block of the
> definition of a function so using it as separator no longer works.
>
> A complex regular expression like this:
> https://github.com/doanduyhai/incubator-zeppelin/blob/CassandraInterpreter-V2/cassandra/src/main/scala/org/apache/zeppelin/cassandra/ParagraphParser.scala#L55-L69
> is necessary to parse UDF creation statements correctly.
>
> In a nutshell, parsing (and even not validating) CQL is harder than most
> people think.
>
>
>
> On Mon, Jan 11, 2016 at 10:52 PM, Richard L. Burton III <
> mrburton@gmail.com> wrote:
>
>> What I'm noticing with these projects is that they don't handle CQL files
>> properly. e.g., cassandra-unit dies when you have a string that contains ;
>> inside of it. The parsing logic they use is very primitive in the sense
>> they simple look for ; to denote the end of a statement.
>>
>> Is there any class in Cassandra I could use that given a *.cql file,
>> it'll return a list of statements inside of it?
>>
>> Looking at CQLParser, it's only good for parsing a single statement vs. a
>> file that contains multiple statements.
>>
>>
>> On Mon, Jan 11, 2016 at 3:06 PM, DuyHai Doan <do...@gmail.com>
>> wrote:
>>
>>> Achilles 4.x does offer an embedded Cassandra server support with some
>>> utility classes like ScriptExecutor. It supports C* 2.2 currently :
>>>
>>> https://github.com/doanduyhai/Achilles/wiki/CQL-embedded-cassandra-server
>>> Le 11 janv. 2016 20:47, "Richard L. Burton III" <mr...@gmail.com> a
>>> écrit :
>>>
>>>> I'm looking to see what's recommended for an embedded version of
>>>> Cassandra, just for unit testing.
>>>>
>>>> I'm looking at https://github.com/jsevellec/cassandra-unit/wiki but I
>>>> wanted to see if there's was a better recommendation?
>>>>
>>>> --
>>>> -Richard L. Burton III
>>>> @rburton
>>>>
>>>
>>
>>
>> --
>> -Richard L. Burton III
>> @rburton
>>
>
>


-- 
-Richard L. Burton III
@rburton

Re: Recommendations for an embedded Cassandra and Unit Tests

Posted by DuyHai Doan <do...@gmail.com>.
"What I'm noticing with these projects is that they don't handle CQL files
properly"

--> your concern is very legit. But handling CQL files properly is very
complex, let me explain the reasons.

A naive solution if you want to handle CQL syntax is to re-use the ANTLR
grammar file here:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/Cql.g

 I've gone down this path in the past and it's nearly impossible, simply
because the Cql.g grammar file is using a lot of "internal" Cassandra
classes. Just look at the import block at the beginning of the file.

At a higher level, we should clearly define the "scope" of a CQL script
executor. Is it responsible for 1) parsing CQL statements or 2) validating
CQL statements ?

As far as I'm concerned, point 2) should be done by Cassandra. If we limit
the scope of a script executor to point 1) it's sufficient.

Indeed the remaining challenge is : how to split a block of input text that
contains multiples CQL statements into a list of CQL statements that can be
executed sequentially (or in //) by the Java driver ?

The Zeppelin Cassandra interpreter is using Scala combinator parser to
define a minimum grammar to split differents CQL statements apart:
https://github.com/doanduyhai/incubator-zeppelin/blob/CassandraInterpreter-V2/cassandra/src/main/scala/org/apache/zeppelin/cassandra/ParagraphParser.scala#L179-L198

Until Cassandra 2.1, it's pretty easy, the semi-colon (;) can be used as
statement separator. Since Cassandra 2.2 and the introduction of UDF, it's
much more complex. Semi-colon can appears in Java source code block of the
definition of a function so using it as separator no longer works.

A complex regular expression like this:
https://github.com/doanduyhai/incubator-zeppelin/blob/CassandraInterpreter-V2/cassandra/src/main/scala/org/apache/zeppelin/cassandra/ParagraphParser.scala#L55-L69
is necessary to parse UDF creation statements correctly.

In a nutshell, parsing (and even not validating) CQL is harder than most
people think.



On Mon, Jan 11, 2016 at 10:52 PM, Richard L. Burton III <mr...@gmail.com>
wrote:

> What I'm noticing with these projects is that they don't handle CQL files
> properly. e.g., cassandra-unit dies when you have a string that contains ;
> inside of it. The parsing logic they use is very primitive in the sense
> they simple look for ; to denote the end of a statement.
>
> Is there any class in Cassandra I could use that given a *.cql file, it'll
> return a list of statements inside of it?
>
> Looking at CQLParser, it's only good for parsing a single statement vs. a
> file that contains multiple statements.
>
>
> On Mon, Jan 11, 2016 at 3:06 PM, DuyHai Doan <do...@gmail.com> wrote:
>
>> Achilles 4.x does offer an embedded Cassandra server support with some
>> utility classes like ScriptExecutor. It supports C* 2.2 currently :
>>
>> https://github.com/doanduyhai/Achilles/wiki/CQL-embedded-cassandra-server
>> Le 11 janv. 2016 20:47, "Richard L. Burton III" <mr...@gmail.com> a
>> écrit :
>>
>>> I'm looking to see what's recommended for an embedded version of
>>> Cassandra, just for unit testing.
>>>
>>> I'm looking at https://github.com/jsevellec/cassandra-unit/wiki but I
>>> wanted to see if there's was a better recommendation?
>>>
>>> --
>>> -Richard L. Burton III
>>> @rburton
>>>
>>
>
>
> --
> -Richard L. Burton III
> @rburton
>

Re: Recommendations for an embedded Cassandra and Unit Tests

Posted by "Richard L. Burton III" <mr...@gmail.com>.
What I'm noticing with these projects is that they don't handle CQL files
properly. e.g., cassandra-unit dies when you have a string that contains ;
inside of it. The parsing logic they use is very primitive in the sense
they simple look for ; to denote the end of a statement.

Is there any class in Cassandra I could use that given a *.cql file, it'll
return a list of statements inside of it?

Looking at CQLParser, it's only good for parsing a single statement vs. a
file that contains multiple statements.


On Mon, Jan 11, 2016 at 3:06 PM, DuyHai Doan <do...@gmail.com> wrote:

> Achilles 4.x does offer an embedded Cassandra server support with some
> utility classes like ScriptExecutor. It supports C* 2.2 currently :
>
> https://github.com/doanduyhai/Achilles/wiki/CQL-embedded-cassandra-server
> Le 11 janv. 2016 20:47, "Richard L. Burton III" <mr...@gmail.com> a
> écrit :
>
>> I'm looking to see what's recommended for an embedded version of
>> Cassandra, just for unit testing.
>>
>> I'm looking at https://github.com/jsevellec/cassandra-unit/wiki but I
>> wanted to see if there's was a better recommendation?
>>
>> --
>> -Richard L. Burton III
>> @rburton
>>
>


-- 
-Richard L. Burton III
@rburton

Re: Recommendations for an embedded Cassandra and Unit Tests

Posted by DuyHai Doan <do...@gmail.com>.
Achilles 4.x does offer an embedded Cassandra server support with some
utility classes like ScriptExecutor. It supports C* 2.2 currently :

https://github.com/doanduyhai/Achilles/wiki/CQL-embedded-cassandra-server
Le 11 janv. 2016 20:47, "Richard L. Burton III" <mr...@gmail.com> a
écrit :

> I'm looking to see what's recommended for an embedded version of
> Cassandra, just for unit testing.
>
> I'm looking at https://github.com/jsevellec/cassandra-unit/wiki but I
> wanted to see if there's was a better recommendation?
>
> --
> -Richard L. Burton III
> @rburton
>