You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brad Schoening (Jira)" <ji...@apache.org> on 2022/07/26 03:20:00 UTC
[jira] [Comment Edited] (CASSANDRA-17667) Text value containing "/*" interpreted as multiline comment in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-17667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571175#comment-17571175 ]
Brad Schoening edited comment on CASSANDRA-17667 at 7/26/22 3:19 AM:
---------------------------------------------------------------------
[~ahomoki], looking at this a little further, safescanner.py, which inherits from the python re.Scanner, is tokenizing the input. An example from [https://mail.python.org/pipermail/python-dev/2003-April/035075.html] shows input is parsed:
{code:java}
import re
def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)
scanner = re.Scanner([
(r"[a-zA-Z_]\w*", s_ident),
(r"\d+\.\d*", s_float),
(r"\d+", s_int),
(r"=|\+|-|\*|/", s_operator),
(r"\s+", None),
])
# sanity check
test('scanner.scan("sum = 3*foo + 312.50 + bar")',
(['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], ''))
{code}
In pylexotron this is implemented as:
{code:java}
RuleSpecScanner = SaferScanner([
(r'::=', lambda s, t: t),
(r'\[[a-z0-9_]+\]=', lambda s, t: ('named_collector', t[1:-2])),
(r'[a-z0-9_]+=', lambda s, t: ('named_symbol', t[:-1])),
(r'/(\[\^?.[^]]*\]|[^/]|\\.)*/', lambda s, t: ('regex', t[1:-1].replace(r'\/', '/'))),
(r'"([^"]|\\.)*"', lambda s, t: ('litstring', t)),
(r'<[^>]*>', lambda s, t: ('reference', t[1:-1])),
(r'\bJUNK\b', lambda s, t: ('junk', t)),
(r'[@()|?*;]', lambda s, t: t),
(r'\s+', None),
(r'#[^\n]*', None),
], re.I | re.S | re.U) {code}
r'\s+' is skipping whitespace
I'm uncertain what r'#[^\n]*' and r'\bJUNK\b' are doing. Adding comments could be helpful.
There doesn't seem to be a unit test class for pylexotron or SafeScanner, however. That might be a good thing to add. There could be tests for each type of token, named_collector, named_symbol, regex, litstring, reference and junk.
was (Author: bschoeni):
[~ahomoki], looking at this a little further, safescanner.py, which inherits from the python re.Scanner, is tokenizing the input. An example from [https://mail.python.org/pipermail/python-dev/2003-April/035075.html] shows input is parsed:
{code:java}
import re
def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)
scanner = re.Scanner([
(r"[a-zA-Z_]\w*", s_ident),
(r"\d+\.\d*", s_float),
(r"\d+", s_int),
(r"=|\+|-|\*|/", s_operator),
(r"\s+", None),
])
# sanity check
test('scanner.scan("sum = 3*foo + 312.50 + bar")',
(['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], ''))
{code}
In pylexotron this is implemented as:
{code:java}
RuleSpecScanner = SaferScanner([
(r'::=', lambda s, t: t),
(r'\[[a-z0-9_]+\]=', lambda s, t: ('named_collector', t[1:-2])),
(r'[a-z0-9_]+=', lambda s, t: ('named_symbol', t[:-1])),
(r'/(\[\^?.[^]]*\]|[^/]|\\.)*/', lambda s, t: ('regex', t[1:-1].replace(r'\/', '/'))),
(r'"([^"]|\\.)*"', lambda s, t: ('litstring', t)),
(r'<[^>]*>', lambda s, t: ('reference', t[1:-1])),
(r'\bJUNK\b', lambda s, t: ('junk', t)),
(r'[@()|?*;]', lambda s, t: t),
(r'\s+', None),
(r'#[^\n]*', None),
], re.I | re.S | re.U) {code}
r'\s+' is skipping whitespace
I'm uncertain what r'#[^\n]*' and r'\bJUNK\b' are doing. Adding comments could be helpful.
There doesn't seem to be a unit test class for pylexotron or SafeScanner, however. That might be a good thing to add.
> Text value containing "/*" interpreted as multiline comment in cqlsh
> --------------------------------------------------------------------
>
> Key: CASSANDRA-17667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17667
> Project: Cassandra
> Issue Type: Bug
> Components: CQL/Interpreter
> Reporter: ANOOP THOMAS
> Assignee: Attila Homoki
> Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
>
> I use CQLSH command line utility to load some DDLs. The version of utility I use is this:
> {noformat}
> [cqlsh 6.0.0 | Cassandra 4.0.0.47 | CQL spec 3.4.5 | Native protocol v5]{noformat}
> Command that loads DDL.cql:
> {noformat}
> cqlsh -u username -p password cassandra.example.com 65503 --ssl -f DDL.cql
> {noformat}
> I have a line in CQL script that breaks the syntax.
> {noformat}
> INSERT into tablename (key,columnname1,columnname2) VALUES ('keyName','value1','/value2/*/value3');{noformat}
> {{/*}} here is interpreted as start of multi-line comment. It used to work on older versions of cqlsh. The error I see looks like this:
> {noformat}
> SyntaxException: line 4:2 mismatched input 'Update' expecting ')' (...,'value1','/value2INSERT into tablename(INSERT into tablename (key,columnname1,columnname2)) VALUES ('[Update]-...) SyntaxException: line 1:0 no viable alternative at input '(' ([(]...)
> {noformat}
> Same behavior while running in interactive mode too. {{/*}} inside a CQL statement should not be interpreted as start of multi-line comment.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org