You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2017/02/09 13:10:41 UTC

[jira] [Resolved] (JENA-1285) Have on Tokenizer token for strings.

     [ https://issues.apache.org/jira/browse/JENA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Seaborne resolved JENA-1285.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Jena 3.3.0

> Have on Tokenizer token for strings.
> ------------------------------------
>
>                 Key: JENA-1285
>                 URL: https://issues.apache.org/jira/browse/JENA-1285
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: RIOT
>            Reporter: Andy Seaborne
>            Assignee: Andy Seaborne
>            Priority: Minor
>             Fix For: Jena 3.3.0
>
>
> The Tokenizer ({{TokenizerText}}) faithfully records what sort of string it has processed using different token types - STRING1, STRING2, LONG_STRING1, LONG_STRING2.
> Sometimes it matters (N-Triples), sometimes it doesn't (Turtle).
> [Turtle rule for strings|https://www.w3.org/TR/turtle/#grammar-production-String]
> [N-Triples rule for strings|https://www.w3.org/TR/n-triples/#grammar-production-STRING_LITERAL_QUOTE]
> Instead of 4 tokens, (5 if you include the existing STRING token) it is proposed to use one token type STRING and record the actual string type seen separately.
> This is make working with non-text formats simpler where there are strings without the concept of quotes, and any format that works with any string form.
> The specific cases (e.g. N-Triples) can still test for the details of the string syntax seen but the token type is the conceptual "superclass" STRING type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)