You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/02/01 11:01:51 UTC
[jira] [Commented] (JENA-1285) Have on Tokenizer token for strings.

    [ https://issues.apache.org/jira/browse/JENA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848232#comment-15848232 ] 

ASF GitHub Bot commented on JENA-1285:
--------------------------------------

GitHub user afs opened a pull request:

    https://github.com/apache/jena/pull/213

    JENA-1285: One token type for strings

    Having 4 separate string token types means that all four cases have to be considered, yet the more common case is that a string has been seen. Only sometimes does the exact form of the string matter.
    
    This PR changes Jena to have one token type for STRING and separately record the string type seen for when it is needed.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/afs/jena token

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #213
    
----
commit 781895ce64e062c7f2268a78189a777c39b92844
Author: Andy Seaborne <an...@apache.org>
Date:   2017-01-27T13:31:59Z

    Improve bulk operations depending on relative sizes of graphs

commit 01419943d908676152c43a66bda16cb90cba3a46
Author: Andy Seaborne <an...@apache.org>
Date:   2017-01-28T12:13:16Z

    Extra tests for bulk addInto/deleteFrom.

commit 0e2fad4f2e7e4237d27847839fa3a7b8eb1941d8
Author: Andy Seaborne <an...@apache.org>
Date:   2017-01-31T21:20:30Z

    One kind of tokenized string, with string type recorded separately.

----


> Have on Tokenizer token for strings.
> ------------------------------------
>
>                 Key: JENA-1285
>                 URL: https://issues.apache.org/jira/browse/JENA-1285
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: RIOT
>            Reporter: Andy Seaborne
>            Assignee: Andy Seaborne
>            Priority: Minor
>
> The Tokenizer ({{TokenizerText}}) faithfully records what sort of string it has processed using different token types - STRING1, STRING2, LONG_STRING1, LONG_STRING2.
> Sometimes it matters (N-Triples), sometimes it doesn't (Turtle).
> [Turtle rule for strings|https://www.w3.org/TR/turtle/#grammar-production-String]
> [N-Triples rule for strings|https://www.w3.org/TR/n-triples/#grammar-production-STRING_LITERAL_QUOTE]
> Instead of 4 tokens, (5 if you include the existing STRING token) it is proposed to use one token type STRING and record the actual string type seen separately.
> This is make working with non-text formats simpler where there are strings without the concept of quotes, and any format that works with any string form.
> The specific cases (e.g. N-Triples) can still test for the details of the string syntax seen but the token type is the conceptual "superclass" STRING type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)