You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2010/12/18 17:56:00 UTC
[jira] Assigned: (JENA-12) Turtle Files with a UTF-8 BOM fail to
parse
[ https://issues.apache.org/jira/browse/JENA-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Seaborne reassigned JENA-12:
---------------------------------
Assignee: Andy Seaborne
> Turtle Files with a UTF-8 BOM fail to parse
> -------------------------------------------
>
> Key: JENA-12
> URL: https://issues.apache.org/jira/browse/JENA-12
> Project: Jena
> Issue Type: Bug
> Components: RIOT
> Environment: Windows 7, latest Sun Java Runtime, Jena 2.6.4
> Reporter: Rob Vesse
> Assignee: Andy Seaborne
> Attachments: ttl-with-bom.ttl
>
>
> If a Turtle file has a BOM at the start then Jena will refuse to parse it giving the following error:
> Exception in thread "main" com.hp.hpl.jena.n3.turtle.TurtleParseException: Lexical error at line 1, column 2. Encountered: "@" (64), after : "\ufeff"
> at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:44)
> at com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(TurtleReader.java:21)
> at com.hp.hpl.jena.n3.JenaReaderBase.readImpl(JenaReaderBase.java:101)
> at com.hp.hpl.jena.n3.JenaReaderBase.read(JenaReaderBase.java:68)
> at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226)
> at TurtleWithBOM.main(TurtleWithBOM.java:31)
> The code I used to produce this error was as follows:
> import com.hp.hpl.jena.rdf.model.*;
> import com.hp.hpl.jena.util.FileManager;
> import java.io.*;
> public class TurtleWithBOM
> {
> public static void main(String[] args)
> {
> // create an empty model
> Model model = ModelFactory.createDefaultModel();
> InputStream in = FileManager.get().open( "ttl-with-bom.ttl" );
> if (in == null)
> {
> throw new IllegalArgumentException( "File: ttl-with-bom.ttl not found");
> }
> // read the Turtle file
> model.read(in, "", "TTL");
> // write it to standard out
> model.write(System.out);
> }
> }
> A sample Turtle file used with the above code is attached to this issue.
> The data files are coming from my software which is all written in .Net and when outputting in UTF-8 the default behaviour of .Net is to include the BOM at the start of the file. The BOM is not required for UTF-8 but it is not forbidden so I think this should be fixed (if possible) for future releases. I will be modifying my software so that output of the BOM can be disabled by my users if desired
> Looking at the error message given I expect that the same problem would also affect N3 files since they are using the same reader afaict from the error trace.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.