You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Chen Li <ch...@gmail.com> on 2015/07/25 22:51:41 UTC

loading CSV records with comma in the value

Not sure if this topic was discussed before.  I was trying to load an
external CVS file using "," as the delimiter.  But the engine failed to
read a file with the following single record:

14, "John Smith, Mary Reeve"


use dataverse pubs;

   create type PaperType as open {
      id: int32,
       authors: string
   }

create external dataset Papers(PaperType)
   using localfs
(("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
   ("format"="delimited-text"),
   ("delimiter"=","));

for $paper in dataset('Papers')
return $paper;

The following is the output, which shows that the comma in the authors
field was incorrectly used to break the field.  Any idea about how to fix
it?

Output
Results:

{ "id": 14, "authors": " \"John Smith" }

Duration of all jobs: 0.091 sec

Success: Query Complete

Re: loading CSV records with comma in the value

Posted by Till Westmann <ti...@apache.org>.
I would guess (not having access to the code right now) that we also 
have a quote character in addition to the delimiter character. Maybe 
that needs to be specified?

Also, I think that we should have regression tests for this that could 
serve as an example.

Cheers,
Till

On 25 Jul 2015, at 15:27, Taewoo Kim wrote:

> Can you try to load it into an internal dataset? I think I have 
> implemented
> the "comma between the comma (delimiter)" when modifying the delimited 
> data
> parser. And Chris also modified that part, too. If it doesn't work, I 
> can
> look at the issue.
>
> Best,
> Taewoo
>
> On Sat, Jul 25, 2015 at 1:51 PM, Chen Li <ch...@gmail.com> wrote:
>
>> Not sure if this topic was discussed before.  I was trying to load an
>> external CVS file using "," as the delimiter.  But the engine failed 
>> to
>> read a file with the following single record:
>>
>> 14, "John Smith, Mary Reeve"
>>
>>
>> use dataverse pubs;
>>
>>  create type PaperType as open {
>>     id: int32,
>>      authors: string
>>  }
>>
>> create external dataset Papers(PaperType)
>>  using localfs
>> (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
>>  ("format"="delimited-text"),
>>  ("delimiter"=","));
>>
>> for $paper in dataset('Papers')
>> return $paper;
>>
>> The following is the output, which shows that the comma in the 
>> authors
>> field was incorrectly used to break the field.  Any idea about how to 
>> fix
>> it?
>>
>> Output
>> Results:
>>
>> { "id": 14, "authors": " \"John Smith" }
>>
>> Duration of all jobs: 0.091 sec
>>
>> Success: Query Complete
>>

Re: loading CSV records with comma in the value

Posted by Chen Li <ch...@gmail.com>.
Taewoo helped me look into the issue.  To finish this discussion, it
was because I was using an old Asterix version.  The current master
branch can parse CSV files properly.

Chen

On Sun, Jul 26, 2015 at 11:25 PM, Taewoo Kim <wa...@gmail.com> wrote:
> @Chen: the format of your data file is not correct. In fact, after the
> delimiter (,), the quote should be followed based on CSV RFC. However, in
> your example, a white space exists. In fact, I saw the following error
> message, which complains about the file format. After removing a white
> space after the delimiter, it worked fine. So, if you correct the file
> format, it should work.
>
> At record: 1, field#: 2 - a quote enclosing a field needs to be placed in
> the beginning of that field. [IOException]
>
>
> [ { "id": 14i32, "authors": "John Smith, Mary Reeve" }
>  ]
>
>
>
> Best,
> Taewoo
>
> On Sun, Jul 26, 2015 at 10:47 PM, Chen Li <ch...@gmail.com> wrote:
>
>> I added the following line
>>
>> ("quote"="\"")
>>
>> to the load statement, but the problem remains: it mistakenly used the
>> "," in the "authors" field to break the record.
>>
>> @Taewoo: can you try the simple AQL example I included in this thread
>> to see if it can parse the quoted field correctly?
>>
>> Chen
>>
>> On Sun, Jul 26, 2015 at 1:25 PM, Taewoo Kim <wa...@gmail.com> wrote:
>> > We have test cases for this case. There are located in
>> > asterix-app/src/test/resources/runtimets/queries/load/.  The
>> documentation
>> > is in the /asterix-doc/src/site/markdown/csv.md. Addtional syntax for
>> the
>> > CSV is fairly simple. You just have two additional parameters - "quote"
>> and
>> > "header". Refer to the file for more details.
>> >
>> >
>> >
>> > Best,
>> > Taewoo
>> >
>> > On Sat, Jul 25, 2015 at 11:30 PM, Chen Li <ch...@gmail.com> wrote:
>> >
>> >> @Taewoo: I tried it and it has the same problem.  Do you have a test
>> >> case for this feature?  Also do we have documentation for this syntax?
>> >>
>> >> Chen
>> >>
>> >> On Sat, Jul 25, 2015 at 10:52 PM, Taewoo Kim <wa...@gmail.com>
>> wrote:
>> >> > The URL is
>> https://asterixdb.ics.uci.edu/documentation/aql/primer.html.
>> >> >
>> >> >
>> >> > It should look like this:
>> >> >
>> >> > ////
>> >> > use dataverse pubs;
>> >> >
>> >> > create type PaperType as open {
>> >> >    id: int32,
>> >> >    authors: string
>> >> > }
>> >> >
>> >> > create dataset Papers(PaperType) primary key id;
>> >> >
>> >> > load dataset Papers using localfs
>> >> >      using localfs
>> >> > (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
>> >> >    ("format"="delimited-text"),
>> >> >    ("delimiter"=","));
>> >> >
>> >> > for $paper in dataset('Papers')
>> >> > return $paper;
>> >> >
>> >> >
>> >> >
>> >> > Best,
>> >> > Taewoo
>> >> >
>> >> > On Sat, Jul 25, 2015 at 10:47 PM, Chen Li <ch...@gmail.com> wrote:
>> >> >
>> >> >> @Taewoo: can you send me the syntax or the documentation URL to show
>> the
>> >> >> syntax?
>> >> >>
>> >> >> Chen
>> >> >>
>> >> >> On Sat, Jul 25, 2015 at 3:27 PM, Taewoo Kim <wa...@gmail.com>
>> wrote:
>> >> >> > Can you try to load it into an internal dataset? I think I have
>> >> >> implemented
>> >> >> > the "comma between the comma (delimiter)" when modifying the
>> delimited
>> >> >> data
>> >> >> > parser. And Chris also modified that part, too. If it doesn't
>> work, I
>> >> can
>> >> >> > look at the issue.
>> >> >> >
>> >> >> > Best,
>> >> >> > Taewoo
>> >> >> >
>> >> >> > On Sat, Jul 25, 2015 at 1:51 PM, Chen Li <ch...@gmail.com> wrote:
>> >> >> >
>> >> >> >> Not sure if this topic was discussed before.  I was trying to
>> load an
>> >> >> >> external CVS file using "," as the delimiter.  But the engine
>> failed
>> >> to
>> >> >> >> read a file with the following single record:
>> >> >> >>
>> >> >> >> 14, "John Smith, Mary Reeve"
>> >> >> >>
>> >> >> >>
>> >> >> >> use dataverse pubs;
>> >> >> >>
>> >> >> >>    create type PaperType as open {
>> >> >> >>       id: int32,
>> >> >> >>        authors: string
>> >> >> >>    }
>> >> >> >>
>> >> >> >> create external dataset Papers(PaperType)
>> >> >> >>    using localfs
>> >> >> >> (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
>> >> >> >>    ("format"="delimited-text"),
>> >> >> >>    ("delimiter"=","));
>> >> >> >>
>> >> >> >> for $paper in dataset('Papers')
>> >> >> >> return $paper;
>> >> >> >>
>> >> >> >> The following is the output, which shows that the comma in the
>> >> authors
>> >> >> >> field was incorrectly used to break the field.  Any idea about
>> how to
>> >> >> fix
>> >> >> >> it?
>> >> >> >>
>> >> >> >> Output
>> >> >> >> Results:
>> >> >> >>
>> >> >> >> { "id": 14, "authors": " \"John Smith" }
>> >> >> >>
>> >> >> >> Duration of all jobs: 0.091 sec
>> >> >> >>
>> >> >> >> Success: Query Complete
>> >> >> >>
>> >> >>
>> >>
>>

Re: loading CSV records with comma in the value

Posted by Taewoo Kim <wa...@gmail.com>.
@Chen: the format of your data file is not correct. In fact, after the
delimiter (,), the quote should be followed based on CSV RFC. However, in
your example, a white space exists. In fact, I saw the following error
message, which complains about the file format. After removing a white
space after the delimiter, it worked fine. So, if you correct the file
format, it should work.

At record: 1, field#: 2 - a quote enclosing a field needs to be placed in
the beginning of that field. [IOException]


[ { "id": 14i32, "authors": "John Smith, Mary Reeve" }
 ]



Best,
Taewoo

On Sun, Jul 26, 2015 at 10:47 PM, Chen Li <ch...@gmail.com> wrote:

> I added the following line
>
> ("quote"="\"")
>
> to the load statement, but the problem remains: it mistakenly used the
> "," in the "authors" field to break the record.
>
> @Taewoo: can you try the simple AQL example I included in this thread
> to see if it can parse the quoted field correctly?
>
> Chen
>
> On Sun, Jul 26, 2015 at 1:25 PM, Taewoo Kim <wa...@gmail.com> wrote:
> > We have test cases for this case. There are located in
> > asterix-app/src/test/resources/runtimets/queries/load/.  The
> documentation
> > is in the /asterix-doc/src/site/markdown/csv.md. Addtional syntax for
> the
> > CSV is fairly simple. You just have two additional parameters - "quote"
> and
> > "header". Refer to the file for more details.
> >
> >
> >
> > Best,
> > Taewoo
> >
> > On Sat, Jul 25, 2015 at 11:30 PM, Chen Li <ch...@gmail.com> wrote:
> >
> >> @Taewoo: I tried it and it has the same problem.  Do you have a test
> >> case for this feature?  Also do we have documentation for this syntax?
> >>
> >> Chen
> >>
> >> On Sat, Jul 25, 2015 at 10:52 PM, Taewoo Kim <wa...@gmail.com>
> wrote:
> >> > The URL is
> https://asterixdb.ics.uci.edu/documentation/aql/primer.html.
> >> >
> >> >
> >> > It should look like this:
> >> >
> >> > ////
> >> > use dataverse pubs;
> >> >
> >> > create type PaperType as open {
> >> >    id: int32,
> >> >    authors: string
> >> > }
> >> >
> >> > create dataset Papers(PaperType) primary key id;
> >> >
> >> > load dataset Papers using localfs
> >> >      using localfs
> >> > (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
> >> >    ("format"="delimited-text"),
> >> >    ("delimiter"=","));
> >> >
> >> > for $paper in dataset('Papers')
> >> > return $paper;
> >> >
> >> >
> >> >
> >> > Best,
> >> > Taewoo
> >> >
> >> > On Sat, Jul 25, 2015 at 10:47 PM, Chen Li <ch...@gmail.com> wrote:
> >> >
> >> >> @Taewoo: can you send me the syntax or the documentation URL to show
> the
> >> >> syntax?
> >> >>
> >> >> Chen
> >> >>
> >> >> On Sat, Jul 25, 2015 at 3:27 PM, Taewoo Kim <wa...@gmail.com>
> wrote:
> >> >> > Can you try to load it into an internal dataset? I think I have
> >> >> implemented
> >> >> > the "comma between the comma (delimiter)" when modifying the
> delimited
> >> >> data
> >> >> > parser. And Chris also modified that part, too. If it doesn't
> work, I
> >> can
> >> >> > look at the issue.
> >> >> >
> >> >> > Best,
> >> >> > Taewoo
> >> >> >
> >> >> > On Sat, Jul 25, 2015 at 1:51 PM, Chen Li <ch...@gmail.com> wrote:
> >> >> >
> >> >> >> Not sure if this topic was discussed before.  I was trying to
> load an
> >> >> >> external CVS file using "," as the delimiter.  But the engine
> failed
> >> to
> >> >> >> read a file with the following single record:
> >> >> >>
> >> >> >> 14, "John Smith, Mary Reeve"
> >> >> >>
> >> >> >>
> >> >> >> use dataverse pubs;
> >> >> >>
> >> >> >>    create type PaperType as open {
> >> >> >>       id: int32,
> >> >> >>        authors: string
> >> >> >>    }
> >> >> >>
> >> >> >> create external dataset Papers(PaperType)
> >> >> >>    using localfs
> >> >> >> (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
> >> >> >>    ("format"="delimited-text"),
> >> >> >>    ("delimiter"=","));
> >> >> >>
> >> >> >> for $paper in dataset('Papers')
> >> >> >> return $paper;
> >> >> >>
> >> >> >> The following is the output, which shows that the comma in the
> >> authors
> >> >> >> field was incorrectly used to break the field.  Any idea about
> how to
> >> >> fix
> >> >> >> it?
> >> >> >>
> >> >> >> Output
> >> >> >> Results:
> >> >> >>
> >> >> >> { "id": 14, "authors": " \"John Smith" }
> >> >> >>
> >> >> >> Duration of all jobs: 0.091 sec
> >> >> >>
> >> >> >> Success: Query Complete
> >> >> >>
> >> >>
> >>
>

Re: loading CSV records with comma in the value

Posted by Chen Li <ch...@gmail.com>.
I added the following line

("quote"="\"")

to the load statement, but the problem remains: it mistakenly used the
"," in the "authors" field to break the record.

@Taewoo: can you try the simple AQL example I included in this thread
to see if it can parse the quoted field correctly?

Chen

On Sun, Jul 26, 2015 at 1:25 PM, Taewoo Kim <wa...@gmail.com> wrote:
> We have test cases for this case. There are located in
> asterix-app/src/test/resources/runtimets/queries/load/.  The documentation
> is in the /asterix-doc/src/site/markdown/csv.md. Addtional syntax for the
> CSV is fairly simple. You just have two additional parameters - "quote" and
> "header". Refer to the file for more details.
>
>
>
> Best,
> Taewoo
>
> On Sat, Jul 25, 2015 at 11:30 PM, Chen Li <ch...@gmail.com> wrote:
>
>> @Taewoo: I tried it and it has the same problem.  Do you have a test
>> case for this feature?  Also do we have documentation for this syntax?
>>
>> Chen
>>
>> On Sat, Jul 25, 2015 at 10:52 PM, Taewoo Kim <wa...@gmail.com> wrote:
>> > The URL is https://asterixdb.ics.uci.edu/documentation/aql/primer.html.
>> >
>> >
>> > It should look like this:
>> >
>> > ////
>> > use dataverse pubs;
>> >
>> > create type PaperType as open {
>> >    id: int32,
>> >    authors: string
>> > }
>> >
>> > create dataset Papers(PaperType) primary key id;
>> >
>> > load dataset Papers using localfs
>> >      using localfs
>> > (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
>> >    ("format"="delimited-text"),
>> >    ("delimiter"=","));
>> >
>> > for $paper in dataset('Papers')
>> > return $paper;
>> >
>> >
>> >
>> > Best,
>> > Taewoo
>> >
>> > On Sat, Jul 25, 2015 at 10:47 PM, Chen Li <ch...@gmail.com> wrote:
>> >
>> >> @Taewoo: can you send me the syntax or the documentation URL to show the
>> >> syntax?
>> >>
>> >> Chen
>> >>
>> >> On Sat, Jul 25, 2015 at 3:27 PM, Taewoo Kim <wa...@gmail.com> wrote:
>> >> > Can you try to load it into an internal dataset? I think I have
>> >> implemented
>> >> > the "comma between the comma (delimiter)" when modifying the delimited
>> >> data
>> >> > parser. And Chris also modified that part, too. If it doesn't work, I
>> can
>> >> > look at the issue.
>> >> >
>> >> > Best,
>> >> > Taewoo
>> >> >
>> >> > On Sat, Jul 25, 2015 at 1:51 PM, Chen Li <ch...@gmail.com> wrote:
>> >> >
>> >> >> Not sure if this topic was discussed before.  I was trying to load an
>> >> >> external CVS file using "," as the delimiter.  But the engine failed
>> to
>> >> >> read a file with the following single record:
>> >> >>
>> >> >> 14, "John Smith, Mary Reeve"
>> >> >>
>> >> >>
>> >> >> use dataverse pubs;
>> >> >>
>> >> >>    create type PaperType as open {
>> >> >>       id: int32,
>> >> >>        authors: string
>> >> >>    }
>> >> >>
>> >> >> create external dataset Papers(PaperType)
>> >> >>    using localfs
>> >> >> (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
>> >> >>    ("format"="delimited-text"),
>> >> >>    ("delimiter"=","));
>> >> >>
>> >> >> for $paper in dataset('Papers')
>> >> >> return $paper;
>> >> >>
>> >> >> The following is the output, which shows that the comma in the
>> authors
>> >> >> field was incorrectly used to break the field.  Any idea about how to
>> >> fix
>> >> >> it?
>> >> >>
>> >> >> Output
>> >> >> Results:
>> >> >>
>> >> >> { "id": 14, "authors": " \"John Smith" }
>> >> >>
>> >> >> Duration of all jobs: 0.091 sec
>> >> >>
>> >> >> Success: Query Complete
>> >> >>
>> >>
>>

Re: loading CSV records with comma in the value

Posted by Taewoo Kim <wa...@gmail.com>.
We have test cases for this case. There are located in
asterix-app/src/test/resources/runtimets/queries/load/.  The documentation
is in the /asterix-doc/src/site/markdown/csv.md. Addtional syntax for the
CSV is fairly simple. You just have two additional parameters - "quote" and
"header". Refer to the file for more details.



Best,
Taewoo

On Sat, Jul 25, 2015 at 11:30 PM, Chen Li <ch...@gmail.com> wrote:

> @Taewoo: I tried it and it has the same problem.  Do you have a test
> case for this feature?  Also do we have documentation for this syntax?
>
> Chen
>
> On Sat, Jul 25, 2015 at 10:52 PM, Taewoo Kim <wa...@gmail.com> wrote:
> > The URL is https://asterixdb.ics.uci.edu/documentation/aql/primer.html.
> >
> >
> > It should look like this:
> >
> > ////
> > use dataverse pubs;
> >
> > create type PaperType as open {
> >    id: int32,
> >    authors: string
> > }
> >
> > create dataset Papers(PaperType) primary key id;
> >
> > load dataset Papers using localfs
> >      using localfs
> > (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
> >    ("format"="delimited-text"),
> >    ("delimiter"=","));
> >
> > for $paper in dataset('Papers')
> > return $paper;
> >
> >
> >
> > Best,
> > Taewoo
> >
> > On Sat, Jul 25, 2015 at 10:47 PM, Chen Li <ch...@gmail.com> wrote:
> >
> >> @Taewoo: can you send me the syntax or the documentation URL to show the
> >> syntax?
> >>
> >> Chen
> >>
> >> On Sat, Jul 25, 2015 at 3:27 PM, Taewoo Kim <wa...@gmail.com> wrote:
> >> > Can you try to load it into an internal dataset? I think I have
> >> implemented
> >> > the "comma between the comma (delimiter)" when modifying the delimited
> >> data
> >> > parser. And Chris also modified that part, too. If it doesn't work, I
> can
> >> > look at the issue.
> >> >
> >> > Best,
> >> > Taewoo
> >> >
> >> > On Sat, Jul 25, 2015 at 1:51 PM, Chen Li <ch...@gmail.com> wrote:
> >> >
> >> >> Not sure if this topic was discussed before.  I was trying to load an
> >> >> external CVS file using "," as the delimiter.  But the engine failed
> to
> >> >> read a file with the following single record:
> >> >>
> >> >> 14, "John Smith, Mary Reeve"
> >> >>
> >> >>
> >> >> use dataverse pubs;
> >> >>
> >> >>    create type PaperType as open {
> >> >>       id: int32,
> >> >>        authors: string
> >> >>    }
> >> >>
> >> >> create external dataset Papers(PaperType)
> >> >>    using localfs
> >> >> (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
> >> >>    ("format"="delimited-text"),
> >> >>    ("delimiter"=","));
> >> >>
> >> >> for $paper in dataset('Papers')
> >> >> return $paper;
> >> >>
> >> >> The following is the output, which shows that the comma in the
> authors
> >> >> field was incorrectly used to break the field.  Any idea about how to
> >> fix
> >> >> it?
> >> >>
> >> >> Output
> >> >> Results:
> >> >>
> >> >> { "id": 14, "authors": " \"John Smith" }
> >> >>
> >> >> Duration of all jobs: 0.091 sec
> >> >>
> >> >> Success: Query Complete
> >> >>
> >>
>

Re: loading CSV records with comma in the value

Posted by Chen Li <ch...@gmail.com>.
@Taewoo: I tried it and it has the same problem.  Do you have a test
case for this feature?  Also do we have documentation for this syntax?

Chen

On Sat, Jul 25, 2015 at 10:52 PM, Taewoo Kim <wa...@gmail.com> wrote:
> The URL is https://asterixdb.ics.uci.edu/documentation/aql/primer.html.
>
>
> It should look like this:
>
> ////
> use dataverse pubs;
>
> create type PaperType as open {
>    id: int32,
>    authors: string
> }
>
> create dataset Papers(PaperType) primary key id;
>
> load dataset Papers using localfs
>      using localfs
> (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
>    ("format"="delimited-text"),
>    ("delimiter"=","));
>
> for $paper in dataset('Papers')
> return $paper;
>
>
>
> Best,
> Taewoo
>
> On Sat, Jul 25, 2015 at 10:47 PM, Chen Li <ch...@gmail.com> wrote:
>
>> @Taewoo: can you send me the syntax or the documentation URL to show the
>> syntax?
>>
>> Chen
>>
>> On Sat, Jul 25, 2015 at 3:27 PM, Taewoo Kim <wa...@gmail.com> wrote:
>> > Can you try to load it into an internal dataset? I think I have
>> implemented
>> > the "comma between the comma (delimiter)" when modifying the delimited
>> data
>> > parser. And Chris also modified that part, too. If it doesn't work, I can
>> > look at the issue.
>> >
>> > Best,
>> > Taewoo
>> >
>> > On Sat, Jul 25, 2015 at 1:51 PM, Chen Li <ch...@gmail.com> wrote:
>> >
>> >> Not sure if this topic was discussed before.  I was trying to load an
>> >> external CVS file using "," as the delimiter.  But the engine failed to
>> >> read a file with the following single record:
>> >>
>> >> 14, "John Smith, Mary Reeve"
>> >>
>> >>
>> >> use dataverse pubs;
>> >>
>> >>    create type PaperType as open {
>> >>       id: int32,
>> >>        authors: string
>> >>    }
>> >>
>> >> create external dataset Papers(PaperType)
>> >>    using localfs
>> >> (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
>> >>    ("format"="delimited-text"),
>> >>    ("delimiter"=","));
>> >>
>> >> for $paper in dataset('Papers')
>> >> return $paper;
>> >>
>> >> The following is the output, which shows that the comma in the authors
>> >> field was incorrectly used to break the field.  Any idea about how to
>> fix
>> >> it?
>> >>
>> >> Output
>> >> Results:
>> >>
>> >> { "id": 14, "authors": " \"John Smith" }
>> >>
>> >> Duration of all jobs: 0.091 sec
>> >>
>> >> Success: Query Complete
>> >>
>>

Re: loading CSV records with comma in the value

Posted by Taewoo Kim <wa...@gmail.com>.
The URL is https://asterixdb.ics.uci.edu/documentation/aql/primer.html.


It should look like this:

////
use dataverse pubs;

create type PaperType as open {
   id: int32,
   authors: string
}

create dataset Papers(PaperType) primary key id;

load dataset Papers using localfs
     using localfs
(("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
   ("format"="delimited-text"),
   ("delimiter"=","));

for $paper in dataset('Papers')
return $paper;



Best,
Taewoo

On Sat, Jul 25, 2015 at 10:47 PM, Chen Li <ch...@gmail.com> wrote:

> @Taewoo: can you send me the syntax or the documentation URL to show the
> syntax?
>
> Chen
>
> On Sat, Jul 25, 2015 at 3:27 PM, Taewoo Kim <wa...@gmail.com> wrote:
> > Can you try to load it into an internal dataset? I think I have
> implemented
> > the "comma between the comma (delimiter)" when modifying the delimited
> data
> > parser. And Chris also modified that part, too. If it doesn't work, I can
> > look at the issue.
> >
> > Best,
> > Taewoo
> >
> > On Sat, Jul 25, 2015 at 1:51 PM, Chen Li <ch...@gmail.com> wrote:
> >
> >> Not sure if this topic was discussed before.  I was trying to load an
> >> external CVS file using "," as the delimiter.  But the engine failed to
> >> read a file with the following single record:
> >>
> >> 14, "John Smith, Mary Reeve"
> >>
> >>
> >> use dataverse pubs;
> >>
> >>    create type PaperType as open {
> >>       id: int32,
> >>        authors: string
> >>    }
> >>
> >> create external dataset Papers(PaperType)
> >>    using localfs
> >> (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
> >>    ("format"="delimited-text"),
> >>    ("delimiter"=","));
> >>
> >> for $paper in dataset('Papers')
> >> return $paper;
> >>
> >> The following is the output, which shows that the comma in the authors
> >> field was incorrectly used to break the field.  Any idea about how to
> fix
> >> it?
> >>
> >> Output
> >> Results:
> >>
> >> { "id": 14, "authors": " \"John Smith" }
> >>
> >> Duration of all jobs: 0.091 sec
> >>
> >> Success: Query Complete
> >>
>

Re: loading CSV records with comma in the value

Posted by Chen Li <ch...@gmail.com>.
@Taewoo: can you send me the syntax or the documentation URL to show the syntax?

Chen

On Sat, Jul 25, 2015 at 3:27 PM, Taewoo Kim <wa...@gmail.com> wrote:
> Can you try to load it into an internal dataset? I think I have implemented
> the "comma between the comma (delimiter)" when modifying the delimited data
> parser. And Chris also modified that part, too. If it doesn't work, I can
> look at the issue.
>
> Best,
> Taewoo
>
> On Sat, Jul 25, 2015 at 1:51 PM, Chen Li <ch...@gmail.com> wrote:
>
>> Not sure if this topic was discussed before.  I was trying to load an
>> external CVS file using "," as the delimiter.  But the engine failed to
>> read a file with the following single record:
>>
>> 14, "John Smith, Mary Reeve"
>>
>>
>> use dataverse pubs;
>>
>>    create type PaperType as open {
>>       id: int32,
>>        authors: string
>>    }
>>
>> create external dataset Papers(PaperType)
>>    using localfs
>> (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
>>    ("format"="delimited-text"),
>>    ("delimiter"=","));
>>
>> for $paper in dataset('Papers')
>> return $paper;
>>
>> The following is the output, which shows that the comma in the authors
>> field was incorrectly used to break the field.  Any idea about how to fix
>> it?
>>
>> Output
>> Results:
>>
>> { "id": 14, "authors": " \"John Smith" }
>>
>> Duration of all jobs: 0.091 sec
>>
>> Success: Query Complete
>>

Re: loading CSV records with comma in the value

Posted by Taewoo Kim <wa...@gmail.com>.
Can you try to load it into an internal dataset? I think I have implemented
the "comma between the comma (delimiter)" when modifying the delimited data
parser. And Chris also modified that part, too. If it doesn't work, I can
look at the issue.

Best,
Taewoo

On Sat, Jul 25, 2015 at 1:51 PM, Chen Li <ch...@gmail.com> wrote:

> Not sure if this topic was discussed before.  I was trying to load an
> external CVS file using "," as the delimiter.  But the engine failed to
> read a file with the following single record:
>
> 14, "John Smith, Mary Reeve"
>
>
> use dataverse pubs;
>
>    create type PaperType as open {
>       id: int32,
>        authors: string
>    }
>
> create external dataset Papers(PaperType)
>    using localfs
> (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"),
>    ("format"="delimited-text"),
>    ("delimiter"=","));
>
> for $paper in dataset('Papers')
> return $paper;
>
> The following is the output, which shows that the comma in the authors
> field was incorrectly used to break the field.  Any idea about how to fix
> it?
>
> Output
> Results:
>
> { "id": 14, "authors": " \"John Smith" }
>
> Duration of all jobs: 0.091 sec
>
> Success: Query Complete
>