You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by John Omernik <jo...@omernik.com> on 2015/12/01 15:04:03 UTC

CSV Reader on 1.3

Hey all,

Per my comment on https://issues.apache.org/jira/browse/DRILL-4145,  I am
curious on why a CSV query (I am assuming with a default configuration, but
I have asked the question) in S3 would interpret differently than a CSV
query in MaprFS.

Per the other user, they are using Drill 1.3, and I am as well (per the
MapR folks, I am using a Dev release version from MapR that has the Office
1.3 release code base)

Basically, The query from the JIRA author showed the CSV file being
interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the headers and the
results broken out into columns. When I did this on the same data, I got
one results, columns and an array of data.

I tried setting extractHeader: true (what is the default on this setting)
and that had no effect. (After I update a storage plugin, what do I need to
do to ensure I see the effect in my SQL line session? DO I need to
reconnect?  Basically I set the storage plugin, got the "success" then
changed to a difference schema and then back to my original schema and saw
no effect... should I reconnect or is that not needed?)

Just curious on why we'd see different ways to read CSV files, the S3 vs.
MapRFS shouldn't be different... or am I missing something?

Thanks!

John

Re: CSV Reader on 1.3

Posted by Bridget Bevens <bb...@maprtech.com>.

Thanks for pointing that out, Jason. Oversight on my part. I escaped the
backticks so they appear around the file path:
http://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters

The code text in the MD file looks like this: ``select a, b from table(dfs.
`path/to/data.csv`(type => 'text',
fieldDelimiter => ',', extractHeader => true))``


Thanks,
Bridget


On Fri, Dec 4, 2015 at 5:11 PM, Jason Altekruse <al...@gmail.com>
wrote:

> Thanks for the fix Bridget. I just took a look at the posted version an
> noticed that the example query is missing some necessary backticks
>
> Julien had tried to include an escaped backtick in his PR, but it doesn't
> look like this is the right way to include a backtick in this type of text.
>
> The path name to the file should be in backticks, like it is with a regular
> from clause.
>
> Current: table(dfs.path/to/data.csv(type => 'text',
> Correct: table(dfs.`path/to/data.csv`(type => 'text',
>
> I don't know if we frequently use these inline monospace blocks, but we
> might have similar issues elsewhere. As far as getting a backtick into one
> it seems like there are two possible solutions:
>
> http://meta.stackexchange.com/questions/55437/how-can-the-backtick-character-be-included-in-code
>
> Do you know which of these has been used previously? If we aren't using
> these inline blocks elsewhere, we could just replace this with the
> separated code/monospace blocks we normally use.
>
>
>
> On Fri, Dec 4, 2015 at 4:40 PM, Bridget Bevens <bb...@maprtech.com>
> wrote:
>
> > Added that it's available in Drill 1.4 and later:
> >
> >
> http://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters
> >
> > Thanks,
> > Bridget
> >
> > On Thu, Dec 3, 2015 at 9:55 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
> >
> > > One note, this feature is in the upcoming Drill 1.4, not 1.3
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Thu, Dec 3, 2015 at 5:11 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> > >
> > > > Thank you!
> > > >
> > > > On Thu, Dec 3, 2015 at 5:08 PM, Jason Altekruse <
> > > altekrusejason@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks Bridget!
> > > > >
> > > > > On Thu, Dec 3, 2015 at 4:14 PM, Bridget Bevens <
> bbevens@maprtech.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I updated the doc page on the Drill site:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters
> > > > > >
> > > > > > Thanks,
> > > > > > Bridget
> > > > > >
> > > > > > On Thu, Dec 3, 2015 at 3:53 PM, Julien Le Dem <julien@dremio.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Here's a PR for the doc:
> > > > > > > https://github.com/apache/drill/pull/290
> > > > > > >
> > > > > > > On Thu, Dec 3, 2015 at 3:25 PM, Julien Le Dem <
> julien@dremio.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > > I need to update the doc to for this. I'll send a PR soon.
> > > > > > > > In the meantime you can look at the tests:
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/test/java/org/apache/drill/TestSelectWithOption.java
> > > > > > > > Basically there is one type for each Format plugin.
> > > > > > > > It look at the classes that implement FormatPluginConfig just
> > > like
> > > > > for
> > > > > > > the
> > > > > > > > json based configuration:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/logical/src/main/java/org/apache/drill/common/logical/FormatPluginConfig.java
> > > > > > > >
> > > > > > > > For example for the "text" format:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L135
> > > > > > > >
> > > > > > > > the type is "text" as defined by the annotation:@JsonTypeName
> > > > > ("text")
> > > > > > > > the available parameters are the same fields as in the json
> > conf
> > > > with
> > > > > > the
> > > > > > > > same defaults:
> > > > > > > >     public String lineDelimiter = "\n";
> > > > > > > >     public char fieldDelimiter = '\n';
> > > > > > > >     public char quote = '"';
> > > > > > > >     public char escape = '"';
> > > > > > > >     public char comment = '#';
> > > > > > > >     public boolean skipFirstLine = false;
> > > > > > > >     public boolean extractHeader = false;
> > > > > > > >
> > > > > > > > On Thu, Dec 3, 2015 at 1:12 PM, Jason Altekruse <
> > > > > > > altekrusejason@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> I don't think we have anything posted right now, it was just
> > > > merged
> > > > > > last
> > > > > > > >> week.
> > > > > > > >>
> > > > > > > >> Julien,
> > > > > > > >> Did you have something written for a short bit of
> > documentation
> > > on
> > > > > the
> > > > > > > >> functionality and any current limitations?
> > > > > > > >>
> > > > > > > >> - Jason
> > > > > > > >>
> > > > > > > >> On Thu, Dec 3, 2015 at 12:16 PM, Abdel Hakim Deneche <
> > > > > > > >> adeneche@maprtech.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > I didn't notice select with options is already available
> !!!
> > > did
> > > > > we
> > > > > > > add
> > > > > > > >> it
> > > > > > > >> > to the documentation ?
> > > > > > > >> >
> > > > > > > >> > On Thu, Dec 3, 2015 at 12:05 PM, Jason Altekruse <
> > > > > > > >> altekrusejason@gmail.com
> > > > > > > >> > > wrote:
> > > > > > > >> >
> > > > > > > >> >> Yes!
> > > > > > > >> >>
> > > > > > > >> >> Thanks to the new feature "select with options" it is
> > > possible
> > > > to
> > > > > > > >> >> configure
> > > > > > > >> >> the text reader to have query specific options. You will
> > need
> > > > to
> > > > > > > build
> > > > > > > >> the
> > > > > > > >> >> tip of master or use the soon to be posted release
> > candidate
> > > > for
> > > > > > 1.4
> > > > > > > to
> > > > > > > >> >> use
> > > > > > > >> >> the feature.
> > > > > > > >> >>
> > > > > > > >> >> select a, b from table(dfs.`path/to/data.csv`(type =>
> > 'text',
> > > > > > > >> >> fieldDelimiter => ',', extractHeader => true))
> > > > > > > >> >>
> > > > > > > >> >> On Thu, Dec 3, 2015 at 11:51 AM, John Omernik <
> > > > john@omernik.com>
> > > > > > > >> wrote:
> > > > > > > >> >>
> > > > > > > >> >> > I can't reproduce, so I must have done something wrong
> > the
> > > > > first
> > > > > > > >> time,
> > > > > > > >> >> > thank you for replying.
> > > > > > > >> >> >
> > > > > > > >> >> > Is there away to select from a csv directory with
> extract
> > > > > header
> > > > > > > for
> > > > > > > >> >> only
> > > > > > > >> >> > that query or table?  (Options?)
> > > > > > > >> >> >
> > > > > > > >> >> >
> > > > > > > >> >> >
> > > > > > > >> >> > On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <
> > > > > > > >> >> > adeneche@maprtech.com>
> > > > > > > >> >> > wrote:
> > > > > > > >> >> >
> > > > > > > >> >> > > Hey John,
> > > > > > > >> >> > >
> > > > > > > >> >> > > What do you get when you run "select * from
> > sys.version"
> > > ?
> > > > > > > >> >> > >
> > > > > > > >> >> > > extractHeader is false by default, so you need to
> > > > explicitly
> > > > > > set
> > > > > > > >> it to
> > > > > > > >> >> > > true.
> > > > > > > >> >> > >
> > > > > > > >> >> > > can you post your storage plugin configuration ?
> > > > > > > >> >> > >
> > > > > > > >> >> > > Thanks
> > > > > > > >> >> > >
> > > > > > > >> >> > > On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <
> > > > > john@omernik.com
> > > > > > >
> > > > > > > >> >> wrote:
> > > > > > > >> >> > >
> > > > > > > >> >> > > > Hey all,
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > Per my comment on
> > > > > > > >> https://issues.apache.org/jira/browse/DRILL-4145,
> > > > > > > >> >> I
> > > > > > > >> >> > > am
> > > > > > > >> >> > > > curious on why a CSV query (I am assuming with a
> > > default
> > > > > > > >> >> configuration,
> > > > > > > >> >> > > but
> > > > > > > >> >> > > > I have asked the question) in S3 would interpret
> > > > > differently
> > > > > > > >> than a
> > > > > > > >> >> CSV
> > > > > > > >> >> > > > query in MaprFS.
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > Per the other user, they are using Drill 1.3, and I
> > am
> > > as
> > > > > > well
> > > > > > > >> (per
> > > > > > > >> >> the
> > > > > > > >> >> > > > MapR folks, I am using a Dev release version from
> > MapR
> > > > that
> > > > > > has
> > > > > > > >> the
> > > > > > > >> >> > > Office
> > > > > > > >> >> > > > 1.3 release code base)
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > Basically, The query from the JIRA author showed
> the
> > > CSV
> > > > > file
> > > > > > > >> being
> > > > > > > >> >> > > > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were
> > the
> > > > > > headers
> > > > > > > >> and
> > > > > > > >> >> the
> > > > > > > >> >> > > > results broken out into columns. When I did this on
> > the
> > > > > same
> > > > > > > >> data, I
> > > > > > > >> >> > got
> > > > > > > >> >> > > > one results, columns and an array of data.
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > I tried setting extractHeader: true (what is the
> > > default
> > > > on
> > > > > > > this
> > > > > > > >> >> > setting)
> > > > > > > >> >> > > > and that had no effect. (After I update a storage
> > > plugin,
> > > > > > what
> > > > > > > >> do I
> > > > > > > >> >> > need
> > > > > > > >> >> > > to
> > > > > > > >> >> > > > do to ensure I see the effect in my SQL line
> session?
> > > DO
> > > > I
> > > > > > need
> > > > > > > >> to
> > > > > > > >> >> > > > reconnect?  Basically I set the storage plugin, got
> > the
> > > > > > > "success"
> > > > > > > >> >> then
> > > > > > > >> >> > > > changed to a difference schema and then back to my
> > > > original
> > > > > > > >> schema
> > > > > > > >> >> and
> > > > > > > >> >> > > saw
> > > > > > > >> >> > > > no effect... should I reconnect or is that not
> > needed?)
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > Just curious on why we'd see different ways to read
> > CSV
> > > > > > files,
> > > > > > > >> the
> > > > > > > >> >> S3
> > > > > > > >> >> > vs.
> > > > > > > >> >> > > > MapRFS shouldn't be different... or am I missing
> > > > something?
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > Thanks!
> > > > > > > >> >> > > >
> > > > > > > >> >> > > > John
> > > > > > > >> >> > > >
> > > > > > > >> >> > >
> > > > > > > >> >> > >
> > > > > > > >> >> > >
> > > > > > > >> >> > > --
> > > > > > > >> >> > >
> > > > > > > >> >> > > Abdelhakim Deneche
> > > > > > > >> >> > >
> > > > > > > >> >> > > Software Engineer
> > > > > > > >> >> > >
> > > > > > > >> >> > >   <http://www.mapr.com/>
> > > > > > > >> >> > >
> > > > > > > >> >> > >
> > > > > > > >> >> > > Now Available - Free Hadoop On-Demand Training
> > > > > > > >> >> > > <
> > > > > > > >> >> > >
> > > > > > > >> >> >
> > > > > > > >> >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > > >> >> > > >
> > > > > > > >> >> > >
> > > > > > > >> >> >
> > > > > > > >> >>
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > --
> > > > > > > >> >
> > > > > > > >> > Abdelhakim Deneche
> > > > > > > >> >
> > > > > > > >> > Software Engineer
> > > > > > > >> >
> > > > > > > >> >   <http://www.mapr.com/>
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > Now Available - Free Hadoop On-Demand Training
> > > > > > > >> > <
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Julien
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Julien
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Julien
> > > >
> > >
> >
>

Re: CSV Reader on 1.3

Posted by Jason Altekruse <al...@gmail.com>.

Thanks for the fix Bridget. I just took a look at the posted version an
noticed that the example query is missing some necessary backticks

Julien had tried to include an escaped backtick in his PR, but it doesn't
look like this is the right way to include a backtick in this type of text.

The path name to the file should be in backticks, like it is with a regular
from clause.

Current: table(dfs.path/to/data.csv(type => 'text',
Correct: table(dfs.`path/to/data.csv`(type => 'text',

I don't know if we frequently use these inline monospace blocks, but we
might have similar issues elsewhere. As far as getting a backtick into one
it seems like there are two possible solutions:
http://meta.stackexchange.com/questions/55437/how-can-the-backtick-character-be-included-in-code

Do you know which of these has been used previously? If we aren't using
these inline blocks elsewhere, we could just replace this with the
separated code/monospace blocks we normally use.



On Fri, Dec 4, 2015 at 4:40 PM, Bridget Bevens <bb...@maprtech.com> wrote:

> Added that it's available in Drill 1.4 and later:
>
> http://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters
>
> Thanks,
> Bridget
>
> On Thu, Dec 3, 2015 at 9:55 PM, Jacques Nadeau <ja...@dremio.com> wrote:
>
> > One note, this feature is in the upcoming Drill 1.4, not 1.3
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Thu, Dec 3, 2015 at 5:11 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > > Thank you!
> > >
> > > On Thu, Dec 3, 2015 at 5:08 PM, Jason Altekruse <
> > altekrusejason@gmail.com>
> > > wrote:
> > >
> > > > Thanks Bridget!
> > > >
> > > > On Thu, Dec 3, 2015 at 4:14 PM, Bridget Bevens <bbevens@maprtech.com
> >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I updated the doc page on the Drill site:
> > > > >
> > > > >
> > > >
> > >
> >
> http://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters
> > > > >
> > > > > Thanks,
> > > > > Bridget
> > > > >
> > > > > On Thu, Dec 3, 2015 at 3:53 PM, Julien Le Dem <ju...@dremio.com>
> > > wrote:
> > > > >
> > > > > > Here's a PR for the doc:
> > > > > > https://github.com/apache/drill/pull/290
> > > > > >
> > > > > > On Thu, Dec 3, 2015 at 3:25 PM, Julien Le Dem <julien@dremio.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > I need to update the doc to for this. I'll send a PR soon.
> > > > > > > In the meantime you can look at the tests:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/test/java/org/apache/drill/TestSelectWithOption.java
> > > > > > > Basically there is one type for each Format plugin.
> > > > > > > It look at the classes that implement FormatPluginConfig just
> > like
> > > > for
> > > > > > the
> > > > > > > json based configuration:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/logical/src/main/java/org/apache/drill/common/logical/FormatPluginConfig.java
> > > > > > >
> > > > > > > For example for the "text" format:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L135
> > > > > > >
> > > > > > > the type is "text" as defined by the annotation:@JsonTypeName
> > > > ("text")
> > > > > > > the available parameters are the same fields as in the json
> conf
> > > with
> > > > > the
> > > > > > > same defaults:
> > > > > > >     public String lineDelimiter = "\n";
> > > > > > >     public char fieldDelimiter = '\n';
> > > > > > >     public char quote = '"';
> > > > > > >     public char escape = '"';
> > > > > > >     public char comment = '#';
> > > > > > >     public boolean skipFirstLine = false;
> > > > > > >     public boolean extractHeader = false;
> > > > > > >
> > > > > > > On Thu, Dec 3, 2015 at 1:12 PM, Jason Altekruse <
> > > > > > altekrusejason@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> I don't think we have anything posted right now, it was just
> > > merged
> > > > > last
> > > > > > >> week.
> > > > > > >>
> > > > > > >> Julien,
> > > > > > >> Did you have something written for a short bit of
> documentation
> > on
> > > > the
> > > > > > >> functionality and any current limitations?
> > > > > > >>
> > > > > > >> - Jason
> > > > > > >>
> > > > > > >> On Thu, Dec 3, 2015 at 12:16 PM, Abdel Hakim Deneche <
> > > > > > >> adeneche@maprtech.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > I didn't notice select with options is already available !!!
> > did
> > > > we
> > > > > > add
> > > > > > >> it
> > > > > > >> > to the documentation ?
> > > > > > >> >
> > > > > > >> > On Thu, Dec 3, 2015 at 12:05 PM, Jason Altekruse <
> > > > > > >> altekrusejason@gmail.com
> > > > > > >> > > wrote:
> > > > > > >> >
> > > > > > >> >> Yes!
> > > > > > >> >>
> > > > > > >> >> Thanks to the new feature "select with options" it is
> > possible
> > > to
> > > > > > >> >> configure
> > > > > > >> >> the text reader to have query specific options. You will
> need
> > > to
> > > > > > build
> > > > > > >> the
> > > > > > >> >> tip of master or use the soon to be posted release
> candidate
> > > for
> > > > > 1.4
> > > > > > to
> > > > > > >> >> use
> > > > > > >> >> the feature.
> > > > > > >> >>
> > > > > > >> >> select a, b from table(dfs.`path/to/data.csv`(type =>
> 'text',
> > > > > > >> >> fieldDelimiter => ',', extractHeader => true))
> > > > > > >> >>
> > > > > > >> >> On Thu, Dec 3, 2015 at 11:51 AM, John Omernik <
> > > john@omernik.com>
> > > > > > >> wrote:
> > > > > > >> >>
> > > > > > >> >> > I can't reproduce, so I must have done something wrong
> the
> > > > first
> > > > > > >> time,
> > > > > > >> >> > thank you for replying.
> > > > > > >> >> >
> > > > > > >> >> > Is there away to select from a csv directory with extract
> > > > header
> > > > > > for
> > > > > > >> >> only
> > > > > > >> >> > that query or table?  (Options?)
> > > > > > >> >> >
> > > > > > >> >> >
> > > > > > >> >> >
> > > > > > >> >> > On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <
> > > > > > >> >> > adeneche@maprtech.com>
> > > > > > >> >> > wrote:
> > > > > > >> >> >
> > > > > > >> >> > > Hey John,
> > > > > > >> >> > >
> > > > > > >> >> > > What do you get when you run "select * from
> sys.version"
> > ?
> > > > > > >> >> > >
> > > > > > >> >> > > extractHeader is false by default, so you need to
> > > explicitly
> > > > > set
> > > > > > >> it to
> > > > > > >> >> > > true.
> > > > > > >> >> > >
> > > > > > >> >> > > can you post your storage plugin configuration ?
> > > > > > >> >> > >
> > > > > > >> >> > > Thanks
> > > > > > >> >> > >
> > > > > > >> >> > > On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <
> > > > john@omernik.com
> > > > > >
> > > > > > >> >> wrote:
> > > > > > >> >> > >
> > > > > > >> >> > > > Hey all,
> > > > > > >> >> > > >
> > > > > > >> >> > > > Per my comment on
> > > > > > >> https://issues.apache.org/jira/browse/DRILL-4145,
> > > > > > >> >> I
> > > > > > >> >> > > am
> > > > > > >> >> > > > curious on why a CSV query (I am assuming with a
> > default
> > > > > > >> >> configuration,
> > > > > > >> >> > > but
> > > > > > >> >> > > > I have asked the question) in S3 would interpret
> > > > differently
> > > > > > >> than a
> > > > > > >> >> CSV
> > > > > > >> >> > > > query in MaprFS.
> > > > > > >> >> > > >
> > > > > > >> >> > > > Per the other user, they are using Drill 1.3, and I
> am
> > as
> > > > > well
> > > > > > >> (per
> > > > > > >> >> the
> > > > > > >> >> > > > MapR folks, I am using a Dev release version from
> MapR
> > > that
> > > > > has
> > > > > > >> the
> > > > > > >> >> > > Office
> > > > > > >> >> > > > 1.3 release code base)
> > > > > > >> >> > > >
> > > > > > >> >> > > > Basically, The query from the JIRA author showed the
> > CSV
> > > > file
> > > > > > >> being
> > > > > > >> >> > > > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were
> the
> > > > > headers
> > > > > > >> and
> > > > > > >> >> the
> > > > > > >> >> > > > results broken out into columns. When I did this on
> the
> > > > same
> > > > > > >> data, I
> > > > > > >> >> > got
> > > > > > >> >> > > > one results, columns and an array of data.
> > > > > > >> >> > > >
> > > > > > >> >> > > > I tried setting extractHeader: true (what is the
> > default
> > > on
> > > > > > this
> > > > > > >> >> > setting)
> > > > > > >> >> > > > and that had no effect. (After I update a storage
> > plugin,
> > > > > what
> > > > > > >> do I
> > > > > > >> >> > need
> > > > > > >> >> > > to
> > > > > > >> >> > > > do to ensure I see the effect in my SQL line session?
> > DO
> > > I
> > > > > need
> > > > > > >> to
> > > > > > >> >> > > > reconnect?  Basically I set the storage plugin, got
> the
> > > > > > "success"
> > > > > > >> >> then
> > > > > > >> >> > > > changed to a difference schema and then back to my
> > > original
> > > > > > >> schema
> > > > > > >> >> and
> > > > > > >> >> > > saw
> > > > > > >> >> > > > no effect... should I reconnect or is that not
> needed?)
> > > > > > >> >> > > >
> > > > > > >> >> > > > Just curious on why we'd see different ways to read
> CSV
> > > > > files,
> > > > > > >> the
> > > > > > >> >> S3
> > > > > > >> >> > vs.
> > > > > > >> >> > > > MapRFS shouldn't be different... or am I missing
> > > something?
> > > > > > >> >> > > >
> > > > > > >> >> > > > Thanks!
> > > > > > >> >> > > >
> > > > > > >> >> > > > John
> > > > > > >> >> > > >
> > > > > > >> >> > >
> > > > > > >> >> > >
> > > > > > >> >> > >
> > > > > > >> >> > > --
> > > > > > >> >> > >
> > > > > > >> >> > > Abdelhakim Deneche
> > > > > > >> >> > >
> > > > > > >> >> > > Software Engineer
> > > > > > >> >> > >
> > > > > > >> >> > >   <http://www.mapr.com/>
> > > > > > >> >> > >
> > > > > > >> >> > >
> > > > > > >> >> > > Now Available - Free Hadoop On-Demand Training
> > > > > > >> >> > > <
> > > > > > >> >> > >
> > > > > > >> >> >
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > >> >> > > >
> > > > > > >> >> > >
> > > > > > >> >> >
> > > > > > >> >>
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > --
> > > > > > >> >
> > > > > > >> > Abdelhakim Deneche
> > > > > > >> >
> > > > > > >> > Software Engineer
> > > > > > >> >
> > > > > > >> >   <http://www.mapr.com/>
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Now Available - Free Hadoop On-Demand Training
> > > > > > >> > <
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Julien
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Julien
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Julien
> > >
> >
>

Re: CSV Reader on 1.3

Posted by Bridget Bevens <bb...@maprtech.com>.

Added that it's available in Drill 1.4 and later:
http://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters

Thanks,
Bridget

On Thu, Dec 3, 2015 at 9:55 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> One note, this feature is in the upcoming Drill 1.4, not 1.3
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Thu, Dec 3, 2015 at 5:11 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
> > Thank you!
> >
> > On Thu, Dec 3, 2015 at 5:08 PM, Jason Altekruse <
> altekrusejason@gmail.com>
> > wrote:
> >
> > > Thanks Bridget!
> > >
> > > On Thu, Dec 3, 2015 at 4:14 PM, Bridget Bevens <bb...@maprtech.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I updated the doc page on the Drill site:
> > > >
> > > >
> > >
> >
> http://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters
> > > >
> > > > Thanks,
> > > > Bridget
> > > >
> > > > On Thu, Dec 3, 2015 at 3:53 PM, Julien Le Dem <ju...@dremio.com>
> > wrote:
> > > >
> > > > > Here's a PR for the doc:
> > > > > https://github.com/apache/drill/pull/290
> > > > >
> > > > > On Thu, Dec 3, 2015 at 3:25 PM, Julien Le Dem <ju...@dremio.com>
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > > I need to update the doc to for this. I'll send a PR soon.
> > > > > > In the meantime you can look at the tests:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/test/java/org/apache/drill/TestSelectWithOption.java
> > > > > > Basically there is one type for each Format plugin.
> > > > > > It look at the classes that implement FormatPluginConfig just
> like
> > > for
> > > > > the
> > > > > > json based configuration:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/logical/src/main/java/org/apache/drill/common/logical/FormatPluginConfig.java
> > > > > >
> > > > > > For example for the "text" format:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L135
> > > > > >
> > > > > > the type is "text" as defined by the annotation:@JsonTypeName
> > > ("text")
> > > > > > the available parameters are the same fields as in the json conf
> > with
> > > > the
> > > > > > same defaults:
> > > > > >     public String lineDelimiter = "\n";
> > > > > >     public char fieldDelimiter = '\n';
> > > > > >     public char quote = '"';
> > > > > >     public char escape = '"';
> > > > > >     public char comment = '#';
> > > > > >     public boolean skipFirstLine = false;
> > > > > >     public boolean extractHeader = false;
> > > > > >
> > > > > > On Thu, Dec 3, 2015 at 1:12 PM, Jason Altekruse <
> > > > > altekrusejason@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> I don't think we have anything posted right now, it was just
> > merged
> > > > last
> > > > > >> week.
> > > > > >>
> > > > > >> Julien,
> > > > > >> Did you have something written for a short bit of documentation
> on
> > > the
> > > > > >> functionality and any current limitations?
> > > > > >>
> > > > > >> - Jason
> > > > > >>
> > > > > >> On Thu, Dec 3, 2015 at 12:16 PM, Abdel Hakim Deneche <
> > > > > >> adeneche@maprtech.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > I didn't notice select with options is already available !!!
> did
> > > we
> > > > > add
> > > > > >> it
> > > > > >> > to the documentation ?
> > > > > >> >
> > > > > >> > On Thu, Dec 3, 2015 at 12:05 PM, Jason Altekruse <
> > > > > >> altekrusejason@gmail.com
> > > > > >> > > wrote:
> > > > > >> >
> > > > > >> >> Yes!
> > > > > >> >>
> > > > > >> >> Thanks to the new feature "select with options" it is
> possible
> > to
> > > > > >> >> configure
> > > > > >> >> the text reader to have query specific options. You will need
> > to
> > > > > build
> > > > > >> the
> > > > > >> >> tip of master or use the soon to be posted release candidate
> > for
> > > > 1.4
> > > > > to
> > > > > >> >> use
> > > > > >> >> the feature.
> > > > > >> >>
> > > > > >> >> select a, b from table(dfs.`path/to/data.csv`(type => 'text',
> > > > > >> >> fieldDelimiter => ',', extractHeader => true))
> > > > > >> >>
> > > > > >> >> On Thu, Dec 3, 2015 at 11:51 AM, John Omernik <
> > john@omernik.com>
> > > > > >> wrote:
> > > > > >> >>
> > > > > >> >> > I can't reproduce, so I must have done something wrong the
> > > first
> > > > > >> time,
> > > > > >> >> > thank you for replying.
> > > > > >> >> >
> > > > > >> >> > Is there away to select from a csv directory with extract
> > > header
> > > > > for
> > > > > >> >> only
> > > > > >> >> > that query or table?  (Options?)
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <
> > > > > >> >> > adeneche@maprtech.com>
> > > > > >> >> > wrote:
> > > > > >> >> >
> > > > > >> >> > > Hey John,
> > > > > >> >> > >
> > > > > >> >> > > What do you get when you run "select * from sys.version"
> ?
> > > > > >> >> > >
> > > > > >> >> > > extractHeader is false by default, so you need to
> > explicitly
> > > > set
> > > > > >> it to
> > > > > >> >> > > true.
> > > > > >> >> > >
> > > > > >> >> > > can you post your storage plugin configuration ?
> > > > > >> >> > >
> > > > > >> >> > > Thanks
> > > > > >> >> > >
> > > > > >> >> > > On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <
> > > john@omernik.com
> > > > >
> > > > > >> >> wrote:
> > > > > >> >> > >
> > > > > >> >> > > > Hey all,
> > > > > >> >> > > >
> > > > > >> >> > > > Per my comment on
> > > > > >> https://issues.apache.org/jira/browse/DRILL-4145,
> > > > > >> >> I
> > > > > >> >> > > am
> > > > > >> >> > > > curious on why a CSV query (I am assuming with a
> default
> > > > > >> >> configuration,
> > > > > >> >> > > but
> > > > > >> >> > > > I have asked the question) in S3 would interpret
> > > differently
> > > > > >> than a
> > > > > >> >> CSV
> > > > > >> >> > > > query in MaprFS.
> > > > > >> >> > > >
> > > > > >> >> > > > Per the other user, they are using Drill 1.3, and I am
> as
> > > > well
> > > > > >> (per
> > > > > >> >> the
> > > > > >> >> > > > MapR folks, I am using a Dev release version from MapR
> > that
> > > > has
> > > > > >> the
> > > > > >> >> > > Office
> > > > > >> >> > > > 1.3 release code base)
> > > > > >> >> > > >
> > > > > >> >> > > > Basically, The query from the JIRA author showed the
> CSV
> > > file
> > > > > >> being
> > > > > >> >> > > > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the
> > > > headers
> > > > > >> and
> > > > > >> >> the
> > > > > >> >> > > > results broken out into columns. When I did this on the
> > > same
> > > > > >> data, I
> > > > > >> >> > got
> > > > > >> >> > > > one results, columns and an array of data.
> > > > > >> >> > > >
> > > > > >> >> > > > I tried setting extractHeader: true (what is the
> default
> > on
> > > > > this
> > > > > >> >> > setting)
> > > > > >> >> > > > and that had no effect. (After I update a storage
> plugin,
> > > > what
> > > > > >> do I
> > > > > >> >> > need
> > > > > >> >> > > to
> > > > > >> >> > > > do to ensure I see the effect in my SQL line session?
> DO
> > I
> > > > need
> > > > > >> to
> > > > > >> >> > > > reconnect?  Basically I set the storage plugin, got the
> > > > > "success"
> > > > > >> >> then
> > > > > >> >> > > > changed to a difference schema and then back to my
> > original
> > > > > >> schema
> > > > > >> >> and
> > > > > >> >> > > saw
> > > > > >> >> > > > no effect... should I reconnect or is that not needed?)
> > > > > >> >> > > >
> > > > > >> >> > > > Just curious on why we'd see different ways to read CSV
> > > > files,
> > > > > >> the
> > > > > >> >> S3
> > > > > >> >> > vs.
> > > > > >> >> > > > MapRFS shouldn't be different... or am I missing
> > something?
> > > > > >> >> > > >
> > > > > >> >> > > > Thanks!
> > > > > >> >> > > >
> > > > > >> >> > > > John
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > > --
> > > > > >> >> > >
> > > > > >> >> > > Abdelhakim Deneche
> > > > > >> >> > >
> > > > > >> >> > > Software Engineer
> > > > > >> >> > >
> > > > > >> >> > >   <http://www.mapr.com/>
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > > Now Available - Free Hadoop On-Demand Training
> > > > > >> >> > > <
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > --
> > > > > >> >
> > > > > >> > Abdelhakim Deneche
> > > > > >> >
> > > > > >> > Software Engineer
> > > > > >> >
> > > > > >> >   <http://www.mapr.com/>
> > > > > >> >
> > > > > >> >
> > > > > >> > Now Available - Free Hadoop On-Demand Training
> > > > > >> > <
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Julien
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Julien
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Julien
> >
>

Re: CSV Reader on 1.3

Posted by Jacques Nadeau <ja...@dremio.com>.

One note, this feature is in the upcoming Drill 1.4, not 1.3

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Thu, Dec 3, 2015 at 5:11 PM, Julien Le Dem <ju...@dremio.com> wrote:

> Thank you!
>
> On Thu, Dec 3, 2015 at 5:08 PM, Jason Altekruse <al...@gmail.com>
> wrote:
>
> > Thanks Bridget!
> >
> > On Thu, Dec 3, 2015 at 4:14 PM, Bridget Bevens <bb...@maprtech.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I updated the doc page on the Drill site:
> > >
> > >
> >
> http://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters
> > >
> > > Thanks,
> > > Bridget
> > >
> > > On Thu, Dec 3, 2015 at 3:53 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> > >
> > > > Here's a PR for the doc:
> > > > https://github.com/apache/drill/pull/290
> > > >
> > > > On Thu, Dec 3, 2015 at 3:25 PM, Julien Le Dem <ju...@dremio.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > > I need to update the doc to for this. I'll send a PR soon.
> > > > > In the meantime you can look at the tests:
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/test/java/org/apache/drill/TestSelectWithOption.java
> > > > > Basically there is one type for each Format plugin.
> > > > > It look at the classes that implement FormatPluginConfig just like
> > for
> > > > the
> > > > > json based configuration:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/logical/src/main/java/org/apache/drill/common/logical/FormatPluginConfig.java
> > > > >
> > > > > For example for the "text" format:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L135
> > > > >
> > > > > the type is "text" as defined by the annotation:@JsonTypeName
> > ("text")
> > > > > the available parameters are the same fields as in the json conf
> with
> > > the
> > > > > same defaults:
> > > > >     public String lineDelimiter = "\n";
> > > > >     public char fieldDelimiter = '\n';
> > > > >     public char quote = '"';
> > > > >     public char escape = '"';
> > > > >     public char comment = '#';
> > > > >     public boolean skipFirstLine = false;
> > > > >     public boolean extractHeader = false;
> > > > >
> > > > > On Thu, Dec 3, 2015 at 1:12 PM, Jason Altekruse <
> > > > altekrusejason@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> I don't think we have anything posted right now, it was just
> merged
> > > last
> > > > >> week.
> > > > >>
> > > > >> Julien,
> > > > >> Did you have something written for a short bit of documentation on
> > the
> > > > >> functionality and any current limitations?
> > > > >>
> > > > >> - Jason
> > > > >>
> > > > >> On Thu, Dec 3, 2015 at 12:16 PM, Abdel Hakim Deneche <
> > > > >> adeneche@maprtech.com>
> > > > >> wrote:
> > > > >>
> > > > >> > I didn't notice select with options is already available !!! did
> > we
> > > > add
> > > > >> it
> > > > >> > to the documentation ?
> > > > >> >
> > > > >> > On Thu, Dec 3, 2015 at 12:05 PM, Jason Altekruse <
> > > > >> altekrusejason@gmail.com
> > > > >> > > wrote:
> > > > >> >
> > > > >> >> Yes!
> > > > >> >>
> > > > >> >> Thanks to the new feature "select with options" it is possible
> to
> > > > >> >> configure
> > > > >> >> the text reader to have query specific options. You will need
> to
> > > > build
> > > > >> the
> > > > >> >> tip of master or use the soon to be posted release candidate
> for
> > > 1.4
> > > > to
> > > > >> >> use
> > > > >> >> the feature.
> > > > >> >>
> > > > >> >> select a, b from table(dfs.`path/to/data.csv`(type => 'text',
> > > > >> >> fieldDelimiter => ',', extractHeader => true))
> > > > >> >>
> > > > >> >> On Thu, Dec 3, 2015 at 11:51 AM, John Omernik <
> john@omernik.com>
> > > > >> wrote:
> > > > >> >>
> > > > >> >> > I can't reproduce, so I must have done something wrong the
> > first
> > > > >> time,
> > > > >> >> > thank you for replying.
> > > > >> >> >
> > > > >> >> > Is there away to select from a csv directory with extract
> > header
> > > > for
> > > > >> >> only
> > > > >> >> > that query or table?  (Options?)
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <
> > > > >> >> > adeneche@maprtech.com>
> > > > >> >> > wrote:
> > > > >> >> >
> > > > >> >> > > Hey John,
> > > > >> >> > >
> > > > >> >> > > What do you get when you run "select * from sys.version" ?
> > > > >> >> > >
> > > > >> >> > > extractHeader is false by default, so you need to
> explicitly
> > > set
> > > > >> it to
> > > > >> >> > > true.
> > > > >> >> > >
> > > > >> >> > > can you post your storage plugin configuration ?
> > > > >> >> > >
> > > > >> >> > > Thanks
> > > > >> >> > >
> > > > >> >> > > On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <
> > john@omernik.com
> > > >
> > > > >> >> wrote:
> > > > >> >> > >
> > > > >> >> > > > Hey all,
> > > > >> >> > > >
> > > > >> >> > > > Per my comment on
> > > > >> https://issues.apache.org/jira/browse/DRILL-4145,
> > > > >> >> I
> > > > >> >> > > am
> > > > >> >> > > > curious on why a CSV query (I am assuming with a default
> > > > >> >> configuration,
> > > > >> >> > > but
> > > > >> >> > > > I have asked the question) in S3 would interpret
> > differently
> > > > >> than a
> > > > >> >> CSV
> > > > >> >> > > > query in MaprFS.
> > > > >> >> > > >
> > > > >> >> > > > Per the other user, they are using Drill 1.3, and I am as
> > > well
> > > > >> (per
> > > > >> >> the
> > > > >> >> > > > MapR folks, I am using a Dev release version from MapR
> that
> > > has
> > > > >> the
> > > > >> >> > > Office
> > > > >> >> > > > 1.3 release code base)
> > > > >> >> > > >
> > > > >> >> > > > Basically, The query from the JIRA author showed the CSV
> > file
> > > > >> being
> > > > >> >> > > > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the
> > > headers
> > > > >> and
> > > > >> >> the
> > > > >> >> > > > results broken out into columns. When I did this on the
> > same
> > > > >> data, I
> > > > >> >> > got
> > > > >> >> > > > one results, columns and an array of data.
> > > > >> >> > > >
> > > > >> >> > > > I tried setting extractHeader: true (what is the default
> on
> > > > this
> > > > >> >> > setting)
> > > > >> >> > > > and that had no effect. (After I update a storage plugin,
> > > what
> > > > >> do I
> > > > >> >> > need
> > > > >> >> > > to
> > > > >> >> > > > do to ensure I see the effect in my SQL line session? DO
> I
> > > need
> > > > >> to
> > > > >> >> > > > reconnect?  Basically I set the storage plugin, got the
> > > > "success"
> > > > >> >> then
> > > > >> >> > > > changed to a difference schema and then back to my
> original
> > > > >> schema
> > > > >> >> and
> > > > >> >> > > saw
> > > > >> >> > > > no effect... should I reconnect or is that not needed?)
> > > > >> >> > > >
> > > > >> >> > > > Just curious on why we'd see different ways to read CSV
> > > files,
> > > > >> the
> > > > >> >> S3
> > > > >> >> > vs.
> > > > >> >> > > > MapRFS shouldn't be different... or am I missing
> something?
> > > > >> >> > > >
> > > > >> >> > > > Thanks!
> > > > >> >> > > >
> > > > >> >> > > > John
> > > > >> >> > > >
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > > --
> > > > >> >> > >
> > > > >> >> > > Abdelhakim Deneche
> > > > >> >> > >
> > > > >> >> > > Software Engineer
> > > > >> >> > >
> > > > >> >> > >   <http://www.mapr.com/>
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > > Now Available - Free Hadoop On-Demand Training
> > > > >> >> > > <
> > > > >> >> > >
> > > > >> >> >
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >> >> > > >
> > > > >> >> > >
> > > > >> >> >
> > > > >> >>
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> >
> > > > >> > Abdelhakim Deneche
> > > > >> >
> > > > >> > Software Engineer
> > > > >> >
> > > > >> >   <http://www.mapr.com/>
> > > > >> >
> > > > >> >
> > > > >> > Now Available - Free Hadoop On-Demand Training
> > > > >> > <
> > > > >>
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >> >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Julien
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Julien
> > > >
> > >
> >
>
>
>
> --
> Julien
>

Re: CSV Reader on 1.3

Posted by Julien Le Dem <ju...@dremio.com>.

Thank you!

On Thu, Dec 3, 2015 at 5:08 PM, Jason Altekruse <al...@gmail.com>
wrote:

> Thanks Bridget!
>
> On Thu, Dec 3, 2015 at 4:14 PM, Bridget Bevens <bb...@maprtech.com>
> wrote:
>
> > Hi,
> >
> > I updated the doc page on the Drill site:
> >
> >
> http://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters
> >
> > Thanks,
> > Bridget
> >
> > On Thu, Dec 3, 2015 at 3:53 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > > Here's a PR for the doc:
> > > https://github.com/apache/drill/pull/290
> > >
> > > On Thu, Dec 3, 2015 at 3:25 PM, Julien Le Dem <ju...@dremio.com>
> wrote:
> > >
> > > > Hi,
> > > > I need to update the doc to for this. I'll send a PR soon.
> > > > In the meantime you can look at the tests:
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/test/java/org/apache/drill/TestSelectWithOption.java
> > > > Basically there is one type for each Format plugin.
> > > > It look at the classes that implement FormatPluginConfig just like
> for
> > > the
> > > > json based configuration:
> > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/logical/src/main/java/org/apache/drill/common/logical/FormatPluginConfig.java
> > > >
> > > > For example for the "text" format:
> > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L135
> > > >
> > > > the type is "text" as defined by the annotation:@JsonTypeName
> ("text")
> > > > the available parameters are the same fields as in the json conf with
> > the
> > > > same defaults:
> > > >     public String lineDelimiter = "\n";
> > > >     public char fieldDelimiter = '\n';
> > > >     public char quote = '"';
> > > >     public char escape = '"';
> > > >     public char comment = '#';
> > > >     public boolean skipFirstLine = false;
> > > >     public boolean extractHeader = false;
> > > >
> > > > On Thu, Dec 3, 2015 at 1:12 PM, Jason Altekruse <
> > > altekrusejason@gmail.com>
> > > > wrote:
> > > >
> > > >> I don't think we have anything posted right now, it was just merged
> > last
> > > >> week.
> > > >>
> > > >> Julien,
> > > >> Did you have something written for a short bit of documentation on
> the
> > > >> functionality and any current limitations?
> > > >>
> > > >> - Jason
> > > >>
> > > >> On Thu, Dec 3, 2015 at 12:16 PM, Abdel Hakim Deneche <
> > > >> adeneche@maprtech.com>
> > > >> wrote:
> > > >>
> > > >> > I didn't notice select with options is already available !!! did
> we
> > > add
> > > >> it
> > > >> > to the documentation ?
> > > >> >
> > > >> > On Thu, Dec 3, 2015 at 12:05 PM, Jason Altekruse <
> > > >> altekrusejason@gmail.com
> > > >> > > wrote:
> > > >> >
> > > >> >> Yes!
> > > >> >>
> > > >> >> Thanks to the new feature "select with options" it is possible to
> > > >> >> configure
> > > >> >> the text reader to have query specific options. You will need to
> > > build
> > > >> the
> > > >> >> tip of master or use the soon to be posted release candidate for
> > 1.4
> > > to
> > > >> >> use
> > > >> >> the feature.
> > > >> >>
> > > >> >> select a, b from table(dfs.`path/to/data.csv`(type => 'text',
> > > >> >> fieldDelimiter => ',', extractHeader => true))
> > > >> >>
> > > >> >> On Thu, Dec 3, 2015 at 11:51 AM, John Omernik <jo...@omernik.com>
> > > >> wrote:
> > > >> >>
> > > >> >> > I can't reproduce, so I must have done something wrong the
> first
> > > >> time,
> > > >> >> > thank you for replying.
> > > >> >> >
> > > >> >> > Is there away to select from a csv directory with extract
> header
> > > for
> > > >> >> only
> > > >> >> > that query or table?  (Options?)
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <
> > > >> >> > adeneche@maprtech.com>
> > > >> >> > wrote:
> > > >> >> >
> > > >> >> > > Hey John,
> > > >> >> > >
> > > >> >> > > What do you get when you run "select * from sys.version" ?
> > > >> >> > >
> > > >> >> > > extractHeader is false by default, so you need to explicitly
> > set
> > > >> it to
> > > >> >> > > true.
> > > >> >> > >
> > > >> >> > > can you post your storage plugin configuration ?
> > > >> >> > >
> > > >> >> > > Thanks
> > > >> >> > >
> > > >> >> > > On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <
> john@omernik.com
> > >
> > > >> >> wrote:
> > > >> >> > >
> > > >> >> > > > Hey all,
> > > >> >> > > >
> > > >> >> > > > Per my comment on
> > > >> https://issues.apache.org/jira/browse/DRILL-4145,
> > > >> >> I
> > > >> >> > > am
> > > >> >> > > > curious on why a CSV query (I am assuming with a default
> > > >> >> configuration,
> > > >> >> > > but
> > > >> >> > > > I have asked the question) in S3 would interpret
> differently
> > > >> than a
> > > >> >> CSV
> > > >> >> > > > query in MaprFS.
> > > >> >> > > >
> > > >> >> > > > Per the other user, they are using Drill 1.3, and I am as
> > well
> > > >> (per
> > > >> >> the
> > > >> >> > > > MapR folks, I am using a Dev release version from MapR that
> > has
> > > >> the
> > > >> >> > > Office
> > > >> >> > > > 1.3 release code base)
> > > >> >> > > >
> > > >> >> > > > Basically, The query from the JIRA author showed the CSV
> file
> > > >> being
> > > >> >> > > > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the
> > headers
> > > >> and
> > > >> >> the
> > > >> >> > > > results broken out into columns. When I did this on the
> same
> > > >> data, I
> > > >> >> > got
> > > >> >> > > > one results, columns and an array of data.
> > > >> >> > > >
> > > >> >> > > > I tried setting extractHeader: true (what is the default on
> > > this
> > > >> >> > setting)
> > > >> >> > > > and that had no effect. (After I update a storage plugin,
> > what
> > > >> do I
> > > >> >> > need
> > > >> >> > > to
> > > >> >> > > > do to ensure I see the effect in my SQL line session? DO I
> > need
> > > >> to
> > > >> >> > > > reconnect?  Basically I set the storage plugin, got the
> > > "success"
> > > >> >> then
> > > >> >> > > > changed to a difference schema and then back to my original
> > > >> schema
> > > >> >> and
> > > >> >> > > saw
> > > >> >> > > > no effect... should I reconnect or is that not needed?)
> > > >> >> > > >
> > > >> >> > > > Just curious on why we'd see different ways to read CSV
> > files,
> > > >> the
> > > >> >> S3
> > > >> >> > vs.
> > > >> >> > > > MapRFS shouldn't be different... or am I missing something?
> > > >> >> > > >
> > > >> >> > > > Thanks!
> > > >> >> > > >
> > > >> >> > > > John
> > > >> >> > > >
> > > >> >> > >
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > --
> > > >> >> > >
> > > >> >> > > Abdelhakim Deneche
> > > >> >> > >
> > > >> >> > > Software Engineer
> > > >> >> > >
> > > >> >> > >   <http://www.mapr.com/>
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > Now Available - Free Hadoop On-Demand Training
> > > >> >> > > <
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >>
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >> >> > > >
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> >
> > > >> > Abdelhakim Deneche
> > > >> >
> > > >> > Software Engineer
> > > >> >
> > > >> >   <http://www.mapr.com/>
> > > >> >
> > > >> >
> > > >> > Now Available - Free Hadoop On-Demand Training
> > > >> > <
> > > >>
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >> >
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Julien
> > > >
> > >
> > >
> > >
> > > --
> > > Julien
> > >
> >
>



-- 
Julien

Re: CSV Reader on 1.3

Posted by Jason Altekruse <al...@gmail.com>.

Thanks Bridget!

On Thu, Dec 3, 2015 at 4:14 PM, Bridget Bevens <bb...@maprtech.com> wrote:

> Hi,
>
> I updated the doc page on the Drill site:
>
> http://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters
>
> Thanks,
> Bridget
>
> On Thu, Dec 3, 2015 at 3:53 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
> > Here's a PR for the doc:
> > https://github.com/apache/drill/pull/290
> >
> > On Thu, Dec 3, 2015 at 3:25 PM, Julien Le Dem <ju...@dremio.com> wrote:
> >
> > > Hi,
> > > I need to update the doc to for this. I'll send a PR soon.
> > > In the meantime you can look at the tests:
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/test/java/org/apache/drill/TestSelectWithOption.java
> > > Basically there is one type for each Format plugin.
> > > It look at the classes that implement FormatPluginConfig just like for
> > the
> > > json based configuration:
> > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/logical/src/main/java/org/apache/drill/common/logical/FormatPluginConfig.java
> > >
> > > For example for the "text" format:
> > >
> > >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L135
> > >
> > > the type is "text" as defined by the annotation:@JsonTypeName("text")
> > > the available parameters are the same fields as in the json conf with
> the
> > > same defaults:
> > >     public String lineDelimiter = "\n";
> > >     public char fieldDelimiter = '\n';
> > >     public char quote = '"';
> > >     public char escape = '"';
> > >     public char comment = '#';
> > >     public boolean skipFirstLine = false;
> > >     public boolean extractHeader = false;
> > >
> > > On Thu, Dec 3, 2015 at 1:12 PM, Jason Altekruse <
> > altekrusejason@gmail.com>
> > > wrote:
> > >
> > >> I don't think we have anything posted right now, it was just merged
> last
> > >> week.
> > >>
> > >> Julien,
> > >> Did you have something written for a short bit of documentation on the
> > >> functionality and any current limitations?
> > >>
> > >> - Jason
> > >>
> > >> On Thu, Dec 3, 2015 at 12:16 PM, Abdel Hakim Deneche <
> > >> adeneche@maprtech.com>
> > >> wrote:
> > >>
> > >> > I didn't notice select with options is already available !!! did we
> > add
> > >> it
> > >> > to the documentation ?
> > >> >
> > >> > On Thu, Dec 3, 2015 at 12:05 PM, Jason Altekruse <
> > >> altekrusejason@gmail.com
> > >> > > wrote:
> > >> >
> > >> >> Yes!
> > >> >>
> > >> >> Thanks to the new feature "select with options" it is possible to
> > >> >> configure
> > >> >> the text reader to have query specific options. You will need to
> > build
> > >> the
> > >> >> tip of master or use the soon to be posted release candidate for
> 1.4
> > to
> > >> >> use
> > >> >> the feature.
> > >> >>
> > >> >> select a, b from table(dfs.`path/to/data.csv`(type => 'text',
> > >> >> fieldDelimiter => ',', extractHeader => true))
> > >> >>
> > >> >> On Thu, Dec 3, 2015 at 11:51 AM, John Omernik <jo...@omernik.com>
> > >> wrote:
> > >> >>
> > >> >> > I can't reproduce, so I must have done something wrong the first
> > >> time,
> > >> >> > thank you for replying.
> > >> >> >
> > >> >> > Is there away to select from a csv directory with extract header
> > for
> > >> >> only
> > >> >> > that query or table?  (Options?)
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <
> > >> >> > adeneche@maprtech.com>
> > >> >> > wrote:
> > >> >> >
> > >> >> > > Hey John,
> > >> >> > >
> > >> >> > > What do you get when you run "select * from sys.version" ?
> > >> >> > >
> > >> >> > > extractHeader is false by default, so you need to explicitly
> set
> > >> it to
> > >> >> > > true.
> > >> >> > >
> > >> >> > > can you post your storage plugin configuration ?
> > >> >> > >
> > >> >> > > Thanks
> > >> >> > >
> > >> >> > > On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <john@omernik.com
> >
> > >> >> wrote:
> > >> >> > >
> > >> >> > > > Hey all,
> > >> >> > > >
> > >> >> > > > Per my comment on
> > >> https://issues.apache.org/jira/browse/DRILL-4145,
> > >> >> I
> > >> >> > > am
> > >> >> > > > curious on why a CSV query (I am assuming with a default
> > >> >> configuration,
> > >> >> > > but
> > >> >> > > > I have asked the question) in S3 would interpret differently
> > >> than a
> > >> >> CSV
> > >> >> > > > query in MaprFS.
> > >> >> > > >
> > >> >> > > > Per the other user, they are using Drill 1.3, and I am as
> well
> > >> (per
> > >> >> the
> > >> >> > > > MapR folks, I am using a Dev release version from MapR that
> has
> > >> the
> > >> >> > > Office
> > >> >> > > > 1.3 release code base)
> > >> >> > > >
> > >> >> > > > Basically, The query from the JIRA author showed the CSV file
> > >> being
> > >> >> > > > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the
> headers
> > >> and
> > >> >> the
> > >> >> > > > results broken out into columns. When I did this on the same
> > >> data, I
> > >> >> > got
> > >> >> > > > one results, columns and an array of data.
> > >> >> > > >
> > >> >> > > > I tried setting extractHeader: true (what is the default on
> > this
> > >> >> > setting)
> > >> >> > > > and that had no effect. (After I update a storage plugin,
> what
> > >> do I
> > >> >> > need
> > >> >> > > to
> > >> >> > > > do to ensure I see the effect in my SQL line session? DO I
> need
> > >> to
> > >> >> > > > reconnect?  Basically I set the storage plugin, got the
> > "success"
> > >> >> then
> > >> >> > > > changed to a difference schema and then back to my original
> > >> schema
> > >> >> and
> > >> >> > > saw
> > >> >> > > > no effect... should I reconnect or is that not needed?)
> > >> >> > > >
> > >> >> > > > Just curious on why we'd see different ways to read CSV
> files,
> > >> the
> > >> >> S3
> > >> >> > vs.
> > >> >> > > > MapRFS shouldn't be different... or am I missing something?
> > >> >> > > >
> > >> >> > > > Thanks!
> > >> >> > > >
> > >> >> > > > John
> > >> >> > > >
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > > --
> > >> >> > >
> > >> >> > > Abdelhakim Deneche
> > >> >> > >
> > >> >> > > Software Engineer
> > >> >> > >
> > >> >> > >   <http://www.mapr.com/>
> > >> >> > >
> > >> >> > >
> > >> >> > > Now Available - Free Hadoop On-Demand Training
> > >> >> > > <
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Abdelhakim Deneche
> > >> >
> > >> > Software Engineer
> > >> >
> > >> >   <http://www.mapr.com/>
> > >> >
> > >> >
> > >> > Now Available - Free Hadoop On-Demand Training
> > >> > <
> > >>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >> >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Julien
> > >
> >
> >
> >
> > --
> > Julien
> >
>

Re: CSV Reader on 1.3

Posted by Bridget Bevens <bb...@maprtech.com>.

Hi,

I updated the doc page on the Drill site:
http://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters

Thanks,
Bridget

On Thu, Dec 3, 2015 at 3:53 PM, Julien Le Dem <ju...@dremio.com> wrote:

> Here's a PR for the doc:
> https://github.com/apache/drill/pull/290
>
> On Thu, Dec 3, 2015 at 3:25 PM, Julien Le Dem <ju...@dremio.com> wrote:
>
> > Hi,
> > I need to update the doc to for this. I'll send a PR soon.
> > In the meantime you can look at the tests:
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/test/java/org/apache/drill/TestSelectWithOption.java
> > Basically there is one type for each Format plugin.
> > It look at the classes that implement FormatPluginConfig just like for
> the
> > json based configuration:
> >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/logical/src/main/java/org/apache/drill/common/logical/FormatPluginConfig.java
> >
> > For example for the "text" format:
> >
> >
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L135
> >
> > the type is "text" as defined by the annotation:@JsonTypeName("text")
> > the available parameters are the same fields as in the json conf with the
> > same defaults:
> >     public String lineDelimiter = "\n";
> >     public char fieldDelimiter = '\n';
> >     public char quote = '"';
> >     public char escape = '"';
> >     public char comment = '#';
> >     public boolean skipFirstLine = false;
> >     public boolean extractHeader = false;
> >
> > On Thu, Dec 3, 2015 at 1:12 PM, Jason Altekruse <
> altekrusejason@gmail.com>
> > wrote:
> >
> >> I don't think we have anything posted right now, it was just merged last
> >> week.
> >>
> >> Julien,
> >> Did you have something written for a short bit of documentation on the
> >> functionality and any current limitations?
> >>
> >> - Jason
> >>
> >> On Thu, Dec 3, 2015 at 12:16 PM, Abdel Hakim Deneche <
> >> adeneche@maprtech.com>
> >> wrote:
> >>
> >> > I didn't notice select with options is already available !!! did we
> add
> >> it
> >> > to the documentation ?
> >> >
> >> > On Thu, Dec 3, 2015 at 12:05 PM, Jason Altekruse <
> >> altekrusejason@gmail.com
> >> > > wrote:
> >> >
> >> >> Yes!
> >> >>
> >> >> Thanks to the new feature "select with options" it is possible to
> >> >> configure
> >> >> the text reader to have query specific options. You will need to
> build
> >> the
> >> >> tip of master or use the soon to be posted release candidate for 1.4
> to
> >> >> use
> >> >> the feature.
> >> >>
> >> >> select a, b from table(dfs.`path/to/data.csv`(type => 'text',
> >> >> fieldDelimiter => ',', extractHeader => true))
> >> >>
> >> >> On Thu, Dec 3, 2015 at 11:51 AM, John Omernik <jo...@omernik.com>
> >> wrote:
> >> >>
> >> >> > I can't reproduce, so I must have done something wrong the first
> >> time,
> >> >> > thank you for replying.
> >> >> >
> >> >> > Is there away to select from a csv directory with extract header
> for
> >> >> only
> >> >> > that query or table?  (Options?)
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <
> >> >> > adeneche@maprtech.com>
> >> >> > wrote:
> >> >> >
> >> >> > > Hey John,
> >> >> > >
> >> >> > > What do you get when you run "select * from sys.version" ?
> >> >> > >
> >> >> > > extractHeader is false by default, so you need to explicitly set
> >> it to
> >> >> > > true.
> >> >> > >
> >> >> > > can you post your storage plugin configuration ?
> >> >> > >
> >> >> > > Thanks
> >> >> > >
> >> >> > > On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <jo...@omernik.com>
> >> >> wrote:
> >> >> > >
> >> >> > > > Hey all,
> >> >> > > >
> >> >> > > > Per my comment on
> >> https://issues.apache.org/jira/browse/DRILL-4145,
> >> >> I
> >> >> > > am
> >> >> > > > curious on why a CSV query (I am assuming with a default
> >> >> configuration,
> >> >> > > but
> >> >> > > > I have asked the question) in S3 would interpret differently
> >> than a
> >> >> CSV
> >> >> > > > query in MaprFS.
> >> >> > > >
> >> >> > > > Per the other user, they are using Drill 1.3, and I am as well
> >> (per
> >> >> the
> >> >> > > > MapR folks, I am using a Dev release version from MapR that has
> >> the
> >> >> > > Office
> >> >> > > > 1.3 release code base)
> >> >> > > >
> >> >> > > > Basically, The query from the JIRA author showed the CSV file
> >> being
> >> >> > > > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the headers
> >> and
> >> >> the
> >> >> > > > results broken out into columns. When I did this on the same
> >> data, I
> >> >> > got
> >> >> > > > one results, columns and an array of data.
> >> >> > > >
> >> >> > > > I tried setting extractHeader: true (what is the default on
> this
> >> >> > setting)
> >> >> > > > and that had no effect. (After I update a storage plugin, what
> >> do I
> >> >> > need
> >> >> > > to
> >> >> > > > do to ensure I see the effect in my SQL line session? DO I need
> >> to
> >> >> > > > reconnect?  Basically I set the storage plugin, got the
> "success"
> >> >> then
> >> >> > > > changed to a difference schema and then back to my original
> >> schema
> >> >> and
> >> >> > > saw
> >> >> > > > no effect... should I reconnect or is that not needed?)
> >> >> > > >
> >> >> > > > Just curious on why we'd see different ways to read CSV files,
> >> the
> >> >> S3
> >> >> > vs.
> >> >> > > > MapRFS shouldn't be different... or am I missing something?
> >> >> > > >
> >> >> > > > Thanks!
> >> >> > > >
> >> >> > > > John
> >> >> > > >
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > --
> >> >> > >
> >> >> > > Abdelhakim Deneche
> >> >> > >
> >> >> > > Software Engineer
> >> >> > >
> >> >> > >   <http://www.mapr.com/>
> >> >> > >
> >> >> > >
> >> >> > > Now Available - Free Hadoop On-Demand Training
> >> >> > > <
> >> >> > >
> >> >> >
> >> >>
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Abdelhakim Deneche
> >> >
> >> > Software Engineer
> >> >
> >> >   <http://www.mapr.com/>
> >> >
> >> >
> >> > Now Available - Free Hadoop On-Demand Training
> >> > <
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> >
> >> >
> >>
> >
> >
> >
> > --
> > Julien
> >
>
>
>
> --
> Julien
>

Re: CSV Reader on 1.3

Posted by Julien Le Dem <ju...@dremio.com>.

Here's a PR for the doc:
https://github.com/apache/drill/pull/290

On Thu, Dec 3, 2015 at 3:25 PM, Julien Le Dem <ju...@dremio.com> wrote:

> Hi,
> I need to update the doc to for this. I'll send a PR soon.
> In the meantime you can look at the tests:
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/test/java/org/apache/drill/TestSelectWithOption.java
> Basically there is one type for each Format plugin.
> It look at the classes that implement FormatPluginConfig just like for the
> json based configuration:
>
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/logical/src/main/java/org/apache/drill/common/logical/FormatPluginConfig.java
>
> For example for the "text" format:
>
> https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L135
>
> the type is "text" as defined by the annotation:@JsonTypeName("text")
> the available parameters are the same fields as in the json conf with the
> same defaults:
>     public String lineDelimiter = "\n";
>     public char fieldDelimiter = '\n';
>     public char quote = '"';
>     public char escape = '"';
>     public char comment = '#';
>     public boolean skipFirstLine = false;
>     public boolean extractHeader = false;
>
> On Thu, Dec 3, 2015 at 1:12 PM, Jason Altekruse <al...@gmail.com>
> wrote:
>
>> I don't think we have anything posted right now, it was just merged last
>> week.
>>
>> Julien,
>> Did you have something written for a short bit of documentation on the
>> functionality and any current limitations?
>>
>> - Jason
>>
>> On Thu, Dec 3, 2015 at 12:16 PM, Abdel Hakim Deneche <
>> adeneche@maprtech.com>
>> wrote:
>>
>> > I didn't notice select with options is already available !!! did we add
>> it
>> > to the documentation ?
>> >
>> > On Thu, Dec 3, 2015 at 12:05 PM, Jason Altekruse <
>> altekrusejason@gmail.com
>> > > wrote:
>> >
>> >> Yes!
>> >>
>> >> Thanks to the new feature "select with options" it is possible to
>> >> configure
>> >> the text reader to have query specific options. You will need to build
>> the
>> >> tip of master or use the soon to be posted release candidate for 1.4 to
>> >> use
>> >> the feature.
>> >>
>> >> select a, b from table(dfs.`path/to/data.csv`(type => 'text',
>> >> fieldDelimiter => ',', extractHeader => true))
>> >>
>> >> On Thu, Dec 3, 2015 at 11:51 AM, John Omernik <jo...@omernik.com>
>> wrote:
>> >>
>> >> > I can't reproduce, so I must have done something wrong the first
>> time,
>> >> > thank you for replying.
>> >> >
>> >> > Is there away to select from a csv directory with extract header for
>> >> only
>> >> > that query or table?  (Options?)
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <
>> >> > adeneche@maprtech.com>
>> >> > wrote:
>> >> >
>> >> > > Hey John,
>> >> > >
>> >> > > What do you get when you run "select * from sys.version" ?
>> >> > >
>> >> > > extractHeader is false by default, so you need to explicitly set
>> it to
>> >> > > true.
>> >> > >
>> >> > > can you post your storage plugin configuration ?
>> >> > >
>> >> > > Thanks
>> >> > >
>> >> > > On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <jo...@omernik.com>
>> >> wrote:
>> >> > >
>> >> > > > Hey all,
>> >> > > >
>> >> > > > Per my comment on
>> https://issues.apache.org/jira/browse/DRILL-4145,
>> >> I
>> >> > > am
>> >> > > > curious on why a CSV query (I am assuming with a default
>> >> configuration,
>> >> > > but
>> >> > > > I have asked the question) in S3 would interpret differently
>> than a
>> >> CSV
>> >> > > > query in MaprFS.
>> >> > > >
>> >> > > > Per the other user, they are using Drill 1.3, and I am as well
>> (per
>> >> the
>> >> > > > MapR folks, I am using a Dev release version from MapR that has
>> the
>> >> > > Office
>> >> > > > 1.3 release code base)
>> >> > > >
>> >> > > > Basically, The query from the JIRA author showed the CSV file
>> being
>> >> > > > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the headers
>> and
>> >> the
>> >> > > > results broken out into columns. When I did this on the same
>> data, I
>> >> > got
>> >> > > > one results, columns and an array of data.
>> >> > > >
>> >> > > > I tried setting extractHeader: true (what is the default on this
>> >> > setting)
>> >> > > > and that had no effect. (After I update a storage plugin, what
>> do I
>> >> > need
>> >> > > to
>> >> > > > do to ensure I see the effect in my SQL line session? DO I need
>> to
>> >> > > > reconnect?  Basically I set the storage plugin, got the "success"
>> >> then
>> >> > > > changed to a difference schema and then back to my original
>> schema
>> >> and
>> >> > > saw
>> >> > > > no effect... should I reconnect or is that not needed?)
>> >> > > >
>> >> > > > Just curious on why we'd see different ways to read CSV files,
>> the
>> >> S3
>> >> > vs.
>> >> > > > MapRFS shouldn't be different... or am I missing something?
>> >> > > >
>> >> > > > Thanks!
>> >> > > >
>> >> > > > John
>> >> > > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > >
>> >> > > Abdelhakim Deneche
>> >> > >
>> >> > > Software Engineer
>> >> > >
>> >> > >   <http://www.mapr.com/>
>> >> > >
>> >> > >
>> >> > > Now Available - Free Hadoop On-Demand Training
>> >> > > <
>> >> > >
>> >> >
>> >>
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > Abdelhakim Deneche
>> >
>> > Software Engineer
>> >
>> >   <http://www.mapr.com/>
>> >
>> >
>> > Now Available - Free Hadoop On-Demand Training
>> > <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >
>> >
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: CSV Reader on 1.3

Posted by Julien Le Dem <ju...@dremio.com>.

Hi,
I need to update the doc to for this. I'll send a PR soon.
In the meantime you can look at the tests:
https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/test/java/org/apache/drill/TestSelectWithOption.java
Basically there is one type for each Format plugin.
It look at the classes that implement FormatPluginConfig just like for the
json based configuration:
https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/logical/src/main/java/org/apache/drill/common/logical/FormatPluginConfig.java

For example for the "text" format:
https://github.com/apache/drill/blob/d855906b95d4182a93af936c4e16888a770039b5/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L135

the type is "text" as defined by the annotation:@JsonTypeName("text")
the available parameters are the same fields as in the json conf with the
same defaults:
    public String lineDelimiter = "\n";
    public char fieldDelimiter = '\n';
    public char quote = '"';
    public char escape = '"';
    public char comment = '#';
    public boolean skipFirstLine = false;
    public boolean extractHeader = false;

On Thu, Dec 3, 2015 at 1:12 PM, Jason Altekruse <al...@gmail.com>
wrote:

> I don't think we have anything posted right now, it was just merged last
> week.
>
> Julien,
> Did you have something written for a short bit of documentation on the
> functionality and any current limitations?
>
> - Jason
>
> On Thu, Dec 3, 2015 at 12:16 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > I didn't notice select with options is already available !!! did we add
> it
> > to the documentation ?
> >
> > On Thu, Dec 3, 2015 at 12:05 PM, Jason Altekruse <
> altekrusejason@gmail.com
> > > wrote:
> >
> >> Yes!
> >>
> >> Thanks to the new feature "select with options" it is possible to
> >> configure
> >> the text reader to have query specific options. You will need to build
> the
> >> tip of master or use the soon to be posted release candidate for 1.4 to
> >> use
> >> the feature.
> >>
> >> select a, b from table(dfs.`path/to/data.csv`(type => 'text',
> >> fieldDelimiter => ',', extractHeader => true))
> >>
> >> On Thu, Dec 3, 2015 at 11:51 AM, John Omernik <jo...@omernik.com> wrote:
> >>
> >> > I can't reproduce, so I must have done something wrong the first time,
> >> > thank you for replying.
> >> >
> >> > Is there away to select from a csv directory with extract header for
> >> only
> >> > that query or table?  (Options?)
> >> >
> >> >
> >> >
> >> > On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <
> >> > adeneche@maprtech.com>
> >> > wrote:
> >> >
> >> > > Hey John,
> >> > >
> >> > > What do you get when you run "select * from sys.version" ?
> >> > >
> >> > > extractHeader is false by default, so you need to explicitly set it
> to
> >> > > true.
> >> > >
> >> > > can you post your storage plugin configuration ?
> >> > >
> >> > > Thanks
> >> > >
> >> > > On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <jo...@omernik.com>
> >> wrote:
> >> > >
> >> > > > Hey all,
> >> > > >
> >> > > > Per my comment on
> https://issues.apache.org/jira/browse/DRILL-4145,
> >> I
> >> > > am
> >> > > > curious on why a CSV query (I am assuming with a default
> >> configuration,
> >> > > but
> >> > > > I have asked the question) in S3 would interpret differently than
> a
> >> CSV
> >> > > > query in MaprFS.
> >> > > >
> >> > > > Per the other user, they are using Drill 1.3, and I am as well
> (per
> >> the
> >> > > > MapR folks, I am using a Dev release version from MapR that has
> the
> >> > > Office
> >> > > > 1.3 release code base)
> >> > > >
> >> > > > Basically, The query from the JIRA author showed the CSV file
> being
> >> > > > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the headers
> and
> >> the
> >> > > > results broken out into columns. When I did this on the same
> data, I
> >> > got
> >> > > > one results, columns and an array of data.
> >> > > >
> >> > > > I tried setting extractHeader: true (what is the default on this
> >> > setting)
> >> > > > and that had no effect. (After I update a storage plugin, what do
> I
> >> > need
> >> > > to
> >> > > > do to ensure I see the effect in my SQL line session? DO I need to
> >> > > > reconnect?  Basically I set the storage plugin, got the "success"
> >> then
> >> > > > changed to a difference schema and then back to my original schema
> >> and
> >> > > saw
> >> > > > no effect... should I reconnect or is that not needed?)
> >> > > >
> >> > > > Just curious on why we'd see different ways to read CSV files, the
> >> S3
> >> > vs.
> >> > > > MapRFS shouldn't be different... or am I missing something?
> >> > > >
> >> > > > Thanks!
> >> > > >
> >> > > > John
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > >
> >> > > Abdelhakim Deneche
> >> > >
> >> > > Software Engineer
> >> > >
> >> > >   <http://www.mapr.com/>
> >> > >
> >> > >
> >> > > Now Available - Free Hadoop On-Demand Training
> >> > > <
> >> > >
> >> >
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> > > >
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
> >
>



-- 
Julien

Re: CSV Reader on 1.3

Posted by Jason Altekruse <al...@gmail.com>.

I don't think we have anything posted right now, it was just merged last
week.

Julien,
Did you have something written for a short bit of documentation on the
functionality and any current limitations?

- Jason

On Thu, Dec 3, 2015 at 12:16 PM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> I didn't notice select with options is already available !!! did we add it
> to the documentation ?
>
> On Thu, Dec 3, 2015 at 12:05 PM, Jason Altekruse <altekrusejason@gmail.com
> > wrote:
>
>> Yes!
>>
>> Thanks to the new feature "select with options" it is possible to
>> configure
>> the text reader to have query specific options. You will need to build the
>> tip of master or use the soon to be posted release candidate for 1.4 to
>> use
>> the feature.
>>
>> select a, b from table(dfs.`path/to/data.csv`(type => 'text',
>> fieldDelimiter => ',', extractHeader => true))
>>
>> On Thu, Dec 3, 2015 at 11:51 AM, John Omernik <jo...@omernik.com> wrote:
>>
>> > I can't reproduce, so I must have done something wrong the first time,
>> > thank you for replying.
>> >
>> > Is there away to select from a csv directory with extract header for
>> only
>> > that query or table?  (Options?)
>> >
>> >
>> >
>> > On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <
>> > adeneche@maprtech.com>
>> > wrote:
>> >
>> > > Hey John,
>> > >
>> > > What do you get when you run "select * from sys.version" ?
>> > >
>> > > extractHeader is false by default, so you need to explicitly set it to
>> > > true.
>> > >
>> > > can you post your storage plugin configuration ?
>> > >
>> > > Thanks
>> > >
>> > > On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <jo...@omernik.com>
>> wrote:
>> > >
>> > > > Hey all,
>> > > >
>> > > > Per my comment on https://issues.apache.org/jira/browse/DRILL-4145,
>> I
>> > > am
>> > > > curious on why a CSV query (I am assuming with a default
>> configuration,
>> > > but
>> > > > I have asked the question) in S3 would interpret differently than a
>> CSV
>> > > > query in MaprFS.
>> > > >
>> > > > Per the other user, they are using Drill 1.3, and I am as well (per
>> the
>> > > > MapR folks, I am using a Dev release version from MapR that has the
>> > > Office
>> > > > 1.3 release code base)
>> > > >
>> > > > Basically, The query from the JIRA author showed the CSV file being
>> > > > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the headers and
>> the
>> > > > results broken out into columns. When I did this on the same data, I
>> > got
>> > > > one results, columns and an array of data.
>> > > >
>> > > > I tried setting extractHeader: true (what is the default on this
>> > setting)
>> > > > and that had no effect. (After I update a storage plugin, what do I
>> > need
>> > > to
>> > > > do to ensure I see the effect in my SQL line session? DO I need to
>> > > > reconnect?  Basically I set the storage plugin, got the "success"
>> then
>> > > > changed to a difference schema and then back to my original schema
>> and
>> > > saw
>> > > > no effect... should I reconnect or is that not needed?)
>> > > >
>> > > > Just curious on why we'd see different ways to read CSV files, the
>> S3
>> > vs.
>> > > > MapRFS shouldn't be different... or am I missing something?
>> > > >
>> > > > Thanks!
>> > > >
>> > > > John
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > Abdelhakim Deneche
>> > >
>> > > Software Engineer
>> > >
>> > >   <http://www.mapr.com/>
>> > >
>> > >
>> > > Now Available - Free Hadoop On-Demand Training
>> > > <
>> > >
>> >
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> > > >
>> > >
>> >
>>
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
>

Re: CSV Reader on 1.3

Posted by Jason Altekruse <al...@gmail.com>.

Yes!

Thanks to the new feature "select with options" it is possible to configure
the text reader to have query specific options. You will need to build the
tip of master or use the soon to be posted release candidate for 1.4 to use
the feature.

select a, b from table(dfs.`path/to/data.csv`(type => 'text',
fieldDelimiter => ',', extractHeader => true))

On Thu, Dec 3, 2015 at 11:51 AM, John Omernik <jo...@omernik.com> wrote:

> I can't reproduce, so I must have done something wrong the first time,
> thank you for replying.
>
> Is there away to select from a csv directory with extract header for only
> that query or table?  (Options?)
>
>
>
> On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > Hey John,
> >
> > What do you get when you run "select * from sys.version" ?
> >
> > extractHeader is false by default, so you need to explicitly set it to
> > true.
> >
> > can you post your storage plugin configuration ?
> >
> > Thanks
> >
> > On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <jo...@omernik.com> wrote:
> >
> > > Hey all,
> > >
> > > Per my comment on https://issues.apache.org/jira/browse/DRILL-4145,  I
> > am
> > > curious on why a CSV query (I am assuming with a default configuration,
> > but
> > > I have asked the question) in S3 would interpret differently than a CSV
> > > query in MaprFS.
> > >
> > > Per the other user, they are using Drill 1.3, and I am as well (per the
> > > MapR folks, I am using a Dev release version from MapR that has the
> > Office
> > > 1.3 release code base)
> > >
> > > Basically, The query from the JIRA author showed the CSV file being
> > > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the headers and the
> > > results broken out into columns. When I did this on the same data, I
> got
> > > one results, columns and an array of data.
> > >
> > > I tried setting extractHeader: true (what is the default on this
> setting)
> > > and that had no effect. (After I update a storage plugin, what do I
> need
> > to
> > > do to ensure I see the effect in my SQL line session? DO I need to
> > > reconnect?  Basically I set the storage plugin, got the "success" then
> > > changed to a difference schema and then back to my original schema and
> > saw
> > > no effect... should I reconnect or is that not needed?)
> > >
> > > Just curious on why we'd see different ways to read CSV files, the S3
> vs.
> > > MapRFS shouldn't be different... or am I missing something?
> > >
> > > Thanks!
> > >
> > > John
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>

Re: CSV Reader on 1.3

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.

I don't think you can for now. There is an ongoing effort to add select
with options but I don't know the current progress on that

On Thu, Dec 3, 2015 at 11:51 AM, John Omernik <jo...@omernik.com> wrote:

> I can't reproduce, so I must have done something wrong the first time,
> thank you for replying.
>
> Is there away to select from a csv directory with extract header for only
> that query or table?  (Options?)
>
>
>
> On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > Hey John,
> >
> > What do you get when you run "select * from sys.version" ?
> >
> > extractHeader is false by default, so you need to explicitly set it to
> > true.
> >
> > can you post your storage plugin configuration ?
> >
> > Thanks
> >
> > On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <jo...@omernik.com> wrote:
> >
> > > Hey all,
> > >
> > > Per my comment on https://issues.apache.org/jira/browse/DRILL-4145,  I
> > am
> > > curious on why a CSV query (I am assuming with a default configuration,
> > but
> > > I have asked the question) in S3 would interpret differently than a CSV
> > > query in MaprFS.
> > >
> > > Per the other user, they are using Drill 1.3, and I am as well (per the
> > > MapR folks, I am using a Dev release version from MapR that has the
> > Office
> > > 1.3 release code base)
> > >
> > > Basically, The query from the JIRA author showed the CSV file being
> > > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the headers and the
> > > results broken out into columns. When I did this on the same data, I
> got
> > > one results, columns and an array of data.
> > >
> > > I tried setting extractHeader: true (what is the default on this
> setting)
> > > and that had no effect. (After I update a storage plugin, what do I
> need
> > to
> > > do to ensure I see the effect in my SQL line session? DO I need to
> > > reconnect?  Basically I set the storage plugin, got the "success" then
> > > changed to a difference schema and then back to my original schema and
> > saw
> > > no effect... should I reconnect or is that not needed?)
> > >
> > > Just curious on why we'd see different ways to read CSV files, the S3
> vs.
> > > MapRFS shouldn't be different... or am I missing something?
> > >
> > > Thanks!
> > >
> > > John
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: CSV Reader on 1.3

Posted by John Omernik <jo...@omernik.com>.

I can't reproduce, so I must have done something wrong the first time,
thank you for replying.

Is there away to select from a csv directory with extract header for only
that query or table?  (Options?)



On Wed, Dec 2, 2015 at 11:56 AM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> Hey John,
>
> What do you get when you run "select * from sys.version" ?
>
> extractHeader is false by default, so you need to explicitly set it to
> true.
>
> can you post your storage plugin configuration ?
>
> Thanks
>
> On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <jo...@omernik.com> wrote:
>
> > Hey all,
> >
> > Per my comment on https://issues.apache.org/jira/browse/DRILL-4145,  I
> am
> > curious on why a CSV query (I am assuming with a default configuration,
> but
> > I have asked the question) in S3 would interpret differently than a CSV
> > query in MaprFS.
> >
> > Per the other user, they are using Drill 1.3, and I am as well (per the
> > MapR folks, I am using a Dev release version from MapR that has the
> Office
> > 1.3 release code base)
> >
> > Basically, The query from the JIRA author showed the CSV file being
> > interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the headers and the
> > results broken out into columns. When I did this on the same data, I got
> > one results, columns and an array of data.
> >
> > I tried setting extractHeader: true (what is the default on this setting)
> > and that had no effect. (After I update a storage plugin, what do I need
> to
> > do to ensure I see the effect in my SQL line session? DO I need to
> > reconnect?  Basically I set the storage plugin, got the "success" then
> > changed to a difference schema and then back to my original schema and
> saw
> > no effect... should I reconnect or is that not needed?)
> >
> > Just curious on why we'd see different ways to read CSV files, the S3 vs.
> > MapRFS shouldn't be different... or am I missing something?
> >
> > Thanks!
> >
> > John
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: CSV Reader on 1.3

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.

Hey John,

What do you get when you run "select * from sys.version" ?

extractHeader is false by default, so you need to explicitly set it to true.

can you post your storage plugin configuration ?

Thanks

On Tue, Dec 1, 2015 at 6:04 AM, John Omernik <jo...@omernik.com> wrote:

> Hey all,
>
> Per my comment on https://issues.apache.org/jira/browse/DRILL-4145,  I am
> curious on why a CSV query (I am assuming with a default configuration, but
> I have asked the question) in S3 would interpret differently than a CSV
> query in MaprFS.
>
> Per the other user, they are using Drill 1.3, and I am as well (per the
> MapR folks, I am using a Dev release version from MapR that has the Office
> 1.3 release code base)
>
> Basically, The query from the JIRA author showed the CSV file being
> interpreted, i.e. the "FIELD_1", "FIELD_2" etc were the headers and the
> results broken out into columns. When I did this on the same data, I got
> one results, columns and an array of data.
>
> I tried setting extractHeader: true (what is the default on this setting)
> and that had no effect. (After I update a storage plugin, what do I need to
> do to ensure I see the effect in my SQL line session? DO I need to
> reconnect?  Basically I set the storage plugin, got the "success" then
> changed to a difference schema and then back to my original schema and saw
> no effect... should I reconnect or is that not needed?)
>
> Just curious on why we'd see different ways to read CSV files, the S3 vs.
> MapRFS shouldn't be different... or am I missing something?
>
> Thanks!
>
> John
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>