You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Scott Kinney <sc...@stem.com> on 2016/07/01 16:22:55 UTC

Re: gzipped json files not named .json.gz

Hi Jason, 
Thanks for getting back to me. We were able to get the spark job to append the .json.gz so we are ok for now. 
I tried working with local files of json. Drill will not query it if it's not named .json. I didn't try gzipped. But since we got them renamed in s3 I'm out of the woods.

thanks!


________________________________
Scott Kinney | DevOps
stem   |   m  510.282.1299
100 Rollins Road, Millbrae, California 94030

This e-mail and/or any attachments contain Stem, Inc. confidential and proprietary information and material for the sole use of the intended recipient(s). Any review, use or distribution that has not been expressly authorized by Stem, Inc. is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. Thank you.

________________________________________
From: Jason Altekruse <ja...@dremio.com>
Sent: Tuesday, June 28, 2016 3:05 PM
To: user
Subject: Re: gzipped json files not named .json.gz

Hi Scott,

From some quick testing, setting the defaultInputFormat to "json" appears
to be working as it was designed. It is true that we have the limitation of
relying entirely on extensions for detecting compression of text and json
files.

I am able to read all of these files in a workspace with JSON set as the
default format. Were you not seeing this behavior?

a        a.gz        a.json        a.json.gz

We could consider adding default compression as an option in a workspace,
but are you really unable to move the files? It seems like the best option
might be to just rename, as I would think other tools would have trouble
reading these as well.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Tue, Jun 28, 2016 at 2:48 PM, Parth Chandra <pc...@maprtech.com>
wrote:

> Yes, I believe that would work if the file is not compressed.
>
> On Tue, Jun 28, 2016 at 12:01 PM, Scott Kinney <sc...@stem.com>
> wrote:
>
> > Well that's a bummer but I believe it setting "defaultInputFormat":
> "json"
> > doesn't seem to have any effect.
> >
> >
> > ________________________________
> > Scott Kinney | DevOps
> > stem   |   m  510.282.1299
> > 100 Rollins Road, Millbrae, California 94030
> >
> > This e-mail and/or any attachments contain Stem, Inc. confidential and
> > proprietary information and material for the sole use of the intended
> > recipient(s). Any review, use or distribution that has not been expressly
> > authorized by Stem, Inc. is strictly prohibited. If you are not the
> > intended recipient, please contact the sender and delete all copies.
> Thank
> > you.
> >
> > ________________________________________
> > From: Parth Chandra <pc...@maprtech.com>
> > Sent: Tuesday, June 28, 2016 11:36 AM
> > To: user@drill.apache.org
> > Subject: Re: gzipped json files not named .json.gz
> >
> > Hi Scott,
> >
> >   Unlikely that this will work without the extension. Drill uses Hadoop's
> > CompressionCodecFactory class [1] that infers the compression type from
> the
> > extension.
> >
> > Parth
> >
> > [1]
> >
> >
> https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/io/compress/CompressionCodecFactory.html#getCodec(org.apache.hadoop.fs.Path)
> >
> > On Tue, Jun 28, 2016 at 8:47 AM, Scott Kinney <sc...@stem.com>
> > wrote:
> >
> > > Can I have drill open gzipped json files who's names do not end in
> > > .json.gz?
> > >
> > > We have a spark job generating these files and it just dosn't want to
> > > change the name or append the .json.gz.
> > >
> > > ?
> > >
> > >
> > > ________________________________
> > > Scott Kinney | DevOps
> > > stem <http://www.stem.com/>   |   m  510.282.1299
> > > 100 Rollins Road, Millbrae, California 94030
> > >
> > > This e-mail and/or any attachments contain Stem, Inc. confidential and
> > > proprietary information and material for the sole use of the intended
> > > recipient(s). Any review, use or distribution that has not been
> expressly
> > > authorized by Stem, Inc. is strictly prohibited. If you are not the
> > > intended recipient, please contact the sender and delete all copies.
> > Thank
> > > you.
> > >
> >
>