You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Liwen Sun <li...@databricks.com> on 2019/06/19 19:03:57 UTC

Announcing Delta Lake 0.2.0

We are delighted to announce the availability of Delta Lake 0.2.0!

To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
https://docs.delta.io/0.2.0/quick-start.html

To view the release notes:
https://github.com/delta-io/delta/releases/tag/v0.2.0

This release introduces two main features:

*Cloud storage support*
In addition to HDFS, you can now configure Delta Lake to read and write
data on cloud storage services such as Amazon S3 and Azure Blob Storage.
For configuration instructions, please see:
https://docs.delta.io/0.2.0/delta-storage.html

*Improved concurrency*
Delta Lake now allows concurrent append-only writes while still ensuring
serializability. For concurrency control in Delta Lake, please see:
https://docs.delta.io/0.2.0/delta-concurrency.html

We have also greatly expanded the test coverage as part of this release.

We would like to acknowledge all community members for contributing to this
release.

Best regards,
Liwen Sun

Re: Announcing Delta Lake 0.2.0

Posted by Gourav Sengupta <go...@gmail.com>.
Hi Liwen,

thanks a ton,  I think that there is a difference between a storage class
and metastore, just like there is a difference between a database and file
system and coffee and cup.

It will be wonderful to keep the focus on the fantastic opportunity that
Delta creates for us :)

Regards,
Gourav Sengupta

On Fri, Jun 21, 2019 at 2:05 AM Liwen Sun <li...@databricks.com> wrote:

> Hi James,
>
> Right now we don't have plans for having a catalog component as part of
> Delta Lake, but we are looking to support Hive metastore and also DDL
> commands in the near future.
>
> Thanks,
> Liwen
>
> On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <
> jamescotrotsios@gmail.com> wrote:
>
>> Is there a plan to have a business catalog component for the Data Lake?
>> If not how would someone make a proposal to create an open source project
>> related to that. I would be interested in building out an open source data
>> catalog that would use the Hive metadata store as a baseline for technical
>> metadata.
>>
>>
>> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <li...@databricks.com>
>> wrote:
>>
>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>
>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>> https://docs.delta.io/0.2.0/quick-start.html
>>>
>>> To view the release notes:
>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>
>>> This release introduces two main features:
>>>
>>> *Cloud storage support*
>>> In addition to HDFS, you can now configure Delta Lake to read and write
>>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>>> For configuration instructions, please see:
>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>
>>> *Improved concurrency*
>>> Delta Lake now allows concurrent append-only writes while still ensuring
>>> serializability. For concurrency control in Delta Lake, please see:
>>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>>
>>> We have also greatly expanded the test coverage as part of this release.
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this release.
>>>
>>> Best regards,
>>> Liwen Sun
>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "Delta Lake Users and Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to delta-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/delta-users/CAE4dWq-4rC8n5OXuB7NRfDhY4ZLwC8w20cLf7wbktvLKWotHow%40mail.gmail.com
> <https://groups.google.com/d/msgid/delta-users/CAE4dWq-4rC8n5OXuB7NRfDhY4ZLwC8w20cLf7wbktvLKWotHow%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

Re: Announcing Delta Lake 0.2.0

Posted by Michael Armbrust <mi...@databricks.com>.
>
> Thanks for confirmation. We are using the workaround to create a separate
> Hive external table STORED AS PARQUET with the exact location of Delta
> table. Our use case is batch-driven and we are running VACUUM with 0
> retention after every batch is completed. Do you see any potential problem
> with this workaround, other than during the time when the batch is running
> the table can provide some wrong information?
>

This is a reasonable workaround to allow other systems to read Delta
tables. Another consideration is that if you are running on S3, eventual
consistency my increase the amount of time before external readers see a
consistent view. Also note, that this prevents you from using time travel.

In the near future, I think we should also support generating manifest
files that list the data files in the most recent version of the Delta
table (see #76 <https://github.com/delta-io/delta/issues/76> for details).
This will give support for Presto, though Hive would require some
additional modifications on the Hive side (if there are any Hive
contributors / committers on this list let me know!).

In the longer term, we are talking with authors of other engines to build
native support for reading the Delta transaction log (e.g. this
announcement from Starburst
<https://www.starburstdata.com/technical-blog/starburst-presto-databricks-delta-lake-support/>).
Please contact me if you are interested in contributing here!

Re: Announcing Delta Lake 0.2.0

Posted by ayan guha <gu...@gmail.com>.
Hi

Thanks for confirmation. We are using the workaround to create a separate
Hive external table STORED AS PARQUET with the exact location of Delta
table. Our use case is batch-driven and we are running VACUUM with 0
retention after every batch is completed. Do you see any potential problem
with this workaround, other than during the time when the batch is running
the table can provide some wrong information?

Best
Ayan

On Fri, Jun 21, 2019 at 8:03 PM Tathagata Das <ta...@gmail.com>
wrote:

> @ayan guha <gu...@gmail.com> @Gourav Sengupta
> <go...@gmail.com>
> Delta Lake is OSS currently does not support defining tables in Hive
> metastore using DDL commands. We are hoping to add the necessary
> compatibility fixes in Apache Spark to make Delta Lake work with tables and
> DDL commands. So we will support them in a future release. In the meantime,
> please read/write Delta tables using paths.
>
> TD
>
> On Fri, Jun 21, 2019 at 12:49 AM Gourav Sengupta <
> gourav.sengupta@gmail.com> wrote:
>
>> Hi Ayan,
>>
>> I may be wrong about this, but I think that Delta files are in Parquet
>> format. But I am sure that you have already checked this. Am I missing
>> something?
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Fri, Jun 21, 2019 at 6:39 AM ayan guha <gu...@gmail.com> wrote:
>>
>>> Hi
>>> We used spark.sql to create a table using DELTA. We also have a hive
>>> metastore attached to the spark session. Hence, a table gets created in
>>> Hive metastore. We then tried to query the table from Hive. We faced
>>> following issues:
>>>
>>>    1. SERDE is SequenceFile, should have been Parquet
>>>    2. Scema fields are not passed.
>>>
>>> Essentially the hive DDL looks like:
>>>
>>> *CREATE TABLE `TABLE NAME`(**  `col` array<string> COMMENT 'from
>>> deserializer')*
>>>
>>> *ROW FORMAT SERDE **
>>> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH
>>> SERDEPROPERTIES ( **  'path'=WASB PATH**')  **STORED AS INPUTFORMAT *
>>> *  'org.apache.hadoop.mapred.SequenceFileInputFormat'*
>>>
>>> *OUTPUTFORMAT **
>>> 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'  *
>>> *LOCATION **  '* *WASB PATH'*
>>>
>>> *TBLPROPERTIES ( **  'spark.sql.create.version'='2.4.0',**
>>> 'spark.sql.sources.provider'='DELTA',**
>>> 'spark.sql.sources.schema.numParts'='1',*
>>> *  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',**
>>> 'transient_lastDdlTime'='1556544657')*
>>>
>>> Is this expected? And will the use case be supported in future releases?
>>>
>>>
>>> We are now experimenting
>>>
>>> Best
>>>
>>> Ayan
>>>
>>> On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun <li...@databricks.com>
>>> wrote:
>>>
>>>> Hi James,
>>>>
>>>> Right now we don't have plans for having a catalog component as part of
>>>> Delta Lake, but we are looking to support Hive metastore and also DDL
>>>> commands in the near future.
>>>>
>>>> Thanks,
>>>> Liwen
>>>>
>>>> On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <
>>>> jamescotrotsios@gmail.com> wrote:
>>>>
>>>>> Is there a plan to have a business catalog component for the Data
>>>>> Lake? If not how would someone make a proposal to create an open source
>>>>> project related to that. I would be interested in building out an open
>>>>> source data catalog that would use the Hive metadata store as a baseline
>>>>> for technical metadata.
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <li...@databricks.com>
>>>>> wrote:
>>>>>
>>>>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>>>>
>>>>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>>>>> https://docs.delta.io/0.2.0/quick-start.html
>>>>>>
>>>>>> To view the release notes:
>>>>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>>>>
>>>>>> This release introduces two main features:
>>>>>>
>>>>>> *Cloud storage support*
>>>>>> In addition to HDFS, you can now configure Delta Lake to read and
>>>>>> write data on cloud storage services such as Amazon S3 and Azure Blob
>>>>>> Storage. For configuration instructions, please see:
>>>>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>>>>
>>>>>> *Improved concurrency*
>>>>>> Delta Lake now allows concurrent append-only writes while still
>>>>>> ensuring serializability. For concurrency control in Delta Lake, please
>>>>>> see: https://docs.delta.io/0.2.0/delta-concurrency.html
>>>>>>
>>>>>> We have also greatly expanded the test coverage as part of this
>>>>>> release.
>>>>>>
>>>>>> We would like to acknowledge all community members for contributing
>>>>>> to this release.
>>>>>>
>>>>>> Best regards,
>>>>>> Liwen Sun
>>>>>>
>>>>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>

-- 
Best Regards,
Ayan Guha

Re: Announcing Delta Lake 0.2.0

Posted by Tathagata Das <ta...@gmail.com>.
@ayan guha <gu...@gmail.com> @Gourav Sengupta
<go...@gmail.com>
Delta Lake is OSS currently does not support defining tables in Hive
metastore using DDL commands. We are hoping to add the necessary
compatibility fixes in Apache Spark to make Delta Lake work with tables and
DDL commands. So we will support them in a future release. In the meantime,
please read/write Delta tables using paths.

TD

On Fri, Jun 21, 2019 at 12:49 AM Gourav Sengupta <go...@gmail.com>
wrote:

> Hi Ayan,
>
> I may be wrong about this, but I think that Delta files are in Parquet
> format. But I am sure that you have already checked this. Am I missing
> something?
>
> Regards,
> Gourav Sengupta
>
> On Fri, Jun 21, 2019 at 6:39 AM ayan guha <gu...@gmail.com> wrote:
>
>> Hi
>> We used spark.sql to create a table using DELTA. We also have a hive
>> metastore attached to the spark session. Hence, a table gets created in
>> Hive metastore. We then tried to query the table from Hive. We faced
>> following issues:
>>
>>    1. SERDE is SequenceFile, should have been Parquet
>>    2. Scema fields are not passed.
>>
>> Essentially the hive DDL looks like:
>>
>> *CREATE TABLE `TABLE NAME`(**  `col` array<string> COMMENT 'from
>> deserializer')*
>>
>> *ROW FORMAT SERDE **
>> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH
>> SERDEPROPERTIES ( **  'path'=WASB PATH**')  **STORED AS INPUTFORMAT *
>> *  'org.apache.hadoop.mapred.SequenceFileInputFormat'*
>>
>> *OUTPUTFORMAT **
>> 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'  **LOCATION **
>> '* *WASB PATH'*
>>
>> *TBLPROPERTIES ( **  'spark.sql.create.version'='2.4.0',**
>> 'spark.sql.sources.provider'='DELTA',**
>> 'spark.sql.sources.schema.numParts'='1',*
>> *  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',**
>> 'transient_lastDdlTime'='1556544657')*
>>
>> Is this expected? And will the use case be supported in future releases?
>>
>>
>> We are now experimenting
>>
>> Best
>>
>> Ayan
>>
>> On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun <li...@databricks.com>
>> wrote:
>>
>>> Hi James,
>>>
>>> Right now we don't have plans for having a catalog component as part of
>>> Delta Lake, but we are looking to support Hive metastore and also DDL
>>> commands in the near future.
>>>
>>> Thanks,
>>> Liwen
>>>
>>> On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <
>>> jamescotrotsios@gmail.com> wrote:
>>>
>>>> Is there a plan to have a business catalog component for the Data Lake?
>>>> If not how would someone make a proposal to create an open source project
>>>> related to that. I would be interested in building out an open source data
>>>> catalog that would use the Hive metadata store as a baseline for technical
>>>> metadata.
>>>>
>>>>
>>>> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <li...@databricks.com>
>>>> wrote:
>>>>
>>>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>>>
>>>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>>>> https://docs.delta.io/0.2.0/quick-start.html
>>>>>
>>>>> To view the release notes:
>>>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>>>
>>>>> This release introduces two main features:
>>>>>
>>>>> *Cloud storage support*
>>>>> In addition to HDFS, you can now configure Delta Lake to read and
>>>>> write data on cloud storage services such as Amazon S3 and Azure Blob
>>>>> Storage. For configuration instructions, please see:
>>>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>>>
>>>>> *Improved concurrency*
>>>>> Delta Lake now allows concurrent append-only writes while still
>>>>> ensuring serializability. For concurrency control in Delta Lake, please
>>>>> see: https://docs.delta.io/0.2.0/delta-concurrency.html
>>>>>
>>>>> We have also greatly expanded the test coverage as part of this
>>>>> release.
>>>>>
>>>>> We would like to acknowledge all community members for contributing to
>>>>> this release.
>>>>>
>>>>> Best regards,
>>>>> Liwen Sun
>>>>>
>>>>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>

Re: Announcing Delta Lake 0.2.0

Posted by Gourav Sengupta <go...@gmail.com>.
Hi Ayan,

I may be wrong about this, but I think that Delta files are in Parquet
format. But I am sure that you have already checked this. Am I missing
something?

Regards,
Gourav Sengupta

On Fri, Jun 21, 2019 at 6:39 AM ayan guha <gu...@gmail.com> wrote:

> Hi
> We used spark.sql to create a table using DELTA. We also have a hive
> metastore attached to the spark session. Hence, a table gets created in
> Hive metastore. We then tried to query the table from Hive. We faced
> following issues:
>
>    1. SERDE is SequenceFile, should have been Parquet
>    2. Scema fields are not passed.
>
> Essentially the hive DDL looks like:
>
> *CREATE TABLE `TABLE NAME`(**  `col` array<string> COMMENT 'from
> deserializer')*
>
> *ROW FORMAT SERDE **
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH
> SERDEPROPERTIES ( **  'path'=WASB PATH**')  **STORED AS INPUTFORMAT *
> *  'org.apache.hadoop.mapred.SequenceFileInputFormat'*
>
> *OUTPUTFORMAT **
> 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'  **LOCATION **
> '* *WASB PATH'*
>
> *TBLPROPERTIES ( **  'spark.sql.create.version'='2.4.0',**
> 'spark.sql.sources.provider'='DELTA',**
> 'spark.sql.sources.schema.numParts'='1',*
> *  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',**
> 'transient_lastDdlTime'='1556544657')*
>
> Is this expected? And will the use case be supported in future releases?
>
>
> We are now experimenting
>
> Best
>
> Ayan
>
> On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun <li...@databricks.com>
> wrote:
>
>> Hi James,
>>
>> Right now we don't have plans for having a catalog component as part of
>> Delta Lake, but we are looking to support Hive metastore and also DDL
>> commands in the near future.
>>
>> Thanks,
>> Liwen
>>
>> On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <
>> jamescotrotsios@gmail.com> wrote:
>>
>>> Is there a plan to have a business catalog component for the Data Lake?
>>> If not how would someone make a proposal to create an open source project
>>> related to that. I would be interested in building out an open source data
>>> catalog that would use the Hive metadata store as a baseline for technical
>>> metadata.
>>>
>>>
>>> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <li...@databricks.com>
>>> wrote:
>>>
>>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>>
>>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>>> https://docs.delta.io/0.2.0/quick-start.html
>>>>
>>>> To view the release notes:
>>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>>
>>>> This release introduces two main features:
>>>>
>>>> *Cloud storage support*
>>>> In addition to HDFS, you can now configure Delta Lake to read and write
>>>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>>>> For configuration instructions, please see:
>>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>>
>>>> *Improved concurrency*
>>>> Delta Lake now allows concurrent append-only writes while still
>>>> ensuring serializability. For concurrency control in Delta Lake, please
>>>> see: https://docs.delta.io/0.2.0/delta-concurrency.html
>>>>
>>>> We have also greatly expanded the test coverage as part of this release.
>>>>
>>>> We would like to acknowledge all community members for contributing to
>>>> this release.
>>>>
>>>> Best regards,
>>>> Liwen Sun
>>>>
>>>>
>
> --
> Best Regards,
> Ayan Guha
>

Re: Announcing Delta Lake 0.2.0

Posted by ayan guha <gu...@gmail.com>.
Hi
We used spark.sql to create a table using DELTA. We also have a hive
metastore attached to the spark session. Hence, a table gets created in
Hive metastore. We then tried to query the table from Hive. We faced
following issues:

   1. SERDE is SequenceFile, should have been Parquet
   2. Scema fields are not passed.

Essentially the hive DDL looks like:

*CREATE TABLE `TABLE NAME`(**  `col` array<string> COMMENT 'from
deserializer')*

*ROW FORMAT SERDE **
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' **WITH
SERDEPROPERTIES ( **  'path'=WASB PATH**')  **STORED AS INPUTFORMAT *
*  'org.apache.hadoop.mapred.SequenceFileInputFormat'*

*OUTPUTFORMAT **
'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'  **LOCATION **
'* *WASB PATH'*

*TBLPROPERTIES ( **  'spark.sql.create.version'='2.4.0',**
'spark.sql.sources.provider'='DELTA',**
'spark.sql.sources.schema.numParts'='1',*
*  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[]}',**
'transient_lastDdlTime'='1556544657')*

Is this expected? And will the use case be supported in future releases?


We are now experimenting

Best

Ayan

On Fri, Jun 21, 2019 at 11:06 AM Liwen Sun <li...@databricks.com> wrote:

> Hi James,
>
> Right now we don't have plans for having a catalog component as part of
> Delta Lake, but we are looking to support Hive metastore and also DDL
> commands in the near future.
>
> Thanks,
> Liwen
>
> On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <
> jamescotrotsios@gmail.com> wrote:
>
>> Is there a plan to have a business catalog component for the Data Lake?
>> If not how would someone make a proposal to create an open source project
>> related to that. I would be interested in building out an open source data
>> catalog that would use the Hive metadata store as a baseline for technical
>> metadata.
>>
>>
>> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <li...@databricks.com>
>> wrote:
>>
>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>
>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>> https://docs.delta.io/0.2.0/quick-start.html
>>>
>>> To view the release notes:
>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>
>>> This release introduces two main features:
>>>
>>> *Cloud storage support*
>>> In addition to HDFS, you can now configure Delta Lake to read and write
>>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>>> For configuration instructions, please see:
>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>
>>> *Improved concurrency*
>>> Delta Lake now allows concurrent append-only writes while still ensuring
>>> serializability. For concurrency control in Delta Lake, please see:
>>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>>
>>> We have also greatly expanded the test coverage as part of this release.
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this release.
>>>
>>> Best regards,
>>> Liwen Sun
>>>
>>>

-- 
Best Regards,
Ayan Guha

Re: Announcing Delta Lake 0.2.0

Posted by Liwen Sun <li...@databricks.com>.
Hi James,

Right now we don't have plans for having a catalog component as part of
Delta Lake, but we are looking to support Hive metastore and also DDL
commands in the near future.

Thanks,
Liwen

On Thu, Jun 20, 2019 at 4:46 AM James Cotrotsios <ja...@gmail.com>
wrote:

> Is there a plan to have a business catalog component for the Data Lake? If
> not how would someone make a proposal to create an open source project
> related to that. I would be interested in building out an open source data
> catalog that would use the Hive metadata store as a baseline for technical
> metadata.
>
>
> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <li...@databricks.com>
> wrote:
>
>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>
>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>> https://docs.delta.io/0.2.0/quick-start.html
>>
>> To view the release notes:
>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>
>> This release introduces two main features:
>>
>> *Cloud storage support*
>> In addition to HDFS, you can now configure Delta Lake to read and write
>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>> For configuration instructions, please see:
>> https://docs.delta.io/0.2.0/delta-storage.html
>>
>> *Improved concurrency*
>> Delta Lake now allows concurrent append-only writes while still ensuring
>> serializability. For concurrency control in Delta Lake, please see:
>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>
>> We have also greatly expanded the test coverage as part of this release.
>>
>> We would like to acknowledge all community members for contributing to
>> this release.
>>
>> Best regards,
>> Liwen Sun
>>
>>

Re: Announcing Delta Lake 0.2.0

Posted by Li Gao <li...@gmail.com>.
Lyft recently open sourced a data discovery tool called Amundsen that can
serve many of the data catalog needs.

https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9
https://github.com/lyft/amundsenmetadatalibrary

You still need HMS to store the data schema though.



On Thu, Jun 20, 2019 at 4:47 AM James Cotrotsios <ja...@gmail.com>
wrote:

> Is there a plan to have a business catalog component for the Data Lake? If
> not how would someone make a proposal to create an open source project
> related to that. I would be interested in building out an open source data
> catalog that would use the Hive metadata store as a baseline for technical
> metadata.
>
>
> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <li...@databricks.com>
> wrote:
>
>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>
>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>> https://docs.delta.io/0.2.0/quick-start.html
>>
>> To view the release notes:
>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>
>> This release introduces two main features:
>>
>> *Cloud storage support*
>> In addition to HDFS, you can now configure Delta Lake to read and write
>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>> For configuration instructions, please see:
>> https://docs.delta.io/0.2.0/delta-storage.html
>>
>> *Improved concurrency*
>> Delta Lake now allows concurrent append-only writes while still ensuring
>> serializability. For concurrency control in Delta Lake, please see:
>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>
>> We have also greatly expanded the test coverage as part of this release.
>>
>> We would like to acknowledge all community members for contributing to
>> this release.
>>
>> Best regards,
>> Liwen Sun
>>
>>

Re: Announcing Delta Lake 0.2.0

Posted by James Cotrotsios <ja...@gmail.com>.
Is there a plan to have a business catalog component for the Data Lake? If
not how would someone make a proposal to create an open source project
related to that. I would be interested in building out an open source data
catalog that would use the Hive metadata store as a baseline for technical
metadata.


On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <li...@databricks.com> wrote:

> We are delighted to announce the availability of Delta Lake 0.2.0!
>
> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
> https://docs.delta.io/0.2.0/quick-start.html
>
> To view the release notes:
> https://github.com/delta-io/delta/releases/tag/v0.2.0
>
> This release introduces two main features:
>
> *Cloud storage support*
> In addition to HDFS, you can now configure Delta Lake to read and write
> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
> For configuration instructions, please see:
> https://docs.delta.io/0.2.0/delta-storage.html
>
> *Improved concurrency*
> Delta Lake now allows concurrent append-only writes while still ensuring
> serializability. For concurrency control in Delta Lake, please see:
> https://docs.delta.io/0.2.0/delta-concurrency.html
>
> We have also greatly expanded the test coverage as part of this release.
>
> We would like to acknowledge all community members for contributing to
> this release.
>
> Best regards,
> Liwen Sun
>
>

Re: Announcing Delta Lake 0.2.0

Posted by Gourav Sengupta <go...@gmail.com>.
Hi,

this is fantastic :)

Regards,
Gourav Sengupta

On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun <li...@databricks.com> wrote:

> We are delighted to announce the availability of Delta Lake 0.2.0!
>
> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
> https://docs.delta.io/0.2.0/quick-start.html
>
> To view the release notes:
> https://github.com/delta-io/delta/releases/tag/v0.2.0
>
> This release introduces two main features:
>
> *Cloud storage support*
> In addition to HDFS, you can now configure Delta Lake to read and write
> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
> For configuration instructions, please see:
> https://docs.delta.io/0.2.0/delta-storage.html
>
> *Improved concurrency*
> Delta Lake now allows concurrent append-only writes while still ensuring
> serializability. For concurrency control in Delta Lake, please see:
> https://docs.delta.io/0.2.0/delta-concurrency.html
>
> We have also greatly expanded the test coverage as part of this release.
>
> We would like to acknowledge all community members for contributing to
> this release.
>
> Best regards,
> Liwen Sun
>
> --
> You received this message because you are subscribed to the Google Groups
> "Delta Lake Users and Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to delta-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
> <https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

Re: Announcing Delta Lake 0.2.0

Posted by Gourav Sengupta <go...@gmail.com>.
Hi Ayan,

Delta is obviously well thought through, its been available in Databricks
since last year and a half now I think and besides that it is from some of
the best minds at work :)

But what may not be well tested in Delta is its availability as a storage
class for HIVE.

How about your testing? Are you doing it in S3? What is the kind of volume
you are testing it with if I may ask.


Regards,
Gourav Sengupta

On Thu, Jun 20, 2019 at 12:58 AM ayan guha <gu...@gmail.com> wrote:

> Hi
>
> We are using Delta features. The only problem we faced till now is Hive
> can not read DELTA outputs by itself (even if the Hive metastore is
> shared). However, if we create hive external table pointing to the folder
> (and with Vacuum), it can read the data.
>
> Other than that, the feature looks good and well thought out. We are doing
> a volume testing now....
>
> Best
> Ayan
>
> On Thu, Jun 20, 2019 at 9:52 AM Liwen Sun <li...@databricks.com>
> wrote:
>
>> Hi Gourav,
>>
>> Thanks for the suggestion. Please open a Github issue at
>> https://github.com/delta-io/delta/issues to describe your use case and
>> requirements for "external tables" so we can better track this feature and
>> also get feedback from the community.
>>
>> Regards,
>> Liwen
>>
>> On Wed, Jun 19, 2019 at 12:11 PM Gourav Sengupta <
>> gourav.sengupta@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> does Delta support external tables? I think that most users will be
>>> needing this.
>>>
>>>
>>> Regards,
>>> Gourav
>>>
>>> On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun <li...@databricks.com>
>>> wrote:
>>>
>>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>>
>>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>>> https://docs.delta.io/0.2.0/quick-start.html
>>>>
>>>> To view the release notes:
>>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>>
>>>> This release introduces two main features:
>>>>
>>>> *Cloud storage support*
>>>> In addition to HDFS, you can now configure Delta Lake to read and write
>>>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>>>> For configuration instructions, please see:
>>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>>
>>>> *Improved concurrency*
>>>> Delta Lake now allows concurrent append-only writes while still
>>>> ensuring serializability. For concurrency control in Delta Lake, please
>>>> see: https://docs.delta.io/0.2.0/delta-concurrency.html
>>>>
>>>> We have also greatly expanded the test coverage as part of this release.
>>>>
>>>> We would like to acknowledge all community members for contributing to
>>>> this release.
>>>>
>>>> Best regards,
>>>> Liwen Sun
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Delta Lake Users and Developers" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to delta-users+unsubscribe@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>
> --
> Best Regards,
> Ayan Guha
>

Re: Announcing Delta Lake 0.2.0

Posted by ayan guha <gu...@gmail.com>.
Hi

We are using Delta features. The only problem we faced till now is Hive can
not read DELTA outputs by itself (even if the Hive metastore is shared).
However, if we create hive external table pointing to the folder (and with
Vacuum), it can read the data.

Other than that, the feature looks good and well thought out. We are doing
a volume testing now....

Best
Ayan

On Thu, Jun 20, 2019 at 9:52 AM Liwen Sun <li...@databricks.com> wrote:

> Hi Gourav,
>
> Thanks for the suggestion. Please open a Github issue at
> https://github.com/delta-io/delta/issues to describe your use case and
> requirements for "external tables" so we can better track this feature and
> also get feedback from the community.
>
> Regards,
> Liwen
>
> On Wed, Jun 19, 2019 at 12:11 PM Gourav Sengupta <
> gourav.sengupta@gmail.com> wrote:
>
>> Hi,
>>
>> does Delta support external tables? I think that most users will be
>> needing this.
>>
>>
>> Regards,
>> Gourav
>>
>> On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun <li...@databricks.com>
>> wrote:
>>
>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>
>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>> https://docs.delta.io/0.2.0/quick-start.html
>>>
>>> To view the release notes:
>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>
>>> This release introduces two main features:
>>>
>>> *Cloud storage support*
>>> In addition to HDFS, you can now configure Delta Lake to read and write
>>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>>> For configuration instructions, please see:
>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>
>>> *Improved concurrency*
>>> Delta Lake now allows concurrent append-only writes while still ensuring
>>> serializability. For concurrency control in Delta Lake, please see:
>>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>>
>>> We have also greatly expanded the test coverage as part of this release.
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this release.
>>>
>>> Best regards,
>>> Liwen Sun
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Delta Lake Users and Developers" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to delta-users+unsubscribe@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
Best Regards,
Ayan Guha

Re: Announcing Delta Lake 0.2.0

Posted by Gourav Sengupta <go...@gmail.com>.
Hi Liwen,

its done https://github.com/delta-io/delta/issues/73

Please let me know in case the description looks fine. I can also
contribute to the test cases in case required.


Regards,
Gourav

On Thu, Jun 20, 2019 at 12:52 AM Liwen Sun <li...@databricks.com> wrote:

> Hi Gourav,
>
> Thanks for the suggestion. Please open a Github issue at
> https://github.com/delta-io/delta/issues to describe your use case and
> requirements for "external tables" so we can better track this feature and
> also get feedback from the community.
>
> Regards,
> Liwen
>
> On Wed, Jun 19, 2019 at 12:11 PM Gourav Sengupta <
> gourav.sengupta@gmail.com> wrote:
>
>> Hi,
>>
>> does Delta support external tables? I think that most users will be
>> needing this.
>>
>>
>> Regards,
>> Gourav
>>
>> On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun <li...@databricks.com>
>> wrote:
>>
>>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>>
>>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>>> https://docs.delta.io/0.2.0/quick-start.html
>>>
>>> To view the release notes:
>>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>>
>>> This release introduces two main features:
>>>
>>> *Cloud storage support*
>>> In addition to HDFS, you can now configure Delta Lake to read and write
>>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>>> For configuration instructions, please see:
>>> https://docs.delta.io/0.2.0/delta-storage.html
>>>
>>> *Improved concurrency*
>>> Delta Lake now allows concurrent append-only writes while still ensuring
>>> serializability. For concurrency control in Delta Lake, please see:
>>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>>
>>> We have also greatly expanded the test coverage as part of this release.
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this release.
>>>
>>> Best regards,
>>> Liwen Sun
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Delta Lake Users and Developers" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to delta-users+unsubscribe@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

Re: Announcing Delta Lake 0.2.0

Posted by Liwen Sun <li...@databricks.com>.
Hi Gourav,

Thanks for the suggestion. Please open a Github issue at
https://github.com/delta-io/delta/issues to describe your use case and
requirements for "external tables" so we can better track this feature and
also get feedback from the community.

Regards,
Liwen

On Wed, Jun 19, 2019 at 12:11 PM Gourav Sengupta <go...@gmail.com>
wrote:

> Hi,
>
> does Delta support external tables? I think that most users will be
> needing this.
>
>
> Regards,
> Gourav
>
> On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun <li...@databricks.com>
> wrote:
>
>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>
>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>> https://docs.delta.io/0.2.0/quick-start.html
>>
>> To view the release notes:
>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>
>> This release introduces two main features:
>>
>> *Cloud storage support*
>> In addition to HDFS, you can now configure Delta Lake to read and write
>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>> For configuration instructions, please see:
>> https://docs.delta.io/0.2.0/delta-storage.html
>>
>> *Improved concurrency*
>> Delta Lake now allows concurrent append-only writes while still ensuring
>> serializability. For concurrency control in Delta Lake, please see:
>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>
>> We have also greatly expanded the test coverage as part of this release.
>>
>> We would like to acknowledge all community members for contributing to
>> this release.
>>
>> Best regards,
>> Liwen Sun
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Delta Lake Users and Developers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to delta-users+unsubscribe@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
>> <https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>

Re: Announcing Delta Lake 0.2.0

Posted by Gourav Sengupta <go...@gmail.com>.
Hi,

does Delta support external tables? I think that most users will be needing
this.


Regards,
Gourav

On Wed, Jun 19, 2019 at 8:04 PM Liwen Sun <li...@databricks.com> wrote:

> We are delighted to announce the availability of Delta Lake 0.2.0!
>
> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
> https://docs.delta.io/0.2.0/quick-start.html
>
> To view the release notes:
> https://github.com/delta-io/delta/releases/tag/v0.2.0
>
> This release introduces two main features:
>
> *Cloud storage support*
> In addition to HDFS, you can now configure Delta Lake to read and write
> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
> For configuration instructions, please see:
> https://docs.delta.io/0.2.0/delta-storage.html
>
> *Improved concurrency*
> Delta Lake now allows concurrent append-only writes while still ensuring
> serializability. For concurrency control in Delta Lake, please see:
> https://docs.delta.io/0.2.0/delta-concurrency.html
>
> We have also greatly expanded the test coverage as part of this release.
>
> We would like to acknowledge all community members for contributing to
> this release.
>
> Best regards,
> Liwen Sun
>
> --
> You received this message because you are subscribed to the Google Groups
> "Delta Lake Users and Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to delta-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com
> <https://groups.google.com/d/msgid/delta-users/CAE4dWq9g90NkUr_SLs2J6kFPbOpxx4wy6MEgb%3DQ5pBxkUcK%2B-A%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>