You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "Bigicecream (Jira)" <ji...@apache.org> on 2021/09/22 11:48:00 UTC
[jira] [Updated] (CARBONDATA-4279) Insert data to table with a
partitions resulting in 'Marked for Delete' segment in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bigicecream updated CARBONDATA-4279:
------------------------------------
Description:
as described [here|https://github.com/apache/carbondata/issues/4212]
After the commit [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
I have successfully created a table with partitions, but when I trying insert data the job end with a success
but the segment is marked as "Marked for Delete"
I am running:
{code:sql}
CREATE TABLE lior_carbon_tests.mark_for_del_bug(
timestamp string,
name string
)
STORED AS carbondata
PARTITIONED BY (dt string, hr string)
{code}
{code:sql}
INSERT INTO lior_carbon_tests.mark_for_del_bug select '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
{code}
{code:sql}
select * from lior_carbon_tests.mark_for_del_bug
{code}
gives:
{code:java}
+---------+----+---+---+
|timestamp|name| dt| hr|
+---------+----+---+---+
+---------+----+---+---+
{code}
And
{code:java}
show segments for TABLE lior_carbon_tests.mark_for_del_bug
{code}
gives
{code:java}
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
|ID |Status |Load Start Time |Load Time Taken|Partition|Data Size|Index Size|File Format|
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
|0 |Marked for Delete|2021-09-02 15:24:21.022|11.798S |NA |NA |NA |columnar_v3|
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
{code}
I took a looking at the folder structure in S3 and it seems fine
was:
as decribed [here|https://github.com/apache/carbondata/issues/4212]
After the commit [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
I have successfully created a table with partitions, but when I trying insert data the job end with a success
but the segment is marked as "Marked for Delete"
I am running:
{code:sql}
CREATE TABLE lior_carbon_tests.mark_for_del_bug(
timestamp string,
name string
)
STORED AS carbondata
PARTITIONED BY (dt string, hr string)
{code}
{code:sql}
INSERT INTO lior_carbon_tests.mark_for_del_bug select '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
{code}
{code:sql}
select * from lior_carbon_tests.mark_for_del_bug
{code}
gives:
{code:java}
+---------+----+---+---+
|timestamp|name| dt| hr|
+---------+----+---+---+
+---------+----+---+---+
{code}
And
{code:java}
show segments for TABLE lior_carbon_tests.mark_for_del_bug
{code}
gives
{code:java}
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
|ID |Status |Load Start Time |Load Time Taken|Partition|Data Size|Index Size|File Format|
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
|0 |Marked for Delete|2021-09-02 15:24:21.022|11.798S |NA |NA |NA |columnar_v3|
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
{code}
I took a looking at the folder structure in S3 and it seems fine
> Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR
> -----------------------------------------------------------------------------------------------
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
> Issue Type: Bug
> Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
> Reporter: Bigicecream
> Priority: Blocker
>
> as described [here|https://github.com/apache/carbondata/issues/4212]
> After the commit [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert data the job end with a success
> but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +---------+----+---+---+
> |timestamp|name| dt| hr|
> +---------+----+---+---+
> +---------+----+---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>
> {code:java}
> +---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
> |ID |Status |Load Start Time |Load Time Taken|Partition|Data Size|Index Size|File Format|
> +---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
> |0 |Marked for Delete|2021-09-02 15:24:21.022|11.798S |NA |NA |NA |columnar_v3|
> +---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
> {code}
>
> I took a looking at the folder structure in S3 and it seems fine
--
This message was sent by Atlassian Jira
(v8.3.4#803005)