You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by VenuReddy <k....@gmail.com> on 2020/07/09 09:16:09 UTC

[Discussion]Do we still need to support carbon.merge.index.in.segment property ?

Dear Community.!

Have recently encountered a problem of Segment directory and segment file in
metadata directory are not created for partitioned table when
'carbon.merge.index.in.segment' property is set to 'false'. And actual index
files which were present in respective partition's '.tmp' directory are also
deleted without moving them out to respective partition directory where its
'.carbondata' file exist. Thus queries throw exception while reading index
files. Please refer jira issue - 
https://issues.apache.org/jira/browse/CARBONDATA-3834
<https://issues.apache.org/jira/browse/CARBONDATA-3834>  

To address this issue, we have 2 options to go with - 
1. Either fix it to work for 'carbon.merge.index.in.segment' set to false
case. There is an open PR  https://github.com/apache/carbondata/pull/3776
<https://github.com/apache/carbondata/pull/3776>   for it.

2. Or Deprecate the 'carbon.merge.index.in.segment' property itself. As the
query performance is better when merge index files are in use when compared
to normal index files, and default behavior is to generate merge index
files, probably it is not necessary to support
'carbon.merge.index.in.segment' anymore.

What do you think about it ? Please let me know your opinion.

Thanks,
Venu



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

Posted by David CaiQiang <da...@gmail.com>.
update reply: 

The merging index should be a part of loading. It is not good to extract the
merging index to an independent process, it brought the query issue (the
system can't find the index files when/after merging).

In my opinion, during loading, new  .carbonindex files should be temporary,
we should merge them to a .carbonindexmerge file in a segment before
updating the segment status to success in tablestatus file.
When the merging index failed, loading should be failed.

for query:
1.  support reading .carbonindex files and .carbonindexmerge files

for loading: (also include the loading part of compaction/create
index/create mv/merge operations)
better to do like this.
step 1. update tablestatus file to add an in-progress segment
step 2. generate carbondata files and temporary .carbonindex files.
step 3. merge .carbonindex files to a .carbonindexmerge file.
step 4. write a segment file. 
step 5. update tablestatus file with final status, segment file name and
some statistics.

So in total,
 update tablestatus file twice,
 write segment file once,
 write .carbonindexmerge files once,
 write and delete .carbonindex files once.

for updating:
1. Now only updating operation can keep .carbonindex file
in the future, maybe we can change updating operations to the same with
merge operations to generate new files into a new segment.



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

Posted by David CaiQiang <da...@gmail.com>.
The merging index should be a part of loading. It is not good to extract the
merging index to an independent process, it brought the query issue (the
system can't find the index files when/after merging). 

In my opinion, during loading, new  .carbonindex files should be temporary, 
we should merge them to a .carbonindexmerge file in a segment before
updating the segment status to success in tablestatus file.
When the merging index failed, loading should be failed.

for query:
1.  support reading .carbonindex files and .carbonindexmerge files

for loading: (also include the loading part of compaction/create
index/create mv/merge operations)
better to do like this.
step 1. update tablestatus file to add an in-progress segment
step 2. generate carbondata file and temporary .carbonindex files, for a
partitioned table, it also generates a temporary segment file for each
related partition.
step 3. merge .carbonindex files to a .carbonindexmerge file.
step 4. write a segment file. for a partitioned table, merge all temporary
segment files to one segment file.
step 5. update tablestatus file with final status, segment file name and
some statistics. 

So in total,
 update tablestatus file twice,
 write segment file once,
 write .carbonindex files once,
 write and delete .carbonindex files once. 

for updating:
1. Now only updating operation can keep .carbonindex file 
in the future, maybe we can change updating operations to the same with
merge operation to generate new files into a new segment.



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

Posted by Kumar Vishal <ku...@gmail.com>.
+1 for deprecating this property. We can have it as an internal property
for dev, not expose to user.

+1 for fixing the issue. If merge index creation is failing, then also
query should work fine with index file.

-Regards
Kumar Vishal

On Thu, 9 Jul 2020 at 10:58 PM, Kunal Kapoor <ku...@gmail.com>
wrote:

> Agree with david on always using merge index.
>
> +1 for deprecating the property.
>
> On Thu, Jul 9, 2020 at 3:32 PM Ajantha Bhat <aj...@gmail.com> wrote:
>
> > Hi, What if too many index files in a segment and user want to finish
> load
> > fast and don't want to wait for merge index?
> >
> > That time setting merge index = false can help to save load time and in
> off
> > peak time user can create merge index.
> >
> > So I still feel we need to fix issue exist when merge index = false.
> >
> > Thanks,
> > Ajantha
> >
> > On Thu, 9 Jul, 2020, 3:05 pm David CaiQiang, <da...@gmail.com>
> wrote:
> >
> > > Better to always merge index.
> > >
> > > -1 for 1,
> > >
> > > +1 for 2,
> > >
> > >
> > >
> > > -----
> > > Best Regards
> > > David Cai
> > > --
> > > Sent from:
> > >
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> > >
> >
>

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

Posted by Kunal Kapoor <ku...@gmail.com>.
Agree with david on always using merge index.

+1 for deprecating the property.

On Thu, Jul 9, 2020 at 3:32 PM Ajantha Bhat <aj...@gmail.com> wrote:

> Hi, What if too many index files in a segment and user want to finish load
> fast and don't want to wait for merge index?
>
> That time setting merge index = false can help to save load time and in off
> peak time user can create merge index.
>
> So I still feel we need to fix issue exist when merge index = false.
>
> Thanks,
> Ajantha
>
> On Thu, 9 Jul, 2020, 3:05 pm David CaiQiang, <da...@gmail.com> wrote:
>
> > Better to always merge index.
> >
> > -1 for 1,
> >
> > +1 for 2,
> >
> >
> >
> > -----
> > Best Regards
> > David Cai
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

Posted by Ajantha Bhat <aj...@gmail.com>.
Hi, What if too many index files in a segment and user want to finish load
fast and don't want to wait for merge index?

That time setting merge index = false can help to save load time and in off
peak time user can create merge index.

So I still feel we need to fix issue exist when merge index = false.

Thanks,
Ajantha

On Thu, 9 Jul, 2020, 3:05 pm David CaiQiang, <da...@gmail.com> wrote:

> Better to always merge index.
>
> -1 for 1,
>
> +1 for 2,
>
>
>
> -----
> Best Regards
> David Cai
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

Posted by David CaiQiang <da...@gmail.com>.
Better to always merge index.

-1 for 1,  

+1 for 2, 



-----
Best Regards
David Cai
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

Posted by Ravindra Pesala <ra...@gmail.com>.
Hi,

+1
I agree with Vishal, let's deprecate the configuration and keep it as
internal.

Regards,
Ravindra.

On Fri, 10 Jul 2020 at 01:54, Ajantha Bhat <aj...@gmail.com> wrote:

> Hi,
> I didn't reply to deprecation. *+1 for deprecating it*.
>
> *And +1 for issue fix also.*
> Issue fix, I didn't mean when *carbon.merge.index.in
> <http://carbon.merge.index.in/>.segment = false.*
>
> but when when *carbon.merge.index.in
> <http://carbon.merge.index.in/>.segment = true and merge index creation
> failed for some reason.*
> code needs to take care of
> a. Moving index files from temp folder to final folder in case of partition
> table load.
> b. Not failing the current partition load.  (same as normal load behavior)
> I think these two are not handled after partition optimization, you can
> check and handle it.
>
>
> Thanks,
> Ajantha
>
> On Thu, 9 Jul, 2020, 9:29 pm Akash r, <ak...@gmail.com> wrote:
>
> > Hi,
> >
> > +1, we can deprecate it and as Vishal suggested we can keep as internal
> > property for developer purpose.
> >
> > Regards,
> > Akash R Nilugal
> >
> > On Thu, Jul 9, 2020, 2:46 PM VenuReddy <k....@gmail.com>
> wrote:
> >
> > > Dear Community.!
> > >
> > > Have recently encountered a problem of Segment directory and segment
> file
> > > in
> > > metadata directory are not created for partitioned table when
> > > 'carbon.merge.index.in.segment' property is set to 'false'. And actual
> > > index
> > > files which were present in respective partition's '.tmp' directory are
> > > also
> > > deleted without moving them out to respective partition directory where
> > its
> > > '.carbondata' file exist. Thus queries throw exception while reading
> > index
> > > files. Please refer jira issue -
> > > https://issues.apache.org/jira/browse/CARBONDATA-3834
> > > <https://issues.apache.org/jira/browse/CARBONDATA-3834>
> > >
> > > To address this issue, we have 2 options to go with -
> > > 1. Either fix it to work for 'carbon.merge.index.in.segment' set to
> > false
> > > case. There is an open PR
> > https://github.com/apache/carbondata/pull/3776
> > > <https://github.com/apache/carbondata/pull/3776>   for it.
> > >
> > > 2. Or Deprecate the 'carbon.merge.index.in.segment' property itself.
> As
> > > the
> > > query performance is better when merge index files are in use when
> > compared
> > > to normal index files, and default behavior is to generate merge index
> > > files, probably it is not necessary to support
> > > 'carbon.merge.index.in.segment' anymore.
> > >
> > > What do you think about it ? Please let me know your opinion.
> > >
> > > Thanks,
> > > Venu
> > >
> > >
> > >
> > > --
> > > Sent from:
> > >
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> > >
> >
> >
>


-- 
Thanks & Regards,
Ravi

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

Posted by Ajantha Bhat <aj...@gmail.com>.
Hi,
I didn't reply to deprecation. *+1 for deprecating it*.

*And +1 for issue fix also.*
Issue fix, I didn't mean when *carbon.merge.index.in
<http://carbon.merge.index.in/>.segment = false.*

but when when *carbon.merge.index.in
<http://carbon.merge.index.in/>.segment = true and merge index creation
failed for some reason.*
code needs to take care of
a. Moving index files from temp folder to final folder in case of partition
table load.
b. Not failing the current partition load.  (same as normal load behavior)
I think these two are not handled after partition optimization, you can
check and handle it.


Thanks,
Ajantha

On Thu, 9 Jul, 2020, 9:29 pm Akash r, <ak...@gmail.com> wrote:

> Hi,
>
> +1, we can deprecate it and as Vishal suggested we can keep as internal
> property for developer purpose.
>
> Regards,
> Akash R Nilugal
>
> On Thu, Jul 9, 2020, 2:46 PM VenuReddy <k....@gmail.com> wrote:
>
> > Dear Community.!
> >
> > Have recently encountered a problem of Segment directory and segment file
> > in
> > metadata directory are not created for partitioned table when
> > 'carbon.merge.index.in.segment' property is set to 'false'. And actual
> > index
> > files which were present in respective partition's '.tmp' directory are
> > also
> > deleted without moving them out to respective partition directory where
> its
> > '.carbondata' file exist. Thus queries throw exception while reading
> index
> > files. Please refer jira issue -
> > https://issues.apache.org/jira/browse/CARBONDATA-3834
> > <https://issues.apache.org/jira/browse/CARBONDATA-3834>
> >
> > To address this issue, we have 2 options to go with -
> > 1. Either fix it to work for 'carbon.merge.index.in.segment' set to
> false
> > case. There is an open PR
> https://github.com/apache/carbondata/pull/3776
> > <https://github.com/apache/carbondata/pull/3776>   for it.
> >
> > 2. Or Deprecate the 'carbon.merge.index.in.segment' property itself. As
> > the
> > query performance is better when merge index files are in use when
> compared
> > to normal index files, and default behavior is to generate merge index
> > files, probably it is not necessary to support
> > 'carbon.merge.index.in.segment' anymore.
> >
> > What do you think about it ? Please let me know your opinion.
> >
> > Thanks,
> > Venu
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>
>

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

Posted by Akash r <ak...@gmail.com>.
Hi,

+1, we can deprecate it and as Vishal suggested we can keep as internal
property for developer purpose.

Regards,
Akash R Nilugal

On Thu, Jul 9, 2020, 2:46 PM VenuReddy <k....@gmail.com> wrote:

> Dear Community.!
>
> Have recently encountered a problem of Segment directory and segment file
> in
> metadata directory are not created for partitioned table when
> 'carbon.merge.index.in.segment' property is set to 'false'. And actual
> index
> files which were present in respective partition's '.tmp' directory are
> also
> deleted without moving them out to respective partition directory where its
> '.carbondata' file exist. Thus queries throw exception while reading index
> files. Please refer jira issue -
> https://issues.apache.org/jira/browse/CARBONDATA-3834
> <https://issues.apache.org/jira/browse/CARBONDATA-3834>
>
> To address this issue, we have 2 options to go with -
> 1. Either fix it to work for 'carbon.merge.index.in.segment' set to false
> case. There is an open PR  https://github.com/apache/carbondata/pull/3776
> <https://github.com/apache/carbondata/pull/3776>   for it.
>
> 2. Or Deprecate the 'carbon.merge.index.in.segment' property itself. As
> the
> query performance is better when merge index files are in use when compared
> to normal index files, and default behavior is to generate merge index
> files, probably it is not necessary to support
> 'carbon.merge.index.in.segment' anymore.
>
> What do you think about it ? Please let me know your opinion.
>
> Thanks,
> Venu
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>