You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Shiyan Xu <xu...@gmail.com> on 2022/07/31 04:17:56 UTC
Re: [DISCUSS] Diagnostic reporter
To bubble this up
On Wed, Jun 15, 2022 at 11:47 PM Vinoth Chandar <vi...@apache.org> wrote:
> +1 from me.
>
> It will be very useful if we can have something that can gather
> troubleshooting info easily.
> This part takes a while currently.
>
> On Mon, May 30, 2022 at 9:52 AM Shiyan Xu <xu...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > When troubleshooting Hudi jobs in users' environments, we always ask
> users
> > to share configs, environment info, check spark UI, etc. Here is an RFC
> > idea: can we extend the Hudi metrics system and make a diagnostic
> reporter?
> > It can be turned on like a normal metrics reporter. it should collect
> > common troubleshooting info and save to json or other human-readable text
> > format. Users should be able to run with it and share the diagnosis file.
> > The RFC should discuss what info should / can be collected.
> >
> > Does this make sense? Anyone interested in driving the RFC design and
> > implementation work?
> >
> > --
> > Best,
> > Shiyan
> >
>
--
Best,
Shiyan
Re: [DISCUSS] Diagnostic reporter
Posted by Shiyan Xu <xu...@gmail.com>.
Sure, Zhang Yue, feel free to initiate the RFC!
On Fri, Aug 5, 2022 at 4:57 AM 田昕峣 (Xinyao Tian) <xi...@yeah.net>
wrote:
> Hi Shiyan and everyone,
>
>
> Definitely this feature is very important. We really need to gather error
> infos to fix bugs more efficiently.
>
>
> If there’s any thing I could help please feel free to let me know :)
>
>
> Regards,
> Xinyao
>
>
>
>
> Hi Shiyan and everyone,
> This is a great idea! As one of Hudi user, I also struggle to Hudi
> troubleshooting sometimes. With this feature, it will definitely be able to
> reduce the burden.
> So I volunteer to draft a discuss and maybe raise a RFC about if you don't
> mind. Thanks :)
>
>
> | |
> Yue Zhang
> |
> |
> zhangyue921010@163.com
> |
>
>
> On 08/3/2022 00:44,冯健<fe...@gmail.com> wrote:
> Maybe we can start this with an audit feature? Since we need some sort of
> "images" to represent “facts”, can create an identity of a writer to link
> them. and in this audit file, we can label each operation with IP,
> environment, platform, version, write config and etc.
>
> On Sun, 31 Jul 2022 at 12:18, Shiyan Xu <xu...@gmail.com>
> wrote:
>
> To bubble this up
>
> On Wed, Jun 15, 2022 at 11:47 PM Vinoth Chandar <vi...@apache.org> wrote:
>
> +1 from me.
>
> It will be very useful if we can have something that can gather
> troubleshooting info easily.
> This part takes a while currently.
>
> On Mon, May 30, 2022 at 9:52 AM Shiyan Xu <xu...@gmail.com>
> wrote:
>
> Hi all,
>
> When troubleshooting Hudi jobs in users' environments, we always ask
> users
> to share configs, environment info, check spark UI, etc. Here is an RFC
> idea: can we extend the Hudi metrics system and make a diagnostic
> reporter?
> It can be turned on like a normal metrics reporter. it should collect
> common troubleshooting info and save to json or other human-readable
> text
> format. Users should be able to run with it and share the diagnosis
> file.
> The RFC should discuss what info should / can be collected.
>
> Does this make sense? Anyone interested in driving the RFC design and
> implementation work?
>
> --
> Best,
> Shiyan
>
>
> --
> Best,
> Shiyan
>
>
--
Best,
Shiyan
Re: [DISCUSS] Diagnostic reporter
Posted by "田昕峣 (Xinyao Tian)" <xi...@yeah.net>.
Hi Shiyan and everyone,
Definitely this feature is very important. We really need to gather error infos to fix bugs more efficiently.
If there’s any thing I could help please feel free to let me know :)
Regards,
Xinyao
Hi Shiyan and everyone,
This is a great idea! As one of Hudi user, I also struggle to Hudi troubleshooting sometimes. With this feature, it will definitely be able to reduce the burden.
So I volunteer to draft a discuss and maybe raise a RFC about if you don't mind. Thanks :)
| |
Yue Zhang
|
|
zhangyue921010@163.com
|
On 08/3/2022 00:44,冯健<fe...@gmail.com> wrote:
Maybe we can start this with an audit feature? Since we need some sort of
"images" to represent “facts”, can create an identity of a writer to link
them. and in this audit file, we can label each operation with IP,
environment, platform, version, write config and etc.
On Sun, 31 Jul 2022 at 12:18, Shiyan Xu <xu...@gmail.com> wrote:
To bubble this up
On Wed, Jun 15, 2022 at 11:47 PM Vinoth Chandar <vi...@apache.org> wrote:
+1 from me.
It will be very useful if we can have something that can gather
troubleshooting info easily.
This part takes a while currently.
On Mon, May 30, 2022 at 9:52 AM Shiyan Xu <xu...@gmail.com>
wrote:
Hi all,
When troubleshooting Hudi jobs in users' environments, we always ask
users
to share configs, environment info, check spark UI, etc. Here is an RFC
idea: can we extend the Hudi metrics system and make a diagnostic
reporter?
It can be turned on like a normal metrics reporter. it should collect
common troubleshooting info and save to json or other human-readable
text
format. Users should be able to run with it and share the diagnosis
file.
The RFC should discuss what info should / can be collected.
Does this make sense? Anyone interested in driving the RFC design and
implementation work?
--
Best,
Shiyan
--
Best,
Shiyan
Re: [DISCUSS] Diagnostic reporter
Posted by Forward Xu <fo...@gmail.com>.
+1, Thanks Shiyan Xu and Zhang Yue, This is a very useful function.
Best,
Forward
sagar sumit <co...@apache.org> 于2022年9月12日周一 18:39写道:
> Thanks Zhang Yue for drafting the RFC.
> It's an interesting read! I have left some comments.
>
> While exposing certain info such as "sample_hoodie_key",
> we have to consider masking/obfuscation.
>
> Looking forward to the implementation.
>
> Regards,
> Sagar
>
> On Wed, Sep 7, 2022 at 1:49 PM Yue Zhang <zh...@163.com> wrote:
>
> > Hi Hudi,
> > Just raise a RFC about this diagnostic reporter
> > https://github.com/apache/hudi/pull/6600. PLEASE feel free to leave any
> > comments or concerns if you are interested!
> >
> >
> > | |
> > Yue Zhang
> > |
> > |
> > zhangyue921010@163.com
> > |
> >
> >
> > On 08/4/2022 19:38,Yue Zhang<zh...@163.com> wrote:
> > Hi Shiyan and everyone,
> > This is a great idea! As one of Hudi user, I also struggle to Hudi
> > troubleshooting sometimes. With this feature, it will definitely be able
> to
> > reduce the burden.
> > So I volunteer to draft a discuss and maybe raise a RFC about if you
> > don't mind. Thanks :)
> >
> >
> > | |
> > Yue Zhang
> > |
> > |
> > zhangyue921010@163.com
> > |
> >
> >
> > On 08/3/2022 00:44,冯健<fe...@gmail.com> wrote:
> > Maybe we can start this with an audit feature? Since we need some sort of
> > "images" to represent “facts”, can create an identity of a writer to link
> > them. and in this audit file, we can label each operation with IP,
> > environment, platform, version, write config and etc.
> >
> > On Sun, 31 Jul 2022 at 12:18, Shiyan Xu <xu...@gmail.com>
> > wrote:
> >
> > To bubble this up
> >
> > On Wed, Jun 15, 2022 at 11:47 PM Vinoth Chandar <vi...@apache.org>
> wrote:
> >
> > +1 from me.
> >
> > It will be very useful if we can have something that can gather
> > troubleshooting info easily.
> > This part takes a while currently.
> >
> > On Mon, May 30, 2022 at 9:52 AM Shiyan Xu <xu...@gmail.com>
> > wrote:
> >
> > Hi all,
> >
> > When troubleshooting Hudi jobs in users' environments, we always ask
> > users
> > to share configs, environment info, check spark UI, etc. Here is an RFC
> > idea: can we extend the Hudi metrics system and make a diagnostic
> > reporter?
> > It can be turned on like a normal metrics reporter. it should collect
> > common troubleshooting info and save to json or other human-readable
> > text
> > format. Users should be able to run with it and share the diagnosis
> > file.
> > The RFC should discuss what info should / can be collected.
> >
> > Does this make sense? Anyone interested in driving the RFC design and
> > implementation work?
> >
> > --
> > Best,
> > Shiyan
> >
> >
> > --
> > Best,
> > Shiyan
> >
> >
>
Re: [DISCUSS] Diagnostic reporter
Posted by sagar sumit <co...@apache.org>.
Thanks Zhang Yue for drafting the RFC.
It's an interesting read! I have left some comments.
While exposing certain info such as "sample_hoodie_key",
we have to consider masking/obfuscation.
Looking forward to the implementation.
Regards,
Sagar
On Wed, Sep 7, 2022 at 1:49 PM Yue Zhang <zh...@163.com> wrote:
> Hi Hudi,
> Just raise a RFC about this diagnostic reporter
> https://github.com/apache/hudi/pull/6600. PLEASE feel free to leave any
> comments or concerns if you are interested!
>
>
> | |
> Yue Zhang
> |
> |
> zhangyue921010@163.com
> |
>
>
> On 08/4/2022 19:38,Yue Zhang<zh...@163.com> wrote:
> Hi Shiyan and everyone,
> This is a great idea! As one of Hudi user, I also struggle to Hudi
> troubleshooting sometimes. With this feature, it will definitely be able to
> reduce the burden.
> So I volunteer to draft a discuss and maybe raise a RFC about if you
> don't mind. Thanks :)
>
>
> | |
> Yue Zhang
> |
> |
> zhangyue921010@163.com
> |
>
>
> On 08/3/2022 00:44,冯健<fe...@gmail.com> wrote:
> Maybe we can start this with an audit feature? Since we need some sort of
> "images" to represent “facts”, can create an identity of a writer to link
> them. and in this audit file, we can label each operation with IP,
> environment, platform, version, write config and etc.
>
> On Sun, 31 Jul 2022 at 12:18, Shiyan Xu <xu...@gmail.com>
> wrote:
>
> To bubble this up
>
> On Wed, Jun 15, 2022 at 11:47 PM Vinoth Chandar <vi...@apache.org> wrote:
>
> +1 from me.
>
> It will be very useful if we can have something that can gather
> troubleshooting info easily.
> This part takes a while currently.
>
> On Mon, May 30, 2022 at 9:52 AM Shiyan Xu <xu...@gmail.com>
> wrote:
>
> Hi all,
>
> When troubleshooting Hudi jobs in users' environments, we always ask
> users
> to share configs, environment info, check spark UI, etc. Here is an RFC
> idea: can we extend the Hudi metrics system and make a diagnostic
> reporter?
> It can be turned on like a normal metrics reporter. it should collect
> common troubleshooting info and save to json or other human-readable
> text
> format. Users should be able to run with it and share the diagnosis
> file.
> The RFC should discuss what info should / can be collected.
>
> Does this make sense? Anyone interested in driving the RFC design and
> implementation work?
>
> --
> Best,
> Shiyan
>
>
> --
> Best,
> Shiyan
>
>
Re: [DISCUSS] Diagnostic reporter
Posted by Yue Zhang <zh...@163.com>.
Hi Hudi,
Just raise a RFC about this diagnostic reporter https://github.com/apache/hudi/pull/6600. PLEASE feel free to leave any comments or concerns if you are interested!
| |
Yue Zhang
|
|
zhangyue921010@163.com
|
On 08/4/2022 19:38,Yue Zhang<zh...@163.com> wrote:
Hi Shiyan and everyone,
This is a great idea! As one of Hudi user, I also struggle to Hudi troubleshooting sometimes. With this feature, it will definitely be able to reduce the burden.
So I volunteer to draft a discuss and maybe raise a RFC about if you don't mind. Thanks :)
| |
Yue Zhang
|
|
zhangyue921010@163.com
|
On 08/3/2022 00:44,冯健<fe...@gmail.com> wrote:
Maybe we can start this with an audit feature? Since we need some sort of
"images" to represent “facts”, can create an identity of a writer to link
them. and in this audit file, we can label each operation with IP,
environment, platform, version, write config and etc.
On Sun, 31 Jul 2022 at 12:18, Shiyan Xu <xu...@gmail.com> wrote:
To bubble this up
On Wed, Jun 15, 2022 at 11:47 PM Vinoth Chandar <vi...@apache.org> wrote:
+1 from me.
It will be very useful if we can have something that can gather
troubleshooting info easily.
This part takes a while currently.
On Mon, May 30, 2022 at 9:52 AM Shiyan Xu <xu...@gmail.com>
wrote:
Hi all,
When troubleshooting Hudi jobs in users' environments, we always ask
users
to share configs, environment info, check spark UI, etc. Here is an RFC
idea: can we extend the Hudi metrics system and make a diagnostic
reporter?
It can be turned on like a normal metrics reporter. it should collect
common troubleshooting info and save to json or other human-readable
text
format. Users should be able to run with it and share the diagnosis
file.
The RFC should discuss what info should / can be collected.
Does this make sense? Anyone interested in driving the RFC design and
implementation work?
--
Best,
Shiyan
--
Best,
Shiyan
Re: [DISCUSS] Diagnostic reporter
Posted by Yue Zhang <zh...@163.com>.
Hi Shiyan and everyone,
This is a great idea! As one of Hudi user, I also struggle to Hudi troubleshooting sometimes. With this feature, it will definitely be able to reduce the burden.
So I volunteer to draft a discuss and maybe raise a RFC about if you don't mind. Thanks :)
| |
Yue Zhang
|
|
zhangyue921010@163.com
|
On 08/3/2022 00:44,冯健<fe...@gmail.com> wrote:
Maybe we can start this with an audit feature? Since we need some sort of
"images" to represent “facts”, can create an identity of a writer to link
them. and in this audit file, we can label each operation with IP,
environment, platform, version, write config and etc.
On Sun, 31 Jul 2022 at 12:18, Shiyan Xu <xu...@gmail.com> wrote:
To bubble this up
On Wed, Jun 15, 2022 at 11:47 PM Vinoth Chandar <vi...@apache.org> wrote:
+1 from me.
It will be very useful if we can have something that can gather
troubleshooting info easily.
This part takes a while currently.
On Mon, May 30, 2022 at 9:52 AM Shiyan Xu <xu...@gmail.com>
wrote:
Hi all,
When troubleshooting Hudi jobs in users' environments, we always ask
users
to share configs, environment info, check spark UI, etc. Here is an RFC
idea: can we extend the Hudi metrics system and make a diagnostic
reporter?
It can be turned on like a normal metrics reporter. it should collect
common troubleshooting info and save to json or other human-readable
text
format. Users should be able to run with it and share the diagnosis
file.
The RFC should discuss what info should / can be collected.
Does this make sense? Anyone interested in driving the RFC design and
implementation work?
--
Best,
Shiyan
--
Best,
Shiyan
Re: [DISCUSS] Diagnostic reporter
Posted by 冯健 <fe...@gmail.com>.
Maybe we can start this with an audit feature? Since we need some sort of
"images" to represent “facts”, can create an identity of a writer to link
them. and in this audit file, we can label each operation with IP,
environment, platform, version, write config and etc.
On Sun, 31 Jul 2022 at 12:18, Shiyan Xu <xu...@gmail.com> wrote:
> To bubble this up
>
> On Wed, Jun 15, 2022 at 11:47 PM Vinoth Chandar <vi...@apache.org> wrote:
>
> > +1 from me.
> >
> > It will be very useful if we can have something that can gather
> > troubleshooting info easily.
> > This part takes a while currently.
> >
> > On Mon, May 30, 2022 at 9:52 AM Shiyan Xu <xu...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > When troubleshooting Hudi jobs in users' environments, we always ask
> > users
> > > to share configs, environment info, check spark UI, etc. Here is an RFC
> > > idea: can we extend the Hudi metrics system and make a diagnostic
> > reporter?
> > > It can be turned on like a normal metrics reporter. it should collect
> > > common troubleshooting info and save to json or other human-readable
> text
> > > format. Users should be able to run with it and share the diagnosis
> file.
> > > The RFC should discuss what info should / can be collected.
> > >
> > > Does this make sense? Anyone interested in driving the RFC design and
> > > implementation work?
> > >
> > > --
> > > Best,
> > > Shiyan
> > >
> >
> --
> Best,
> Shiyan
>