You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Carl Steinbach <ca...@cloudera.com> on 2010/07/15 11:15:21 UTC

Notes from the last Hive Contributors Meeting

Hi,

Notes from the last Hive Contributors Meeting are now available on the
wiki: http://wiki.apache.org/hadoop/HiveContributorsMinutes100706

Thanks.

Carl

Re: Notes from the last Hive Contributors Meeting

Posted by John Sichi <js...@facebook.com>.
On Jul 15, 2010, at 1:32 PM, Edward Capriolo wrote:
> I can imagine in two years when we have possibly four releases and
> several version dependant features. The language manual is going to
> have multiple Caveats, that will be less clear.

We can do something like purge the ones more than three releases old.  Sun seems to keep their @since tags forever.

> Also this is a fixable and anecdotal problem, but the wiki is slow. It
> feels like it has been that way for months now.


Writes have definitely remained super-slow.  Reopen this one?

https://issues.apache.org/jira/browse/INFRA-2549

All of the attachments in my [[Hive/ViewDev]] doc seem to have disappeared, with attachment support now disabled :(

Maybe related to the fact that spammers were using the wiki to upload porn videos?  Sigh.

JVS


Re: Notes from the last Hive Contributors Meeting

Posted by John Sichi <js...@facebook.com>.
On Jul 15, 2010, at 1:32 PM, Edward Capriolo wrote:
> I can imagine in two years when we have possibly four releases and
> several version dependant features. The language manual is going to
> have multiple Caveats, that will be less clear.

We can do something like purge the ones more than three releases old.  Sun seems to keep their @since tags forever.

> Also this is a fixable and anecdotal problem, but the wiki is slow. It
> feels like it has been that way for months now.


Writes have definitely remained super-slow.  Reopen this one?

https://issues.apache.org/jira/browse/INFRA-2549

All of the attachments in my [[Hive/ViewDev]] doc seem to have disappeared, with attachment support now disabled :(

Maybe related to the fact that spammers were using the wiki to upload porn videos?  Sigh.

JVS


Re: Notes from the last Hive Contributors Meeting

Posted by Edward Capriolo <ed...@gmail.com>.
On Thu, Jul 15, 2010 at 2:02 PM, John Sichi <js...@facebook.com> wrote:
> Create/Drop View
>
> Note: View support is only available starting in Hive 0.6.
> I added this caveat when I added the CREATE/DROP view section to the DDL
> page.  For the most part, we've been following this convention, and I think
> we should keep it up whatever the final decision on docs is (copy from wiki
> to xdocs during release, or maintain only in xdocs).
> Personally, I find wiki much friendlier in general, but the wiki software
> (MoinMoin) we are using leaves a lot to be desired compared to mediawiki or
> Confluence.
> JVS
> On Jul 15, 2010, at 7:30 AM, Edward Capriolo wrote:
>
> So a user is running hive and reads the wiki, and says "Wow we have
> view support, let me try this" This fails because views are only in
> trunk. This gives people a general bad impression about hive because
> they expect trunk features, because they have no authoritative
> documentation on THEIR VERSION. Users can be fickle and if they hit
> incorrect documentation they start to get the impression the software
> is "buggy" suddenly they start questioning everything and bringing
> every problem to the hive administrator because even though they wrote
> a query wrong their first instinct is to "blame hive".
>
>
>
>
> I find editing xdocs EASIER then working with wiki. Wiki is great and
> all but in my travels I have to work on 5 different wiki's they all
> are slightly different in what they support and their mark up. We
> should be able to commit xdoc patches without full unit tests. Keeping
> the xdoc up to date should not be an issue because we should simply
> not accept a patch that changes/adds functionality without some xdoc.
>
> Another issue right now is there are features that are NOT documented
> anywhere. When a user asks about those features I have to send them to
> Jira tickets, often times the ticket will have a long back and forth
> where the feature is debated, or sometimes just a patch, you never see
> the full syntax, it can be very confusing,I often end up telling them
> to dig through a .q file inside a patch to figure out what this
> feature is and how to use it. While most people are good about
> updating the wiki we know that things tend to fall though the cracks.
>
> I think there is still a place for wiki, free form, multi-person
> planning, etc but I do not think a mature software product can every
> have authoritative documentation in a wiki.
>
>



On Thu, Jul 15, 2010 at 2:02 PM, John Sichi <js...@facebook.com> wrote:
> Create/Drop View
>
> Note: View support is only available starting in Hive 0.6.
> I added this caveat when I added the CREATE/DROP view section to the DDL
> page.  For the most part, we've been following this convention, and I think
> we should keep it up whatever the final decision on docs is (copy from wiki
> to xdocs during release, or maintain only in xdocs).
> Personally, I find wiki much friendlier in general, but the wiki software
> (MoinMoin) we are using leaves a lot to be desired compared to mediawiki or
> Confluence.
> JVS
> On Jul 15, 2010, at 7:30 AM, Edward Capriolo wrote:
>
> So a user is running hive and reads the wiki, and says "Wow we have
> view support, let me try this" This fails because views are only in
> trunk. This gives people a general bad impression about hive because
> they expect trunk features, because they have no authoritative
> documentation on THEIR VERSION. Users can be fickle and if they hit
> incorrect documentation they start to get the impression the software
> is "buggy" suddenly they start questioning everything and bringing
> every problem to the hive administrator because even though they wrote
> a query wrong their first instinct is to "blame hive".
>
>
>
>
> I find editing xdocs EASIER then working with wiki. Wiki is great and
> all but in my travels I have to work on 5 different wiki's they all
> are slightly different in what they support and their mark up. We
> should be able to commit xdoc patches without full unit tests. Keeping
> the xdoc up to date should not be an issue because we should simply
> not accept a patch that changes/adds functionality without some xdoc.
>
> Another issue right now is there are features that are NOT documented
> anywhere. When a user asks about those features I have to send them to
> Jira tickets, often times the ticket will have a long back and forth
> where the feature is debated, or sometimes just a patch, you never see
> the full syntax, it can be very confusing,I often end up telling them
> to dig through a .q file inside a patch to figure out what this
> feature is and how to use it. While most people are good about
> updating the wiki we know that things tend to fall though the cracks.
>
> I think there is still a place for wiki, free form, multi-person
> planning, etc but I do not think a mature software product can every
> have authoritative documentation in a wiki.
>
>

John,

I was not trying to pick on your because of the view example, I only
mentioned views because it was a 6.0 feature off the top of my head. I
was just making a general point. You bring up another point.

Take something like this:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
  [(col_name data_type [COMMENT col_comment], ...)]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name
[ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [
   [ROW FORMAT row_format] [STORED AS file_format]
   | STORED BY 'storage.handler.class.name' [ WITH SERDEPROPERTIES
(...) ]  (Note:  only available starting with 0.6.0)
  ]
  [STORED AS file_format]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]  (Note:  only
available starting with 0.6.0)
  [AS select_statement]  (Note: this feature is only available
starting with 0.5.0.)

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
  LIKE existing_table_name
  [LOCATION hdfs_path]

I can imagine in two years when we have possibly four releases and
several version dependant features. The language manual is going to
have multiple Caveats, that will be less clear.

Also this is a fixable and anecdotal problem, but the wiki is slow. It
feels like it has been that way for months now.

[edward@ec ~]$ wget http://wiki.apache.org/hadoop/Hive/LanguageManual/Explain
--2010-07-15 16:27:36--
http://wiki.apache.org/hadoop/Hive/LanguageManual/Explain
    [    <=>
                                     ] 21,497      11.1K/s   in 1.9s

2010-07-15 16:27:41 (11.1 KB/s) - “Explain” saved [21497]


Edward

Re: Notes from the last Hive Contributors Meeting

Posted by Edward Capriolo <ed...@gmail.com>.
On Thu, Jul 15, 2010 at 2:02 PM, John Sichi <js...@facebook.com> wrote:
> Create/Drop View
>
> Note: View support is only available starting in Hive 0.6.
> I added this caveat when I added the CREATE/DROP view section to the DDL
> page.  For the most part, we've been following this convention, and I think
> we should keep it up whatever the final decision on docs is (copy from wiki
> to xdocs during release, or maintain only in xdocs).
> Personally, I find wiki much friendlier in general, but the wiki software
> (MoinMoin) we are using leaves a lot to be desired compared to mediawiki or
> Confluence.
> JVS
> On Jul 15, 2010, at 7:30 AM, Edward Capriolo wrote:
>
> So a user is running hive and reads the wiki, and says "Wow we have
> view support, let me try this" This fails because views are only in
> trunk. This gives people a general bad impression about hive because
> they expect trunk features, because they have no authoritative
> documentation on THEIR VERSION. Users can be fickle and if they hit
> incorrect documentation they start to get the impression the software
> is "buggy" suddenly they start questioning everything and bringing
> every problem to the hive administrator because even though they wrote
> a query wrong their first instinct is to "blame hive".
>
>
>
>
> I find editing xdocs EASIER then working with wiki. Wiki is great and
> all but in my travels I have to work on 5 different wiki's they all
> are slightly different in what they support and their mark up. We
> should be able to commit xdoc patches without full unit tests. Keeping
> the xdoc up to date should not be an issue because we should simply
> not accept a patch that changes/adds functionality without some xdoc.
>
> Another issue right now is there are features that are NOT documented
> anywhere. When a user asks about those features I have to send them to
> Jira tickets, often times the ticket will have a long back and forth
> where the feature is debated, or sometimes just a patch, you never see
> the full syntax, it can be very confusing,I often end up telling them
> to dig through a .q file inside a patch to figure out what this
> feature is and how to use it. While most people are good about
> updating the wiki we know that things tend to fall though the cracks.
>
> I think there is still a place for wiki, free form, multi-person
> planning, etc but I do not think a mature software product can every
> have authoritative documentation in a wiki.
>
>



On Thu, Jul 15, 2010 at 2:02 PM, John Sichi <js...@facebook.com> wrote:
> Create/Drop View
>
> Note: View support is only available starting in Hive 0.6.
> I added this caveat when I added the CREATE/DROP view section to the DDL
> page.  For the most part, we've been following this convention, and I think
> we should keep it up whatever the final decision on docs is (copy from wiki
> to xdocs during release, or maintain only in xdocs).
> Personally, I find wiki much friendlier in general, but the wiki software
> (MoinMoin) we are using leaves a lot to be desired compared to mediawiki or
> Confluence.
> JVS
> On Jul 15, 2010, at 7:30 AM, Edward Capriolo wrote:
>
> So a user is running hive and reads the wiki, and says "Wow we have
> view support, let me try this" This fails because views are only in
> trunk. This gives people a general bad impression about hive because
> they expect trunk features, because they have no authoritative
> documentation on THEIR VERSION. Users can be fickle and if they hit
> incorrect documentation they start to get the impression the software
> is "buggy" suddenly they start questioning everything and bringing
> every problem to the hive administrator because even though they wrote
> a query wrong their first instinct is to "blame hive".
>
>
>
>
> I find editing xdocs EASIER then working with wiki. Wiki is great and
> all but in my travels I have to work on 5 different wiki's they all
> are slightly different in what they support and their mark up. We
> should be able to commit xdoc patches without full unit tests. Keeping
> the xdoc up to date should not be an issue because we should simply
> not accept a patch that changes/adds functionality without some xdoc.
>
> Another issue right now is there are features that are NOT documented
> anywhere. When a user asks about those features I have to send them to
> Jira tickets, often times the ticket will have a long back and forth
> where the feature is debated, or sometimes just a patch, you never see
> the full syntax, it can be very confusing,I often end up telling them
> to dig through a .q file inside a patch to figure out what this
> feature is and how to use it. While most people are good about
> updating the wiki we know that things tend to fall though the cracks.
>
> I think there is still a place for wiki, free form, multi-person
> planning, etc but I do not think a mature software product can every
> have authoritative documentation in a wiki.
>
>

John,

I was not trying to pick on your because of the view example, I only
mentioned views because it was a 6.0 feature off the top of my head. I
was just making a general point. You bring up another point.

Take something like this:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
  [(col_name data_type [COMMENT col_comment], ...)]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name
[ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [
   [ROW FORMAT row_format] [STORED AS file_format]
   | STORED BY 'storage.handler.class.name' [ WITH SERDEPROPERTIES
(...) ]  (Note:  only available starting with 0.6.0)
  ]
  [STORED AS file_format]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]  (Note:  only
available starting with 0.6.0)
  [AS select_statement]  (Note: this feature is only available
starting with 0.5.0.)

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
  LIKE existing_table_name
  [LOCATION hdfs_path]

I can imagine in two years when we have possibly four releases and
several version dependant features. The language manual is going to
have multiple Caveats, that will be less clear.

Also this is a fixable and anecdotal problem, but the wiki is slow. It
feels like it has been that way for months now.

[edward@ec ~]$ wget http://wiki.apache.org/hadoop/Hive/LanguageManual/Explain
--2010-07-15 16:27:36--
http://wiki.apache.org/hadoop/Hive/LanguageManual/Explain
    [    <=>
                                     ] 21,497      11.1K/s   in 1.9s

2010-07-15 16:27:41 (11.1 KB/s) - “Explain” saved [21497]


Edward

Re: Notes from the last Hive Contributors Meeting

Posted by John Sichi <js...@facebook.com>.
Create/Drop View
Note: View support is only available starting in Hive 0.6.

I added this caveat when I added the CREATE/DROP view section to the DDL page.  For the most part, we've been following this convention, and I think we should keep it up whatever the final decision on docs is (copy from wiki to xdocs during release, or maintain only in xdocs).

Personally, I find wiki much friendlier in general, but the wiki software (MoinMoin) we are using leaves a lot to be desired compared to mediawiki or Confluence.

JVS

On Jul 15, 2010, at 7:30 AM, Edward Capriolo wrote:
So a user is running hive and reads the wiki, and says "Wow we have
view support, let me try this" This fails because views are only in
trunk. This gives people a general bad impression about hive because
they expect trunk features, because they have no authoritative
documentation on THEIR VERSION. Users can be fickle and if they hit
incorrect documentation they start to get the impression the software
is "buggy" suddenly they start questioning everything and bringing
every problem to the hive administrator because even though they wrote
a query wrong their first instinct is to "blame hive".




I find editing xdocs EASIER then working with wiki. Wiki is great and
all but in my travels I have to work on 5 different wiki's they all
are slightly different in what they support and their mark up. We
should be able to commit xdoc patches without full unit tests. Keeping
the xdoc up to date should not be an issue because we should simply
not accept a patch that changes/adds functionality without some xdoc.

Another issue right now is there are features that are NOT documented
anywhere. When a user asks about those features I have to send them to
Jira tickets, often times the ticket will have a long back and forth
where the feature is debated, or sometimes just a patch, you never see
the full syntax, it can be very confusing,I often end up telling them
to dig through a .q file inside a patch to figure out what this
feature is and how to use it. While most people are good about
updating the wiki we know that things tend to fall though the cracks.

I think there is still a place for wiki, free form, multi-person
planning, etc but I do not think a mature software product can every
have authoritative documentation in a wiki.


Re: Notes from the last Hive Contributors Meeting

Posted by John Sichi <js...@facebook.com>.
Create/Drop View
Note: View support is only available starting in Hive 0.6.

I added this caveat when I added the CREATE/DROP view section to the DDL page.  For the most part, we've been following this convention, and I think we should keep it up whatever the final decision on docs is (copy from wiki to xdocs during release, or maintain only in xdocs).

Personally, I find wiki much friendlier in general, but the wiki software (MoinMoin) we are using leaves a lot to be desired compared to mediawiki or Confluence.

JVS

On Jul 15, 2010, at 7:30 AM, Edward Capriolo wrote:
So a user is running hive and reads the wiki, and says "Wow we have
view support, let me try this" This fails because views are only in
trunk. This gives people a general bad impression about hive because
they expect trunk features, because they have no authoritative
documentation on THEIR VERSION. Users can be fickle and if they hit
incorrect documentation they start to get the impression the software
is "buggy" suddenly they start questioning everything and bringing
every problem to the hive administrator because even though they wrote
a query wrong their first instinct is to "blame hive".




I find editing xdocs EASIER then working with wiki. Wiki is great and
all but in my travels I have to work on 5 different wiki's they all
are slightly different in what they support and their mark up. We
should be able to commit xdoc patches without full unit tests. Keeping
the xdoc up to date should not be an issue because we should simply
not accept a patch that changes/adds functionality without some xdoc.

Another issue right now is there are features that are NOT documented
anywhere. When a user asks about those features I have to send them to
Jira tickets, often times the ticket will have a long back and forth
where the feature is debated, or sometimes just a patch, you never see
the full syntax, it can be very confusing,I often end up telling them
to dig through a .q file inside a patch to figure out what this
feature is and how to use it. While most people are good about
updating the wiki we know that things tend to fall though the cracks.

I think there is still a place for wiki, free form, multi-person
planning, etc but I do not think a mature software product can every
have authoritative documentation in a wiki.


Re: Notes from the last Hive Contributors Meeting

Posted by Edward Capriolo <ed...@gmail.com>.
On Thu, Jul 15, 2010 at 5:15 AM, Carl Steinbach <ca...@cloudera.com> wrote:
> Hi,
>
> Notes from the last Hive Contributors Meeting are now available on the
> wiki: http://wiki.apache.org/hadoop/HiveContributorsMinutes100706
>
> Thanks.
>
> Carl
>

Sorry I did not get to listen to the event. So one topic of interest for me is:

"Several people voiced concerns that developers/users are less likely
to update the documentation if doing so requires them to submit a
patch."

I think this is a valid concern, however I want to point out a few
bigger picture things.  First, I want to point out what I think is a
great shining of documentation.

http://hornetq.sourceforge.net/docs/hornetq-2.1.1.Final/user-manual/en/html/index.html

hbase does a nice job as well.
http://hbase.apache.org/docs/r0.20.5/metrics.html

While I think the hive documentation on the wiki is better then most
wiki's, it has some issues. Here is an example. I am running hive5.

So a user is running hive and reads the wiki, and says "Wow we have
view support, let me try this" This fails because views are only in
trunk. This gives people a general bad impression about hive because
they expect trunk features, because they have no authoritative
documentation on THEIR VERSION. Users can be fickle and if they hit
incorrect documentation they start to get the impression the software
is "buggy" suddenly they start questioning everything and bringing
every problem to the hive administrator because even though they wrote
a query wrong their first instinct is to "blame hive".

I find editing xdocs EASIER then working with wiki. Wiki is great and
all but in my travels I have to work on 5 different wiki's they all
are slightly different in what they support and their mark up. We
should be able to commit xdoc patches without full unit tests. Keeping
the xdoc up to date should not be an issue because we should simply
not accept a patch that changes/adds functionality without some xdoc.

Another issue right now is there are features that are NOT documented
anywhere. When a user asks about those features I have to send them to
Jira tickets, often times the ticket will have a long back and forth
where the feature is debated, or sometimes just a patch, you never see
the full syntax, it can be very confusing,I often end up telling them
to dig through a .q file inside a patch to figure out what this
feature is and how to use it. While most people are good about
updating the wiki we know that things tend to fall though the cracks.

I think there is still a place for wiki, free form, multi-person
planning, etc but I do not think a mature software product can every
have authoritative documentation in a wiki.

Re: Notes from the last Hive Contributors Meeting

Posted by Edward Capriolo <ed...@gmail.com>.
On Thu, Jul 15, 2010 at 5:15 AM, Carl Steinbach <ca...@cloudera.com> wrote:
> Hi,
>
> Notes from the last Hive Contributors Meeting are now available on the
> wiki: http://wiki.apache.org/hadoop/HiveContributorsMinutes100706
>
> Thanks.
>
> Carl
>

Sorry I did not get to listen to the event. So one topic of interest for me is:

"Several people voiced concerns that developers/users are less likely
to update the documentation if doing so requires them to submit a
patch."

I think this is a valid concern, however I want to point out a few
bigger picture things.  First, I want to point out what I think is a
great shining of documentation.

http://hornetq.sourceforge.net/docs/hornetq-2.1.1.Final/user-manual/en/html/index.html

hbase does a nice job as well.
http://hbase.apache.org/docs/r0.20.5/metrics.html

While I think the hive documentation on the wiki is better then most
wiki's, it has some issues. Here is an example. I am running hive5.

So a user is running hive and reads the wiki, and says "Wow we have
view support, let me try this" This fails because views are only in
trunk. This gives people a general bad impression about hive because
they expect trunk features, because they have no authoritative
documentation on THEIR VERSION. Users can be fickle and if they hit
incorrect documentation they start to get the impression the software
is "buggy" suddenly they start questioning everything and bringing
every problem to the hive administrator because even though they wrote
a query wrong their first instinct is to "blame hive".

I find editing xdocs EASIER then working with wiki. Wiki is great and
all but in my travels I have to work on 5 different wiki's they all
are slightly different in what they support and their mark up. We
should be able to commit xdoc patches without full unit tests. Keeping
the xdoc up to date should not be an issue because we should simply
not accept a patch that changes/adds functionality without some xdoc.

Another issue right now is there are features that are NOT documented
anywhere. When a user asks about those features I have to send them to
Jira tickets, often times the ticket will have a long back and forth
where the feature is debated, or sometimes just a patch, you never see
the full syntax, it can be very confusing,I often end up telling them
to dig through a .q file inside a patch to figure out what this
feature is and how to use it. While most people are good about
updating the wiki we know that things tend to fall though the cracks.

I think there is still a place for wiki, free form, multi-person
planning, etc but I do not think a mature software product can every
have authoritative documentation in a wiki.