You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/31 03:30:52 UTC

[GitHub] [spark] beobest2 commented on pull request #36729: [SPARK-39295][PYTHON][DOCS] Improve documentation of pandas API suppo…

beobest2 commented on PR #36729:
URL: https://github.com/apache/spark/pull/36729#issuecomment-1141632078

   @HyukjinKwon The current 'supported API generation' function dynamically compares the modules of `PySpark.pandas` and `pandas` to find the difference. At this time, the inherited class is also aggregated, and the link is not generated correctly (such as `CategoricalIndex.all()`) because it does not match the pattern of each API document. 
   
   ex>
   <img width="779" alt="Screen Shot 2022-05-30 at 11 27 55 PM" src="https://user-images.githubusercontent.com/7010554/171086960-0a7c9465-7366-4d0f-a823-a0826e2512ab.png">
   
   ```
   .../reference/pyspark.pandas/api/pyspark.pandas.CategoricalIndex.add_categories.html
       >> exists
   .../reference/pyspark.pandas/api/pyspark.pandas.CategoricalIndex.all.html
       >> not exists 
   ```
   
   So, I thought about the options below:
   
   1. Creates by excluding methods that exist in the parent class.
    
      - For example, in the list of CategoricalIndex class, the list of functions available by inheriting the Index function (methods of base class) is removed.
   
   2. Includes all methods, and creates a document link to the parent class by identifying whether a document corresponding to the path of the parent class exists.
   
       - In my opinion, the part of "determining whether the corresponding document exists" seems to be difficult, and option 1 seems appropriate because the existing pandas documentation does not document all methods of the parent class. (ex> https://pandas.pydata.org/docs/reference/api/pandas.CategoricalIndex.categories.html?highlight=category)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org