You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by "Rohit-pawar902 (via GitHub)" <gi...@apache.org> on 2023/02/03 13:55:30 UTC

[GitHub] [superset] Rohit-pawar902 opened a new issue, #22981: Redundant column problem comes in csv while scheduling in email.

Rohit-pawar902 opened a new issue, #22981:
URL: https://github.com/apache/superset/issues/22981

   Hello
   
   I am scheduling an chart(in table form) in email as csv form , I am getting an extra column which has count from 0 to n-row . 
   **Like :**
   
   ![Screenshot from 2023-02-03 19-10-58](https://user-images.githubusercontent.com/72196393/216618803-c82b78a7-995e-4934-a514-6bb0933431f2.png)
   
   **Findings : ** 
   
   - When I directly download csv from chart I did'nt getting that extra one column
   - And one scenrio :  In case of Email it avoid the extra column add in csv only if the chart is not in table form.
      Exactly  I would say when the chart is in form of bar/any other visvalization it will not add extra column in csv while sending in email.
      
   **Using : ** superset-2.0.0-dev
   
   My-Question : 
   
   - It is intantionally did ? 
   - Or any thing I am forgeeting to set by which i can avoid this extra-column(with numbering)
   - Or this thing is resolved in latest realses?  
       
   
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Antonio-RiveroMartnez commented on issue #22981: Redundant column problem comes in csv while scheduling in email.

Posted by "Antonio-RiveroMartnez (via GitHub)" <gi...@apache.org>.
Antonio-RiveroMartnez commented on issue #22981:
URL: https://github.com/apache/superset/issues/22981#issuecomment-1440410419

   Hey @Rohit-pawar902 ! Thanks for addressing the issue 👏 . I do have a couple of thoughts regarding this issue and how to address it though:
   
   1. There's a bug reported already for the extra column in the CSV: https://github.com/apache/superset/issues/21568 , so, not sure if we want to treat this as an `opt in/out` config? I mean, if it's treated as a bug, you shouldn't be able to `opt out` from the fix.
   2. If we decided to go with the `opt in/out`, should we instead use a Feature Flag instead? to be consistent with other `True/False` values in the config.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] eschutho commented on issue #22981: Redundant column problem comes in csv while scheduling in email.

Posted by "eschutho (via GitHub)" <gi...@apache.org>.
eschutho commented on issue #22981:
URL: https://github.com/apache/superset/issues/22981#issuecomment-1423455729

   Thank you @Rohit-pawar902 for the issue and also the fix! We can implement what you suggested here, but you're also welcome to put up your own PR if you like. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Rohit-pawar902 commented on issue #22981: Redundant column problem comes in csv while scheduling in email.

Posted by "Rohit-pawar902 (via GitHub)" <gi...@apache.org>.
Rohit-pawar902 commented on issue #22981:
URL: https://github.com/apache/superset/issues/22981#issuecomment-1418900752

   ```
   def get_data(self, df: pd.DataFrame) -> Union[str, List[Dict[str, Any]]]:
           if self._query_context.result_format == ChartDataResultFormat.CSV:
               include_index = not isinstance(df.index, pd.RangeIndex)
               columns = list(df.columns)
               verbose_map = self._qc_datasource.data.get("verbose_map", {})
               if verbose_map:
                   df.columns = [verbose_map.get(column, column) for column in columns]
               result = csv.df_to_escaped_csv(
                   df, index=include_index, **config["CSV_EXPORT"]
               )
               return result or ""
   
           return df.to_dict(orient="records")
   ```
   
   I have tracked the issue but I am here stuck that both in the case of
   csv -direct-download
   csv -email
   both execute this function and genrate csv which has no index-colmn because **Include_index** is always false here
   but some-how in case of csv-email from any else method index column is created again
   
   I think while attching in email or while placing file on disk it goes with index-col
   
   But got nothing , **any suggestion realted to this bug**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [I] Redundant column problem comes in csv while scheduling in email. [superset]

Posted by "rusackas (via GitHub)" <gi...@apache.org>.
rusackas commented on issue #22981:
URL: https://github.com/apache/superset/issues/22981#issuecomment-1958317745

   @rohitpawar2811 / @Antonio-RiveroMartnez This one seems to have gotten away from it. Is there any work left to be done on the linked PR that would close this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Rohit-pawar902 commented on issue #22981: Redundant column problem comes in csv while scheduling in email.

Posted by "Rohit-pawar902 (via GitHub)" <gi...@apache.org>.
Rohit-pawar902 commented on issue #22981:
URL: https://github.com/apache/superset/issues/22981#issuecomment-1420436972

   @eschutho 
   this bug is created because of PostProcessing logic while downloading csv througth email-scheduling.
   
   Can we stop PostProcessing from setting or from any other thing without modifing inside the code.?
   @rusackas 
   I have to ask that is it really bug,Can I make pull request for removing index column in email-csv also.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Rohit-pawar902 commented on issue #22981: Redundant column problem comes in csv while scheduling in email.

Posted by "Rohit-pawar902 (via GitHub)" <gi...@apache.org>.
Rohit-pawar902 commented on issue #22981:
URL: https://github.com/apache/superset/issues/22981#issuecomment-1443370380

   @Antonio-RiveroMartnez 
   Can you help me that is I have to create new PR ? for changes(in which option in/out not there directly we would make index= false ) ,Or I can change the current PR 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Rohit-pawar902 commented on issue #22981: Redundant column problem comes in csv while scheduling in email.

Posted by "Rohit-pawar902 (via GitHub)" <gi...@apache.org>.
Rohit-pawar902 commented on issue #22981:
URL: https://github.com/apache/superset/issues/22981#issuecomment-1443358325

    Yep I am convinced with your 1 St statement that it's bug and we don't have to give opt out thing.
   
   Actually previously I am not sure that it's really a bug or it is intentionally written like that , keeping this in mind I gave the opt in/out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Rohit-pawar902 commented on issue #22981: Redundant column problem comes in csv while scheduling in email.

Posted by "Rohit-pawar902 (via GitHub)" <gi...@apache.org>.
Rohit-pawar902 commented on issue #22981:
URL: https://github.com/apache/superset/issues/22981#issuecomment-1439939218

   PR : https://github.com/apache/superset/pull/23155


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Antonio-RiveroMartnez commented on issue #22981: Redundant column problem comes in csv while scheduling in email.

Posted by "Antonio-RiveroMartnez (via GitHub)" <gi...@apache.org>.
Antonio-RiveroMartnez commented on issue #22981:
URL: https://github.com/apache/superset/issues/22981#issuecomment-1443688718

   Hey @Rohit-pawar902 ! Yes, you can make your changes in this PR, there's no need for a new one, make sure tho, you update whatever feels necessary in the PR Template to fit your new scope, i.e, title, description, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Rohit-pawar902 commented on issue #22981: Redundant column problem comes in csv while scheduling in email.

Posted by "Rohit-pawar902 (via GitHub)" <gi...@apache.org>.
Rohit-pawar902 commented on issue #22981:
URL: https://github.com/apache/superset/issues/22981#issuecomment-1420219045

   @eschutho 
   Hii I am working on this bug of superset , I need an help
   Can you would tell location of this api endpoint
   http://superset:8088/api/v1/chart/2470/data/?format=csv&type=post_processed&force=true
   where could I get in code
   Can anyone ping me the location or folder where this api exists


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] Rohit-pawar902 commented on issue #22981: Redundant column problem comes in csv while scheduling in email.

Posted by "Rohit-pawar902 (via GitHub)" <gi...@apache.org>.
Rohit-pawar902 commented on issue #22981:
URL: https://github.com/apache/superset/issues/22981#issuecomment-1420558994

   Proposed Changes :  It will facilitate a user user to chose that it wants index in csv or not while emailing througth
                                      CSV_INDEX={"index":False} //In superset_config.py
   **Changes did In superset/charts/post_processing.py**   
   ```
   from superset import (
       app
   )
   
   config = app.config
   ```
   ```
   def apply_post_process(
       result: Dict[Any, Any],
       form_data: Optional[Dict[str, Any]] = None,
       datasource: Optional["BaseDatasource"] = None,
   ) -> Dict[Any, Any]:
       form_data = form_data or {}
   
       viz_type = form_data.get("viz_type")
       if viz_type not in post_processors:
           return result
   
       post_processor = post_processors[viz_type]
   
       for query in result["queries"]:
           if query["result_format"] == ChartDataResultFormat.JSON:
               df = pd.DataFrame.from_dict(query["data"])
           elif query["result_format"] == ChartDataResultFormat.CSV:
               df = pd.read_csv(StringIO(query["data"]))
           else:
               raise Exception(f"Result format {query['result_format']} not supported")
   
           processed_df = post_processor(df, form_data, datasource)
   
           query["colnames"] = list(processed_df.columns)
           query["indexnames"] = list(processed_df.index)
           query["coltypes"] = extract_dataframe_dtypes(processed_df, datasource)
           query["rowcount"] = len(processed_df.index)
   
           # Flatten hierarchical columns/index since they are represented as
           # `Tuple[str]`. Otherwise encoding to JSON later will fail because
           # maps cannot have tuples as their keys in JSON.
           processed_df.columns = [
               " ".join(str(name) for name in column).strip()
               if isinstance(column, tuple)
               else column
               for column in processed_df.columns
           ]
           processed_df.index = [
               " ".join(str(name) for name in index).strip()
               if isinstance(index, tuple)
               else index
               for index in processed_df.index
           ]
   
           if query["result_format"] == ChartDataResultFormat.JSON:
               query["data"] = processed_df.to_dict()
           elif query["result_format"] == ChartDataResultFormat.CSV:
               buf = StringIO()
              **#Changed Part**
               processed_df.to_csv(buf, **config["CSV_INDEX"])
               buf.seek(0)
               query["data"] = buf.getvalue()
   
       return result 
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org