You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Subhajit Purkayastha <sp...@p3si.net> on 2016/08/23 22:46:29 UTC
DataFrame Data Manipulation - Based on a timestamp column Not Working
Using spark 2.0 & scala 2.11.8, I have a DataFrame with a timestamp column
root
|-- ORG_ID: integer (nullable = true)
|-- HEADER_ID: integer (nullable = true)
|-- ORDER_NUMBER: integer (nullable = true)
|-- LINE_ID: integer (nullable = true)
|-- LINE_NUMBER: integer (nullable = true)
|-- ITEM_TYPE_CODE: string (nullable = true)
|-- ORGANIZATION_ID: integer (nullable = true)
|-- INVENTORY_ITEM_ID: integer (nullable = true)
|-- SCHEDULE_SHIP_DATE: timestamp (nullable = true)
|-- ORDER_QUANTITY_UOM: string (nullable = true)
|-- UNIT_SELLING_PRICE: double (nullable = true)
|-- OPEN_QUANTITY: double (nullable = true)
[204,94468,56721,197328,1,STANDARD,207,149,2004-01-08
23:59:59.0,Ea,1599.0,28.0]
[204,94468,56721,197331,2,STANDARD,207,151,2004-01-08
23:59:59.0,Ea,1899.05,40.0]
[204,94468,56721,197332,3,STANDARD,207,436,2004-01-08
23:59:59.0,Ea,300.0,24.0]
[204,94468,56721,197335,4,STANDARD,207,3751,2004-01-08
23:59:59.0,Ea,380.0,24.0]
I want to manipulate the dataframe data based on a parameter =
demand_time_fence_date
var demand_timefence_end_date_instance = new
MutableDateTime(planning_start_date)
var demand_timefence_days =
demand_timefence_end_date_instance.addDays(demand_time_fence)
val demand_timefence_end_date =
ISODateTimeFormat.yearMonthDay().print(demand_timefence_end_date_instance)
var filter_stmt = "from_unixtime(SCHEDULE_SHIP_DATE,'yyyy-MM-dd') >= "+
demand_timefence_end_date
val sales_order_dataFrame =
sales_order_base_dataFrame.filter(filter_stmt).limit(10)
What is the correct syntax to pass the parameter value?
The above filter statement is not working to restrict the dataset
Thanks,
Subhajit
Re: DataFrame Data Manipulation - Based on a timestamp column Not
Working
Posted by Bedrytski Aliaksandr <sp...@bedryt.ski>.
Hi Subhajit,
you may try to use sql queries instead of helper methods:
> sales_order_base_dataFrame.registerTempTable("sales_orders")
>
> val result = sqlContext.sql("""
> SELECT *
> FROM sales_orders
> WHERE unix_timestamp(SCHEDULE_SHIP_DATE,'_yyyy_-MM-_dd_') >=
> unix_timestamp(demand_timefence_end_date ,'_yyyy_-MM-_dd_')
> """)
This is if demand_timefence_end_date has '_yyyy_-MM-_dd_' date format
Regards,
--
Bedrytski Aliaksandr
spark@bedryt.ski
On Wed, Aug 24, 2016, at 00:46, Subhajit Purkayastha wrote:
> Using spark 2.0 & scala 2.11.8, I have a DataFrame with a
> timestamp column
>
> root
> |-- ORG_ID: integer (nullable = true)
> |-- HEADER_ID: integer (nullable = true)
> |-- ORDER_NUMBER: integer (nullable = true)
> |-- LINE_ID: integer (nullable = true)
> |-- LINE_NUMBER: integer (nullable = true)
> |-- ITEM_TYPE_CODE: string (nullable = true)
> |-- ORGANIZATION_ID: integer (nullable = true)
> |-- INVENTORY_ITEM_ID: integer (nullable = true)
> |-- SCHEDULE_SHIP_DATE: timestamp (nullable = true)
> |-- ORDER_QUANTITY_UOM: string (nullable = true)
> |-- UNIT_SELLING_PRICE: double (nullable = true)
> |-- OPEN_QUANTITY: double (nullable = true)
>
> [204,94468,56721,197328,1,STANDARD,207,149,2004-01-08
> 23:59:59.0,Ea,1599.0,28.0]
> [204,94468,56721,197331,2,STANDARD,207,151,2004-01-08
> 23:59:59.0,Ea,1899.05,40.0]
> [204,94468,56721,197332,3,STANDARD,207,436,2004-01-08
> 23:59:59.0,Ea,300.0,24.0]
> [204,94468,56721,197335,4,STANDARD,207,3751,2004-01-08
> 23:59:59.0,Ea,380.0,24.0]
>
> I want to manipulate the dataframe data based on a parameter =
> demand_time_fence_date
>
> *var* demand_timefence_end_date_instance = *new*
> MutableDateTime(planning_start_date)
> *var* demand_timefence_days =
> demand_timefence_end_date_instance.addDays(demand_time_fence)
> *val* demand_timefence_end_date = ISODateTimeFormat.yearMonthDay().pr-
> int(demand_timefence_end_date_instance)
>
> _var_ filter_stmt = "from_unixtime(SCHEDULE_SHIP_DATE,'_yyyy_-MM-
> _dd_') >= "+ demand_timefence_end_date
>
> *val* sales_order_dataFrame =
> sales_order_base_dataFrame.filter(filter_stmt).limit(10)
>
> What is the correct syntax to pass the parameter value?
>
> The above filter statement is not working to restrict the dataset
>
> Thanks,
>
> Subhajit
>
>