Wednesday, 15 June 2022

Compare array list with Dataframe column

# If you have array list column of dataframe and need to check or compare  another element of same dataframe then you can achieve this by using  "expr" function for more details please find below code.


===================================

from pyspark.sql.functions import expr


df_desc_split=df_trx.withColumn('split_desc',sf.split(sf.col('description'),' '))


df_name_flg=df_desc_split.withColumn("first_name_flag", sf.expr("array_contains(split_desc, FIRSTNAME)"))

.withColumn("middle_name_flag", sf.expr("array_contains(split_desc, MIDDLE)"))


Explanation :

 I have description field containing string like "My name is dheerendra" which I split and keep in array field "split_desc" like ['My','name','is','dheerendra']

Now if I have another column of same dataframe e.g "name" which contain "dheerendra".

|name   |split_desc|

|dheerendra| ['My','name','is','dheerendra']|

If I need to check the existence of 'dheerendra' in  split_desc field then need to use "expr" function along with array_contains functions

df_desc_split.withColumn("first_name_flag", sf.expr("array_contains(split_desc, FIRSTNAME)"))



No comments:

Post a Comment