Thursday, 31 August 2017

Hive commands

If you want to list down all tables contains same string in same database then use below command.

hive > show tables like '*tablename*';

Similarly for Database

hive > show databases like '*dbname*';

Remove duplicate (redundancy)

Case: We have multiple columns in table but none of the column or combination of columns  in table provide unique row.

Solution: In that case you can use function called  ROW_NUMBER  and also used ROW_NUMBER with OVER (PARTITION BY )


Step 1:  First you have to generate unique number in subquery with ROW_NUMBER() OVER() as unique_number.

Step 2 : Now you have to generate ROW_NUMBER OVER() ( PARTITION BY unique_number , table_column1,table_column2)

Now from step 2 you will start getting unique row.

Example:

select pos4.*,
              if(pos4.Partner_Tier_Indicator_Code='T1','sell-thru','sell-to') as partner_selling_motion_measure_name,
              if(pos4.Partner_Tier_Indicator_Code='T1','sold to','sold to R2R') as address_type_name
        from
                  (SELECT pos3.*,
                             ROW_NUMBER() OVER (PARTITION BY pos3.rowid,pos3.partner_sales_transaction_date order by pos3.partner_sales_transaction_date  desc ) as rownum
                           FROM
                                 (SELECT
                                       pos2.*,
                                       if((ph.reporting_source_partner_level_3_identifier <> NULL or ph.reporting_source_partner_level_3_identifier <> ''),
                                       ph.reporting_source_partner_level_3_identifier,pos2.partner_sold_to_matched_siebel_row_identifier) as HQ_partner2
                                  FROM
                                      (SELECT  pos1.*,ROW_NUMBER()  over() as rowid
                                            FROM
                                                (SELECT
                                                  pos.*,
                                                  if((ph.reporting_source_partner_level_3_identifier <> NULL or ph.reporting_source_partner_level_3_identifier <> ''),
                                                  ph.reporting_source_partner_level_3_identifier,pos.reporting_partner_siebel_row_identifier) as HQ_partner
                                                FROM xx.pos_weekly_temp pos --xx.Fact_Channel_Point_Of_Sale_Weekly
                                                      LEFT OUTER JOIN  xx.dim_partner_hierarchy  ph
                                                      ON pos.reporting_partner_siebel_row_identifier=ph.reporting_partner_identifier
                                                      WHERE pos.Partner_Tier_Indicator_Code='T1'
                                                      AND   pos.region_code='AMER'
                                                )  pos1
                                  ) pos2
                                        LEFT OUTER JOIN  xx.dim_partner_hierarchy  ph
                                             ON pos2.partner_sold_to_matched_siebel_row_identifier=ph.reporting_partner_identifier
                                             AND pos2.region_code = ph.region_code
                     )  pos3
                            LEFT OUTER JOIN  xx.PAS_REPORTING_PARTNER_REFERENCE_temp rpr
                            ON pos3.HQ_partner2 = rpr.reporting_partner_siebel_row_identifier
                            AND pos3.channel_sub_segment_code = rpr.channel_sub_segment_code
                          ) pos4
                             where pos4.rownum = 1;