Monday, 16 April 2018

Extracting and comparing data

To Extract the data in minimum part file use order by clause in Query

Exmaple:

INSERT OVERWRITE LOCAL DIRECTORY '/dataingestion/folder_name/data/ship' row format delimited FIELDS TERMINATED BY '~'
 SELECT
    wkly_agg.col1,
    wkly_agg.col2,
    SUM(wkly_agg.col3) as net_amount
 FROM db_name.POS_AGGR_FLG_TEMP_T1_2 wkly_agg
   WHERE  wkly_agg.aggregate_sales_flag ='Y'
   AND (wkly_agg.flg1 <>'Y' OR wkly_agg.flg2 <> 'N' OR wkly_agg.seller_flag <> 'Y')
  AND wkly_agg.trx_date between '2017-11-01' and '2017-11-30' 
 GROUP BY
  wkly_agg.col1,
  wkly_agg.col2
Order by 1

Output --
On edge node
/dataingestion/folder_name/data/ship/000000_0 

Move this file to windows by winscp or ftp
Now convert the file in .csv format.

Now this file have list of values in  column and paste other column which values you need to compare with this.

Now you need to use Excel function to compare the value.
(1) =MATCH(B3,A:A,0) It return row number of matched value.
(2) =INDEX(A:A,MATCH(B4,A:A,0)) Return the matched values.

A:A column  - Contains the those values which was extracted by Hive query.
B3 column - Contain the value you want to match with list.

Once it done and return correct match drag the formula on the column value which you want to match.

Friday, 13 April 2018

Hive cmds

(1) Getting specific value from String if delimited by some special char like "_ or #"

example: - p_2011

select split(partition_col,'_')[1] from tmp;

Now the value is separated in two array. Now you have to take the position of array and get that value.

In above case I need 2011 that's why I have given [1] becoz array index start from [0].