To Extract the data in minimum part file use order by clause in Query
Exmaple:
INSERT OVERWRITE LOCAL DIRECTORY '/dataingestion/folder_name/data/ship' row format delimited FIELDS TERMINATED BY '~'
SELECT
wkly_agg.col1,
wkly_agg.col2,
SUM(wkly_agg.col3) as net_amount
FROM db_name.POS_AGGR_FLG_TEMP_T1_2 wkly_agg
WHERE wkly_agg.aggregate_sales_flag ='Y'
AND (wkly_agg.flg1 <>'Y' OR wkly_agg.flg2 <> 'N' OR wkly_agg.seller_flag <> 'Y')
AND wkly_agg.trx_date between '2017-11-01' and '2017-11-30'
GROUP BY
wkly_agg.col1,
wkly_agg.col2
Order by 1
Output --
On edge node
/dataingestion/folder_name/data/ship/000000_0
Move this file to windows by winscp or ftp
Now convert the file in .csv format.
Now this file have list of values in column and paste other column which values you need to compare with this.
Now you need to use Excel function to compare the value.
(1) =MATCH(B3,A:A,0) It return row number of matched value.
(2) =INDEX(A:A,MATCH(B4,A:A,0)) Return the matched values.
A:A column - Contains the those values which was extracted by Hive query.
B3 column - Contain the value you want to match with list.
Once it done and return correct match drag the formula on the column value which you want to match.
Exmaple:
INSERT OVERWRITE LOCAL DIRECTORY '/dataingestion/folder_name/data/ship' row format delimited FIELDS TERMINATED BY '~'
SELECT
wkly_agg.col1,
wkly_agg.col2,
SUM(wkly_agg.col3) as net_amount
FROM db_name.POS_AGGR_FLG_TEMP_T1_2 wkly_agg
WHERE wkly_agg.aggregate_sales_flag ='Y'
AND (wkly_agg.flg1 <>'Y' OR wkly_agg.flg2 <> 'N' OR wkly_agg.seller_flag <> 'Y')
AND wkly_agg.trx_date between '2017-11-01' and '2017-11-30'
GROUP BY
wkly_agg.col1,
wkly_agg.col2
Order by 1
Output --
On edge node
/dataingestion/folder_name/data/ship/000000_0
Move this file to windows by winscp or ftp
Now convert the file in .csv format.
Now this file have list of values in column and paste other column which values you need to compare with this.
Now you need to use Excel function to compare the value.
(1) =MATCH(B3,A:A,0) It return row number of matched value.
(2) =INDEX(A:A,MATCH(B4,A:A,0)) Return the matched values.
A:A column - Contains the those values which was extracted by Hive query.
B3 column - Contain the value you want to match with list.
Once it done and return correct match drag the formula on the column value which you want to match.