Monday, 28 May 2018

Update table (Acid)

 Update table in Hive by using ACID properties

Prerequisites :-

(1) Table should be Bucketized.
(2) ORC format
(3) Need to set 2 properties:
     (a) SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
     (b) SET hive.support.concurrency=true;

Step: 1 => Create table using TBLPROPERTIES('transactional'=true');


create table testacid(id int, name string, rollno int) clustered by (id) into 4 buckets
stored as ORC tblproperties('transactional'='true') ;


Step : 2 =>  Insert records into table.

insert into testacid values(1,'dh1',121),(2,'dh2',122);

Step : 3 =>  update testacid set name='dheeren' where name='dh2';








Saturday, 12 May 2018

Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

Error :   Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

Solution: This issue is related with Metastore which will done by help of database e,g Derby, MySql etc...

First steps 
Verify hive-site.xml with below properties.

 <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:;databaseName=metastore_db;create=true</value>
   
<name>javax.jdo.option.ConnectionDriverName</name>
    <value>org.apache.derby.jdbc.EmbeddedDriver</value>
    
  <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://localhost:9000/user/hive/warehouse</value>

 <property>
    <name>hive.metastore.uris</name>
    Keep it BLANK

Now try again to start HIVE.If you are still getting the same problem.

Please follow below steps:
(1) Remove metastore_db directory if exists
     rm -r metastore_db

(2) Now initiate Schema by command schematool from hive bin directory 
       schematool -initSchema -dbType derby

(3)     IF Schematool complete successfully the please start hive.
         Now you can able to use database in hive.

Monday, 16 April 2018

Extracting and comparing data

To Extract the data in minimum part file use order by clause in Query

Exmaple:

INSERT OVERWRITE LOCAL DIRECTORY '/dataingestion/folder_name/data/ship' row format delimited FIELDS TERMINATED BY '~'
 SELECT
    wkly_agg.col1,
    wkly_agg.col2,
    SUM(wkly_agg.col3) as net_amount
 FROM db_name.POS_AGGR_FLG_TEMP_T1_2 wkly_agg
   WHERE  wkly_agg.aggregate_sales_flag ='Y'
   AND (wkly_agg.flg1 <>'Y' OR wkly_agg.flg2 <> 'N' OR wkly_agg.seller_flag <> 'Y')
  AND wkly_agg.trx_date between '2017-11-01' and '2017-11-30' 
 GROUP BY
  wkly_agg.col1,
  wkly_agg.col2
Order by 1

Output --
On edge node
/dataingestion/folder_name/data/ship/000000_0 

Move this file to windows by winscp or ftp
Now convert the file in .csv format.

Now this file have list of values in  column and paste other column which values you need to compare with this.

Now you need to use Excel function to compare the value.
(1) =MATCH(B3,A:A,0) It return row number of matched value.
(2) =INDEX(A:A,MATCH(B4,A:A,0)) Return the matched values.

A:A column  - Contains the those values which was extracted by Hive query.
B3 column - Contain the value you want to match with list.

Once it done and return correct match drag the formula on the column value which you want to match.

Friday, 13 April 2018

Hive cmds

(1) Getting specific value from String if delimited by some special char like "_ or #"

example: - p_2011

select split(partition_col,'_')[1] from tmp;

Now the value is separated in two array. Now you have to take the position of array and get that value.

In above case I need 2011 that's why I have given [1] becoz array index start from [0].

Tuesday, 6 March 2018

Alter field delimiter in hive

ALTER TABLE temp SET SERDEPROPERTIES ('field.delim' = '|');

example:

create table  temp(a string,b string,c string) row format delimited fields terminated by ',' stored as textfile;

CREATE TABLE `temp`(
  `a` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hdpdevnn/apps/hive/warehouse/gcw.db/temp'


ALTER TABLE temp SET SERDEPROPERTIES ('field.delim' = '|');


CREATE TABLE `temp`(
  `a` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hdpdevnn/apps/hive/warehouse/gcw.db/temp'