Thursday, 27 April 2017

ORC format

hive> create external  table test_pos_ext2(
    > trxno string,
    > trx_dt string,
    > reporter_id  string,
    > buyer_id string
    > ) row format delimited fields terminated by ','
    > LOCATION '/gcw/testing/pos_test_ext/test';
OK
Time taken: 0.153 seconds
hive> select * from test_pos_ext2;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://hdpdevnn/gcw/testing/pos_test_ext/test/pos3.txt. Invalid postscript.


Solutiion:--


Add - STORED AS TEXTFILE in table while creating.

hive> create external  table test_pos_ext2(
    > trxno string,
    > trx_dt string,
    > reporter_id  string,
    > buyer_id string
    > ) row format delimited fields terminated by ','
    > STORED AS TEXTFILE
    > LOCATION '/gcw/testing/pos_test_ext/test';

hive> select * from test_pos_ext2;
OK
9106956188      3/7/2017        3-HWJW-516      3-2SS-2763
9106956189      3/7/2017        3-HWJW-516      3-2SS-2763
Time taken: 0.12 seconds, Fetched: 2 row(s)

Sunday, 2 April 2017

Copy Partition table

If you want to copy existing Partition table in Hive from one cluster to another cluster or copy from one database to another database on same cluster.

Suppose you have Table t1 in database testdb and you have load data in partition table from local directory.

create table testdb.t1(a string, b string) row format delimited fields terminated by ',';
load data local inpath '/home/dheerendra/working/Data/data1' overwrite into table testdb.t1;

Insert data in Partition table.

set hive.exec.dynamic.partition.mode=nonstrict;
insert into table testdb.tp2 partition(b) select a,b from testdb.t2;

Copy data from one cluster to another cluster using  Distcp command. I my case cluster is same.

hadoop distcp   /user/hive/warehouse/testdb.db/tp /user/hive/warehouse/db1.db/

When Cluster is different then.

hadoop distcp   hdfs://cluster1/user/hive/warehouse/testdb.db/tp  hdfs://cluster2/user/hive/warehouse/db1.db/


Create same table structure on destination database.

create table db1.tp(a string) partitioned by(b string);


Use MSCK command to add partition with metastore


msck repair table tp;

If you get error like:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
 Take below steps:

set hive.msck.path.validation=ignore;
MSCK REPAIR TABLE table_name;