Tuesday, December 2, 2014

How to load text file into ORC hive table

We can not simply load the text file into an ORC hive table because "load data into" simply copies the files to the hive data file. The file should be ORC file if you want to load it into a ORC hive table.

However currently Hive does not validate the storage format when you run "load data into", which means if you accidentally load a plain text file into a ORC hive table, below error messages will show up:
CREATE TABLE IF NOT EXISTS orctest (
id string,
id2 string,
id3 string,
id4 string
)
STORED AS ORC;

load data local inpath "/opt/tmp/testload2.txt" into table orctest;

hive> select * from orctest limit 1;
OK
Failed with exception java.io.IOException:java.lang.RuntimeException: serious problem
Time taken: 0.279 seconds

The correct way is to firstly load into a intermediate normal hive table with text format and then insert overwrite into the hive ORC table.
For example:
CREATE TABLE IF NOT EXISTS orctest_text (
id string,
id2 string,
id3 string,
id4 string
)
STORED AS TEXTFILE;

load data local inpath "/opt/tmp/testload2.txt" into table orctest_text;

INSERT OVERWRITE TABLE orctest SELECT * FROM orctest_text;

No comments:

Post a Comment

Popular Posts