Tuesday, December 2, 2014

How to load text file into ORC hive table

We can not simply load the text file into an ORC hive table because "load data into" simply copies the files to the hive data file. The file should be ORC file if you want to load it into a ORC hive table.

However currently Hive does not validate the storage format when you run "load data into", which means if you accidentally load a plain text file into a ORC hive table, below error messages will show up:
CREATE TABLE IF NOT EXISTS orctest (
id string,
id2 string,
id3 string,
id4 string
)
STORED AS ORC;

load data local inpath "/opt/tmp/testload2.txt" into table orctest;

hive> select * from orctest limit 1;
OK
Failed with exception java.io.IOException:java.lang.RuntimeException: serious problem
Time taken: 0.279 seconds

The correct way is to firstly load into a intermediate normal hive table with text format and then insert overwrite into the hive ORC table.
For example:
CREATE TABLE IF NOT EXISTS orctest_text (
id string,
id2 string,
id3 string,
id4 string
)
STORED AS TEXTFILE;

load data local inpath "/opt/tmp/testload2.txt" into table orctest_text;

INSERT OVERWRITE TABLE orctest SELECT * FROM orctest_text;

2 comments:

  1. Really wonderful blog completely enjoyed reading and learning to gain the vast knowledge. Eventually, this blog helps in developing certain skills which in turn helpful in implementing those skills. Thanking the blogger for delivering such a beautiful content and keep posting the contents in upcoming days.

    data science training institute in bangalore

    ReplyDelete
  2. When your website or blog goes live for the first time, it is exciting. That is until you realize no one but you and your. file storage

    ReplyDelete

Popular Posts