Thursday, March 26, 2015

COPY into PostgreSQL fails with error "invalid byte sequence for encoding "UTF8""

Env:

PostgreSQL or Greenplum

Symptom:

COPY from a file into a table fails with error:
ERROR:  invalid byte sequence for encoding "UTF8": 0x81

Root Cause:

There could be several causes:
1. client_encoding and server_encoding are not set to UTF8.
2. File is using other character set.
3. File contains back-slash "\" , but it not used as escape string.

Solution:

1. Make sure both client_encoding and server_encoding are set to UTF8.
template1=# show client_encoding;
 client_encoding
-----------------
 UTF8
(1 row)

template1=# show server_encoding;
 server_encoding
-----------------
 UTF8
(1 row)
2. Convert file to UTF8 character set.
iconv -f original_charset -t utf-8 originalfile > newfile
3. Add "ESCAPE 'OFF'" into COPY SQL if the back-slash is not used as escape string.
copy mytable from stdin ESCAPE 'OFF';

8 comments:

Popular Posts