Load CSV into Hive

简介

将本地TEMP.csv文件中数据导入Hive中.

TEMP.csv文件中每行包含id,name,location, create_date, create_timestamp, longitude, latitude.

使用OpenCSVSerde

-- 建表
CREATE EXTERNAL TABLE test_csv_opencsvserde (
    id INT,
    name STRING,
    location STRING,
    create_date STRING,
    create_timestamp STRING,
    longitude DOUBLE,
    latitude DOUBLE
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties(
    "separatorChar"=",",
    "quoteChar"="\"",
    "escapeChar"="\\"
)
STORED AS TEXTFILE
LOCATION 'oss://test-bucket-julian-1/test_csv_serde_1';

-- 加载数据
LOAD DATA LOCAL INPATH '/home/TEMP.csv' OVERWRITE INTO TABLE test_csv_opencsvserde ;

设想中创建的表中id列为INT类型, longitudelatitudeDOUBLE类型, 而其他列为STRING类型.

但是经过DESC发现, 表中所有列都为STRING类型, 与预期不符.

>>> DESC test_csv_opencsvserde ;
col_name        data_type       comment
id     string                  from deserializer
name   string                  from deserializer
location        string                  from deserializer
...

造成这种情况的原因是因为OpenCSVSerde的使用.

This SerDe treats all columns to be of type String. Even if you create a table with non-string column types using this SerDe, the DESCRIBE TABLE output would show string column type. The type information is retrieved from the SerDe. To convert columns to the desired type in a table, you can create a view over the table that does the CAST to the desired type.
— From CSV Serde

简而言之就是使用OpenCSVSerde的话, 会导致所有列都是STRING类型, 即使你创建表的时候确定了列的数据类型.

指定分隔符

-- 建表
CREATE EXTERNAL TABLE test_csv_opencsvserde (
    id INT,
    name STRING,
    location STRING,
    create_date STRING,
    create_timestamp STRING,
    longitude DOUBLE,
    latitude DOUBLE
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

-- 加载数据
LOAD DATA LOCAL INPATH '/home/TEMP.csv' OVERWRITE INTO TABLE test_csv_opencsvserde ;

通过指定分隔符, 将csv文件看做是以行为区分, 分隔符指定的普通文本文件进行加载, 可以得到满意的效果.

References

OpenCSVSerDe处理CSV文件
Hive “OpenCSVSerde” Changes Your Table Definition
Why does all columns get created as string when I use OpenCSVSerde in Hive?

发表评论

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据