通過HiveServer2訪問Hive

Hive系列文章

Hive表的基本操作
Hive中的集合數(shù)據(jù)類型
Hive動態(tài)分區(qū)詳解
hive中orc格式表的數(shù)據(jù)導(dǎo)入
Java通過jdbc連接hive
通過HiveServer2訪問Hive
SpringBoot連接Hive實(shí)現(xiàn)自助取數(shù)
hive關(guān)聯(lián)hbase表
Hive udf 使用方法
Hive基于UDF進(jìn)行文本分詞
Hive窗口函數(shù)row number的用法
數(shù)據(jù)倉庫之拉鏈表

先解釋一下幾個名詞：

metadata ：hive元數(shù)據(jù)，即hive定義的表名，字段名，類型，分區(qū)，用戶這些數(shù)據(jù)。一般存儲關(guān)系型書庫mysql中，在測試階段也可以用hive內(nèi)置Derby數(shù)據(jù)庫。
metastore ：hivestore服務(wù)端。主要提供將DDL，DML等語句轉(zhuǎn)換為MapReduce，提交到hdfs中。
hiveserver2：hive服務(wù)端。提供hive服務(wù)?？蛻舳丝梢酝ㄟ^beeline，jdbc（即用java代碼鏈接）等多種方式鏈接到hive。
beeline：hive客戶端鏈接到hive的一個工具?？梢岳斫獬蒻ysql的客戶端。如：navite cat 等。

其它語言訪問hive主要是通過hiveserver2服務(wù)，HiveServer2(HS2)是一種能使客戶端執(zhí)行Hive查詢的服務(wù)。HiveServer2可以支持對 HiveServer2 的嵌入式和遠(yuǎn)程訪問，支持多客戶端并發(fā)和身份認(rèn)證。旨在為開放API客戶端（如JDBC和ODBC）提供更好的支持。

會啟動一個hive服務(wù)端默認(rèn)端口為：10000，可以通過beeline，jdbc，odbc的方式鏈接到hive。hiveserver2啟動的時候會先檢查有沒有配置hive.metastore.uris，如果沒有會先啟動一個metastore服務(wù)，然后在啟動hiveserver2。如果有配置hive.metastore.uris。會連接到遠(yuǎn)程的metastore服務(wù)。這種方式是最常用的。部署在圖如下：

Python連接Hive

Python3訪問hive需要安裝的依賴有：

pip3 install thrift
pip3 install PyHive
pip3 install sasl
pip3 install thrift_sasl

這里有一個Python訪問Hive的工具類：

# -*- coding:utf-8 -*-

from pyhive import hive

class HiveClient(object):
    """docstring for HiveClient"""
    def __init__(self, host='hadoop-master',port=10000,username='hadoop',password='hadoop',database='hadoop',auth='LDAP'):
        """ 
        create connection to hive server2 
        """  
        self.conn = hive.Connection(host=host,  
            port=port,  
            username=username,  
            password=password,  
            database=database,
            auth=auth) 

    def query(self, sql):
        """ 
        query 
        """ 
        with self.conn.cursor() as cursor: 
            cursor.execute(sql)
            return cursor.fetchall()

    def insert(self, sql):
        """
        insert action
        """
        with self.conn.cursor() as cursor:
            cursor.execute(sql)
            # self.conn.commit()
            # self.conn.rollback()

    def close(self):
        """ 
        close connection 
        """  
        self.conn.close()

使用的時候，只需要導(dǎo)入，然后創(chuàng)建一個對象實(shí)例即可，傳入sql調(diào)用query方法完成查詢。

# 拿一個連接
hclient = hive.HiveClient()

# 執(zhí)行查詢操作
...

# 關(guān)閉連接
hclient.close()

注意：在insert插入方法中，我將self.conn.commit()和self.conn.rollback()即回滾注釋了，這是傳統(tǒng)關(guān)系型數(shù)據(jù)庫才有的事務(wù)操作，Hive中是不支持的。

Java連接Hive

Java作為大數(shù)據(jù)的基礎(chǔ)語言，連接hive自然是支持的很好的，這里介紹通過jdbc和mybatis兩種方法連接hive。

1. Jdbc連接

java通過jdbc連接hiveserver，跟傳統(tǒng)的jdbc連接mysql方法一樣。

需要hive-jdbc依賴：

<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-jdbc</artifactId>
    <version>1.2.1</version>
</dependency>

代碼跟連接mysql套路一樣，都是使用的DriverManager.getConnection(url, username, password)：

@NoArgsConstructor
@AllArgsConstructor
@Data
@ToString
public class HiveConfigModel {

    private String url = "jdbc:hive2://localhost:10000";
    private String username = "hadoop";
    private String password = "hadoop";

}

@Test
public void test(){
    // 初始化配置
    HiveConfigModel hiveConfigModel = ConfigureContext.getInstance("hive-config.properties")
            .addClass(HiveConfigModel.class)
            .getModelProperties(HiveConfigModel.class);

    try {
        Connection conn = DriverManager.getConnection(hiveConfigModel.getUrl(),
                hiveConfigModel.getUsername(), hiveConfigModel.getPassword());

        String sql = "show tables";
        PreparedStatement preparedStatement = conn.prepareStatement(sql);
        ResultSet rs = preparedStatement.executeQuery();
        List<String> tables = new ArrayList<>();
        while (rs.next()){
            tables.add(rs.getString(1));
        }

        System.out.println(tables);
    } catch (SQLException e) {
        e.printStackTrace();
    }
}

在hive-jdbc-1.2.1.jar的META-INF下有個services目錄，里面有個java.sql.Driver文件，內(nèi)容是：

org.apache.hive.jdbc.HiveDriver

java.sql.DriverManager使用spi實(shí)現(xiàn)了服務(wù)接口與服務(wù)實(shí)現(xiàn)分離以達(dá)到解耦，在這里jdbc的實(shí)現(xiàn)org.apache.hive.jdbc.HiveDriver根據(jù)java.sql.Driver提供的統(tǒng)一規(guī)范實(shí)現(xiàn)邏輯?？蛻舳耸褂胘dbc時不需要去改變代碼，直接引入不同的spi接口服務(wù)即可。

DriverManager.getConnection(url, username, password)

這樣即可拿到連接，前提是具體實(shí)現(xiàn)需要遵循相應(yīng)的spi規(guī)范。

2. 整合mybatis

通常都會使用mybatis來做dao層訪問數(shù)據(jù)庫，訪問hive也是類似的。

配置文件sqlConfig.xml：

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE configuration PUBLIC "-//mybatis.org//DTD Config 3.0//EN"
        "http://mybatis.org/dtd/mybatis-3-config.dtd">
<configuration>
    <environments default="production">
        <environment id="production">
            <transactionManager type="JDBC"/>
            <dataSource type="POOLED">
                <property name="driver" value="org.apache.hive.jdbc.HiveDriver"/>
                <property name="url" value="jdbc:hive2://master:10000/default"/>
                <property name="username" value="hadoop"/>
                <property name="password" value="hadoop"/>
            </dataSource>
        </environment>
    </environments>
    <mappers>
        <mapper resource="mapper/hive/test/test.xml"/>
    </mappers>
</configuration>

mapper代碼省略，實(shí)現(xiàn)代碼：

public classTestMapperImpl implements TestMapper {

    private static SqlSessionFactory sqlSessionFactory = HiveSqlSessionFactory.getInstance().getSqlSessionFactory();

    @Override
    public int getTestCount(String dateTime) {
        SqlSession sqlSession = sqlSessionFactory.openSession();
        TestMapper testMapper = sqlSession.getMapper(TestMapper.class);

        int count = testMapper.getTestCount(dateTime);

        sqlSession.close();

        return count;
    }
}

作者：柯廣的網(wǎng)絡(luò)日志

微信公眾號：Java大數(shù)據(jù)與數(shù)據(jù)倉庫

在线午夜精品自拍小视频_无码av无码专区线_亚洲无码精品人妻_人人澡欧美一区

大數(shù)據(jù)

Python連接Hive

Java連接Hive

1. Jdbc連接

2. 整合mybatis