thriftpy.protocol.exc.TProtocolException: TProtocolException(type=4)

问题描述

通过impyla连接hiveserver2的时候, 抛出异常:

Traceback (most recent call last):
File "/Users/admin/projects/impyla_test/connect.py", line 3, in
cursor = conn.cursor()
File "/Users/admin/analytics/venv/lib/python3.4/site-packages/impala/hiveserver2.py", line 88, in cursor
session = self.service.open_session(user, configuration)
File "/Users/admin/analytics/venv/lib/python3.4/site-packages/impala/hiveserver2.py", line 798, in open_session
resp = self._rpc('OpenSession', req)
File "/Users/admin/analytics/venv/lib/python3.4/site-packages/impala/hiveserver2.py", line 724, in _rpc
response = self._execute(func_name, request)
File "/Users/admin/analytics/venv/lib/python3.4/site-packages/impala/hiveserver2.py", line 741, in _execute
return func(request)
File "/Users/admin/analytics/venv/lib/python3.4/site-packages/thriftpy/thrift.py", line 159, in _req
return self._recv(_api)
File "/Users/admin/analytics/venv/lib/python3.4/site-packages/thriftpy/thrift.py", line 171, in _recv
fname, mtype, rseqid = self._iprot.read_message_begin()
File "/Users/admin/analytics/venv/lib/python3.4/site-packages/thriftpy/protocol/binary.py", line 364, in read_message_begin
self.trans, strict=self.strict_read)
File "/Users/admin/analytics/venv/lib/python3.4/site-packages/thriftpy/protocol/binary.py", line 178, in read_message_begin
message='No protocol version header')
thriftpy.protocol.exc.TProtocolException: TProtocolException(type=4)

分析

hiveserver2通过hive.server2.authentication决定鉴权机制.

根据HiveServer2 Security Configuration的描述

HiveServer2 supports authentication of the Thrift client using the following methods:
1. Kerberos authentication
2. LDAP authentication
Starting with CDH 5.7, clusters running LDAP-enabled HiveServer2 deployments also accept Kerberos authentication. This ensures that users are not forced to enter usernames/passwords manually, and are able to take advantage of the multiple authentication schemes SASL offers. In CDH 5.6 and lower, HiveServer2 stops accepting delegation tokens when any alternate authentication is enabled.

Kerberos authentication is supported between the Thrift client and HiveServer2, and between HiveServer2 and secure HDFS. LDAP authentication is supported only between the Thrift client and HiveServer2.

To configure HiveServer2 to use one of these authentication modes, configure the hive.server2.authentication configuration property.

impyla中的connect在建立链接的同时, 通过auth_mechanism参数指定参与鉴权的方法类别.

def connect(host='localhost', port=21050, database=None, timeout=None,
            use_ssl=False, ca_cert=None, auth_mechanism='NOSASL', user=None,
            password=None, kerberos_service_name='impala', use_ldap=None,
            ldap_user=None, ldap_password=None, use_kerberos=None,
            protocol=None, krb_host=None):
        """
            ...
                auth_mechanism : {'NOSASL', 'PLAIN', 'GSSAPI', 'LDAP'}
                        Specify the authentication mechanism. `'NOSASL'` for unsecured Impala.
                        `'PLAIN'` for unsecured Hive (because Hive requires the SASL
                        transport). `'GSSAPI'` for Kerberos and `'LDAP'` for Kerberos with
                        LDAP.
            ...
        """
...

一般来说, 可以通过将hive.server2.authentication配置为NOSASL+client也配置为NOSASL解决连接问题. 但是生产环境中的hiveserver2一般不会修改配置, 还是需要解决问题(如果个人尝试用则可以用这种方法来绕过不必要的困难).

解决方案

  1. 降低thrift-sasl的版本.
    报错的时候thrift-sasl的版本为0.3.0, 后降低至0.2.1.

  2. 创立conn时, 添加参数auth_mechanism='PLAIN'

补充

  1. 单独安装pip install sasl报错时, 可以通过sudo apt-get install libsasl2-dev解决.

  2. sasl 代表 Simple Authentication and Security Layer.

  3. impyla (0.14.0) ERROR - 'TSocket' object has no attribute 'isOpen'
    这个问题的原因是thrift-sasl版本过高导致的, 将其换成0.2.1的版本即可pip install thrift-sasl==0.2.1.

  4. TypeError: can’t concat str to bytes(未验证)
    修改 thrift-sasl init.py,在第94行之前加上以下语句即可:

if (type(body) is str):
    body = body.encode()
  1. thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'(未验证)
    这是Windows下采用pyhive连接方式提出的错误,正如前言所述,可能需要修改对应的配置文件,也可能sasl根本就不支持Windows,建议改用impyla形式连接.

  2. thriftpy.parser.exc.ThriftParserError: ThriftPy does not support generating module with path in protocol 'c'(未验证)
    修改thriftpy包下\parser\parser.py中第488行代码,将if url_scheme == '':"修改为if len(url_scheme) <=1:即可.

依赖

thrift-sasl – 0.2.1 -Thrift SASL Python module that implements SASL transports for Thrift (TSaslClientTransport).
thrift – 0.11.0 – Python bindings for the Apache Thrift RPC system.
sasl – 0.2.1 – Cyrus-SASL bindings for Python
thriftpy – 0.3.9 – Pure python implementation of Apache Thrift.

References

TProtocolException when trying to connect to Hive with Python 3 & impyla
Windows下使用Python3连接Hive

发表评论

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据