OverflowError: cannot serialize a bytes object larger than 4 GiB

Pickle在处理大小超过4G的文件时会抛出异常:OverflowError: cannot serialize a bytes object larger than 4 GiB


Stack Overflow 上的回答是: 在Python 3.4之后不在有这个限制, 但是需要声明使用的协议等级.

Not anymore in Python 3.4 which has PEP 3154 and Pickle 4.0
https://www.python.org/dev/peps/pep-3154/

But you need to say you want to use version 4 of the protocol:
https://docs.python.org/3/library/pickle.html

pickle.dump(d, open("file", 'w'), protocol=4)

Quoting From Eric Levieil

这里补充一点, 可以通过pickle.dumps(obj=OBJ,protocal=pickle.HIGHEST_PROTOCOL)直接指定最高等级的 Protocol.

还有一个不是那么优雅的方案, 即先将对象转换为string类型, 然后每次只通过pickle处理不超过4G的数据.

def dump_over_4g(data, file_path, max_bytes=2 ** 31 - 1):
    """ 将超过4g大小的对象写入本地(mac os 系统上pickle处理超过4g的文件会抛出异常) 2**31 - 1 代表 4g """
    bytes_out = pickle.dumps(data)
    with open(file_path, 'wb+') as w_file:
        for idx in range(0, len(bytes_out), max_bytes):
            w_file.write(bytes_out[idx:idx + max_bytes])

def read_over_4g(file_path, max_bytes=2 ** 31 - 1, mode='rb'):
    """ 读取超过大小超过4g的对象 """
    bytes_in = bytearray(0)
    input_size = os.path.getsize(file_path)
    with open(file_path, mode) as r_file:
        for _ in range(0, input_size, max_bytes):
            bytes_in += r_file.read(max_bytes)
    data = pickle.loads(bytes_in)
    return data

发表评论

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据