Python字节码逆向与Pyc文件恢复详解

1. Pyc文件结构解析

Python编译后的字节码文件(.pyc)遵循特定的文件格式，其核心结构定义如下：

typedef struct {
    PyObject_HEAD
    int co_argcount;        /* #arguments, except *args */
    int co_nlocals;         /* #local variables */
    int co_stacksize;       /* #entries needed for evaluation stack */
    int co_flags;           /* CO_..., see below */
    PyObject *co_code;      /* instruction opcodes */
    PyObject *co_consts;    /* list (constants used) */
    PyObject *co_names;     /* list of strings (names used) */
    PyObject *co_varnames;  /* tuple of strings (local variable names) */
    PyObject *co_freevars;  /* tuple of strings (free variable names) */
    PyObject *co_cellvars;  /* tuple of strings (cell variable names) */
    PyObject *co_filename;  /* string (where it was loaded from) */
    PyObject *co_name;      /* string (name, for reference) */
    int co_firstlineno;     /* first source line number */
    PyObject *co_lnotab;    /* string (encoding addr<->lineno mapping) */
} PyCodeObject;

2. 关键字段说明

co_code: 包含实际的字节码指令序列
co_consts: 代码中使用的常量元组
co_names: 代码中使用的名称(变量名、函数名等)元组
co_varnames: 局部变量名元组
co_filename: 源文件名
co_name: 代码对象名称(如模块名、函数名等)

3. 字节码逆向实战

3.1 加载Pyc文件

import dis, marshal

f = open('third.pyc')
f.read(4)  # 读取魔数
f.read(4)  # 读取时间戳
code = marshal.load(f)  # 加载代码对象

3.2 查看关键字段

code.co_consts    # 查看常量
code.co_varnames  # 查看局部变量名
code.co_names     # 查看名称列表
code.co_code      # 获取字节码指令

3.3 使用dis模块反汇编

dis.dis(code.co_code)  # 反汇编字节码

4. 常见字节码指令

指令	十六进制	描述	示例
LOAD_CONST	0x64	加载常量	`640100` (加载consts[1])
JUMP_ABSOLUTE	0x71	绝对跳转	`710e00` (跳转到14)
LOAD_NAME	0x65	加载名称	`650100` (加载names[1])
STORE_NAME	0x5a	存储名称	`5a0100` (存储到names[1])
BINARY_ADD	0x17	二进制加法	-
CALL_FUNCTION	0x83	调用函数	`830100` (调用1个参数)

5. Pyc文件修复技术

5.1 常见混淆手段

插入无效跳转指令
修改字节码长度字段
添加垃圾数据

5.2 修复步骤

使用hexdump分析文件结构
定位并移除混淆代码(通常在0x1e到0x2c位置)
修正co_code长度字段(小端格式)
保存修改后的文件

with open('third.pyc','r') as f:
    dt = f.read()
    
# 修复文件
dt = dt[:0x1a] + '\x24' + dt[0x1b:0x1e] + dt[0x2c:]

with open('third_test2.pyc', 'w') as f:
    f.write(dt)

5.3 使用uncompyle2反编译

uncompyle2 third_test2.pyc > third_source.py

6. 逆向案例分析

6.1 示例代码分析

import string

letters = list(string.letters) + list(string.digits)
dec = 'FcjTCgD1EffEm2rPC3bTyL5Wu2bKBI9KAZrwFgrUygHN'

def encode(input_str):
    # 实现自定义base64编码
    str_ascii_list = ['{:0>8}'.format(str(bin(ord(i))).replace('0b', '')) for i in input_str]
    output_str = ''
    equal_num = 0
    while str_ascii_list:
        temp_list = str_ascii_list[:3]
        if len(temp_list) != 3:
            while len(temp_list) < 3:
                equal_num += 1
                temp_list += ['00000000']
        temp_str = ''.join(temp_list)
        temp_str_list = [temp_str[x:x + 6] for x in [0, 6, 12, 18]]
        temp_str_list = [int(x, 2) for x in temp_str_list]
        if equal_num:
            temp_str_list = temp_str_list[0:4 - equal_num]
        output_str += ''.join([letters[x] for x in temp_str_list])
        str_ascii_list = str_ascii_list[3:]
    output_str = output_str + '=' * equal_num
    return output_str

# 主程序逻辑
print "Welcome to Processor's Python Classroom Part 3&4!\n"
print 'qi shi wo jiu shi lan cai ba liang dao ti fang zai yi qi.'
print "Now let's start the origin of Python!\n"
print 'Plz Input Your Flag:\n'

enc = raw_input()
lst = list(enc)
lst.reverse()
llen = len(lst)

for i in range(llen):
    if i % 2 == 0:
        lst[i] = chr(ord(lst[i]) - 2)
    lst[i] = chr(ord(lst[i]) + 1)

enc2 = ''
enc2 = enc2.join(lst)
enc3 = encode(enc2)

if enc3 == dec:
    print "You're right! "
else:
    print "You're Wrong!"

6.2 逆向解密脚本

#!/usr/bin/python
# -*- coding: utf-8 -*-

def decode(input_str):
    # 处理自定义base64字母表
    output_str = ''
    for i in input_str:
        if ord(i)>57 and ord(i)<91:
            output_str += i.lower()
        elif ord(i)>91:
            output_str += i.upper()
        else:
            output_str += i
    
    # Base64解码
    lst = list(output_str.decode('base64'))
    llen = len(lst)
    
    # 逆向处理字符变换
    for i in range(llen):
        lst[i] = chr(ord(lst[i]) - 1)
        if i % 2 == 0:
            lst[i] = chr(ord(lst[i]) + 2)
    
    # 反转字符串
    lst.reverse()
    return ''.join(lst)

if __name__ == '__main__':
    dec = 'FcjTCgD1EffEm2rPC3bTyL5Wu2bKBI9KAZrwFgrUygHN'
    print decode(dec)

7. 关键知识点总结

Pyc文件结构：理解PyCodeObject结构是逆向基础
字节码指令：掌握常见指令的十六进制表示和功能
反混淆技术：识别和修复被修改的字节码
Python标准库：熟练使用dis和marshal模块
自定义编码分析：能够逆向分析自定义加密/编码算法

8. 参考工具

dis模块：Python标准反汇编工具
marshal：Python序列化模块
uncompyle2：Python字节码反编译器
hexdump：十六进制查看工具

通过掌握这些技术，您将能够有效地分析和逆向Python编译后的字节码文件，恢复原始源代码或理解程序逻辑。

Python字节码逆向与Pyc文件恢复详解 1. Pyc文件结构解析 Python编译后的字节码文件(.pyc)遵循特定的文件格式，其核心结构定义如下： 2. 关键字段说明 co_ code : 包含实际的字节码指令序列 co_ consts : 代码中使用的常量元组 co_ names : 代码中使用的名称(变量名、函数名等)元组 co_ varnames : 局部变量名元组 co_ filename : 源文件名 co_ name : 代码对象名称(如模块名、函数名等) 3. 字节码逆向实战 3.1 加载Pyc文件 3.2 查看关键字段 3.3 使用dis模块反汇编 4. 常见字节码指令 | 指令 | 十六进制 | 描述 | 示例 | |------|----------|------|------| | LOAD_ CONST | 0x64 | 加载常量 | 640100 (加载consts[ 1 ]) | | JUMP_ ABSOLUTE | 0x71 | 绝对跳转 | 710e00 (跳转到14) | | LOAD_ NAME | 0x65 | 加载名称 | 650100 (加载names[ 1 ]) | | STORE_ NAME | 0x5a | 存储名称 | 5a0100 (存储到names[ 1 ]) | | BINARY_ ADD | 0x17 | 二进制加法 | - | | CALL_ FUNCTION | 0x83 | 调用函数 | 830100 (调用1个参数) | 5. Pyc文件修复技术 5.1 常见混淆手段插入无效跳转指令修改字节码长度字段添加垃圾数据 5.2 修复步骤使用hexdump分析文件结构定位并移除混淆代码(通常在0x1e到0x2c位置) 修正co_ code长度字段(小端格式) 保存修改后的文件 5.3 使用uncompyle2反编译 6. 逆向案例分析 6.1 示例代码分析 6.2 逆向解密脚本 7. 关键知识点总结 Pyc文件结构：理解PyCodeObject结构是逆向基础字节码指令：掌握常见指令的十六进制表示和功能反混淆技术：识别和修复被修改的字节码 Python标准库：熟练使用dis和marshal模块自定义编码分析：能够逆向分析自定义加密/编码算法 8. 参考工具 dis 模块：Python标准反汇编工具 marshal ：Python序列化模块 uncompyle2 ：Python字节码反编译器 hexdump ：十六进制查看工具通过掌握这些技术，您将能够有效地分析和逆向Python编译后的字节码文件，恢复原始源代码或理解程序逻辑。