分析修改ELF目标文件(1)

下面来研究一个elf64编译出来的可重定位的.o文件

nasm编译生成 main.o

首先编写一个不需要glibc的nasm语法的汇编代码 - main.asm

section .text
global _start

add:
    push rbx
    mov rbp, rsp

    add rdi, rsi    ;第一个参数+第二个参数
    mov rax, rdi    ;将和存入 rax

    pop rbx
    ret

_start:
    push rbx
    mov rbp, rsp

    mov rdi, 10
    mov rsi, 12
    
    ;此时栈已经16字节对齐
    call add

    ;write
    mov rax, 1      ;系统调用号
    mov rdi, 1      ;stdout
    mov rsi, fmt    ;字符串
    mov rdx, 9      ;长度
    syscall

    pop rbx
    
    ;exit(0)
    mov rax, 60      ;系统调用号
    mov rdi, 0      ;stdout
    syscall

section .data
    fmt db  "10+12=22",10
    fmt1 db  "hello world",10
    age dd 26

section .bss
    num resb 4      ;相当于c语言 int num; 为未初始化的全局变量

不懂汇编的可以把它理解成为下面的.c文件,不使用下面的代码作为例子是因为下面的代码要想运行,需要连接到glibc,会给代码分析增加复杂性

#include <stdio.h>
char *fmt = "10+12=22\n";
char *fmt1 = "hello world\n";
int age = 26;
int num;
int add(int a, int b) {
    return a+b;
}
int main(){
    add(10,12);
    printf("%s", fmt);
    return 0;
}

接下来把 main.asm 编译成 main.o

nasm -felf64 main.asm

此时我们通过 readelf 命令,od命令来辅助查看main.o里面内容的排序

[root@izbp1irxwqt7ei21awv6wvz reloc]# readelf -h main.o
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          64 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           64 (bytes)
  Number of section headers:         8
  Section header string table index: 4
[root@izbp1irxwqt7ei21awv6wvz reloc]# 
[root@izbp1irxwqt7ei21awv6wvz reloc]# readelf -S main.o
There are 8 section headers, starting at offset 0x40:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000240
       0000000000000047  0000000000000000  AX       0     0     16
  [ 2] .data             PROGBITS         0000000000000000  00000290
       0000000000000019  0000000000000000  WA       0     0     4
  [ 3] .bss              NOBITS           0000000000000000  000002b0
       0000000000000004  0000000000000000  WA       0     0     4
  [ 4] .shstrtab         STRTAB           0000000000000000  000002b0
       0000000000000037  0000000000000000           0     0     1
  [ 5] .symtab           SYMTAB           0000000000000000  000002f0
       0000000000000108  0000000000000018           6    10     4
  [ 6] .strtab           STRTAB           0000000000000000  00000400
       0000000000000026  0000000000000000           0     0     1
  [ 7] .rela.text        RELA             0000000000000000  00000430
       0000000000000018  0000000000000018           5     1     4
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

完全查看main.o里面的二进制内容可以借用 od

[root@izbp1irxwqt7ei21awv6wvz reloc]# od -Ax -tx1z main.o
000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00  >.ELF............<
000010 01 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00  >..>.............<
000020 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00  >........@.......<
000030 00 00 00 00 40 00 00 00 00 00 40 00 08 00 04 00  >....@.....@.....<
000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
*
000080 01 00 00 00 01 00 00 00 06 00 00 00 00 00 00 00  >................<
000090 00 00 00 00 00 00 00 00 40 02 00 00 00 00 00 00  >........@.......<
0000a0 47 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >G...............<
0000b0 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0000c0 07 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00  >................<
0000d0 00 00 00 00 00 00 00 00 90 02 00 00 00 00 00 00  >................<
0000e0 19 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0000f0 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000100 0d 00 00 00 08 00 00 00 03 00 00 00 00 00 00 00  >................<
000110 00 00 00 00 00 00 00 00 b0 02 00 00 00 00 00 00  >................<
000120 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
*
000140 12 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  >................<
000150 00 00 00 00 00 00 00 00 b0 02 00 00 00 00 00 00  >................<
000160 37 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >7...............<
000170 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000180 1c 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00  >................<
000190 00 00 00 00 00 00 00 00 f0 02 00 00 00 00 00 00  >................<
0001a0 08 01 00 00 00 00 00 00 06 00 00 00 0a 00 00 00  >................<
0001b0 04 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00  >................<
0001c0 24 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00  >$...............<
0001d0 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00  >................<
0001e0 26 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >&...............<
0001f0 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000200 2c 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00  >,...............<
000210 00 00 00 00 00 00 00 00 30 04 00 00 00 00 00 00  >........0.......<
000220 18 00 00 00 00 00 00 00 05 00 00 00 01 00 00 00  >................<
000230 04 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00  >................<
000240 53 48 89 e5 48 01 f7 48 89 f8 5b c3 53 48 89 e5  >SH..H..H..[.SH..<
000250 bf 0a 00 00 00 be 0c 00 00 00 e8 e1 ff ff ff b8  >................<
000260 01 00 00 00 bf 01 00 00 00 48 be 00 00 00 00 00  >.........H......<
000270 00 00 00 ba 09 00 00 00 0f 05 5b b8 3c 00 00 00  >..........[.<...<
000280 bf 00 00 00 00 0f 05 00 00 00 00 00 00 00 00 00  >................<
000290 31 30 2b 31 32 3d 32 32 0a 68 65 6c 6c 6f 20 77  >10+12=22.hello w<
0002a0 6f 72 6c 64 0a 1a 00 00 00 00 00 00 00 00 00 00  >orld............<
0002b0 00 2e 74 65 78 74 00 2e 64 61 74 61 00 2e 62 73  >..text..data..bs<
0002c0 73 00 2e 73 68 73 74 72 74 61 62 00 2e 73 79 6d  >s..shstrtab..sym<
0002d0 74 61 62 00 2e 73 74 72 74 61 62 00 2e 72 65 6c  >tab..strtab..rel<
0002e0 61 2e 74 65 78 74 00 00 00 00 00 00 00 00 00 00  >a.text..........<
0002f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000300 00 00 00 00 00 00 00 00 01 00 00 00 04 00 f1 ff  >................<
000310 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000320 00 00 00 00 03 00 01 00 00 00 00 00 00 00 00 00  >................<
000330 00 00 00 00 00 00 00 00 00 00 00 00 03 00 02 00  >................<
000340 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000350 00 00 00 00 03 00 03 00 00 00 00 00 00 00 00 00  >................<
000360 00 00 00 00 00 00 00 00 0a 00 00 00 00 00 01 00  >................<
000370 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000380 15 00 00 00 00 00 02 00 00 00 00 00 00 00 00 00  >................<
000390 00 00 00 00 00 00 00 00 19 00 00 00 00 00 02 00  >................<
0003a0 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0003b0 1e 00 00 00 00 00 02 00 15 00 00 00 00 00 00 00  >................<
0003c0 00 00 00 00 00 00 00 00 22 00 00 00 00 00 03 00  >........".......<
0003d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0003e0 0e 00 00 00 10 00 01 00 0c 00 00 00 00 00 00 00  >................<
0003f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000400 00 6d 61 69 6e 2e 61 73 6d 00 61 64 64 00 5f 73  >.main.asm.add._s<
000410 74 61 72 74 00 66 6d 74 00 66 6d 74 31 00 61 67  >tart.fmt.fmt1.ag<
000420 65 00 6e 75 6d 00 00 00 00 00 00 00 00 00 00 00  >e.num...........<
000430 2b 00 00 00 00 00 00 00 01 00 00 00 03 00 00 00  >+...............<
000440 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
000450

下面是我查看了 main.o 具体内容后的整理

0x00 - 0x40
为elf头,长度为64个字节

0x40 - 0x240
这里存放了8个节头,每个节头64个字节(用 gcc 编译出来的.o文件一般是节内容放在前面,节头放在后面)

0x240 - 0x450
从这里开始就是具体节的内容

.text       0x240 - 0x287 为 .text节的内容,即存放代码
            0x287 - 0x290 补0,16字节对齐

.data       0x290 - 0x2a9 为初始化了的全局变量的内容,即 .data节的内容,长度为25,刚好为里面3个变量的内存占用总和
            0x2a9 - 0x2b0 补0,16字节对齐

.bss        0x2b0 - 0x2b0 .bss有节头,但是没有实际节的内容,因为在运行时初始化为0即可

.shstrtab   0x2b0 - 0x2e7 存放节头名
            0x2e7 - 0x2f0 补0

.symtab     0x2f0 - 0x3f8 符号表
            0x3f8 - 0x400 补0

.strtab     0x400 - 0x426 符号字符串表
            0x426 - 0x430 补0

.rela.text  0x430 - 0x448 保存了重定位信息的节
            0x448 - 0x450 补0

0x2f0 - 0x3f8 为符号表内容,我们可以打印出来看看

[root@izbp1irxwqt7ei21awv6wvz reloc]# readelf -s main.o

Symbol table '.symtab' contains 11 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.asm
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     5: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT    1 add
     6: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT    2 fmt
     7: 0000000000000009     0 NOTYPE  LOCAL  DEFAULT    2 fmt1
     8: 0000000000000015     0 NOTYPE  LOCAL  DEFAULT    2 age
     9: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT    3 num
    10: 000000000000000c     0 NOTYPE  GLOBAL DEFAULT    1 _start


gcc编译生成 obj.o

接下来编写一个 obj.c 文件,利用 gcc -c obj.c 来编译

int g_a_2;
int g_b_2 = 2;
void foo(){}

便已完成生成 obj.o,通过上述的工具分析得出

elf头           0x00 - 0x40

.text           0x40 - 0x50
.data           0x50 - 0x54
.bss            0x54 - 0x54
.comment        0x54 - 0x82
.note.GNU-stack 0x82 - 0x82

这里有个8字节对齐,直接跳到 0x88

.eh_frame       0x88 - 0xc0
.shstrtab       0xc0 - 0x114

这里有个8字节对齐,直接跳到 0x118

.symtab         0x118 - 0x220
.strtab         0x220 - 0x237

这里有个8字节对齐,直接跳到 0x238

.rela.eh_frame  0x238 - 0x250

根据elf文件头得出节头的偏移量为592,刚好为 0x250,即从这里开始就为节头的内容
节头有11个,每个64字节

11个节头         0x250 - 0x510

从这里我们发现,利用 gcc 编译出来的 .o 文件节头是放在最后面的


ld链接成可执行文件

将2个.o文件链接成可执行程序 main

[root@izbp1irxwqt7ei21awv6wvz reloc]# ld main.o obj2.o -o main
[root@izbp1irxwqt7ei21awv6wvz reloc]# ./main 
10+12=22

可以发现成功执行


对比可执行程序main和.o文件

通过上述工具,查看 main 的具体内容,整理如下

elf头            0x00 - 0x40
3个程序头
    PT_LOAD         0x40 - 0x78
    PT_LOAD         0x78 - 0xb0
    GNU_STACK       0xb0 - 0xe8

16字节对齐,补8个0到 0xf0

.text           0xf0 - 0x13d 对齐到 0x140
.eh_frame       0x140 - 0x178 对齐到 0x180

这里对齐到 0x1000 即4096,刚好一个页大小

.data           0x1000 - 0x1020
.bss            0x1020 - 0x1020
.comment        0x1020 - 0x104d
.shstrtab       0x104d - 0x108c 对齐到 0x1090

.symtab         0x1090 - 0x1270
.strtab         0x1270 - 0x12c5 对齐到 0x12c8

9个elf节头,每个节头64个字节
0x12c8 - 0x1508

从上面程序头这篇文章可以知道,程序头是对二进制文件中段的描述,我们打印出段看看

通过 readelf -l main 可以得出如下

[root@izbp1irxwqt7ei21awv6wvz reloc]# readelf -l main

Elf file type is EXEC (Executable file)
Entry point 0x4000fc
There are 3 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000000178 0x0000000000000178  R E    200000
  LOAD           0x0000000000001000 0x0000000000601000 0x0000000000601000
                 0x0000000000000020 0x0000000000000028  RW     200000
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RWE    10

 Section to Segment mapping:
  Segment Sections...
   00     .text .eh_frame 
   01     .data .bss 
   02

整理下

3 program headers,offset at 64

type:LOAD            offset:0x0          filesz:0x178        vaddr:0x400000     memsz:0x178        align:0x200000     flags:R E
type:LOAD            offset:0x1000       filesz:0x20         vaddr:0x601000     memsz:0x28         align:0x200000     flags:RW 
type:GNU_STACK       offset:0x0          filesz:0x0          vaddr:0x0          memsz:0x0          align:0x10         flags:RWE

这里描述了程序 main 在运行时是怎么装载到内存的,LOAD代表的就是运行时装载到内存的意思

上面总共有3个段,.text节和.eh_frame节组成了代码段(text段),.data节和.bss节组成了数据段(data段)

程序运行时

第一个LOAD(即代码段)的意思是将 0x0 - 0x178(从上面可以得出为1个elf头,3个程序头,.text节 .eh_frame 节) 加载到虚拟内存 0x400000 - 0x400178

第二个LOAD(即数据段)的意思是将 0x1000 - 0x1020(从上面可以得出为 .data节 和 .bss节(大小为0)) 加载到虚拟内存 0x601000 - 0x601020,而内存实际占用为0x28字节,后面多出来的8字节直接补0即可

因为需要 0x200000 对齐,所以由0x4xxxxx 直接跳到 0x600000,在加上在文件里的偏移(这里的偏移应该是为了映射方便),所以就从 0x601000 开始。


重定位

首先我们反汇编main.o文件看看

[root@izbp1irxwqt7ei21awv6wvz 01]# objdump -d main.o
main.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <add>:
   0: 53                    push   %rbx
   1: 48 89 e5              mov    %rsp,%rbp
   4: 48 01 f7              add    %rsi,%rdi
   7: 48 89 f8              mov    %rdi,%rax
   a: 5b                    pop    %rbx
   b: c3                    retq   

000000000000000c <_start>:
   c: 53                    push   %rbx
   d: 48 89 e5              mov    %rsp,%rbp
  10: bf 0a 00 00 00        mov    $0xa,%edi
  15: be 0c 00 00 00        mov    $0xc,%esi
  1a: e8 e1 ff ff ff        callq  0 <add>
  1f: b8 01 00 00 00        mov    $0x1,%eax
  24: bf 01 00 00 00        mov    $0x1,%edi
  29: 48 be 00 00 00 00 00  movabs $0x0,%rsi
  30: 00 00 00 
  33: ba 09 00 00 00        mov    $0x9,%edx
  38: 0f 05                 syscall 
  3a: 5b                    pop    %rbx
  3b: b8 3c 00 00 00        mov    $0x3c,%eax
  40: bf 00 00 00 00        mov    $0x0,%edi
  45: 0f 05                 syscall

在 0x29 处指令为 movabs $0x0,%rsi,从源代码中我们知道,该句指令的意思是将 fmt变量的地址存入 %rsi,而此时我们看到这里用了 00 00 00 00 00 00 00 00 占位站着,在以后链接的时候再补上去。

那么链接的时候是怎么知道fmt的地址是多少呢,别忘了我们有重定位的节

[root@izbp1irxwqt7ei21awv6wvz 01]# readelf -S main.o
Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
...
  [ 7] .rela.text        RELA             0000000000000000  00000430
       0000000000000018  0000000000000018           5     1     4

我们再打印出 .rla.text 的具体内容

[root@izbp1irxwqt7ei21awv6wvz 01]# readelf -r main.o

Relocation section '.rela.text' at offset 0x430 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000002b  000300000001 R_X86_64_64       0000000000000000 .data + 0

上面显示的意思是,在代码段偏移 0x2b 处需要重定位,就是上面的代码0x29往后移动2个字节。

找到了需要重定位的地点,那么该改成多少呢,从 .data + 0可以看出,就是 .data 装载的地址,上面的链接后,数据段的装载地址为0x601000,那么链接后,只需要把地址重定位(填充)为 0x601000即可

[root@izbp1irxwqt7ei21awv6wvz 01]# objdump -d main
Disassembly of section .text:

00000000004000f0 <add>:
...
00000000004000fc <_start>:
  4000fc: 53                    push   %rbx
  4000fd: 48 89 e5              mov    %rsp,%rbp
  400100: bf 0a 00 00 00        mov    $0xa,%edi
  400105: be 0c 00 00 00        mov    $0xc,%esi
  40010a: e8 e1 ff ff ff        callq  4000f0 <add>
  40010f: b8 01 00 00 00        mov    $0x1,%eax
  400114: bf 01 00 00 00        mov    $0x1,%edi
  400119: 48 be 00 10 60 00 00  movabs $0x601000,%rsi
  400120: 00 00 00 
...

0000000000400137 <foo>:
  400137: 55                    push   %rbp
  400138: 48 89 e5              mov    %rsp,%rbp
  40013b: 5d                    pop    %rbp
  40013c: c3                    retq

假设我们在 main.asm 里面加上一条调用外部函数 foo 的指令

[root@izbp1irxwqt7ei21awv6wvz tmp]# objdump -d main.o

main.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <add>:
   0: 53                    push   %rbx
...

000000000000000c <_start>:
   c: 53                    push   %rbx
   d: 48 89 e5              mov    %rsp,%rbp
  10: bf 0a 00 00 00        mov    $0xa,%edi
  15: be 0c 00 00 00        mov    $0xc,%esi
  1a: e8 e1 ff ff ff        callq  0 <add>
  1f: e8 00 00 00 00        callq  24 <_start+0x18>
...
  4a: 0f 05                 syscall

我们发现,此时还是使用了 00 00 00 00 占着,我们查看下重定位条目

[root@izbp1irxwqt7ei21awv6wvz tmp]# readelf -r main.o

Relocation section '.rela.text' at offset 0x440 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000020  000a00000002 R_X86_64_PC32     0000000000000000 foo - 4
000000000030  000300000001 R_X86_64_64       0000000000000000 .data + 0

此时多了一条,offset代表foo需要重定位的位置为代码的偏移0x20处,就是上面的 00 00 00 00 位置

addend = -4  的意思是,等链接后,foo函数的实际地址到 00 00 00 00 位置的偏移量再减去4即为最终需要填充到 00 00 00 00的数。

我们看下链接后的情况

[root@izbp1irxwqt7ei21awv6wvz tmp]# objdump -d main
main:     file format elf64-x86-64

Disassembly of section .text:

00000000004000f0 <add>:
  ...

00000000004000fc <_start>:
  ...
  40010f: e8 28 00 00 00        callq  40013c <foo>
  400114: b8 01 00 00 00        mov    $0x1,%eax
  400119: bf 01 00 00 00        mov    $0x1,%edi
  40011e: 48 be 00 10 60 00 00  movabs $0x601000,%rsi
  400125: 00 00 00 
  400128: ba 09 00 00 00        mov    $0x9,%edx
  40012d: 0f 05                 syscall 
  40012f: 5b                    pop    %rbx
  400130: b8 3c 00 00 00        mov    $0x3c,%eax
  400135: bf 00 00 00 00        mov    $0x0,%edi
  40013a: 0f 05                 syscall 

000000000040013c <foo>:
  40013c: 55                    push   %rbp
  40013d: 48 89 e5              mov    %rsp,%rbp
  400140: 5d                    pop    %rbp
  400141: c3                    retq

此时 foo函数的实际地址为 40013c,此时需要重定位的地址为 400110,通过上述公式,重定位地址 = 0x40013c - 0x400110 + Addend = 0x2c - 0x4 = 0x28

其实foo的地址就是下一条指令偏移0x28处,下一条指令地址为 0x400114,0x400114 + 0x28 = 0x40013c

这就是位置无关代码,所有需要跳转的地址都是通过偏移计算得出



上一篇: ELF动态链接
下一篇: 分析修改ELF目标文件(2)
作者邮箱: 203328517@qq.com