Reversing 7.62 High Calibre AZP and localization files

Introduction
Going through my Steam backlog of games, I saw 7.62 High Calibre and decided to give it another try.  Looking around, I noticed that this game had a very active modding community, but that the available modding tools were closed source.  I wanted to create an open source tool for unpacking/packing AZP files used by 7.62 High Calibre.  I was successful in accomplishing that and more.  The results of this work can be found here https://github.com/sbobovyc/GameTools/tree/master/762_HighCalibre

Reconnaissance
The initial step of searching for signatures in game executable revealed that zlib is likely statically linked.

$ /c/Users/sbobovyc/Tools/signsrch/signsrch.exe /c/Steam/steamapps/common/7.62/E6.exe

Signsrch 0.2.4
by Luigi Auriemma
e-mail: aluigi@autistici.org
web: aluigi.org
optimized search function by Andrew http://www.team5150.com/~andrew/
disassembler engine by Oleh Yuschuk

- done
- open file "C:/Steam/steamapps/common/7.62/E6.exe"
- 3547136 bytes allocated
- load signatures
- open file C:\Users\sbobovyc\Tools\signsrch\signsrch.sig
- 3075 signatures in the database
- start 4 threads
- start signatures scanning:

offset num description [bits.endian.size]
--------------------------------------------
00017aee 3048 DMC compression [32.le.16&]
002be682 1299 classical random incrementer 0x343FD 0x269EC3 [32.le.8&]
00314e90 2289 zinflate_lengthStarts [16.le.58]
00314f10 2296 zinflate_distanceStarts [16.le.60]
00314f90 641 CRC-32-IEEE 802.3 [crc32.0x04c11db7 le rev int_min.1024]
00314f90 648 CRC-32-IEEE 802.3 [crc32.0xedb88320 lenorev 1.1024]
00315390 129 Adler CRC32 (0x191b3141) [32.le.1024]
00315790 131 Adler CRC32 (0x01c26a37) [32.le.1024]
00315b90 133 Adler CRC32 (0xb8bc6765) [32.le.1024]
00315f90 645 CRC-32-IEEE 802.3 [crc32.0x04c11db7 be rev int_min.1024]
00315f90 652 CRC-32-IEEE 802.3 [crc32.0xedb88320 benorev 1.1024]
00316390 130 Adler CRC32 (0x191b3141) [32.be.1024]
00316790 132 Adler CRC32 (0x01c26a37) [32.be.1024]
00316b90 134 Adler CRC32 (0xb8bc6765) [32.be.1024]
00316f90 2294 zinflate_lengthExtraBits [32.le.116]
00317005 2304 zinflate_distanceExtraBits [32.be.120]
00317008 2303 zinflate_distanceExtraBits [32.le.120]
003175d8 1086 Zlib dist_code [..512]
003177d8 1087 Zlib length_code [..256]
003178d8 1089 Zlib base_length [32.le.116]
00317950 1091 Zlib base_dist [32.le.120]
0031f460 639 CRC-32-IEEE 802.3 [crc32.0x04c11db7 lenorev int_min.1024]
0031f460 650 CRC-32-IEEE 802.3 [crc32.0xedb88320 le rev 1.1024]
0031f868 3038 unlzx table_three [32.le.64]
0031f868 1605 Generic bitmask table [32.le.128]
0031f86c 2588 bitmask [32.le.128]
0031f8dc 3051 compression algorithm seen in the game DreamKiller [32.be.12&]
0031f8df 3050 compression algorithm seen in the game DreamKiller [32.le.12&]
0031fb28 2876 libavcodec ff_mjpeg_val_ac_chrominance [..162]
0031fbf8 2875 libavcodec ff_mjpeg_val_ac_luminance [..162]
00321298 1115 Jpeg dct 14 bit aanscales [16.le.128]
00321348 1119 Jpeg dct AA&N scale factor [double.le.64]
00326753 2417 MBC2 [32.le.248&]
00336030 1933 Vorbis FLOOR1_fromdB_LOOKUP [float.le.1024]
$ strings /c/Steam/steamapps/common/7.62/E6.exe | grep flate
deflate 1.2.3 Copyright 1995-2005 Jean-loup Gailly
inflate 1.2.3 Copyright 1995-2005 Mark Adler

It’s very common for games to use zlib inflate() to save disk space.  Deflated (aka compressed) data looks like random array of bytes, which is how the AZP archives appear to look.

Here are the first few bytes of HWM.azp:
azp_data

First three bytes of the file correspond to the string “AZP.” Interestingly enough, file names and directory names do not appear inside AZP files. Either the table of contents exists somewhere else or it’s obfuscated.

In order to start following the data access pattern, I put a breakpoint on CreateFileA. The first hit that created an AZP file occured here:

006CB5AB | push eax                
006CB5AC | push dword ptr ss:[ebp-8]
006CB5AF | push dword ptr ss:[ebp-10]
006CB5B2 | push dword ptr ss:[ebp+10]         // [ebp+10]:"HWM.azp"
006CB5B5 | call dword ptr ds:[<&CreateFileA>]

I chose to use HWM.azp as the candidate for reverse engineering. This file is read by ReadFile at this instruction:

006C6D2B | push edx
006C6D2C | push dword ptr ds:[eax+esi]
006C6D2F | call dword ptr ds:[<&ReadFile>]

The call stack before the call to ReadFile:
1: [esp] 00000104       // file handle
2: [esp+4] 032D1530  // pointer to destination buffer
3: [esp+8] 00001000  // number of bytes to read
4: [esp+C] 0018F060
5: [esp+10] 00000000

After ReadFile executed, I put a hardware access breakpoint on address 0x32D1530, which is ‘A’ in the buffer. This breakpoint triggered by this instruction:

006C6C97 | movzx eax,byte ptr ds:[ecx]  // ecx is 032D1530
006C6C9A | inc ecx
006C6C9B | mov dword ptr ds:[esi],ecx   // esi is a pointer to some data structure
006C6C9D | pop esi
006C6C9E | ret

After tracing through the code, I figured out that this code copies the first four bytes, “AZP” plus 0x01, to some buffer:

006C4D84 | mov al,byte ptr ds:[esi]
006C4D86 | mov byte ptr ds:[edi],al
006C4D88 | mov al,byte ptr ds:[esi+1]
006C4D8B | mov byte ptr ds:[edi+1],al
006C4D8E | mov al,byte ptr ds:[esi+2]
006C4D91 | mov byte ptr ds:[edi+2],al
006C4D94 | mov eax,dword ptr ss:[ebp+8]
006C4D97 | pop esi
006C4D98 | pop edi
006C4D99 | leave
006C4D9A | ret

I then put a hardware access breakpoint on the next two bytes in the buffer, 0x42 0xED. This breakpoint was triggered here:

006C4D0C | mov dword ptr ds:[edi+ecx*4-18],eax
006C4D10 | mov eax,dword ptr ds:[esi+ecx*4-14]
006C4D14 | mov dword ptr ds:[edi+ecx*4-14],eax
006C4D18 | mov eax,dword ptr ds:[esi+ecx*4-10]
006C4D1C | mov dword ptr ds:[edi+ecx*4-10],eax | edi+ecx*4-10:",Aq"
006C4D20 | mov eax,dword ptr ds:[esi+ecx*4-C] 
006C4D24 | mov dword ptr ds:[edi+ecx*4-C],eax  | [edi+ecx*4-C]:"HWM.azp"
006C4D28 | mov eax,dword ptr ds:[esi+ecx*4-8]
006C4D2C | mov dword ptr ds:[edi+ecx*4-8],eax
006C4D30 | mov eax,dword ptr ds:[esi+ecx*4-4]
006C4D34 | mov dword ptr ds:[edi+ecx*4-4],eax  // copy 0x42 0xED here
006C4D38 | lea eax,dword ptr ds:[ecx*4]

The same code is hit when the next four bytes, 0x6 0x00 0x00 0x00, are accessed. After I put a hardware breakpoint on 0x3C 0x05 0x00 0x00, this instruction was hit:

006C4C33 | rep movsd dword ptr es:[edi],dword ptr ds:[esi]
006C4C35 | jmp dword ptr ds:[edx*4+6C4D4C]

Next, another ReadFile happened in which 0xD000 bytes are read:

006C6D2C | push dword ptr ds:[eax+esi]
006C6D2F | call dword ptr ds:[<&ReadFile>]

The call stack:
1: [esp] 00000104        // file handle
2: [esp+4] 032D3528   // pointer to destination buffer
3: [esp+8] 0000D000  // number of bytes to read
4: [esp+C] 0018F074
5: [esp+10] 00000000

At this point I decided to switch gears and take a look at the community provided tool that is used to unpack/pack AZP files.

Reversing azp.exe
In the process of reverse engineering a cipher, it is useful to have access to the plain text version. The community tool is able to provide a listing of files in an AZP archive:

$ ./azp.exe l HWM.azp | head

7.62 resource archiver  (c) 2007 by Novik  v 1.3

     41564/41580      100.04% LOCAL.TXT
      3910/1804        46.14% TIPS.TXT
     53682/8514        15.86% ACTORS\ITEMS\ACOG-11.ACT
       163/83          50.92% ACTORS\ITEMS\ACOG-11.ACT.INF
     62839/11530       18.35% ACTORS\ITEMS\ACOG-11WR.ACT
       163/82          50.31% ACTORS\ITEMS\ACOG-11WR.ACT.INF
     46710/7363        15.76% ACTORS\ITEMS\AEK919K_SCOPE.ACT

Listing of files in HWM.azp yields the count of 1340 files.  This number can be found at file offset 0xC.

Put breakpoints on open() and read():
azp file is opened here

00404D7F | call dword ptr ds:[<&open>]
00404D85 | add esp,C

The first read from the file occurs here:

00404C81 | call dword ptr ds:[<&read>]
00404C87 | add esp,C

The call stack:
1: [esp] 00000003       // file handle
2: [esp+4] 01B80048  // pointer to destination buffer
3: [esp+8] 00010000  // number of bytes to read
4: [esp+C] 7C36C01B
5: [esp+10] 0018FD74

After the first 0x10000 bytes are read from the file, I put a hardware breakpoint on the ‘A’ in the buffer. The following instruction was hit:

7C342FF4 | mov dword ptr ds:[edi+ecx*4-4],eax

Looking at the callstack, this instruction is part of a call chain of memcopy():
1: [esp] 0018FC58 // pointer to destination buffer
2: [esp+4] 01B80048 // pointer to source buffer
3: [esp+8] 00000004 // number of bytes to copy
4: [esp+C] 7C36C01B
5: [esp+10] 0018FD74

Setting a breakpoint on this call to memcopy, I was able to see the pattern. First, the magic identifier “AZP1” is read, then unknown 4 bytes, then another unknown 4 bytes, then total number of files, then a uint32 which is the length of file name, then nine bytes which are the ciphered version of “LOCAL.TXT”. Both the length of the file name and the file name itself are ciphered, therefore the next step is to figure out the cipher. I kept track of where the bytes 0xFC 0x38 0xB8 0xE8 were copied and put a hardware breakpoint on 0xFC. The following instruction was hit:

00402DF0 | push ebp
....
00402E67 | mov al,byte ptr ds:[edi] // esi is a pointer to a copy of 0xFC 0x38 0xB8 0xE8
00402E69 | xor edx,eax
00402E6B | mov byte ptr ds:[edi],dl // dl has decoded byte 9
00402E6D | inc edi

Watching this code execute, it became clear that this is where the bytes were deciphered.

00402E3C | mov eax,dword ptr ds:[esi+30]                       // [esi+30] is an unknown value, possibly cipher key
00402E3F | mov dword ptr ss:[ebp+C],eax                        |
00402E42 | push edi                                            |
00402E43 | push ecx                                            |
00402E44 | push edx                                            |
00402E45 | push eax                                            |
00402E46 | mov edi,dword ptr ss:[ebp+8]                        // edi is a pointer to ciphered data
00402E49 | mov ecx,dword ptr ss:[ebp-4]                        // ecx is a counter variable
00402E4C | mov eax,dword ptr ss:[ebp+C]                        // eax is probably contains cipher key
00402E4F | cld                                                 |
00402E50 | push eax                                            |
00402E51 | xor eax,eax                                         |
00402E53 | xor edx,edx                                         |
00402E55 | pop dx                                              |
00402E57 | pop ax                                              |
00402E59 | mul dx                                              |
00402E5C | dec ax                                              |
00402E5E | xor eax,edx                                         |
00402E60 | mov edx,FF                                          |
00402E65 | and edx,eax                                         |
00402E67 | mov al,byte ptr ds:[edi]                            |
00402E69 | xor edx,eax                                         |
00402E6B | mov byte ptr ds:[edi],dl                            // dl has clear text byte
00402E6D | inc edi                                             |
00402E6E | loop azp.402E59                                     |

I searched for the constant 0xF69DA025 in the azp.exe address space, and found two locations:
0040304D mov dword ptr ss:[esp+40],F69DA025
004033AE push F69DA025

The second result is hit when AZP file is deciphered:

004033AE | push F69DA025                // hard coded key!
004033B3 | push esi   
004033B4 | lea ecx,dword ptr ss:[esp+24]
004033B8 | call azp.402D80

The cipher algorithm lies between instruction address 00402E51 and 00402E6B. After porting this cipher algorithm to python, I was able to decipher the length of the first file. Watching the execution of this cipher for subsequent data, it became clear that in addition to deciphered data this function also stored a key that should be used to decipher the next data.

00402E70 | shl eax,10
00402E73 | or ax,dx
00402E76 | mov dword ptr ss:[ebp+C],eax     // after next key is generated, store it

With this information, I was able to decipher and parse the header.  The structure of the entries looks like this:

struct {
    uint32 name_length;
    char file_name[];
    uint32 offset;
    uint32 compressed_size;
    uint32 uncompressed_size;
}

With the header parsed, it was time to decompress the data.  From strings found in the game and azp.exe, I knew that zlib was compiled in the executable. Also, I was pretty sure that sub_40D750 is zlib inflate based on strings found in the body of the function:

0040D7D2 | mov dword ptr ds:[eax],D                            | D:'\r'
0040D7D8 | mov dword ptr ds:[esi+18],azp.414BB4                | 414BB4:"unknown compression method"
0040D7DF | jmp azp.40D9BB                                      |
0040D7E4 | mov ecx,dword ptr ds:[eax+4]                        |
0040D7E7 | mov edx,dword ptr ds:[eax+10]                       |
0040D7EA | shr ecx,4                                           |
0040D7ED | add ecx,8                                           |
0040D7F0 | cmp ecx,edx                                         |
0040D7F2 | jbe azp.40D806                                      |
0040D7F4 | mov dword ptr ds:[eax],D                            | D:'\r'
0040D7FA | mov dword ptr ds:[esi+18],azp.414BA0                | 414BA0:"invalid window size"

This function was hit when azp.exe extracted files which indicates that indeed, zlib inflate is used.  I dumped the first file from the AZP archive and the first two bytes where 0x78 0x9C which correspond to a zlib header.  After this, it was a matter of reading the compressed data from the AZP archive, zlib inflating it, and saving the decompressed data to the appropriately named file. Since the cipher is symmetric, the same secret key is used to cipher the table of contents, so I was able to add packing functionality to the tool I developed.

E5DEC
HWM.azp contained two interesting files: LOCAL.txt and TIPS.txt. Using ed5dec.exe, I was able to decipher LOCAL.txt into a human readable file. Following a similar process as reverse engineering azp.exe, I found that these text files are ciphered using the exact same algorithm as the AZP table of contents. It was a simple matter of reading the whole file and deciphering it with the function I developed for azp files.

00401260 | push ebp                                            |
00401261 | mov ebp,esp                                         |
00401263 | push edi                                            |
00401264 | push edi                                            |
00401265 | push ecx                                            |
00401266 | push edx                                            |
00401267 | push eax                                            |
00401268 | mov edi,dword ptr ss:[ebp+8]                        |
0040126B | mov ecx,dword ptr ss:[ebp+C]                        |
0040126E | mov eax,dword ptr ss:[ebp+10]                       |
00401271 | cld                                                 |
00401272 | push eax                                            |
00401273 | xor eax,eax                                         |
00401275 | xor edx,edx                                         |
00401277 | pop dx                                              |
00401279 | pop ax                                              |
0040127B | mul dx                                              |
0040127E | dec ax                                              |
00401280 | xor eax,edx                                         |
00401282 | mov edx,FF                                          |
00401287 | and edx,eax                                         |
00401289 | mov al,byte ptr ds:[edi]                            |
0040128B | xor edx,eax                                         |
0040128D | mov byte ptr ds:[edi],dl             // dl contain deciphered byte
0040128F | inc edi                                             |
00401290 | loop e5dec.40127B                                   |
00401292 | shl eax,10                                          |
00401295 | or ax,dx                                            |
00401298 | mov dword ptr ss:[ebp+10],eax                       |
0040129B | pop eax                                             |
0040129C | pop edx                                             |
0040129D | pop ecx                                             |
0040129E | pop edi                                             |
0040129F | mov eax,dword ptr ss:[ebp+10]                       |
004012A2 | pop edi                                             |
004012A3 | pop ebp                                             |
004012A4 | ret                                                 |

Notes

http://unix.superglobalmegacorp.com/xnu/newsrc/bsd/net/zlib.h.htmlhttps://stackoverflow.com/questions/9050260/what-does-a-zlib-header-look-like#17176881
https://tools.ietf.org/html/rfc1950
http://compgroups.net/comp.compression/what-is-this-compression-format/185138