This article is part of series about reverse-engineering LKV373A HDMI extender. Other parts are available at:
- Part 1: Firmware image format
- Part 2: Identifying processor architecture
- Part 3: Reverse engineering instruction set architecture
- Part 4: Crafting ELF
- Part 5: Porting objdump
- Part 6: State of the reverse engineering
As we should now be able to follow any jump present in the code, it is now time to make analysis more automatic. My target tool for that purpose will be objdump. However, we still have firmware image as raw dump of memory. To be able to use objdump easily, we need to pack our firmware into some container understandable by objdump. Most obvious choice is ELF (Executable and Linkable Format) and this is what I am going to use.
For the purpose of packing data into ELF, I’ve made Python library that makes it easier. For now, it is able to split firmware image into sections, like .text or .data, so objdump will be able to disassembly only the parts of firmware that are in fact a code. Moreover, it can define symbols inside the binary, so it is possible to store information, where certain functions starts and ends, same for any variables, like strings. As of now, there is no CLI interface for the program. If it turns out that such interface is necessary (like for addition of many symbols), it will be added.
Library code can be downloaded from Github. Currently, any LKV373A-specific modifications to this library is stored on branch lkv373a, to not rubbish main – master branch. Throughout this tutorial, I assume, we are using code on this branch, so there might be some LKV373A-specifics, especially regarding enum types (i.e. processor architecture enum).
At this point, I need to warn, that I am not going to describe internal structure of ELF file, nor any features that might be visible from outside, like sections concept, so if you are not familiar with them, it is good time to learn about them, as it might be very difficult to understand, what I am writing about. There are many good resources explaining them. Ones I was using are: this blog post and this documentation.
Creating new ELF
Example code that creates brand new ELF file is as easy as:
1 #!/usr/bin/python3 2 # demo script for creating ELF 3 import os 4 from elf import * 5 6 elf = ELF(e_machine=EM.EM_LKV373A) 7 8 fp = os.open('lkv373a.fw.elf',os.O_CREAT|os.O_WRONLY) 9 os.write(fp, bytes(elf)) 10 os.close(fp)
This, at first does all necessary imports, then creates new
ELF object in line 6, and, finally, converts it to
bytes object and immediately writes to file descriptor. That’s it!
After this, you should get valid, empty ELF file for architecture called lkv373a, which, obviously does not exists and no other program know how to handle, but we are going to change that in future.
ELF object, few things can be defined, in addition to architecture id. They are all described in documentation, I will mention near the end of this tutorial. You are also free to dig in structure of
ELF object. There is no encapsulation in it and structure validation is very permissive, so even completely broken ELFs could be produced, if needed.
Next step is to add some sections to our ELF file.
fw = os.open('LKV373A_TX_V3.0c_d_20161116_bin.bin',os.O_RDONLY) fw_blob = os.read(fw, 0xffffffff) irq_blob = fw_blob[:0x1000] text_blob = fw_blob[0x7d100:0x7d100+0x0b53c0] data_blob = fw_blob[0x0b53c0:0x0b53c0+0x102060] smedia_blob = fw_blob[0x200000:0x200000+0x283105] irq_id = elf.append_section('.irq',irq_blob,0) txt_id = elf.append_section('.text',text_blob,0x7d100) data_id = elf.append_section('.data',data_blob,0x0b53c0) smedia_id = elf.append_section('.smedia',smedia_blob, 0x200000)
At first, I am extracting them from firmware image and then inserting them to
append_section is a handy wrapper to low-level modifications that must be done on ELF structure, hidden under what we can see as ELF instance (these low-level structures are, however still available to the user as
Modifying section attributes
Ok, so now we have sections in our ELF file, ready to save to disk. Before that, one thing can yet be done: setting proper attributes. They tell readers, if program is able to write or execute sections of memory, among other features, I am going to ignore here. This might be useful, as some readers might be confused about what is code (text) and what is data. In our case, we have two text sections (
.text), so we are going to set them executable flag (
SHF_EXECINSTR). Furthermore, we will set
SHF_ALLOC flag for any section that is going to be loaded into memory (so all of them).
This can be done with:
elf.Elf.Shdr_table[irq_id].sh_flags = SHF.SHF_ALLOC | SHF.SHF_EXECINSTR elf.Elf.Shdr_table[txt_id].sh_flags = SHF.SHF_ALLOC | SHF.SHF_EXECINSTR elf.Elf.Shdr_table[data_id].sh_flags = SHF.SHF_ALLOC elf.Elf.Shdr_table[smedia_id].sh_flags = SHF.SHF_ALLOC
Segments are another concept, existing beside sections. They are stored in program header of ELF file and are somehow linked to section data. They allow to define another set of attributes to areas in memory. I don’t think they will be required to define, to perform analysis in objdump, but since at least one such program header, defining segment must exist in ELF file of type executable, there is interface similar to this for sections.
To define new segment, based on
.text section, you can issue:
This also marks the segment as read and executable, but not writable.
Loading existing ELF
Loading existing ELF can be easily done from file with:
newelf, b = ELF.from_file('lkv373a.fw.elf')
Alternatively, it can also be loaded from
fd = os.open('some.elf', os.O_RDONLY) b = os.read(fd, 0xffff) os.close(fd) manualelf, b = Elf32.from_bytes(b)
In latter case, I assumed that
os library is already imported into python.
Adding a symbol
This is very useful for making analysis of code. New symbol can be added using calls like:
elf.append_symbol('irq0', irq_id, 0, 0x44, STB.STB_GLOBAL, STT.STT_FUNC) elf.append_symbol('sprintf', txt_id, 0x9b9f8-0x7d100, 0x78, STB.STB_GLOBAL, STT.STT_FUNC) elf.append_symbol('thread_c_path', data_id, 0xba78a-0x0b53c0, 0xba7bb-0xba78a, STB.STB_LOCAL, STT.STT_OBJECT)
First call defines function of length
.irq section. To do this, ID of
.irq section must be known. Luckily, we want to add symbol at the beginning of the section, so as offset,
0 was provided.
In the second case, we also want to define a function, but now we only know absolute address of the function (
0x9b9f8), but what we need to pass is offset in
.text section. To achieve this, we need to subtract address of the start of
.text section (
In the last example, we define a string as an object of certain address and length. Both address and length are computed by subtracting absolute addresses. This symbol will be marked as local, which is default behavior for
There are many more things possible to do using makeelf library. What I showed here is mostly, what is possible using high-level wrappers, doing many things under the hood. But as there is also low-level interface, virtually anything is possible.
To make exploring interfaces easier, I’ve made doxygen documentation for most of the library. It can be found on my server, here. Feel free to use the library for anything you want.
The library presented here should allow us make one step further to easy to use reverse engineering environment. It will by the way allow to store new findings in easily-modifiable Python scripts.
What I showed in examples to library interfaces split LKV373A firmware image into 4 sections. At this moment I already know that there are at least 6 sections, where code and data are in two parts (forming ICDCDS layout, where I-irq, C-code and so on). Also there should be some more symbols possible to place at this moment.
If I succeed in porting objdump, or any other tool able to disassemble ELF file, next step would be to publish Python script, utilizing library presented here, that annotates LKV373A firmware. So stay tuned, I hope there will be many further interesting findings throughout this reversing process!