Skip to content

re-ws.pl

ReverseEngineering WorkStation

  • Home
  • Tutorials
  • Random
  • About

Tag: Reverse Engineering

LKV373A: reverse engineering instruction set architecture

Posted on September 21, 2017 - October 17, 2019 by Kamil (aka. v3l0c1r4pt0r)

This article is part of series about reverse-engineering LKV373A HDMI extender. Other parts are available at:

  • Part 1: Firmware image format
  • Part 2: Identifying processor architecture
  • Part 3: Reverse engineering instruction set architecture
  • Part 4: Crafting ELF
  • Part 5: Porting objdump
  • Part 6: State of the reverse engineering
  • Part 7: radare2 plugin for easier reverse engineering of OpenRISC 1000 (or1k)

As I wrote in previous part, my choice is in fact reverse engineering instruction set. The goal of this post is not to reverse-engineer whole instruction set, because even in RISC architectures some of the instructions might be quite rarely used.

Before starting the actual reverse engineering, place where such analysis would be easiest should be identified. Then it might be possible to use often repeating patterns to guess instructions that the pattern consists of. The result of this tutorial will be description of opcodes related to jumping and few other often used opcodes, mostly related to memory operations.

Identifying target

LKV373A Encoder firmware - data region
Data region

As written above, first step is to find good target. As was shown in previous article, it is possible to find references to constants mixed into code.

In the picture on the right one especially interesting information can be seen – it seems that as an operating system FreeRTOS was used. FreeRTOS is open source project, so its source code can be downloaded.

This information could be later possibly used to link compiled code with FreeRTOS source code. Let’s look into source code to find some useful location. As we already know the encoding of opcode for an operation on strings, finding some string in code might work. I was able to identify one such place in tasks.c file. It is shown on snippet below:

4110           if( ulStatsAsPercentage > 0UL )
4111           {
4112             #ifdef portLU_PRINTF_SPECIFIER_REQUIRED
4113             {
4114               sprintf( pcWriteBuffer, "\t%lu\t\t%lu%%\r\n", pxTaskStatusArray[ x ].ulRunTimeCounter, ulStatsAsPercentage );
4115             }
4116             #else
4117             {
4118               /* sizeof( int ) == sizeof( long ) so a smaller
4119               printf() library can be used. */
4120               sprintf( pcWriteBuffer, "\t%u\t\t%u%%\r\n", ( unsigned int ) pxTaskStatusArray[ x ].ulRunTimeCounter, ( unsigned int ) ulStatsAsPercentage );
4121             }
4122             #endif
4123           }
4124           else
4125           {
4126             /* If the percentage is zero here then the task has
4127             consumed less than 1% of the total run time. */
4128             #ifdef portLU_PRINTF_SPECIFIER_REQUIRED
4129             {
4130               sprintf( pcWriteBuffer, "\t%lu\t\t<1%%\r\n", pxTaskStatusArray[ x ].ulRunTimeCounter );
4131             }
4132             #else
4133             {
4134               /* sizeof( int ) == sizeof( long ) so a smaller
4135               printf() library can be used. */
4136               sprintf( pcWriteBuffer, "\t%u\t\t<1%%\r\n", ( unsigned int ) pxTaskStatusArray[ x ].ulRunTimeCounter );
4137             }
4138             #endif
4139           }
4140 
4141           pcWriteBuffer += strlen( pcWriteBuffer );
vTaskGetRunTimeStats printf format
vTaskGetRunTimeStats formatting strings

If we look at code before the snippet, we can see that the function is surrounded with ifdefs and is meant to be turned on only for demo purposes. Also searching for complete formatting string from sprintf function above fails on LKV373A firmware. Fortunately it is present in code and happens to have some modifications. One of them is the formatting string we were searching for. I was able to identify them starting at offset 0xbab2f. You can see it on hexdump. What is a bit surprising is that there are four such strings, while we expected only two in whole code. But "IDLE" string after them is confirming that it must be tasks.c module.

Now we can use method shown on previous tutorial about processor identification to find references to these strings. Finally I found usages of offsets 0xbab5c and 0xbab73 (marked in green and blue) near offset 0x91fd4.

At this moment, we have machine code and source code that is very likely to be compiled one to one into this machine code. We can also see here very useful side-effect of open source popularity: we have a system that has quite unusual function and is using open source software. So we can conclude that we can be almost sure that any random part of code also has open source software in itself.

Code patterns

As we already have quite reliable anchor for our analysis, we could try identifying more opcodes. But, to make things easier, I want to go the pattern matching way. Whichever architecture you analyze, you see some patterns that are same or almost the same on any architecture. This is especially true on RISC architectures, as they have very limited set of functions, so compiler have to join two or more instructions to get desired high level functionality. It this section, I will describe some of such patterns, I was able to identify and decode in LKV373A firmware. They are:

  • Function call
  • Function prologue and epilogue
  • Read-only memory access
  • Compare and jump
  • Variadic functions

Following is short description of the above patterns.

Function call

This is main element of ABI (Application Binary Interface) from the point of view of a programmer. Therefore it should also be well-known, even to people not involved in assembly programming or reverse-engineering. It is all about the method of passing arguments, before a call to a function.

Let’s see how such a call looks on MIPS architecture:

lw      gp,24(sp)
nop
lw      s0,-32736(gp)
nop
addiu   s0,s0,6872
lhu     s0,0(s0)
addiu   a0,sp,34        # a0 = sp + 34
move    a1,zero         # a1 = 0
li      a2,16           # a2 = 16
move    s7,v0
sh      s0,32(sp)
lw      t9,(get_sr_name_ptr - 0x1000BE50)(gp)
nop
jalr    t9

As we can see in case of MIPS, arguments are passed in registers named a0, a1, a2 and so on. Then address of function to call is loaded to t9 and jalr (jump and link register) is performed. Usually, in case of ABI, where arguments are passed in registers, when number of arguments is greater than number of such registers, they are passed through memory (i.e. stack).

Function prologue and epilogue

When performing a call to different function, state of the processor have to be preserved, so after the call it is again the same as before (with exception of few registers, used e.g. for return value passing). This operation may be done by caller or callee. Let’s look again at MIPS code to see how it works there:

sw      ra,5328(sp) 
sw      s8,5324(sp) 
sw      s7,5316(sp) 
sw      s6,5312(sp) 
sw      s1,5292(sp) 
sw      s0,5288(sp) 
sw      gp,5320(sp) 
sw      s4,5304(sp) 
sw      s3,5300(sp) 
sw      s2,5296(sp) 

I think there is nothing special to comment here. The opposite happens on function epilogue and additionally, immediately after that return instruction should appear.

Read-only memory access

This one is already partially analyzed on previous part of this series. It is usually appearing where some strings need to be used in code. Strings written in code as literals are stored in part of the memory where they shouldn’t be modified. Then theirs addresses are computed using some base register, or directly if code is not relocatable. This is how it works on MIPS:

lw      t9,-30404(gp)

But, as we can see on previous post, we have something missing on our mysterious architecture. There, address was computed as $3=$3+0xbadadd. So, we should expect that register used should be set to something before that.

Compare and jump

This is after calling conventions, another extremely popular scheme. It usually consists of two opcodes. At first two numbers are compared, and some flags in processor are set. Then based on the state of one of the flags, jump is performed, or processing is continued if flag has value different than we expect. This time, let’s see how it works on x86 platform, as MIPS uses a bit different philosophy:

cmp     [ebp+arg_4], eax
jnz     loc_404CEB

On x86 cmp instruction causes the subtraction of two parameters, without actually storing the result, but only updating the FLAGS register, so it is known if the value is zero, or the overflow happened, and so on. Then based on flag value (in this case if zero flag is not set), jump occurs, or not.

Variadic function

Variadic function is function that can get variable number of parameters. The most popular example of such function is printf. It accepts format string and parameters, which number depend on format string. On system, where parameters are passed through registers, I expect it to get format through register and rest of parameters through stack or dedicated structure, so generally memory. Once we know how constants are accessed, it should be quite easy to identify, as it most likely will get format string as one of the first parameters, and somewhere close to that parameters should be stored, one after another.

Identifying patterns

Now, as we know what pattern we will look for, it is time to find them in code and guess functions of particular opcodes.

Read-only memory access

Let’s look at area near reference to our format string:

00091FB8                 .word 0x1000000A         # 04: ? 0x0A
00091FBC                 .word 0xD401D00C         # 35: ? $0, $1, $26, 0x0c
00091FC0                 .word 0x18600010         # 06: ? $3, $0+0x10
00091FC4                 .word 0x1880000B         # 06: ? $4, $0+0x0b
00091FC8                 .word 0xD4012810         # 35: ? $0, $1, $5, 0x10
00091FCC                 .word 0xA8638608         # 2A: la $3, $3+0x8608
00091FD0                 .word 0x400268A          # 01: ? 0x268a
00091FD4                 .word 0xA884AB5C         # 2A: la $4, $4+0xab5c
00091FD8                 .word 9                  # 00: ? 0x09
00091FDC                 .word 0xBC160000         # 2F: ? $0, $22
00091FE0                 .word 0x18600010         # 06: ? $3, $0+0x10
00091FE4                 .word 0x1880000B         # 06: ? $4, $0+0x0b
00091FE8                 .word 0xA8638608         # 2A: la $3, $3+0x8608
00091FEC                 .word 0xA884AB73         # 2A: la $4, $4+0xab73
00091FF0                 .word 0x4002682          # 01: ? 0x2682
00091FF4                 .word 0x15000000         # 05: ?

After splitting instruction words into parts and decoding opcode, register and immediate values, we can see that string’s address is based on register $4 and stored also in register $4. If we look few lines upwards, we can see that register $4 is computed based on register $0 and offset 0x0b (marked in orange). Register $0 is often used to always store value 0. Now, if we look at original offset of string in firmware, we can see that it is 0xbab5c! So that instruction must store immediate value in register’s higher half. Therefore we just guessed function of opcode 06. Later this opcode will be described as lh (load high).

By the way we also discovered that almost surely firmware image is mapped to address 0 after loading to operational memory or more likely mapping EEPROM to address space.

Compare and jump

In the snippet above, another one thing is quite interesting. And weird at the same time. There are few instructions that seem to not contain any register encoded and have weird uneven offsets, often with quite low values. At this moment my theory is that shorter ones are some kind of jumps (like 0x91fd8) and longer ones are function calls (like 0x91fd0). Then it is time to try to find compare and jump pattern.

If we are right, then opcodes like 00, 04 are jumps and 01 means a call. After this section we should also tell conditional and unconditional jumps apart.

Ok, so we now need to go back to source code and find some good candidate for conditional jump. It should be as close to format string as possible. If we look at the snippet from Identifying target section, we can see one such check on line 4110. It checks for value being greater than zero. Going upwards a little bit, we encounter one 04 opcode, immediately preceded by 2F instruction:

00091FB0                 .word 0xD4011808         # 35: ? $0, $1, $3, 0x08
00091FB4                 .word 0xBC050000         # 2F: ? $0, $5
00091FB8                 .word 0x1000000A         # 04: ? 0x0A
00091FBC                 .word 0xD401D00C         # 35: ? $0, $1, $26, 0x0c
00091FC0                 .word 0x18600010         # 06: ? $3, $0+0x10
00091FC4                 .word 0x1880000B         # 06: ? $4, $0+0x0b

Now, if we look at occurrences of 2F opcode, we can spot that it is appearing usually near 04 opcode. However we cannot tell that they appear in exactly this order which is quite weird. On the other hand if we look at register this particular occurrence uses, it is quite likely it is compare opcode.

If we assume that 2F is cmp (compare) and 04 is jg (jump greater), we see that this more or less matches behavior we expect from the code immediately preceding sprintf from FreeRTOS source code.

However we still miss one information: what does the offset mean. If it is jump instruction, then we cannot jump 10 bytes ahead, because we would land in the middle of instruction. If we look again at source code, we can see that our jump should not go very far forward, so value should also not be too high. We can also exclude usage of register as address, because it would be register $10, which is not set anywhere near jump.

Having no other idea, I did an experiment. I multiplied jump value by 4, because length of instruction is always 4 and added next instruction address to result. Then I checked what is there and… bingo! It jumped above the sprintf call and ended up immediately after it. Some time later, I discovered that it is not completely truth. It happens that real formula is:

addr = imm * 4 + PC

Where imm is instruction argument and PC is program counter before executing the instruction (so address of jump opcode).

The question still is how more sophisticated compares are performed, because every one I’ve seen is just telling which value is greater. As there does not seem to be any flag in instruction, maybe there is no other option and to do that some arithmetic operation must be done to bring them to greater than operation?

Function call

Another thing, we can learn near the sprintf function is how parameters are passed to function. Signature for sprintf is:

int sprintf(char *str, const char *format, ...);

So after analyzing its call, we should know how at least two first parameters are passed. Let’s see how it looks like in machine code:

00091FC0  .word 0x18600010  # 06: lh $3, $0+0x10
00091FC4  .word 0x1880000B  # 06: lh $4, $0+0x0b
00091FC8  .word 0xD4012810  # 35: ? $0, $1, $5, 0x10
00091FCC  .word 0xA8638608  # 2A: la $3, $3+0x8608   # 0x108608 = pcWriteBuffer?
00091FD0  .word 0x400268A   # 01: call +0x268a       # 0x9b9fc = sprintf
00091FD4  .word 0xA884AB5C  # 2A: la $4, $4+0xab5c   # 0xbab5c = "%-16s %c %7u%s\t%2u%%\r\n"

Here, another interesting detail appears: last instruction setting the registers appears after the actual call. This is perfectly normal and can also be found on MIPS architecture. Its purpose is to allow concurrent execution of the two instructions.

Variadic functions

Now, if we scroll a bit upwards, we can see some interesting bunch of 35 opcodes. We know, that our call should have more than 2 parameters and thanks to format string we can tell that there should be exactly 5 extra parameters. Now if we count number of 35 opcodes, we see that these numbers match.

00091FA8  .word 0xD4012000  # 35: ? $0, $1, $4, 0x00
00091FAC  .word 0xD401A004  # 35: ? $0, $1, $20, 0x04
00091FB0  .word 0xD4011808  # 35: ? $0, $1, $3, 0x08
00091FB4  .word 0xBC050000  # 2F: cmp $0, $5
00091FB8  .word 0x1000000A  # 04: jg 0x91fe0
00091FBC  .word 0xD401D00C  # 35: ? $0, $1, $26, 0x0c
00091FC0  .word 0x18600010  # 06: lh $3, $0+0x10
00091FC4  .word 0x1880000B  # 06: lh $4, $0+0x0b
00091FC8  .word 0xD4012810  # 35: ? $0, $1, $5, 0x10
00091FCC  .word 0xA8638608  # 2A: la $3, $3+0x8608
00091FD0  .word 0x400268A   # 01: call 0x9b9fc
00091FD0                    #   sprintf(pcWriteBuffer, "%-16s %c %7u%s\t%2u%%\r\n",
00091FD0                    #     $4, $20, $3, $26, $5)
00091FD4  .word 0xA884AB5C  # 2A: la $4, $4+0xab5c

So, we can tell almost for sure that opcode 35 gets value of third register parameter and stores it at address computed as follows:

addr = reg1 + reg2 + imm

So e.g.

*($0 + $1 + 0x0c) = $26

Unfortunately with only that information, we can only guess the order of parameters, i.e. if $4 is first parameter or last one.

For future, we will denote opcode 35 as sw (store word).

Function prologue and epilogue

As we know exactly at which address the call will land, we can try to decode function prologue and epilogue, so what happens just after the call and just before returning back. Let’s see how such part of code looks, for example of sprintf function:

0009B9F0           .word 0x44004800  # 11: ret
0009B9F4           .word 0x8441FFF8  # 21: ? $2, $1, -0x08
0009B9F8 sprintf:  .word 0xA9030000  # 2A: la $8, $3
0009B9FC           .word 0x18600010  # 06: lh $3, $0+0x10
0009BA00           .word 0xD7E14FFC  # 35: sw $31, $1, $9, -0x04
0009BA04           .word 0xD7E117F8  # 35: sw $31, $1, $2, -0x08
0009BA08           .word 0xD7E10FF4  # 35: sw $31, $1, $1, -0x0c

And to confirm it is the reverse at the end, let’s see epilogue:

0009BA60           .word 0x8521FFFC  # 21: ? $9, $1, $31, -0x04
0009BA64           .word 0x8421FFF4  # 21: ? $1, $1, $31, -0x0c
0009BA68           .word 0x44004800  # 11: ret
0009BA6C           .word 0x8441FFF8  # 21: ? $2, $1, -0x08

By the way immediately after sprintf there is strlen function, that is also called by function we are analyzing. So we see that registers stored with sw instructions are then recovered by opcode 21. Then we can safely assume it is reverse and denote it as lw (load word).

And we see that last jump in function is done by opcode 11, so we can denote it as ret (return). I still don’t know what is the meaning of its parameters. If we use standard decoding, it would be:

ret $0, $0, $9, 0x00

But I have no proof that it is the real meaning. I only see that sometimes this third hypothetical register have different value, but usually it is $9.

From my experience register saving, we see here is done to stack. If in this case it is also true, then we have two options for stack pointer: $1 and $31. Some more investigation must be done to tell which one is SP.

Other methods

We can also try to find constants other than strings. Then we have a chance, that there will be some arithmetic operation going on with them. Personally, I haven’t tried that approach, so I can’t show any example.

Another method might be finding references to some known structures. We can see one such structure in the function, we analyzed (TaskStatus_t). This is also left as an exercise to the reader.

Conclusion

Main focus of the analysis was on branching. As shown, we know quite a lot about not only branching, but also whole ABI. Now it should be possible, as soon as main entry of the system is found, to discover complete flow of the program.

We now know that first parameters are passed in registers $3, $4 and possibly so on. After analysis of function prologue and epilogue, we also know that here callee is responsible for preserving register values.

To sum up, we already know following instructions:

  • 00: jmp off
  • 01: call off
  • 03: j? off
  • 04: jg off
  • 06: lh $r1, $r2, imm
  • 11: ret
  • 21: lw $r1, $r2, $r3, imm
  • 2A: la $r1, $r2, imm
  • 2F: cmp $r1, $r2
  • 35: sw $r1, $r2, $r3, imm

We also know that in this architecture, there is mechanism of slots, identical with that in MIPS. Together with fixed-sized instructions and opcode and register field lengths, it is really similar to MIPS. Unfortunately it is not exactly the same, so reverse engineering of ISA have to be continued.

Unfortunately, after doing the research described here, I see that tools I used are not enough to do more reversing efficiently. So, before doing one step forward, I have to find a way to introduce more automation to the process. As soon as I succeed with this, I will write next part, so stay tuned!

Posted in Reversing LKV373A, TutorialsTagged assembly, English, FreeRTOS, hardware, Reverse EngineeringLeave a comment

LKV373A: identifying processor architecture

Posted on September 10, 2017 - October 17, 2019 by Kamil (aka. v3l0c1r4pt0r)

This article is part of series about reverse-engineering LKV373A HDMI extender. Other parts are available at:

  • Part 1: Firmware image format
  • Part 2: Identifying processor architecture
  • Part 3: Reverse engineering instruction set architecture
  • Part 4: Crafting ELF
  • Part 5: Porting objdump
  • Part 6: State of the reverse engineering
  • Part 7: radare2 plugin for easier reverse engineering of OpenRISC 1000 (or1k)

In the first part, I identified two main problems for further development. First one is unidentified checksum, appended to the end of firmware image. Second one is unknown LZSS-like compression algorithm, used to compress machine code of application processor’s firmware.

Encoder firmware

The thing that till now was more or less unexplored is encoder firmware. LKV373A consists of two processors – application processor using the firmware analyzed in first part and encoder, which reversing I am going to push forward with this article.

Target frmware that I am working on is called LKV373A_TX_V3.0c_d_20161116_bin.bin and is obtainable from danman’s firmware collection.

At first, encoder firmware looks completely different than ITEPKG. The latter was completely structured. This one starts with some data fields, then there is big block of randomly looking data, interlaced with some strings. At the end is familiar SMEDIA02 structure. And this block of “random” data is our target.

First idea I had was running binwalk with -A switch to look for some known opcodes. With no luck. But if we look closer, it seems that this in fact is some kind of machine code.

Finding target

Data region

Ok, what we know now is that in firmware image there is a region with many strings written one after another, like on the first hexdump. In another place there is quite a lot data where some of the words are similar to the other. One such fragment can be seen on another hexdump.

Presumable code region

So, we can guess, there is code region around 0x81680 and code region near 0xbbc70. Now the question is: how to prove it.

Before proving our hypothesis, one thing is worth noting. If we are right and bytes here are really machine code, then we can be almost sure that instructions are always (or almost always) encoded into 32-bit numbers. That fact is very useful, when we will try to find candidate architectures to test against our characteristics.

How to prove it?

Fortunately, we can see one interesting candidate string. There is some debug message marked in red on data region. If our guess is valid, we should find some reference to it and moreover we can expect there would be only one reference to that particular string.

But now, there is another problem. How to find reference to string, saved somewhere in memory, at offset we have no chance to guess? Now the experience with any machine code might be useful. Usually assembly mnemonics are translated more or less in a form they are written by programmer. So, if we have hypothetical instruction:

add $1,$2+0x1234

Chances are it will be translated to something like:

0x21 0x01 0x02 0x12 0x34

Where lengths of any of these fields and maximum offset possible to write depends on particular architecture.

We can make another assumption. Usually if persistent memory (like EPROM) finally is mapped to operational memory, usually nobody designs device the way, where start of some section is not at address, padded to i.e. page size. So, if we are lucky final address of string in memory should have same least significant bits as our firmware image.

Then, connecting the dots, we can try searching firmware for let’s say two least significant bytes of string offset (0xbcc8). One more remark here: we still don’t know endianness of the processor. And what is worse, after reading firmware format description we don’t know it even more. So we have to check both variants.

Of course, it might happen that in case of different firmware for completely different architecture, all that might fail. There is only one advice to succeed with such analysis: be creative!

Hit found!

The first impression is that I was wrong. There are in fact 5 hits. But… If we look at general firmware structure, we can see that most of the hits are inside SMEDIA container, which as I described in previous post about reverse engineering LKV373A consists of mostly compressed data, and unlikely has any uncompressed code. So, success! We have only one hit inside area we suspected to be code.

By the way we know more than the area is in fact code section. We see that numbers in machine code are stored as big endian. And as we can, we can use 16 bits as offset here. That further narrows number of possible architectures. For instance, popular ARM architecture is little endian, so it is not likely to be used in this case.

I’ll leave as an exercise verification that we are right on another strings present in this firmware.

Further guesswork

Next thing that might ease things a bit is finding out what is a whole format of instruction, that is:

  • how many bits are used for opcode?
  • how many operands we can use?
  • how many bits encode an operand?

When I was doing the work I was not experienced enough to see it from plain word we already have. Then my approach was to guess meaning of 0xA8 opcode we see. As we know that it points to string in possibly write-protected memory, we should not expect it to perform any arithmetic or logic operations. It is rather likely to get pointer of string and store somewhere. Since architecture gives us only 32 bits to use per instruction it is not likely to copy memory somewhere. For me most probable operation is computation of string address, based on some base register and storage in another register. Therefore, I will call the operation: “Load Address” (abbreviated “la“).

Because that information alone does not give us any more hints, but can be used to match our mysterious architecture to one of the known ones, it is high time to do even more guesswork.

Specification matching

Wikipedia has quite useful page, that might help us. But, at first it is good to check some popular ones to see if they match.

As an example, I will use MIPS architecture. It is often used in embedded systems and was quite popular in routers, especially in past, when they were not running Linux. Furthermore it is the most popular architecture that might be big endian, so seems like perfect first check.

Now, we can find some hints on Wikibooks. On MIPS there is instruction format that might match our characteristics.

I Format
opcode rs rt imm
6 bits 5 bits 5 bits 16 bits

After search in MIPS manual (its name is “MIPS32™ Architecture For Programmers Volume II: The MIPS32™ Instruction Set” and is probably no more available on original source, but rather some random sites, so no link here), we can see that instruction that has first 6 bits of a word matching our target (0b101010) is “Store Word Left” (swl), so it is not really what we expected.

Now, we have to move on and check another architecture, and another, and another, until we decide we can’t find anything matching. Or finally we will find something. In my case the result is fail. I checked:

  • MIPS
  • ARM
  • ARC
  • RISC-V
  • PowerPC
  • SPARC
  • MicroBlaze
  • Xtensa
  • IBM S/390
  • Motorola 68k
  • DLX
  • Mico32
  • LEON
  • OpenRISC
  • NIOS II
  • m32r

And nothing matched. So architecture here is something really uncommon. Because the device is probably performing HDMI signal processing beside being normal processor, it is likely it is in fact soft processor, programmed into some FPGA. Therefore it is likely, finding the architecture documentation will be impossible.

Decoding instruction format

Then we can try another approach. Let’s play a little bit and try to decode rest of instruction using above MIPS I format. First 6 bits means instruction opcode (0x2A or 0b101010), next are two 5-bit fields meaning destination and source register, so we have register 3 for destination and register 3 for source. That definitely makes sense!

So, our hypothesis is that we have 6-bit opcode, 5-bit operands and 16-bit offset encoding. Here, again proving that fact is left as an exercise for the reader.

Going back to our target instruction, we now know that it is executing code like that:

$3 = $3 + 0xbcc8

We will denote this instruction as:

la $3,$3+0xbcc8

What we know?

Now, to sum things up, we learned following things about device’s architecture:

  • 32-bit (4-byte) static instruction length
  • big endian
  • 6-bit opcodes (maximum number of opcodes is 0x40)
  • 5-bit operands (32 general purpose registers available)
  • indirect addressing of up to 65535 bytes (0xffff)

What next?

As we know quite a lot information about the architecture, we can choose one of the two ways: try to find out what is the name of the architecture and find its documentation (not likely to succeed) or reverse engineer as much instruction as we can. My choice probably will be the latter and if I will succeed in pushing knowledge forward, I will try to describe the process in next article in series.

Posted in Reversing LKV373A, TutorialsTagged assembly, English, hacking, Reverse EngineeringLeave a comment

[Import]LKV373A HDMI to Ethernet converter: firmware image format

Posted on September 4, 2017 - October 17, 2019 by Kamil (aka. v3l0c1r4pt0r)

NOTE: This post was imported from my previous blog – v3l0c1r4pt0r.tk. It was originally published on 19th August 2017.

This article is part of series about reverse-engineering LKV373A HDMI extender. Other parts are available at:

  • Part 1: Firmware image format
  • Part 2: Identifying processor architecture
  • Part 3: Reverse engineering instruction set architecture
  • Part 4: Crafting ELF
  • Part 5: Porting objdump
  • Part 6: State of the reverse engineering
  • Part 7: radare2 plugin for easier reverse engineering of OpenRISC 1000 (or1k)

Recently, I bought LKV373A which is advertised as HDMI extender through Cat5e/Cat6 cable. In fact it is quite cheap HDMI to UDP converter. Unfortunately its inner workings are still more or less unknown. Moreover by default it is transmitting 720p video and does not do HDCP unpacking, which is a pity, because it is not possible to capture signal from devices like cable/satellite TV STB devices. That is why I started some preparations to reverse engineer the thing.

Fortunately a few people were interested by the topic before (especially danman, who discovered second purpose for the device). To make things easier, I am gathering everything what is already known about the device. For that purpose I created project on Github, which is to be served as device’s wiki. Meanwhile I was also able to learn, how more or less firmware container is constructed. This should allow everyone to create custom firmware images as soon as one or two unknowns will be solved.

First one is method for creation of suspected checksum at the very end of firmware image. This would allow to make modifications to filesystem. Other thing is compression algorithm used to compress the program. For now, it should be possible to dissect the firmware into few separate fragments. Below I will describe what I already know about the firmware format.

ITEPKG

ITEPKG format (container content discarded)

Whole image starts with magic bytes ITEPKG, so this is how I call outer container of the image. It allows to store data of few different formats. Most important is denoted by 0x03 type. It stores another data container, that is almost certainly storing machine code for bootloader, and another entity of same type that stores main OS code. This type is also probably storing memory address at which content will be stored after uploading to device. Second important entity is denoted by type 0x06 and means regular file. It is then stored internally on FAT12 partition on SPI flash. There is also directory entry (0x05), that together with files creates complete partition.

SMEDIA

SMEDIA container (header truncated)

Another data container mentioned on previous section is identifiable by magic SMEDIA. It consists of two main parts. Their lengths are stored at the very beginning of the header. First one is some kind of header and contains unknown data. Good news is that it is uncompressed. Second one is another container. Now the bad news is that it contains compressed data chunks.

SMAZ

SMAZ container

This container’s function is to split data into chunks. One chunk has probably maximum length of 0x40000 bytes (uncompressed). Unfortunately after splitting, they are compressed using unknown algorithm, behaving similarly to LZSS and I have some previous experience with variant of LZSS, so if I say so it is very likely that it is true 🙂 . As for now, I reached the wall, but I hope, I’m gonna break it some time soon. Stay tuned!

Posted in Reversing LKV373A, UncategorizedTagged English, hacking, hardware, Reverse EngineeringLeave a comment

[Import]Understanding JCOP: pre-personalization

Posted on September 4, 2017 - September 10, 2017 by Kamil (aka. v3l0c1r4pt0r)

NOTE: This post was imported from my previous blog – v3l0c1r4pt0r.tk. It was originally published on 25th July 2017.

As I promised some time ago, now I am going to describe process of pre-personalization of a JCOP card. JCOP is one of the easier to get JavaCard-compatible cards. However they cost a bit. The problem with the ones available from eBay sellers is lack of pre-personalization. Ok, there are some advantages of buying not pre-personalizaed card, like ability to set most of its parameters, but by the way it is quite easy to make such card unusable.

Online resources, as I mentioned in the first part of the tutorial, are not very descriptive. They say that there is such thing like pre-personalization and it has to be done before using the card, flashing applets, using them and so on. There is only one source that helps a little bit. Someone has written script for the process. However there are two problem with the script. The first one is that it is written in some custom language and internet does not know about the interpreter, it is probably something provided by NXP – manufacturer of JCOP for its customers and neither me nor (probably) you, reader, are their customers. The consequence is that we can have script in custom language, with commands like ‘/select’ or ‘/send’. Fortunately, documentation of ISO 7816 (smart card connection), allows to decipher this. So this problem could be finally solved. Another problem is lack of command values and addresses in memory, so we do not know where and how to read/write/execute anything. After really deep search in Google, finally, I was able to find out all the missing values, so this tutorial could be written.

Process overview

Ok, after this way too long historical introduction, let’s see what will be needed. I assume, you are already able to communicate with your card using raw PDUs. If you don’t, up to this point there are quite a few resources to learn from, so I will not describe this. The most important thing here is to have so called transport key (KT). If you do not have it, go get it now. Seller should provide it to you, and if he did not, you are stuck, since the first step requires this key.

So, basically steps will be as follows:

  1. Select root applet with Transport Key
  2. Boot the card
  3. Read/write some data
  4. Protect the card
  5. Fuse it

Easy? Easy. But only if you know some hex numbers. Ok, here, one big WARNING: the last step is irreversible and can be done by mistake quite easily, so think twice before sending anything, and if you are sure, that you are done, think twice again, before issuing it.

Pre-personalization, step by step

At first, we use Transport Key to select proper applet. Format of SELECT command is as below:

CLA=00 INS=A4 P1=04 P2=00 Lc=10 (...)

Where CLA is always zero, INS means SELECT, P1, according to ISO7816 means selection by DF name and Lc is length of KT. After that, key have to be appended. Of course, whole APDU is to be given to communications program as binary values or hex values only.

What now follows is specific to NXP cards only and is mostly undocumented publicly. First of such commands is BOOT command. Its format is as follows:

CLA=00 INS=F0 P1=00 P2=00

Now double care have to taken, because FUSE command should be available after this point and its APDU consists only of zeros, so every mistake might make the card unusable, since security keys are generated randomly for each card.

Reading memory

Now the most important values to read are called CM_KEY and GPIN in memory dump, I shared in the previous post on the topic. First one starts at offsets: 0xc00305, 0xc00321 and 0xc0033d and are 0x10 bytes long. The other one can be found at offset 0xc00412 and by default should be 5 bytes long. However maximum length is also 0x10, so it is better to make sure the length is really 5 by reading byte at offset 0xc00407. To sum up following commands need to be issued and results be saved for future use:

CLA=C0 INS=B0 P1=03 P2=05 Lc=10
C0 B0 03 21 10
C0 B0 03 3D 10
C0 B0 04 07 01
C0 B0 04 12 xx

Where CLA + P1 + P2 is concatenated address of memory area to read, INS=B0 is read command and Lc contains length of data to read.

Writing data

Alternatively, it is possible to write custom values to these buffers. This is especially encouraged for users who want to use the card not only for testing. Overwriting the values could be done with following:

CLA=C0 INS=D6 P1=03 P2=05 Lc=10 (...)
C0 D6 03 21 10 (...)
C0 D6 03 3D 10 (...)
C0 D6 04 12 05 (...)

Where user data is filled with some random data of length in Lc field.

Required values

Beside securing keys, it is required to set CM_LIFECYCLE value to 0x01 and make sure all fields related to keys and PIN have proper values. Here, my memory dump can be used as reference, since I initialized the card before dumping the memory.

Finishing

After setting all the fields to desired values, there are two more steps to do. First one is issuing PROTECT command. It looks as below:

CLA=00 INS=10 P1=00 P2=00

And finally, sending FUSE command with:

CLA=00 INS=00 P1=00 P2=00

Here again, remember, that this command cannot be undone!

Well done! Your card should now be pre-personalized and ready to use, even in production environment. At the end, one remark: probably FUSE command does not need to be issued at all. However, if it is not issued, the card is completely insecure and should not be used in production.

Previous part of this tutorial can found under this link.

Posted in Tutorials, Understanding JCOPTagged electronics, English, hacking, hardware, JavaCard, JCOP, Reverse Engineering, smart card1 Comment

[Import]Understanding JCOP: memory dump

Posted on September 4, 2017 - September 10, 2017 by Kamil (aka. v3l0c1r4pt0r)

NOTE: This post was imported from my previous blog – v3l0c1r4pt0r.tk. It was originally published on 8th February 2017.

Some time ago I was struggling with JCOP smart card. The one I received as it turned out was not pre-personalized, which means some interesting features (like setting encryption keys and PIN) was still unlocked. Because documentation and all the usual helpers (StackOverflow) were not very useful (well, ok, there was no publicly available documentation at all), I started very deep search on Google, which finished with full success. I was able to make dump of whole memory available during pre-personalization.

Since it is not something that could be found online, here you have screenshot of it, colored a bit with help of my hdcb program. Without documentation it might not be very useful, but in some emergency situation, maybe somebody will need it.

JCOP memory dump made at the very beginning of pre-personalization

Small explanation: first address, I was able to read was 0xC000F0, first address with read error after configuration area was 0xC09600. I know that, despite of lack of privileges some data is placed there.

There are three configurations: cold start (0xc00123-0xc00145), warm start (0xc00146-0xc00168) and contactless (0xc00169-at least 0xc0016f). Description of coding of the individual fields is outside of the scope of this article. I hope, I will describe them in future.

Next time, I will try to describe the process of pre-personalization, that is making not pre-personalized card, easy to get from usual sources of cheap electronics, able to receive and run applets.

Update: Next part of this tutorial can be found under this link.

Posted in Tutorials, Understanding JCOPTagged electronics, English, hacking, hardware, JavaCard, JCOP, Reverse Engineering, smart cardLeave a comment

[Import]HDCB – new way of analysing binary files under Linux

Posted on September 4, 2017 - September 7, 2017 by Kamil (aka. v3l0c1r4pt0r)

NOTE: This post was imported from my previous blog – v3l0c1r4pt0r.tk. It was originally published on 10th February 2016.

As any observer of my projects spotted, most of the biggest projects I do involves binary file analysis. Currently I am working on another one that requires such analysis.
Unfortunately such analysis is not an easy task and anything that will ease this or speed it up is appreciated. Of course there are some tools that will make it a bit easier. One of them is hexdump. Even IDA Pro can make it easier a bit. Despite them I always felt that something is missing here. When creating xSDM and delz utils, I was using hexdump output with LibreOffice document to mark different structure members with different colors. It worked, but selecting 100-byte buffer line by line was just wasting precious time.

Solution

SDC file analyzed by HDCB script

So why not automate whole process? What we really need here is just hexdump output and terminal emulator with color support. And that’s why I’ve made HDCB – HexDump Coloring Book. Basically it is just extension to bash scripting language. Goal was to create simple script that will hide as much of its internals from end-user and allow user to just start it will his shell using old good ./scriptname.ext and that’s it. HDCB is masked as if it was standalone scripting language. It uses shebang, known from bash or python scripts to let user shell know what interpreter to use (#!/usr/bin/env hdcb). Those who are python programmers should recognize usage of env binary.

In fact it is just simple extension to bash language, so users are still able to utilize any features available in bash. Main extensions are two commands: one (define) for defining variables and the other (use) for defining field or array of that defined type. Such scripts should be started with just one argument – file that is meant to be hexdumped and analyzed.

Internals

Bash scripts are just some kind of a cover of real program. Main core of the program is colour utility. It gets unlimited number of parameters grouped in groups of four. They are in order: offset of byte being colored, length of the field, background and foreground colors. As standard input, hexdump output (in fact only hexdump -C or hexdump -Cv are supported) is provided. Program colors the hexdump with rules provided as arguments. This architecture allows clever hacker to build that cover mentioned in virtually any programming language.

Downloads, documentation and more

As usual, program is available on my Github profile. Sources are provided on GPLv3 license so you are free to contribute to the project and you are strongly encouraged to do so or make proposals of a new functions. Program is meant to be expanded according to my future needs, but I will try to implement any good idea. Whole documentation, installation instructions and the like are also available on Github.

Posted in UncategorizedTagged English, hdcb, Linux, Reverse EngineeringLeave a comment

[Import]SDC file format description – Errata

Posted on September 4, 2017 - September 7, 2017 by Kamil (aka. v3l0c1r4pt0r)

NOTE: This post was imported from my previous blog – v3l0c1r4pt0r.tk. It was originally published on 6th November 2015.

Last year, I published a program for Microsoft Dreamspark’s SDC file decryption. Soon after that I wrote article about SDC file format and its analysis. Now it’s time to complete the description with newest data.

This article wouldn’t be written if not the contribution of GitHub’s user @halorhhr who spotted multi-file SDC container and let me know on project’s page. Thanks!

When writing that post year ago, I had no idea what multi-file container really looks like. Any suspicions could not then be confirmed, because it seemed that these files simply where not used in the wild. A days ago situation changed. I got a working sample of multi-file container so I was able to start its analysis.

Real container format

SDC files with different signatures

After quick analysis, I knew that I was wrong with my suspicions. Filename length and encrypted filename strings are not part of a file description. In fact they are placed after them and filename is concatenated string of all filenames (including trailing null-byte). So to sum up filename of n-th element starts at file[n].filename_offset and ends just like any other c-style string.

Whole header structure is like on the sample header on the right. Note that all headers beside 0xb3 one has been already decrypted for readability. In real header the only unencrypted field is header size at the very beginning of the file. 0xb3 sample has unencrypted header and header size is not present in a file. However file name is encrypted in some way, I haven’t figured out as of now. Encryption method is blowfish-compat (the difference between this and blowfish is ciphertext endiannes). Filenames are encrypted once again.

After header, all other data is XORed using key from EDV string and then deflated, so before reading them, you have to inflate and XOR again. Format of data in 0xb3 version is still unknown, however analysis of compressed and file size hints that it may be stored the same way. It is important to note that depending on file signature different configuration of deflater may be needed. It is now known that files older than 0xd1 header, which appears to be newest (because only this one supports files greater than 4 GiB) need to have deflater initialized with

inflateInit2_(&stream,-15,ZLIB_VERSION,(int)sizeof(z_stream));

or equivalent.

Unknowns

This errata does not contain all information needed to support all variations of SDC files. Beside unknowns I mentioned above, there is another variation that uses 0xc4 signature and which I had no sample of. The only trace of its existence is condition in SDM code. Because of that I cannot write support for that type of file. There is also possibility of multi-file containers having 0xb5 or 0xb3 signatures existence. That type of files seems to appear only lately, but it is probable that it existed in the past. Because of having no samples of them I cannot verify that xSDM properly handles them.

So if you have sample of any of variations mentioned here, just send them to me at my email address: v3l0c1r4pt0r at gmail dot com or if you suppose it may be illegal in your country, just send me SDX link or any other hint for me how can I find them.

Other way?

Few days ago, after I started writing this post Github user @adiantek let me know in issues that there is a method to obey SDM in Dreamspark download process. To download plain, unencrypted file you just have to replace ‘dfc=1‘ to ‘sdm=0‘ in a link Dreamspark provided in SDX file. If it true that it works in every file Dreamspark provides, my xSDM project would be obsolete now. However, because Microsoft’s intentions when creating this backdoor (it seems to be created just for debugging) are unknown, I will continue to support the project and fix any future bugs I will be aware of. But now it seems that this project will start to be just proof of concept for curious hackers and will start to slowly die.

Nevertheless, if you have something that might help me or anyone who may be interested in SDC format in future, just let me know somehow, so it will be available somewhere on the internet.

Posted in UncategorizedTagged Dreamspark, English, Reverse Engineering, SDC, SDMLeave a comment

[Import]Decoding Aztec code from polish vehicle registration certificate

Posted on September 4, 2017 - September 7, 2017 by Kamil (aka. v3l0c1r4pt0r)

NOTE: This post was imported from my previous blog – v3l0c1r4pt0r.tk. It was originally published on 1st August 2015.

About a year ago I interested in mysterious 2D code placed in my car’s registration certificate. After quick research on Google it turned out to be even more mysterious because nobody knew how to decode it. There was even no official document like act or regulation that describes the code somehow. People knew that the code is Aztec code and that’s it. Some companies shared web and Android apps to decode this. And all of them was sending base64 to some server and receive decoded data.

Of course for me it wasn’t rewarding so I started my research on it. After initially scanning the code I’ve seen long string that I immediately recognized as base64. The real fun started after that, because stream I’ve got after that was so strange that at first I had no idea what to do. Upon closer examination it was clear that this data is not damaged but encoded in somewhat strange way. Few days later I was almost sure that this is not encoding but rather compression, because some unique parts of stream was easily readable by human. About a month of learning about compression, looking for even most exotic decompression tools and I was left with almost nothing. I had only weak guess on how decompression parameters could be encoded. I gave up…

Polish vehicle registration certificate (source: pwpw.pl)

About a year later I tried one more time. This time I was a bit more lucky. I found a program that decodes the code. Again. But this time was different. I shut down my network connection to make sure. And it worked! So now a bit of reverse engineering and it’s done. I will skip any details because I do not want to piss off the company which created this, even though I was right and I HAD right to do this.

As usual the source code is available on my Github profile. There is also a bit more information about whole scanning/decoding process. If you like to know more technical details about the algorithm or how to decode the data, everything can be found in README file in the repo.

Posted in UncategorizedTagged Aztec, English, Reverse Engineering13 Comments

[Import]SDC file format description and security analysis of SDM

Posted on September 4, 2017 - September 7, 2017 by Kamil (aka. v3l0c1r4pt0r)

NOTE: This post was imported from my previous blog – v3l0c1r4pt0r.tk. It was originally published on 22nd June 2014.

As promised in my previous post I’m publishing description of Microsoft’s SDC file format. At the beginning I’d like to explain what SDC file is. SDC is the abbreviation of Secure Download Cabinet/Secure Digital Container. It is used by Microsoft in its Dreamspark program (formerly MSDNAA). Theoretically it is secure container that can be sent using Internet without additional encryption and it should prevent its content from being read by any third party. But that’s theory, let’s look at how it works in practice.

Overview

Firstly let’s look at the packing process. Let’s say we are in Microsoft and we want to “secure” some data. We got some file (or possibly few files) ie. Windows ISO. Next we generate some random number and write it down somewhere. Now we use least significant byte of that number to do XOR on EVERY single byte of that file. Now it may be considered secure 🙂 But some day Microsoft realized it isn’t enough. So what did they do? They used deflate (it is compression method used ie. in zip, gzip). Actually there are two versions of the deflate: one with all headers necessary to realize method of compression by using a tool like binwalk and the other that haven’t any header. Now it is time to combine all the files we have in one. Of course we still need to know some information about them (ie. their size before/after compression, file name). After concatenation we need to count CRC of all the data we have as of now. And finally we need to build a file header. At first we need to write header size. Then starts actual header. It is important because here starts region that will be encrypted. Here is some info about the header itself and then about each file. It is possibly padded with random data (don’t know for sure). Now we need two random 32-byte keys consisting of printable characters. We use first to encrypt filenames and the second to encrypt whole header (beside its size). Finally we concatenate header with the rest and here we have SDC file.

Header format

Sample header

So, we have basic overview on the format, now let’s look at the details. You think it isn’t secure, huh? It would be worse. On the right you can see example header after decryption. First four bytes determine size of the header counting from the next byte. After that we have area encrypted using Blowfish (sometimes referred blowfish-compat) with ECB mode (Electronic CodeBook) using the key stored in edv variable of webpage linked from SDX file. In that area we have 3 dwords describing the header itself. First is header signature. It can be one of the following values: 0xb4, 0xb5, 0xc4, 0xd1. All I know now is that the one with sig = 0xd1 can store files larger than 4 GiB. The next value is interesting one. It looks like it is used to “encrypt” file name in memory so that the static analysis would result in “not found”. As in other cases it is “very advanced encryption”, the same situation as with the whole file: get all the buffer, iterate through it and XOR with the value’s LSB. I have to admit that this one is even does the job. Now we have something called header size. Actually it is probably number of files packed in the container. While reversing I concluded that SDM iterates from 0 to that number, and while this it is reading 0x38 bytes from file. Next it is probably reading fileNameLength and fileName, so whole header must be in format:

<size><description><0x38-bytes-of-file-description><fileNameLength><fileName><0x38...>

and so on until we reach headerSize. Then we have a lot of values not necessary to unpack the file. First of them is offset of file name. While its value is usually 0 (at least in newer headers with blowfish encryption) it is still probably possible to encounter a file with this value greater than zero. If that happened the first thing to do is probably decrypt filename and then move pointer this amount of bytes right. Next value describes file attributes. In fact I didn’t bother about what bit means what attribute, but I suppose it is the same map as in FAT (see my libfatdino library). The next three values are timestamps (creation, access and modification). They all are in Windows 64-bit format called “file time” used for instance by .NET Framework’s DateTime class (DateTime.FromFileTime method; they are number of 100-nanosecond ticks that elapsed since epoch at 1st January 1601 midnight and I suppose that this value is unsigned). That format is very interesting in comparison with another approach of saving date on 64-bit value used on Linux. UNIX timestamp traditionally uses 1st January of 1970 as its epoch and there is usually signed value in use. It isn’t as precise as Windows (counts only seconds) but its end is about 300 billion (10^9) years in future and since it is signed, in past too. Comparing to that Windows’ date will wrap about year 60000 A.C. and cannot store any date before 1601. I know that is still unreachable (like 4 billion computers in 80’s 🙂 but good to know:) After that we have size of the compressed file (be beware of the difference between 64-bit variant and 32-bit one). When we have container with only one file the equation

compressedSize + headerSize + 4 == sdcSize

should always be true. The next one is uncompressed size of the file which can be used to check if the file has been downloaded entirely. After that there is boolean that indicates if file is inflated (compressed), another one-byte value that is probably reserved for future use, one-word padding, which is also interesting because it looks like it contains random numbers (really?). And after that more padding (this time empty) after which we have size of the file name. It may be a bit tricky because the size we have here is the size AFTER decryption and blowfish demands its output to have length divisible by 8. So to decrypt it we need to count next divisor of 8. File name is encrypted using the same method as the header itself and the second key from edv.

Decryption key

Now something more about the keystring (edv). Its format is:

<crc>^^<fileNameKey><headerKey><xorKey>

where:

  • <crc> is a checksum of whole data area of a file (everything beside header size and header)
  • <fileNameKey> is the key used to encrypt file names
  • <headerKey> is the key used to encrypt whole header
  • <xorKey> is the key used to “encrypt” the files

Security of the whole program

People who are familiar with security should already know how insecure is the SDM. For others I have short description.

  1. At first the files itself AREN’T ENCRYPTED in any way. They are only XORed using one byte long key. XOR itself is very weak protection, even with extremely long key. It is due to the fact that many file formats have some of their bytes predictable (this concerns EXEs, ISOs and ZIPs and these are the formats most frequent on Dreamspark). That predictable bytes are usually the beginnings (headers) which usually have so called magic bytes to easily identify file format. So when we know what byte we expect we could try to XOR that byte with actual byte and it is very probable that we get the “encryption” key.
  2. Deflate which is used to hide this patterns from the end user is just compression method. We don’t need anything special to decompress this data.
  3. ECB which is used as blowfish encryption mode is the most insecure mode of block ciphers. It can cause some parts of data to be revealed without actual decryption (see: Wikipedia).
  4. All the data SDM downloads/sends from/to Microsoft’s servers are UNENCRYPTED. Everything: request from the user, SDC itself and decryption keys are all plaintext so with knowledge how SDC looks we can decrypt the file even when it is not intended for us, but we are only in the middle of its road. Furthermore malicious node is able to modify the file on the fly and i.e. put a backdoor into the file, for instance Windows image.

Conclusion

For all the above reasons Secure Download Manager cannot be called a software for securely downloading the files from Microsoft’s servers. All the users using this are the same way INSECURE as users downloading i.e. their copy of Windows from warez sites. Both are susceptible to MITM attacks.

So we still don’t know the answer: why Microsoft is using dedicated software to share their software. The only answer I have is that it is just for making user’s not using Microsoft’s operating system life difficult. In place of decision-making people like the ones in European Commission I would think if this policy is not intended to be only to keep Microsoft’s monopoly for operating system.

Update 20.07.2014

Description updated thanks to GMMan and his great work on reverse engineering the whole program. He also reminded me about older variants of SDC files. I have currently sample(s) of files with 0xb3, 0xb5 and 0xd1 signatures. I know at the moment that there are also signatures 0xa9, 0xb2, 0xb4, 0xc4 and it is possible that they still are reachable through Dreamspark. It is also likely that Microsoft (or Kivuto on Microsoft’s order) will create new format so if you have a sample of file with different header, please let me know in comments!

Posted in UncategorizedTagged cryptography, Dreamspark, English, Reverse Engineering, SDC, SDM1 Comment

[Import]How to bypass Secure Download Manager while downloading from Dreamspark

Posted on September 4, 2017 - December 11, 2017 by Kamil (aka. v3l0c1r4pt0r)

NOTE: This post was imported from my previous blog – v3l0c1r4pt0r.tk. It was originally published on 1st June 2014.

NOTE2: As people are reporting, THIS METHOD DOES NOT WORK anymore. Also I don’t have access to Imagine, so I would not be able to provide any help. Therefore, this article is left only for historic purposes, or for those that have some SDC files downloaded, when it was still valid and have valid decryption keys.

About a month or so ago I had an urgent need to download a copy of Microsoft Windows from Dreamspark. Unfortunately I haven’t Windows installed then so had to do this using Linux. After successful transaction I was given a link to SDX file and program called SDM. It looked that it would be easy. But it wasn’t. Program that I was encouraged to download was archive with .pkg extension. As I discovered few minutes later it was OS X application package. So the next step was to try to download Windows version and try to execute it with help of Wine. It failed. Then I tried to find some tips on the Net. I found a few other people having the same problem. Some of them could download using Wine and some not. For me there was only one solution: do it myself. As you probably guessed that way was a (almost) full success.

Solution

If you already have SDC file please do not skip since you probably still don’t have a key needed to unpack the file. The first step is to open SDX file in your favorite text editor. You will see a link. Open it in a web browser. Now you need to get to page source. The way it can be done depends on your web browser. Now we need to find few strange values in the code. The easiest way to achieve this is to search for keyword ‘edv*’ where * is the number of file you want to download counting from 1 (they are on ‘Items’ list on page you opened). Now you need to copy somewhere values of the following variables: ‘oiopu*’, ‘oiop*’, ‘fileID*’ (*-see above). The last one we need is ‘dlSelect*’ but for that one you need to search cause it is in a different place. Now you can build URL that will let you to file containing two interesting values: file URL and decryption string. This URL’s format is:

http://[SDXdomain]/WebStore/Account/SDMAuthorize.ashx?oiopu=[oiopu]&f=[fileID]&oiop=[oiop]&dl=[dlSelect]

Now you should see XML file that looks similar to this:

<information>
 <oiopua>01234abcd-0123-4567-890a-0123456789ab</oiopua>
 <edv>0123456789^^0123456789QwErTyUiOpAsDfGhJkLzXc0123456789QwErTyUiOpAsDfGhJkLzXc12345678</edv>
 <linkAvailable>1</linkAvailable>
 <errorTextKey/>
 <invokeExternalDownload>0</invokeExternalDownload>
 <fileUrl><![CDATA[http://software.dreamspark.com/dreamspark/ENGLISH/SDCfileName.sdc]]></fileUrl>
</information>

The last step here will be downloading file from fileUrl and saving edv value in file. The important thing is that the file with a key should be named exactly as SDC file with addition of ‘.key’ suffix.

Update:

I’ve just discovered that things are getting a bit different when the file size exceeds 2.0 GB. In that case Dreamspark is splitting file in two or more files. That situation could be easily recognized, because sdc file name’s suffix is: ‘.01.sdc’. In that case you need to try to download file which URL differs by only that one digit, ie. ‘.01.sdc’, ‘.02.sdc’, ‘.03.sdc’. When you encounter last file it should have smaller size than the rest and incrementing that number by one should give you BlobNotFound error.

After downloading all the files they just need to be joined into one. It can be easily achieved with dd, ie.

dd if=pl_windows_7_professional_with_sp1_x64_dvd_u_676944.02.sdc >> pl_windows_7_professional_with_sp1_x64_dvd_u_676944.01.sdc

and then optionally

dd if=pl_windows_7_professional_with_sp1_x64_dvd_u_676944.03.sdc >> pl_windows_7_professional_with_sp1_x64_dvd_u_676944.01.sdc

After that you will get sdc file prepared to unpack.

Unpacking SDC

Now since you have SDC file you can start unpacking it. The previous part was, at least for me, very easy. The problem started when I tried to discover how the file is stored in that container. But don’t worry, I’ve written simple program to do it for you. As of now (1st June) it is still in really early alpha stage and have lot of constraints. It is able to unpack containers that contains only one file packed, doesn’t create any directories, cannot verify file’s checksum and probably few other problems I don’t remember or don’t know about.

If you were searching a bit in the Internet, you probably found out that someone cracked that container in the past. Unfortunately Microsoft changed format since then. It is also possible that in response to this article it will be changed again. To make it a bit harder for them to block my software I’m publishing source code on github and after the process of reverse engineering is finished will write second article describing how things works under the hood and describe sdc file format.

But let’s get back to unpacking. Now you need to download xSDM from github. The newest version can be downloaded by typing

git clone https://github.com/v3l0c1r4pt0r/xSDM.git

in your terminal (of course you need to have git installed). Nevertheless I advice you to download newest tagged release. You can do this by clicking on releases on project page and then choosing the one on the top (or first beta/stable if any) and clicking on “tar.gz”. tar.gz can be unpacked by typing

tar -zxvf xSDM-[tag-name].tar.gz

into console. Then get into xSDM directory by typing

cd xSDM

(or your release directory) and compile the program by standard

./configure
make
make install

where installation is optional. Now to unpack your file you just need to type

src/xsdm [path-to-your-sdc-file]

And that’s it, you should now be able to open file you downloaded. As mentioned above the program is in very early alpha so I cannot guarantee that it will work in any case. If you will encounter any problems feel free to open issue on project page at github.

Posted in TutorialsTagged Dreamspark, English, Linux, Reverse Engineering, SDC, SDM, Windows44 Comments

Posts navigation

Newer posts

Tags

Aero Android assembly C CAN can-hacking cmake Delphi Dreamspark electronics English FAT FAT32 FM Gingerbread GNU Radio GRC hacking hardware JavaCard JCOP kanał 14 kernel library Linux pinout PKI polski programming Python radio Reverse Engineering RTL-SDR SDC SDM SDR smart card software tor tty UART wifi Windows X.509 Xperia Pro

Recent Posts

  • Reading and programming 93Cx6 EEPROM with Digispark
  • Busybox-based Linux distro from scratch
  • Meet CC Factory – a factory for cross compilers
  • Peugeot 407 rain sensor pinout
  • OpenRISC 1000 support integrated into radare2

Recent Comments

  • rocha on Playing with GF-07 GPS device
  • Alex B on UART pinout for noname spy camera
  • Alex B on UART pinout for noname spy camera
  • MI-7 on Playing with GF-07 GPS device
  • Kamil (aka. v3l0c1r4pt0r) on Mounting encrypted Android emulator image

Categories

  • News
  • Random
  • Reversing LKV373A
  • Setting up new v3 Hidden Service with ultimate security
  • Tutorials
  • Uncategorized
  • Understanding JCOP

Links

  • Me @ github
  • LKV373A Wiki
  • DevTomek

Archives

  • December 2020
  • November 2020
  • October 2020
  • August 2020
  • December 2019
  • November 2019
  • October 2019
  • August 2019
  • July 2019
  • February 2019
  • November 2018
  • October 2018
  • June 2018
  • May 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • September 2017

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
Proudly powered by WordPress | Theme: micro, developed by DevriX.