Introduction
The goal of this post is to provide basics of Web Assessmbly (abbreviated “Wasm”), so the next time you encounter it, you will be able to understand it and test it.
I will not cover all the available instructions as documentation exists for that (see a bit below), but I will try to give you the keys to understand any Wasm code.
Documentation
The documentation for all the instructions in Wasm is available here: https://github.com/sunfishcode/wasm-reference-manual/blob/master/WebAssembly.md
Moreover, a good picture of how to write Wasm code can be found here:
https://learnxinyminutes.com/docs/wasm/
If there is documentation, why this post?
It’s simple:
- Not everybody has basics on how to read assembly code and the tools to handle Wasm are not numerous
- The mentioned documentation (which is the official one) is, in my opinion, not the most comprehensive one.
So the goal is to address these two points.
Disclaimer
I’m not a professional reverse engineer, nor Wasm expert, this post is the sum of the knowledge I acquired while tinkering with Wasm, which is, in my opinion, a very interesting technology.
That being said, you now know that you absolutely won’t become a Wasm master just by reading this. If you are interested after your reading, I encourage you the create yourself your own lab and experiment on your own.
What are the required tools?
In fact, in this post, I will use only one tool that most of us already have on his/her computer: Google Chrome.
I won’t speak about Firefox, because I had some various bugs with it at the time of the writing.
Other tools already exist but won’t be discussed in detailed in this post, such as https://github.com/WebAssembly/wabt which is very useful to find quickly specific things and do some simple checks.
Created toolkit
I created a toolkit to help the memory inspection of a Wasm application:
https://github.com/ExcelliumSA/WASMToolkit
More features are to come, but it will take some time.
Support/example
For this post, I have created a beginner-level challenge: https://github.com/ExcelliumSA/WASM-lab. It is also deployed here: https://excelliumsa.github.io/WASM-lab/.
On this challenge, the Wasm is simple to understand, and the flag is easy to get once you understand the code.
Let’s go!
Where do we start?
On arrival on the challenge’s page, you can see the following interface:
The HTML code behind the interface is the following:
We can see that a “lab.js” is loaded and that the input field is named “password”.
The sources loaded by the browser for this application are the followings:
Great, we can see the “lab.js” file and other files that seem related to Wasm.
Let’s check what’s happening in the “run_ws” JavaScript function regarding the password field seen in the HTML code:
So, what’s happening inside this function? The input field “password” is retrieved and passed to a mysterious “allocateUTF8” function and the result is then passed to a “Module._checkPassword” function. The return value of this function seems to indicate whether the password is the correct one or not. That sounds good to me!
Okay, but we need to know what “allocateUTF8” and “_checkPassword” are doing.
By looking into the “lab.js” file, we can have our first clues:
We could go deeper to fully understand what’s happening, but I’ll spare you the trouble since it’s not the topic of this post. This function gets a JavaScript string and convert it to a Wasm compatible object by allocating some space on the stack of the Wasm program and copying the content of the string to the newly allocated space. The compatible object is a pointer to the start of the allocated space.
By searching the “_checkPassword” string in the “lab.js” file, we can find the following definition:
With the call to the createWasm function above, it seems that the “checkPassword” function is called from the Wasm…. Let’s dive together!
Firsts steps in wasm
First thing first, how do we open the Wasm code in Chrome?
In the Chrome’s developer tools, there is a “Source” tab. In this tab, if you select “Page” on the left menu, you will find a “lab.wasm” file, that’s what we are looking for.
Now, where is the checkPassword function in this file?
It’s easy to find it when performing a Ctrl+F:
Here it is! At the top of the file. Ignore the numerous declarations of local variables at the 0x001f3, this Wasm (as many other) has the bad habit to use new variable instead of reusing the existing one. All the variables are declared at the start of the function.
Now we have to understand what’s happening in the “check_password” function.
One common technique used in reverse engineering to have a quick understanding of a function behavior is the focus on the calls to other functions it performs.
We are lucky, the names of the functions are available, so we won’t have to read and understand each function to understand the global picture, at least, if the names are explicit (it’s the case here, don’t worry).
The called functions in “checkPassword” are the followings:
The only Wasm instruction to understand so far is “call”, which calls a function. Yeah, this one is simple.
So let’s sum up what we know from the info gathered so far: we have a “checkPassword” function which takes the content of the “password” input field as an argument, generate a SHA256 (at least we can deduce by the three “SHA256_*” calls) from something, and finally make a memory comparison.
We can guess without too much risk that the call to “memcmp” compare our input to the intended password.
We now have to understand what is passed to the different functions. But to do that, we will need to understand some basic instructions.
The basics to read Wasm
If you read the code of the “checkPassword” function, you will see that each instruction is composed of several components.
In this function, most instructions can be summarized as “(global|local|i32).(get|set|const|store|add) ($var|int|offset=int)”.
I know some of the mentioned keywords are not in the code but you will probably encounter them if you read Wasm so we might detail them as well. There are many other possibilities for each component but we will focus on these (see the docs for information about other instructions).
The first component indicates either a scope or a size of instruction:
-
- global and local are scopes:
- global means that we are looking for resources available for the whole Wasm code
- local means that we are looking for resources internal to the current function
- global and local are scopes:
-
- i32 is what could be called a size of instruction, the component after the dot will be performed on 32 bits (i32 stands for 32-bit integer)
The second component is the instruction itself:
-
- get: stores the content of the last component on the stack. (Similar to a push instruction in x86 asm)
- set: stores the last element of the stack in the last component of the instruction. Also pop the element from the stack. (Similar to a pop instruction in x86 asm)
- const: stores the last component on the stack
- store: stores a value in at a desired memory address
- add: add the first two elements on the stack and push the result on the stack
If you are lost because you don’t understand the difference between “get” and “const” or because you don’t know what value is stored by “store” and to which address, fear not, I will address these points after explaining the last component.
The last component is the argument of the instruction:
-
- $var: a variable
- int: an integer
- offset=int: an offset to a memory address.
“So, what’s the difference between ‘get’ and ‘const’?”
The usage. The “const” instruction is used to put a specific value (a specific number for example) on the stack, and the “get” instruction is used to put a value which is not known in advance (The content of the variable) on the stack.
“Okay, and what value is stored by the ‘store’ instruction and where is it stored?”
The “store” instruction behaves in the same way as a function call, it uses the element on the stack. It needs two elements on the stack, the first one is the memory address at which you want to write, the second is the value.
You should see a bit clearer how to read a Wasm function now, but I want to address something. When I mention memory writing and the stack, I’m not referring to a direct RAM access like in C or x86 assembly, in Wasm, the memory is just a JavaScript ArrayBuffer.
To make sure everybody is on the same page about the reading of Wasm, I commented a part of the “checkPassword” function:
local. Set $var13 # Set the value of the var13 variable to 112
local. Get $var4 # Push the value of the var4 variable on the stack
local. Get $var13 # Push the value of the var13 variable on the stack
i32.add # Addition var4 and var13
local. Set $var14 # Put the result of the addition in var14
local. Get $var14 # Push the result of the addition on the stack (yes, it is useless)
local. Set $var15 # Put the result of the addition in var15
i32.const 1280 # Push 1280 on the stack
local. Set $var16 # Set the value of the var16 variable to 1280
i32.const 32 # Push 32 on the stack
local. Set $var17 # Set the value of the var17 variable to 32
local. Get $var15 # Push the value of the var15 variable on the stack
local. Get $var16 # Push the value of the var16 variable on the stack
local. Get $var17 # Push the value of the var17 variable on the stack
call $memcmp # Call the memcmp function with three arguments: $var4 + 112, 1280, 32
As the code is probably generated from another language, some optimizations can be done (and you might have seen it); the var14 is useless as the result is copied in var15 which is the only one used.
If you are not lost at this point, congratulations, you know basically how to read Wasm!
If you are lost, no worries, take a break (walk, coffee or other) and come back to it later. If there are still some parts that are not clear after 2 or 3 readings, hit me up on LinkedIn, it’s probably that I have done a bad job of explaining the basics.
So what is the function doing?
To fully understanding what’s happening when the application is running, it is a good idea to run it. So let’s open the debugger!
In the sources, click on the address of an instruction to put a breakpoint. I have put one at the call to “sha256_update” to see the arguments, one should be the buffer with our input.
Please note that the address shown in your Chrome is likely not the same as in the picture below because the lab has been updated since the writing and thus the address of the call instruction has changed.
Note: When calling 3 stages hashing algorithm (init, update, final), the string to hash is usually passed to the update function.
A good exercise would be to read the few instructions above to see what is executed before the function call. We will do just that.
A little advice, when you are reading assembly in general, is to start from the function call and go up, it makes the understanding easier, at least for me.
So, the “sha256_update” function is called with 3 arguments: var8, var6 and var7. Once again, var8 is useless as we could just use var4 but it’s not the topic… var6 contains the content of the memory at the address stored in var4 offset by 156. var7 also contains the content of the memory at the address stored in var4, but offset by 152 this time.
At our break point, the stack should contain the value of var6, var7 and var8. One of them must be our buffer’s memory address.
To see if it’s the case, enter a value in the “password” input field and hit the “Check password” button. For this example, my input is “test”.
Your “Sources” window should look like this:
We can see our variables and the stack. To see the value they contain, just click on it, it will expand it.
We can see that the values we are interested in are:
- var6: 5244728
- var7: 4
- var8: 5244560
Are they matching the stack?
Yes, it is!
That’s great… but which one is our buffer? Both var6 and var8 can be memory addresses. To see it, you can convert them to hexadecimal, but it might not help telling if a variable is a memory address comes with time and experience, and in this case, I created the lab, it helps too.
To determine which one is our buffer, let’s inspect the memory of the program! Don’t panic is very simple.
We will use the tools provided by Chrome, which conveniently provides a Memory Inspector.
To open it you can right click on the “$memory” element which can be found under “Module > memories” in the tab used to check the values of the variables:
This will open the memory inspector, usually at the bottom of the window:
Let’s enter the memory address we want to check in the search field, at the top of the inspector. In our case the value of var6:
When you hit enter, the inspector will print the content of the memory at the wanted address:
We can see that the memory contains the string I have put in the “password” field. Please note that the memory address was automatically translated from decimal to hexadecimal by the tool when we hit enter.
The address entered in the search field is highlighted by the orange square in the inspector. In our case, var6 points to the address of the first character of our input.
You can check what var8 is pointing to on your own, but it will not be useful in this article.
Since we have opened the debugger, I’ll explain some of its features:
The features are (from left to right):
- Continue: continue the code execution until the end or the next breakpoint.
- Step over: execute one instruction. If the instruction is a function, it will not enter the function.
- Step in: execute one instruction. If the instruction is a function, it will enter the function.
- Step out: continue the execution of the current function and break at it exits.
- Step: to be honest, I don’t understand the difference between this feature and the step-in feature… But you can still try to read this post : debugging – What’s the difference of Step and Step Into in Google Chrome developer tools? – Stack Overflow
- Disable breakpoints: disable all the breakpoints, the code run as normal.
If the “sha256_update” function produced an output, and it’s not the case here, it would have been pushed onto the stack. A “local.set $varX” would be commonly performed to retrieve the value. To read it, you would just step over the function call and read the stack or the target variable of the “local. Set”.
And the other functions?
This leaves us with the other “SHA256_*” calls and the call to “memcmp” to understand. The principle is the same for those.
To gain some time, I will only focus on the “memcmp” function call. It is enough to understand what’s happening in this function.
Let’s put a breakpoint and see the arguments of the call (You can hit “continue” to let the code run until the next breakpoint):
So, the function takes three arguments, the first seems to be a buffer and the third looks like a length. We can confirm all our hypothesis by looking at the memcmp’s man page:
So… as we could expect, the second parameter, with the value 1280 (0x500 in hexadecimal), is a memory address too! As it is not in the same part of the memory as the rest of the other address so far, it must be the flag, or at least its sha256. Let’s inspect the memory at this address:
Hello there! We have never met before!
If you wonder why part of the picture is blacked out, it’s because I really encourage you to practice the lab yourself.
Win?
Actually…. Yes! As long as you succeeded to dump the hash. To retrieve its original value, you can use crackstation.net, I have voluntarily used a hash that is present in their database.
Just to avoid headaches, the hash is the hex value highlighted in red above, so the first 8 characters of the hash are “426A1C28”. I made the lab that way to avoid having people dumping every 64 characters long strings trying to get the hash without reading this post.
Dumping? Are you kidding? I won’t copy 32 bytes by hand!
You are absolutely right! But I kept an ace up my sleeve for this! Did you get that the lab includes a “memoryWrapper.js”? It’s a JavaScript code that I wrote to help with memory handling of Wasm code. It’s available here: https://github.com/ExcelliumSA/WASMToolkit
Just use the appropriate function in the console of your browser to get the bytes you are interested in. Be careful, the tool is giving you what you want in base 10, you want it in base 16 (hex).
The title of the post mentioned “messing with” Wasm, but I have seen nothing like this so far… Are you some kind of clickbait YouTuber?
Actually no, I’m not. So, I will show you another way to print the success message by exploiting the fact that… well, all this stuff is running on MY computer in MY browser!
The code is using stack and memory addresses like if it runs directly on the CPU, which it does not. That implies that nothing restrains me from reading and writing this “memory” buffer.
The “memory” used by Wasm is a simple Javascript ArrayBuffer. On Chrome, this ArrayBuffer can be found under “$memory.buffer”. This variable is only available if you are already stopped on a breakpoint inside the Wasm code. If you want to access the memory without the Wasm stopped on a breakpoint, you can use “wasmMemory.buffer”. For Firefox, the name for the variable might be different.
The thing with ArrayBuffers is that you can’t read nor write it as is, you must use a different object to do that. If you look at the documentation for the ArrayBuffer object, you can find a certain DataView object, which can be used to interact with ArrayBuffers.
Once again, I suggest you look at my “memoryWrapper.js”, its goal is to help managing all this mess.
To test this, I have put a breakpoint on the “memcmp” call:
So, we are in the same state as previously.
I will use the console and my memory wrapper to interact with the memory:
Let’s check that it’s working:
The values, printed here as hexadecimal, are the same, it seems to be working pretty well!
Since the “memcmp” determine if the input is the correct one, we can edit one of the buffers to be the same as the other. I will edit the buffer at the “1280” memory address, the one with the value we wanted to get in the previous section, it’s arbitrary.
So, what is happening here is that we read the 32 bytes of the hash of our input (check the stack if you forgot what this memory address is) and write it into the location where the hash of the key is stored. We 32 bytes because it’s the length of the hashes.
But did it work?
Yes! Seems perfect to me!
Once done, you can continue the code execution and the “Success” message should appear.
Now you know how to mess with the Wasm memory!
And that’s all ?
Disclaimer: This part is going to be harder. I mean editing bytecode harder. Nothing too fancy but some of you, readers (At least I hope someone has read this article up to here) might consider a pause in the reading before reading this. I will, the writer, take a tea before writing it.
No! Of course that’s not all!
We know how to read the code; nothing forbids us to edit/patch it.
For Chrome (idk for Firefox), the “Sources” panel of the “developer tools” has no feature to edit a source. Or if it does, I missed the feature…
I did not find an easy way to edit Wasm code, sorry. I tried to use the WABT’s wat2wasm and wasm2wat without any success, Chrome would not load the produced file because of the order of the sections. I also tried to use Ghidra, and the ghidra-wasm-plugin to patch the file and export a new Wasm binary, but the result was not exploitable. Ghidra and the mentioned plugin can still be useful in the technique I ended up using though.
So… how ugly is the solution? In one word: hexeditor. Yes, we have to edit the Wasm binary directly byte per byte. And for this, we will use Burp Suite, don’t worry though, the community edition is enough for what we want.
When loading the lab with a browser plugged into Burp, the following requests are performed:
By reloading the lab and intercepting the request for the “lab.wasm” file, we can ask Burp to intercept the response to the request:
Once the “Response to this request” has been clicked, you can forward the request, and you will see the response in the intercept tab:
Yeah… Not really what you expected, right? Remember, we are talking about binary file so we can’t read it as is. It’s easy to forget because the browser does the work of disassembling the binary file for us, but Burp doesn’t.
You can click on the “Hex” button to open the Burp’s hexeditor:
Now that we can edit the code, we have to find what to edit! Once again, Chrome can help, the addresses shown in the sources are the offset of the instruction from the start of the file. However, Burp also shows the headers of the response in the hexeditor, so we have to compute the address of the bytes we want to edit. A Wasm file usually starts with a NULL byte (0x00) and the “asm” character sequence (0x61 0x73 0x6d). In our case we can be found it here:
The sequence of bytes is 3 bytes after the 0xc0 address so at the address 0xc3. We will add this to the address prompted in Chrome to get the address of the bytes we want to edit in Burp.
For our exploit, I will edit the following part of the code with a NOP sled:
In other terms, my goal is editing the code in a way so that the “local.set $var18” won’t be executed and will be replaced with instructions that do… well, nothing.
Since the default value for a variable is 0 and that the return value of “memcmp” when 2 memory buffers have the same content is 0, removing the “set” instruction for “var18” will make the function always return true. Note that this will pollute the stack as the value will never be popped out of it.
If you look at the addresses of the instructions in the above picture, you can see that the instruction after the one we want to edit is located to bytes after our target (target: 0x2af; next: 0x2b1). This means that we will have to edit two bytes with our hexeditor. These bytes are located at 0x372 and 0x373 (remember to 0xc3 that we have to add because of the response’s headers):
The Wasm opcode for the NOP instruction is 0x01 (see: https://pengowray.github.io/wasm-ops/). So, we just have to replace the two bytes highlighted with “01”, easy!
To edit a byte in the Burp’s hexeditor, double click on it.
Once edited, the “code” looks like the following:
The only step left is to forward the response.
If you want to do a more complex edition of the Wasm file, I suggest you open the file inside Ghidra with the mentioned extension to write the code you want and grab the associated hex code once you are done to paste it in Burp’s hexeditor at the correct address. Be careful: the address shown in Ghidra is not the correct one.
What is the result of our edition? Let’s take a look at the sources in Chrome:
Hooray! It worked!
Now, no matter the input, the program will always print the success message. Here is an example with “a” as an input:
Conclusion
By this point, I hope that you are not afraid of Wasm anymore and that you will be able to engage every challenge in this technology. Maybe you will be able to reuse some of what you learn to perform “standard” reverse engineering.
I hope you enjoyed this reading!
Many thanks to Dominique Righetto and Elliot Rasch for their inputs for this article.