-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix USE_DS
for MIPS ##emu
#23307
base: master
Are you sure you want to change the base?
Fix USE_DS
for MIPS ##emu
#23307
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from what i get here USE_DS makes the expressions more complex to trap when an instruction is executed if the delay slot is going on and the instruction cant happen there. i assume this can only happen on invalid genereated code by compilers or linearly emulating the code. i think the logic was commented out because of that. but ideally this feature should be not a compile-time feature
libr/arch/p/mips/plugin_cs.c
Outdated
// in register $31. | ||
r_strbuf_appendf (&op->esil, ES_TRAP_DS () "" ES_CALL_ND ("%s"), ARG (0)); | ||
#if USE_DS | ||
r_strbuf_replacef (&op->esil, "$$", "0x%"PFMT64x, addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$$ is a keyword that should be removed because we have PC and op->addr already, so there's no need to have an extra keyword for that. also its better to build the string right away instead of using replacef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So would you suggest to change the macro like this to avoid $$
:
#if USE_DS
#define ES_TRAP_DS(addr) "$ds,!,!,?{," addr ",1,TRAP,BREAK,},"
#else
#define ES_TRAP_DS(addr) ""
#endif
With that we can remove the #if USE_DS -> r_strbuf_replacef
stuff and build the string in-place by rewriting the r_strbuf_appendf
call to:
#if USE_DS
r_strbuf_appendf (&op->esil, ES_TRAP_DS ("0x%"PFMT64x) "" ES_CALL_DR ("%s", "%s"), addr, ARG (0), ARG (1));
#else
r_strbuf_appendf (&op->esil, "" ES_CALL_DR ("%s", "%s"), ARG (0), ARG (1));
#endif
where addr
is op->addr
or insn->address
(I guess that's the same). The downside I see is that by unconditionally passing addr
to the invocation we must duplicate the code. Alternatively, one could to a bad hack like:
r_strbuf_appendf (&op->esil, ES_TRAP_DS ("0x%3$"PFMT64x) "" ES_CALL_DR ("%s", "%s"), ARG (0), ARG (1), addr);
But that could, depending on compiler settings, complain about the wrong number of args in the format string.
The problem with using PC is that AFAIK it points to the first byte after instruction, and we would need to subtract its size, which may actually be 4 or 2 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from what i get here USE_DS makes the expressions more complex to trap when an instruction is executed if the delay slot is going on and the instruction cant happen there. i assume this can only happen on invalid genereated code by compilers or linearly emulating the code.
FYI, the manual has the following to say about putting a branch/jump into the branch delay slot: "If a branch or jump instruction is placed in the branch delay slot, the operation of both instructions is undefined." So I guess what happens depends on the impl of the CPU u are running on. To me, it seems like trapping on that is reasonable, but given that behavior is undefined, it would also be OK to throw this part out completely and just let the emulator decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what mainly worries me here is the fact that there are two compile time ways to have this esil expression. so i would prefer to have only one and not depend on compile time options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so completely throw away the USE_DS
define and always emit the version with the trap code and SETD
/SETJT
directives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well. if the code is trapping, the linear emulation code should be ignoring those stop points.. did you had a look at the failing tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mean in theory the change looks good to me, but will need some refinements and checks to see if the behaviour keeps correct for the current usecases. thanks for taking the time!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay - was on vacation. Then I'll now work on a version with only one compile time version of the ESIL. Once that one is ready, I'll certainly need some help since I have no idea where those changes might break something (besides the tests of course) in the larger r2
picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Count on me for that help, don't hesitate to make questions or join discord/telegram/matrix channel for a more fluent chat. Thanks!
libr/arch/p/mips/plugin_cs.c
Outdated
@@ -241,6 +244,7 @@ static const char *arg(csh *handle, cs_insn *insn, char *buf, size_t buf_sz, int | |||
static int analop_esil(RArchSession *as, RAnalOp *op, csh *handle, cs_insn *insn) { | |||
char str[8][32] = {{0}}; | |||
int i; | |||
u_int64_t addr = insn->address; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use ut64 in r2land, its an alias to uint64_t
but just good for consistency sugar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I also hate writing out u_int64_t
^^
ping |
I made the discussed changes (only to the Capstone backend for now). In short:
Before I apply the same changes to the GNU backend I think it would make sense if we check whether they work out nicely with the rest of the code. Will start by looking at the tests I guess. |
Sorry I was afk busy in a conference and didnt had time to do a proper review. Yes, take a look at the failing tests. all the changes you did looks good to me, but im unsure about the behaviour change, so better check in detail to not break any test before merging. |
Description
This PR is mostly a question on how one should handle branch delay slots when emulating ESIL.
For context, I've been working on adding MIPS support to radius2, a symbolic execution engine for ESIL. One problem was handling of branch delay slots, at least until I noticed that everything I needed from r2 was already there. It's just hidden behind the
USE_DS
#define
. With this define active the delay-slot behavior is explicit in the ESIL of an instruction via the (SETD
andSETJT
directives). Given this information the emulation is pretty straight forward.The included patches fix the compilation with
USE_DS=1
and correct a few errors where instructions without a delay slot (compact instructions) were treated incorrectly as having a delay slot. There shouldn't be any change forUSE_DS=0
builds. The patch is incomplete in that "likely" instructions, where the delay slot is only executed if the branch is taken, are still incorrect. Fixing this by encoding the behavior in the ESIL string would be pretty straight-forward.However, if I understand it correctly the
USE_DS
way of embedding delay slot behavior in the ESIL string is legacy (code doesn't even compile + all the errors in instruction semantics).This brings me to my question: What is the intended way for external tools that want to work with ESIL to learn about and handle delay slots? With the information that I currently get via
pdj
after analyzing a function I cannot handle them properly. I am at least missing information about: how many delay slot has this instruction?, are the delay slots always executed or is there some condition (like the branch being taken for "likely" instructions)?.[If there is no "intended" way to get this information, it would be OK for me to rebuild r2 with the define enabled as part of my build process, but then it would be nice to have a way to enable
USE_DS
via the build system.]Cheers