Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Obtain O0-O3 Assembly Code from ExeBench Dataset? #37

Open
KikyoNanakusa opened this issue Dec 6, 2024 · 7 comments
Open

How to Obtain O0-O3 Assembly Code from ExeBench Dataset? #37

KikyoNanakusa opened this issue Dec 6, 2024 · 7 comments

Comments

@KikyoNanakusa
Copy link

Hello,

I am currently working on reproducing the evaluation described in your paper and have a question regarding the evaluation using ExeBench.

From my understanding, the dataset provided by ExeBench includes assembly code for each function at optimization levels O0, Os, and O3. Could you kindly clarify how you obtained the assembly code for all optimization levels (O0, O1, O2, and O3)? Did you extract and recompile each function individually?

If you have a script or specific instructions for generating the dataset you used, would it be possible for you to share it?

Thank you.

@albertan017
Copy link
Owner

albertan017 commented Dec 6, 2024

For compilable data, you may follow the compilation script for AnghaBench, with small modification on handling the source of function (exebench_data['func_def']) and its dependency (exebench_data['synth_deps']).

For executable data, it's quite complicated.

You may refer to the exebench github, or the discussion in other issue. You can modify the _DefaultAssembler in exebench/init.py.

@KikyoNanakusa
Copy link
Author

KikyoNanakusa commented Dec 6, 2024

So, to reproduce the ExeBench re-excutability experiment, I understand that the steps are as follows:

  1. Create C code with dependencies using func_def and synth_deps from ExeBench.
  2. Generate assembly code for optimization levels O0 to O3 using a compilation script.
  3. Perform predictions using the generated assemblies as input.
  4. Evaluate the re-executability of ExeBench by passing the function obtained from prediction to the Wrapper class instead of func_assembly.

Is my understanding correct?

@albertan017
Copy link
Owner

Yes, in theory, it should be effective.

However, we encounter difficulties in generating the appropriate assembly for execution.

As a result, we adjust the input to the Wrapper and alter the optimization stage in _DefaultAssembler to compile with various optimizations.

@KikyoNanakusa
Copy link
Author

Thanks! I'll give it a try.

@KikyoNanakusa
Copy link
Author

Hi
I wrote a script to try it out, but I cannot reproduce the experimental results.
The experiment was performed with LLM4Binary/llm4decompile-6.7b-v1.5.
The following is an excerpt of the script and the results of the experiment.

Optimization O0: Run Rate: 0.1417
Optimization O1: Run Rate: 0.1288
Optimization O2: Run Rate: 0.1280
Optimization O3: Run Rate: 0.1277
  1. Create C code
c_source_code = (
    row["synth_deps"]
    + "\n"
    + row["synth_io_pairs"]["dummy_funcs"][0]
    + "\n"
    + row["func_def"]
)
  1. Compile the code and get assembly code of target function
subprocess.run(
    ["gcc", "-c", "-o", obj_output, input_file_name, "-" + opt_state],
    check=True,
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Generate assembly code from object file using objdump
subprocess.run(
    f"objdump -d --disassemble={function_name} {obj_output} > {asm_output}",
    shell=True,
    check=True,
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)
  1. Perform Prediction by assembly code of the target function.
gen_results = llm.generate(inputs, sampling_params)
gen_results = [[output.outputs[0].text] for output in gen_results]
if args.debug:
    print(f"gen_results: \n{gen_results}")
gen_results_repeat.append(gen_results)
  1. Evaluate the output.
synth_wrapper = Wrapper(
    c_deps=dataset_row["synth_deps"]
    + "\n"
    + dataset_row["synth_io_pairs"]["dummy_funcs"][0]
    + "\n"
    + decompiled_c_func,
    func_c_signature=dataset_row["func_head_types"].replace("extern", ""),
    func_assembly=None,
    cpp_wrapper=dataset_row["synth_exe_wrapper"],
)

# Check if the decompiled function can be compiled and run correctly
test_output = synth_wrapper(
    exebench_dict_to_dict(dataset_row["synth_io_pairs"]["input"][0])
)

if diff_io(
    test_output,
    exebench_dict_to_dict(dataset_row["synth_io_pairs"]["output"][0]),
):
    flag_run = 1

Can you tell me a little more about the changes you made to _DefaultAssebler, the inputs to the model, and how you made the assertions?

@albertan017
Copy link
Owner

As highlighted in our paper, we initially eliminate functions that cannot be executed by testing the executability of the original function (i.e., use the dataset_row['func_def'], not the 'decompiled_c_func' in step 4, consider functions that can be executed under O0-O3). We find that approximately 2,500 functions can be compiled and executed within our environment. This number can fluctuate between 2,000 and 3,000 functions depending on the system.

Therefore, we test only those functions that are executable, and your results closely align with our original findings (0.14 under 5000 -> 0.28 under 2500)

@KikyoNanakusa
Copy link
Author

KikyoNanakusa commented Dec 10, 2024

I see. Thanks for detailed explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants