How to Obtain O0-O3 Assembly Code from ExeBench Dataset? #37

KikyoNanakusa · 2024-12-06T06:05:14Z

Hello,

I am currently working on reproducing the evaluation described in your paper and have a question regarding the evaluation using ExeBench.

From my understanding, the dataset provided by ExeBench includes assembly code for each function at optimization levels O0, Os, and O3. Could you kindly clarify how you obtained the assembly code for all optimization levels (O0, O1, O2, and O3)? Did you extract and recompile each function individually?

If you have a script or specific instructions for generating the dataset you used, would it be possible for you to share it?

Thank you.

albertan017 · 2024-12-06T14:51:36Z

For compilable data, you may follow the compilation script for AnghaBench, with small modification on handling the source of function (exebench_data['func_def']) and its dependency (exebench_data['synth_deps']).

For executable data, it's quite complicated.

You may refer to the exebench github, or the discussion in other issue. You can modify the _DefaultAssembler in exebench/init.py.

KikyoNanakusa · 2024-12-06T16:32:24Z

So, to reproduce the ExeBench re-excutability experiment, I understand that the steps are as follows:

Create C code with dependencies using func_def and synth_deps from ExeBench.
Generate assembly code for optimization levels O0 to O3 using a compilation script.
Perform predictions using the generated assemblies as input.
Evaluate the re-executability of ExeBench by passing the function obtained from prediction to the Wrapper class instead of func_assembly.

Is my understanding correct?

albertan017 · 2024-12-06T16:43:38Z

Yes, in theory, it should be effective.

However, we encounter difficulties in generating the appropriate assembly for execution.

As a result, we adjust the input to the Wrapper and alter the optimization stage in _DefaultAssembler to compile with various optimizations.

KikyoNanakusa · 2024-12-06T16:48:03Z

Thanks! I'll give it a try.

KikyoNanakusa · 2024-12-09T01:45:10Z

Hi
I wrote a script to try it out, but I cannot reproduce the experimental results.
The experiment was performed with LLM4Binary/llm4decompile-6.7b-v1.5.
The following is an excerpt of the script and the results of the experiment.

Optimization O0: Run Rate: 0.1417
Optimization O1: Run Rate: 0.1288
Optimization O2: Run Rate: 0.1280
Optimization O3: Run Rate: 0.1277

Create C code

c_source_code = (
    row["synth_deps"]
    + "\n"
    + row["synth_io_pairs"]["dummy_funcs"][0]
    + "\n"
    + row["func_def"]
)

Compile the code and get assembly code of target function

subprocess.run(
    ["gcc", "-c", "-o", obj_output, input_file_name, "-" + opt_state],
    check=True,
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Generate assembly code from object file using objdump
subprocess.run(
    f"objdump -d --disassemble={function_name} {obj_output} > {asm_output}",
    shell=True,
    check=True,
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

Perform Prediction by assembly code of the target function.

gen_results = llm.generate(inputs, sampling_params)
gen_results = [[output.outputs[0].text] for output in gen_results]
if args.debug:
    print(f"gen_results: \n{gen_results}")
gen_results_repeat.append(gen_results)

Evaluate the output.

synth_wrapper = Wrapper(
    c_deps=dataset_row["synth_deps"]
    + "\n"
    + dataset_row["synth_io_pairs"]["dummy_funcs"][0]
    + "\n"
    + decompiled_c_func,
    func_c_signature=dataset_row["func_head_types"].replace("extern", ""),
    func_assembly=None,
    cpp_wrapper=dataset_row["synth_exe_wrapper"],
)

# Check if the decompiled function can be compiled and run correctly
test_output = synth_wrapper(
    exebench_dict_to_dict(dataset_row["synth_io_pairs"]["input"][0])
)

if diff_io(
    test_output,
    exebench_dict_to_dict(dataset_row["synth_io_pairs"]["output"][0]),
):
    flag_run = 1

Can you tell me a little more about the changes you made to _DefaultAssebler, the inputs to the model, and how you made the assertions?

albertan017 · 2024-12-10T03:12:52Z

As highlighted in our paper, we initially eliminate functions that cannot be executed by testing the executability of the original function (i.e., use the dataset_row['func_def'], not the 'decompiled_c_func' in step 4, consider functions that can be executed under O0-O3). We find that approximately 2,500 functions can be compiled and executed within our environment. This number can fluctuate between 2,000 and 3,000 functions depending on the system.

Therefore, we test only those functions that are executable, and your results closely align with our original findings (0.14 under 5000 -> 0.28 under 2500)

KikyoNanakusa · 2024-12-10T04:44:43Z

I see. Thanks for detailed explanation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Obtain O0-O3 Assembly Code from ExeBench Dataset? #37

How to Obtain O0-O3 Assembly Code from ExeBench Dataset? #37

KikyoNanakusa commented Dec 6, 2024

albertan017 commented Dec 6, 2024 •

edited

Loading

KikyoNanakusa commented Dec 6, 2024 •

edited

Loading

albertan017 commented Dec 6, 2024

KikyoNanakusa commented Dec 6, 2024

KikyoNanakusa commented Dec 9, 2024

albertan017 commented Dec 10, 2024

KikyoNanakusa commented Dec 10, 2024 •

edited

Loading

How to Obtain O0-O3 Assembly Code from ExeBench Dataset? #37

How to Obtain O0-O3 Assembly Code from ExeBench Dataset? #37

Comments

KikyoNanakusa commented Dec 6, 2024

albertan017 commented Dec 6, 2024 • edited Loading

KikyoNanakusa commented Dec 6, 2024 • edited Loading

albertan017 commented Dec 6, 2024

KikyoNanakusa commented Dec 6, 2024

KikyoNanakusa commented Dec 9, 2024

albertan017 commented Dec 10, 2024

KikyoNanakusa commented Dec 10, 2024 • edited Loading

albertan017 commented Dec 6, 2024 •

edited

Loading

KikyoNanakusa commented Dec 6, 2024 •

edited

Loading

KikyoNanakusa commented Dec 10, 2024 •

edited

Loading