Skip to content

IDA plugin which queries uses language models to speed up reverse-engineering

License

Notifications You must be signed in to change notification settings

JusticeRage/Gepetto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gepetto

Gepetto is a Python plugin which uses various large language models to provide meaning to functions decompiled by IDA Pro (≥ 7.4). It can leverage them to explain what a function does, and to automatically rename its variables. Here is a simple example of what results it can provide in mere seconds:

Setup

Simply drop this script (gepetto.py, as well as the gepetto/ folder) into your IDA plugins folder ($IDAUSR/plugins). By default, on Windows, this should be %AppData%\Hex-Rays\IDA Pro\plugins (you may need to create it).

You will need to add the required packages to IDA's Python installation for the script to work. Find which interpreter IDA is using by checking the following registry key: Computer\HKEY_CURRENT_USER\Software\Hex-Rays\IDA (default on Windows: %LOCALAPPDATA%\Programs\Python\Python39). Finally, with the corresponding interpreter, simply run:

[/path/to/python] -m pip install -r requirements.txt

⚠️ You will also need to edit the configuration file (found as gepetto/config.ini) and add your own API keys. For OpenAI, it can be found on this page. Please note that API queries are usually not free (although not very expensive) and you will need to set up a payment method with the corresponding provider.

Supported models

  • OpenAI
    • gpt-3.5-turbo-0125
    • gpt-4-turbo
    • gpt-4o (recommended for beginners)
  • Groq
    • llama3-70b-8192
  • Together
    • mistralai/Mixtral-8x22B-Instruct-v0.1 (does not support renaming variables)
  • Ollama
    • Any local model exposed through Ollama (will not appear if Ollama is not running)

Adding support for additional models shouldn't be too difficult, provided whatever provider you're considering exposes an API similar to OpenAI's. Look into the gepetto/models folder for inspiration, or open an issue if you can't figure it out.

Usage

Once the plugin is installed properly, you should be able to invoke it from the context menu of IDA's pseudocode window, as shown in the screenshot below:

Switch between models supported by Gepetto from the Edit > Gepetto menu:

Gepetto also provides a CLI interface you can use to ask questions to the LLM directly from IDA. Make sure to select Gepetto in the input bar:

Hotkeys

The following hotkeys are available:

  • Ask the model to explain the function: Ctrl + Alt + H
  • Request better names for the function's variables: Ctrl + Alt + R

Initial testing shows that asking for better names works better if you ask for an explanation of the function first – I assume because the model then uses its own comment to make more accurate suggestions. There is an element of randomness to the AI's replies. If for some reason the initial response you get doesn't suit you, you can always run the command again.

Limitations

  • The plugin requires access to the HexRays decompiler to function.
  • All supported LLMs are general-purpose and may very well get things wrong! Always be critical of results returned!

Translations

You can change Gepetto's language by editing the locale in the configuration. For instance, to use the plugin in French, you would simply add:

[Gepetto]
LANGUAGE = "fr_FR"

The chosen locale must match the folder names in gepetto/locales. If the desired language isn't available, you can contribute to the project by adding it yourself! Create a new folder for the desired locale (ex: gepetto/locales/de_DE/LC_MESSAGES/), and open a new pull request with the updated .po file, which you can create by copying and editing gepetto/locales/gepetto.pot (replace all the lines starting with msgstr with the localized version).

Acknowledgements

  • OpenAI, for making these incredible models, obviously
  • Hex Rays, the makers of IDA for their lightning fast support
  • Kaspersky, for initially funding this project
  • HarfangLab, the current backer making this work possible
  • @vanhauser-thc for contributing ideas of additional models and providers to support via his fork
  • Everyone who contributed translations: @seifreed, @kot-igor, @ruzgarkanar, @orangetw