Hexagon decompiler for Ghidra

Overview

Ghidra hexagon plugin

WIP Hexagon decompiler plugin for ghidra

demo

Pcode is more or less autogenerated, essentially copying and adapting from binja-hexagon

Checkout the wiki for more information!

Known issues

Exception while decompiling XXX: Decompiler process died

More often than not this is caused by pcode being unimplemented for some instruction. To view pcode for an instruction, go into the listing view, click on the "Edit the listing fields" icon in the top right, right click on PCode, and click on "Enable field"

e6 40 41 8c  {  S2_vsplatrb   R6 R1
                                     UNIMPLEMENTED

You can work around this temporarily by creating a userop

diff --git a/Ghidra/Processors/Hexagon/data/languages/hexagon.slaspec b/Ghidra/Processors/Hexagon/data/languages/hexagon.slaspec
index 57d1d31bf..e89ed1dac 100644
--- a/Ghidra/Processors/Hexagon/data/languages/hexagon.slaspec
+++ b/Ghidra/Processors/Hexagon/data/languages/hexagon.slaspec
@@ -144,6 +144,7 @@ define pcodeop fICDATAW;
 define pcodeop fPAUSE;
 define pcodeop WRITE_SGP0;
 define pcodeop fSTORE_LOCKED;
+define pcodeop S2_vsplatrb;

 define token NORMAL(32)
   Parse                                = (14, 15)
@@ -34376,7 +34377,9 @@ C4_addipc_pkt_start: reloc is epsilon [ reloc = pkt_start; ] {

 :S2_vsplatrh S2_vsplatrh_Rdd32 S2_vsplatrh_Rs32 is phase = 1 & immext = 0xffffffff & Parse != 0b00 & subinsn = 0 & b6 = 1 & b7 = 0 & b22 = 1 & b23 = 0 & b24 = 0 & b25 = 0 & b26 = 1 & b27 = 0 & b28 = 0 & b29 = 0 & b30 = 0 & b31 = 1 & S2_vsplatrh_Rdd32 & S2_vsplatrh_Rs32 unimpl

-:S2_vsplatrb S2_vsplatrb_Rd32 S2_vsplatrb_Rs32 is phase = 1 & immext = 0xffffffff & Parse != 0b00 & subinsn = 0 & b5 = 1 & b6 = 1 & b7 = 1 & b21 = 0 & b22 = 1 & b23 = 0 & b24 = 0 & b25 = 0 & b26 = 1 & b27 = 1 & b28 = 0 & b29 = 0 & b30 = 0 & b31 = 1 & S2_vsplatrb_Rd32 & S2_vsplatrb_Rs32 unimpl
+:S2_vsplatrb S2_vsplatrb_Rd32 S2_vsplatrb_Rs32 is phase = 1 & immext = 0xffffffff & Parse != 0b00 & subinsn = 0 & b5 = 1 & b6 = 1 & b7 = 1 & b21 = 0 & b22 = 1 & b23 = 0 & b24 = 0 & b25 = 0 & b26 = 1 & b27 = 1 & b28 = 0 & b29 = 0 & b30 = 0 & b31 = 1 & S2_vsplatrb_Rd32 & S2_vsplatrb_Rs32 {
+    S2_vsplatrb_Rd32 = S2_vsplatrb(S2_vsplatrb_Rs32);
+}

 :S6_vsplatrbp S6_vsplatrbp_Rdd32 S6_vsplatrbp_Rs32 is phase = 1 & immext = 0xffffffff & Parse != 0b00 & subinsn = 0 & b6 = 0 & b7 = 1 & b22 = 1 & b23 = 0 & b24 = 0 & b25 = 0 & b26 = 1 & b27 = 0 & b28 = 0 & b29 = 0 & b30 = 0 & b31 = 1 & S6_vsplatrbp_Rdd32 & S6_vsplatrbp_Rs32 unimpl

TODO

  • Reject invalid packets according to ordering and grouping constraints
  • Add new "semantic" field to listing to view disassembly in more natural way
  • Cleanup autogenerated hexagon.slaspec
  • Release autogeneration script for slaspec
  • Implement more instructions in pcode
  • Setting gp global register based on ELF information
  • Variadic arguments placed on stack
Comments
  • Auto-and Predicates

    Auto-and Predicates

    See section 6.1.3 "Auto-AND predicates" in the "Qualcomm Hexagon V66 Programmer’s Reference Manual"

    If multiple compare instructions in a packet write to the same predicate register, the result is the logical AND of the individual compare results.

    This is not handled in HexagonPcodeEmitPacked, and will silently result in only the last assignment to P3:0 being used in a subsequent compare

    bug hexagon is weird 
    opened by toshipiazza 3
  • Reorders new compare jumps and instructions containing dot-new predic…

    Reorders new compare jumps and instructions containing dot-new predic…

    …ates

    Consider the following (valid) hexagon code:

      { if (!P0.new) r0 = #41
        P0 = cmp.eq(R18,#0x0); if (P0.new) jump:nt 1f  }
      { r0 = #0
        jumpr r31 }
    1:
      { r0 = #1
        jumpr r31 }
    

    P0.new precedes the store to P0. We handle this case by deferring dot-new predicates and new-compare jumps till after the rest of the instructions have been processed

    Massively refactors HexagonPcodeEmitPacket to accommodate

    Adds a test for the above code

    Adds a test including endloop01 (the only instruction which contains more than one branch)

    Fixes #7 Fixes #4

    opened by toshipiazza 1
  • Incorrect disassembly and pcode  for gp loads and stores

    Incorrect disassembly and pcode for gp loads and stores

    See "Encoding 32-bit address operands in load/stores" in section 10.9 of "Qualcomm Hexagon V66 Programmer's Reference Manual"

    For unconditional load/stores, the GP-relative load/store instruction is used. [...] In this case the 32-bit value encoded must be a plain address, and the value stored in the GP register is ignored.

    binja-hexagon correctly disassembles some code as

    { immext(<blah>)
      R3 = memw(0+<extended>) }
    

    But ghidra-plugin-hexagon shows a GP-relative address instead

    bug hexagon is weird 
    opened by toshipiazza 1
  • dot-new predicates with weird ordering results in incorrect in_P0 refs in decompilation

    dot-new predicates with weird ordering results in incorrect in_P0 refs in decompilation

    Here's a weird code sample that's valid hexagon code

      { if (!P0.new) r0 = #41
        P0 = cmp.eq(R18,#0x0); if (P0.new) jump:nt 1f  }
      { r0 = #0
        jumpr r31 }
    1:
      { r0 = #1
        jumpr r31 }
    

    Even though P0.new is referenced before P0 is set, hexagon still store-forwards the write value in this case.

    This will require a major refactor to HexagonPcodeEmitPacked to reorder some instructions (and split up newcmp jumps) just like in qemu

    This also needs to be compatible with auto-and predicates #4, yeesh

    bug hexagon is weird 
    opened by toshipiazza 0
  • Fixes disassembly and pcode for instructions which read gp

    Fixes disassembly and pcode for instructions which read gp

    For instructions which read C11 aka the gp register, such as L2_loadrubgp, gp should only be consulted if an immext was not applied.

    For example, immext is applied below so the memref is not gp-rel:

    { immext(##0x123440)
      R0 = memw(#0+##0x123450)
      jumpr R31 }
    

    But this is gp-rel:

    { R0 = memw(GP+##0x10)
      jumpr R31 }
    

    Fixes this issue by adding a "gp" sleigh constructor that's conditional on the immext context reg, and adds C11 or 0 as an operand based on the above

    Fixes #5

    opened by toshipiazza 0
  • Enlightens BasicBlockModel and SimpleBlockcModel to ParallelInstructi…

    Enlightens BasicBlockModel and SimpleBlockcModel to ParallelInstructi…

    …onLanguageHelper

    Previously BasicBlockModel and SimpleBlockModel might terminate a block in the middle of an instruction group

    Now BasicBlockModel and SimpleBlockModel respect instruction group boundaries

    Adds getFlowType to ParallelInstructionLanguageHelper and implements it in HexagonParallelInstructionLanguageHelper

    Reverts formatting on ParallelInstructionLanguageHelper for a cleaner diff off of master

    Adds tests, HexagonPacketTestCodeBlock

    As of this commit, the function graph also respects instruction group boundaries, so there are no longer awkward flows in between packets. For example:

    {  SL2_jumpr31
       SA1_seti  R0 0x2 }
    

    Previously appeared as

    SL2_jumpr31 -> SA1_seti R0 0x2
    

    In the function graph view. These are now displayed as one basic block.

    opened by toshipiazza 0
  • Misbehaving analyzer clears instructions after calls to non-returning functions

    Misbehaving analyzer clears instructions after calls to non-returning functions

    See below

    ca 4b 00 5a  {  J2_call                             assert_fail
        -- Flow Override: CALL_RETURN (CALL_TERMINATOR)
    

    All instructions after the call are undefined. Also notice that the J2_call is not the end of its own packet. This occasionally breaks decompilation as well

    bug 
    opened by toshipiazza 1
  • Enumerate instructions in execution order when emitting pcode

    Enumerate instructions in execution order when emitting pcode

    Previously we would emit pcode for instructions in the order they appear in the listing (order of increasing address). This assumption is incorrect for DUPLEX instructions.

    DUPLEX instructions appear in the listing in swapped order: the slot 0 instruction appears earlier in memory, followed by the slot 1 instruction. But execution order follows the opposite ordering: order of decreasing slots (so slot 3, 2, 1, 0)

    As a result, we would emit incorrect pcode for the following assembly:

    { R3 = memw(R2+#0x0); memw(R2+#0x0) = #0x0 }

    As written, the load comes before the store, but since they are DUPLEX the store would appear before the load, causing the load to be const-propped. This commit fixes the issue.

    Fixes #10

    opened by toshipiazza 1
  • Pcode incorrect for DUPLEX instructions with two memops

    Pcode incorrect for DUPLEX instructions with two memops

    The plugin emits incorrect pcode for the following snippet:

    { R3 = memw(R2+#0x0); memw(R2+#0x0) = #0x0 }
    

    The store should occur after the load, as written. The problem causing the incorrect pcode is that these instructions form a DUPLEX; and in a DUPLEX the slot 0 instruction comes first in program memory, but the slot 1 instruction must always be executed after.

    A standard InstructionIterator is insufficient

    AddressSet addrSet = new AddressSet(minAddr, maxAddr);
    program.getListing().getInstructions(addrSet, true)
    

    Because DUPLEXes will be enumerated in the incorrect (execution) order

    bug 
    opened by toshipiazza 0
  • Improve endloops

    Improve endloops

    The current implementation of endloops has some issues. First, endloop01 has a subtle decompilation bug that doesn't seem to be an issue with pcode generation (see testHwLoop01). Also, the jumps themselves are indirect (goto [C0];) so the graph view doesn't display the loop edge

    Consider replacing C0 and C2 with context registers, to replace the indirect branches with direct branches. This requires updating a few opcodes:

    • J2_loop0r
    • J2_loop1r
    • J2_loop0i
    • J2_loop1i
    • J2_ploop1sr
    • J2_ploop1si
    • J2_ploop2sr
    • J2_ploop2si
    • J2_ploop3sr
    • J2_ploop3si
    • J2_endloop0
    • J2_endloop1
    • J2_endloop01
    enhancement 
    opened by toshipiazza 0
  • Support functions with variadic arguments

    Support functions with variadic arguments

    According to Qualcomm Hexagon Application Binary Interface

    For [variadic] functions, we pass the named (and typed) parameters in the same manner as for fixed argument list functions. The remainder of the arguments are passed on the stack.]

    I do not know how to achieve the desired behavior in the cspec

    enhancement hexagon is weird 
    opened by toshipiazza 0
Owner
Toshi Piazza
Security engineer at Microsoft, member of @RPISEC.
Toshi Piazza
Ghidra Plugin for Texas Instrument CC 8051 SOC's especially CC1110 and CC2510

Texas Instruments CCxxxx Ghidra CPU Plugin Ghidra Plugin for Texas Instrument CC 8051 core SOC's especially CC1110 and CC2510 This helps to name the d

null 6 Dec 22, 2022
Ghidra Plugin for Fujitsu FR60 Processors. Focused on DVRP's MB91302A in the Sony PSX.

Fujitsu FR60 Ghidra Plugin This repository contains a plugin for Ghidra that enables decompilation support for FR60 processors from Fujitsu. In partic

null 13 Jan 3, 2023
Ghidra plugin for querying the Symgrate databases.

Howdy y'all, This repo contains client scripts for accessing the Symgrate databases from Ghidra to recover symbol names, part number and I/O addresses

null 8 Jul 15, 2022
Ghidra plugin for HashDB

hashdb-ghidra This is a Ghidra plugin for HashDB. It allows you to compile a list of API hashes and then to query the HashDB web service for possible

OALabs 9 Apr 7, 2022
Hexagon is a microservices toolkit written in Kotlin

Hexagon is a microservices' toolkit (not a framework) written in Kotlin. Its purpose is to ease the building of server applications (Web applications, APIs or queue consumers) that run inside a cloud platform.

Hexagon 413 Jan 5, 2023
A Java 8+ Jar & Android APK Reverse Engineering Suite (Decompiler, Editor, Debugger & More)

Bytecode Viewer Bytecode Viewer - a lightweight user friendly Java Bytecode Viewer. New Features WAR & JSP Loading JADX-Core Decompiler Fixed APK & de

Kalen (Konloch) Kinloch 13.5k Jan 7, 2023
kotlin decompiler based on quiltflower

Quiltflower Quiltflower is a fork of Fernflower and ForgeFlower adding additional features for use with the Quilt toolchain. Changes include: Javadoc

Joseph Burton 39 Jan 8, 2023
A standalone Java Decompiler GUI

JD-GUI JD-GUI, a standalone graphical utility that displays Java sources from CLASS files. Java Decompiler projects home page: http://java-decompiler.

Java Decompiler 12k Dec 31, 2022
Dex to Java decompiler

JADX jadx - Dex to Java decompiler Command line and GUI tools for producing Java source code from Android Dex and Apk files ❗ ❗ ❗ Please note that in

null 32.8k Jan 1, 2023
GHIDRA plugin to parse, disassemble and decompile NodeJS Bytenode (JSC) binaries

ghidra_nodejs Description GHIDRA plugin to parse, disassemble and decompile NodeJS Bytenode (JSC) binaries. Supported NodeJS versions: v8.16.0 (x64) (

Positive Technologies 231 Jan 8, 2023
A collection of my Ghidra scripts

ghidra-scripts A collection of my Ghidra scripts. iOS FOX: This script locates all calls to objc_msgSend family functions, tries to infer the actual m

null 63 Dec 25, 2022
Ghidra Wasm plugin with disassembly and decompilation support

Module to load WebAssembly files into Ghidra, supporting disassembly and decompilation. This plugin borrows loader functionality from this repo: https

Garrett Gu 54 Nov 22, 2022
The new bridge between Ghidra and Frida!

ghidra2frida ghidra2frida is a Ghidra Extension that, working as a bridge between Ghidra and Frida, lets you create powerful Ghidra scripts that take

null 92 Dec 5, 2022
Ghidra is a software reverse engineering (SRE) framework

Ghidra Software Reverse Engineering Framework Ghidra is a software reverse engineering (SRE) framework created and maintained by the National Security

National Security Agency 36.5k Dec 28, 2022
Ghidra Plugin for Texas Instrument CC 8051 SOC's especially CC1110 and CC2510

Texas Instruments CCxxxx Ghidra CPU Plugin Ghidra Plugin for Texas Instrument CC 8051 core SOC's especially CC1110 and CC2510 This helps to name the d

null 6 Dec 22, 2022
A program diffing extension for Ghidra.

Dragon Fang A program diffing extension for Ghidra. Dragon Fang attempts to map corresponding functions present in two versions of the same binary app

John F.X. Galea 8 Jul 24, 2022
Ghidra Plugin for Fujitsu FR60 Processors. Focused on DVRP's MB91302A in the Sony PSX.

Fujitsu FR60 Ghidra Plugin This repository contains a plugin for Ghidra that enables decompilation support for FR60 processors from Fujitsu. In partic

null 13 Jan 3, 2023
Ghidra plugin for querying the Symgrate databases.

Howdy y'all, This repo contains client scripts for accessing the Symgrate databases from Ghidra to recover symbol names, part number and I/O addresses

null 8 Jul 15, 2022
Ghidra plugin for HashDB

hashdb-ghidra This is a Ghidra plugin for HashDB. It allows you to compile a list of API hashes and then to query the HashDB web service for possible

OALabs 9 Apr 7, 2022