Skip to content

Linear sweep causing performance issues when there are fragments of functions from incomplete analysis #7129

@D0ntPanic

Description

@D0ntPanic

Version and Platform (required):

  • Binary Ninja Version: 5.17982
  • Edition: Ultimate
  • OS: macOS
  • OS Version: 15.5
  • CPU Architecture: M3

Bug Description:
Analysis sometimes aborts for a function for numerous reasons. Historically, this was mostly for performance (function is too large, takes too long to analyze, etc.). Starting in 5.1 we added some additional conservative analysis features that try to avoid analyzing code that isn't real by aborting when invalid instructions are encountered.

Analysis isn't perfect, and sometimes real functions can encounter edge cases in heuristics that cause bad code to be introduced into a well-formed function. This can cause analysis to halt, yielding an incomplete function. The user then can go fix up the function and complete analysis with potentially better results from avoiding incorrect code.

However, linear sweep isn't very aware of this. With incomplete functions around the binary, there are leftover basic blocks that contain real code, which the analysis engine couldn't pick up. Linear sweep finds these, but because they are function fragments that sometimes loop around to common paths, it can sometimes create a very large number of very large functions, causing a massive performance issue. To make things worse, linear sweep analysis is less parallel than phase 1 analysis, so it takes even longer.

Steps To Reproduce:
The binary "XUL-x86_64" on the "Large Binary Performance Tests" project contains a binary that triggers this exclusively because of the new invalid instruction abort logic.

With the setting analysis.guided.triggers.invalidInstruction on (which is default), phase 2/3 take 52 minutes on an M3 Max. With the setting off, it takes only 8 minutes.

An example function in this binary that triggers the issue is at 0x531580. With the invalid instruction setting on, linear sweep makes a large number of extra functions containing leftover basic blocks from this function.

Expected Behavior:
We should try and be better about dealing with leftover function fragments in linear sweep, especially avoiding creating a large number of extra functions that flow into the same code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions