Skip to content

State of art reference

Loic Jaquemet edited this page Dec 24, 2016 · 11 revisions

In Q4 2016, reddit finally helped me identify the relevant academic field. below is a short list of what I found and find relevant.

Identification of data structure in memory dump of executed sample

  • Laika: Digging For Data Structures, 2008
    Keywords: Bayesian unsupervised learning, typed pointer
    Parse the whole memory pages without precise identification of memory chunks. Basic type heuristics. Introduce the indea of typed pointer, where same-size records with pointers values in same offset would probably be the same type.
    Applied to malware triage using bayesian learning.
    ****NB: Concept tested in haystack circa 2011-2012 **** TODO: look for follow-ups. heap walker anyone ?

  • SigGraph: Brute Force Scanning of Kernel Data Structure Instances Using Graph-based Signatures, 2011
    Keywords: data structure isomorphism, signature creation, Kernel rootkit detection, kernel version inference
    Detection of kernel structures in memory dump. Compiler based approach. Generate signatures based on source code (AST).
    ****NB: Concept very similar to haystack ****

  • KOP

  • MAS
    Microsoft efforts at rootkit detection in kernel memory dumps. Massive data source.

Identification of data structures in dynamic execution of samples

  • RDS: Recursive data structure profiling, 2005
    Keywords: profiling, shape graph analysis, Allocation dynamic analysis
    It creates a graph of relation between records based on pointer values. Note: I'm not sure I understand the finality of using dynamic analysis for this

  • DDT: design and evaluation of a dynamic program analysis for optimizing data structure usage, 2009
    Keywords:

  • Rewards: Automatic reverse engineering of data structures from binary execution, 2010
    Keywords:

  • Howard: a dynamic excavator for reverse engineering data structures, 2011
    Keywords:

  • TIE: Principled reverse engineering of types in binary programs, 2011
    Keywords:

  • MemPick: High-Level Data Structure Detection in C/C++ Binaries, 2013
    Keywords: Shape analysis, Allocation dynamic analysis, overlay
    Uses Intel PIN to instrument allocation primitives. Creates a graph based on pointer values? Type analysis. Overlays detection. Classify linked list, trees shapes. Performance assessment of tree usage.
    Overall, conclusion seems focused on detection of linked list structures (hint at performance domain).

Terminology

  • Shape analysis: Discovers and verifies properties of linked data structure to verify high level correctness of programs.
  • Allocation dynamic analysis: Instrumentation of sample to identify allocated memory block. Not heap walking. Require one or multiple run of the sample.
  • Heap walking: Identify allocated memory block by following the static memory dump heap management structure.
  • Low-level data structure identification: Detection of primitive types or records.
  • High-level data structure identification: Recursive data structure and linked (pointer) structures.
Clone this wiki locally