Skip to content

acheshkov/program_slicing

Repository files navigation

Decomposition

Set of methods for source code decomposition.


Installation

  1. $ git clone [email protected]:acheshkov/program_slicing.git
  2. $ cd program_slicing
  3. $ git submodule update --recursive --init
  4. $ pip3 install ./program_slicing

You should have access to global network to use pip. Python 3.9 with corresponding C compiler is required. Run Python Console to check the version of C compiler.


Usage

This project can be used via Command Line Interface, or it can be included into any other Python project as a submodule.

Command Line Interface

slice

Use this command if you want to decompose source files by complete computation slice (Nikolaos Tsantalis and Alexander Chatzigeorgiou. 2011. Identification of extract method refactoring opportunities for the decomposition of methods).

$ python cli.py slice [-h]
                      [-o OUTPUT]
                      source

Positional arguments:

source - source folder, file or url

Optional arguments:

-o, --output OUTPUT - output file or directory: depending on what you set as output, you will get folder full of slice decompositions or a single file with it. It uses stdout if not specified

-h, --help - show this help message and exit

Examples:

$ python cli.py slice MyProjectPath
$ python cli.py slice MyFile.java
$ python cli.py slice MyProjectPath --output MyResultPath
$ python cli.py slice MyFile.java --output MyResultPath

Submodule Interface

Control Dependence Graph - structure that represents Control Dependence Graph (inherited from networkx.DiGraph) with corresponding methods.

from program_slicing.graph.cdg import ControlDependenceGraph
  • add_entry_point - mark specified node as entry point.
  • get_entry_points - return a set of nodes that where marked as an entry point.

Control Flow Graph - structure that represents Control Flow Graph (inherited from networkx.DiGraph) with corresponding methods.

from program_slicing.graph.cfg import ControlFlowGraph
  • add_entry_point - mark specified node as entry point.
  • get_entry_points - return a set of nodes that where marked as an entry point.

Data Dependence Graph - structure that represents Data Dependence Graph (inherited from networkx.DiGraph) with corresponding methods.

from program_slicing.graph.ddg import DataDependenceGraph
  • add_entry_point - mark specified node as entry point.
  • get_entry_points - return a set of nodes that where marked as an entry point.

Program Dependence Graph - structure that represents Program Dependence Graph (inherited from networkx.DiGraph) with corresponding methods.

from program_slicing.graph.pdg import ProgramDependenceGraph
  • add_entry_point - mark specified node as entry point.
  • get_entry_points - return a set of nodes that where marked as an entry point.

Statement - structure that represents Control Dependence Graph, Data Dependence Graph or Program Dependence Graph nodes.

from program_slicing.graph.statement import Statement
  • statement_type - StatementType object.
  • start_point - line and column numbers of the Statement's start.
  • end_point - line and column numbers of the Statement's end.
  • affected_by - set of strings with names of variables that may affect the current Statement.
  • name - string with the name of the Statement. Not all Statements are named.
  • ast_node_type - string with additional information about node (e.g. an AST root's class).

StatementType - structure that enumerates Statement types.

from program_slicing.graph.statement import StatementType
  • FUNCTION - function declaration Statement.
  • VARIABLE - variable declaration Statement.
  • ASSIGNMENT - variable assignment Statement (such as i = 0, i += 1, i++, etc).
  • CALL - function call Statement.
  • SCOPE - scope Statement (such as braces {} or empty body in if (...) a = 0).
  • BRANCH - Statement that branches the flow (such as if, try, catch, switch).
  • LOOP - branch Statement that generates backward flow (such as for and while).
  • GOTO - Statement that lead flow to a concrete Statement (such as break, continue, throw, return and even else).
  • UNKNOWN - all the other Statements.
  • EXIT - special Statement that is placed in the end of function declaration, has zero size and all the 'return' and 'throw' nodes leads flow into it (as well as the last Statement in the flow).

Basic Block - structure that represents Control Flow Graph nodes.

from program_slicing.graph.basic_block import BasicBlock
  • statements - get the content of the Basic Block, i.e a list of Statements.
  • root - get the first Statement from the Basic Block. None if it is empty.
  • append - add a specified Statement to the Basic Block.
  • is_empty - return True if there are no statements in the Basic Block, otherwise - False.
  • split - split content by the given index, left first part of the split at the original Basic Block and return a new Basic Block which contains all the rest Statements.

Program Graphs Manager - structure that contains different types of program graphs (such as Control Flow Graph or Control Dependence Graph) based on same source code and provides a set of methods for their analysis.

from program_slicing.graph.parse import Lang
from program_slicing.graph.parse import control_dependence_graph
from program_slicing.graph.parse import control_flow_graph
from program_slicing.graph.manager import ProgramGraphsManager

manager_by_source = ProgramGraphsManager(source_code, Lang.JAVA)

manager_by_cdg = ProgramGraphsManager.from_control_dependence_graph(control_dependence_graph(source_code, Lang.JAVA))

manager_by_cfg = ProgramGraphsManager.from_control_flow_graph(control_flow_graph(source_code, Lang.JAVA))

Properties:

  • control_dependence_graph - return the Control Dependence Graph.
  • control_flow_graph - return the Control Flow Graph.
  • data_dependence_graph - return the Data Dependence Graph.
  • program_dependence_graph - return the Program Dependence Graph.
  • sorted_statements - return Statements that are sorted first by increasing of their start_point, then by decreasing of their end_point.
  • general_statements - return general Statements (those which are not contained in any non SCOPE, BRANCH, LOOP, FUNCTION or EXIT Statement).
  • scope_statements - return all the SCOPE, BRANCH, LOOP or FUNCTION Statements.

Public methods:

  • get_basic_block - return a Basic Block (that is a node of the Control Flow Graph) that contains a given Statement.
  • get_boundary_blocks - return a set of Basic Blocks which intersection of dominated and reach blocks contain the given one block.
  • get_boundary_blocks_for_statement - return a set of boundary blocks for Basic Block in which the given Statement is placed.
  • get_dominated_blocks return a set of Basic Blocks that are dominated by the given one (i.e. their Statements are placed in a control Dependence Graph subtree of the root of the given Basic Block).
  • get_reach_blocks - return a set of Basic Blocks that are reachable from the given one in the Control Flow Graph.
  • get_statement_line_numbers - return a set of line numbers in which the given Statement is placed.
  • get_function_statement - return the minimal FUNCTION Statement in which the given Statement is placed.
  • get_function_statement_by_range - return the minimal FUNCTION Statement in which the given range is placed.
  • get_scope_statement - return a minimal 'scope-like' (SCOPE, LOOP, BRANCH) Statement that contains a given Statement.
  • get_statements_in_scope - return all the Statements in the given scope Statement.
  • get_statements_in_range - return all the Statements in the given range.
  • get_exit_statements - return Statements that are Flow Dependence children of the given statements but not one of them.
  • get_affecting_statements - return Statements from the given set of Statements that affect some Statement not form the given set.
  • get_changed_variables_statements - return VARIABLE Statements that represent variables changed in the given set of Statements.
  • get_involved_variables_statements - return VARIABLE Statements that represent variables involved (including usage) in the given set of Statements.
  • contain_redundant_statements - check if the given set of Statements contain part of some construction not fully included in the given set.

Class methods:

  • from_source_code - build all the graphs by a given source code string and a language description.
  • from_control_dependence_graph - build all the graphs by a given Control Dependence Graph.
  • from_control_flow_graph - build all the graphs by a given Control Flow Graph.
  • from_data_dependence_graph - build all the graphs by a given Data Dependence Graph.
  • from_program_dependence_graph - build all the graphs by a given Program Dependence Graph.

parse - set of functions that allow to build different graphs from the specified source code string and programming language specification.

  • control_dependence_graph - parse a Control Dependence Graph:
from program_slicing.graph.cdg import ControlDependenceGraph
from program_slicing.graph.parse import control_dependence_graph, Lang

cdg: ControlDependenceGraph = control_dependence_graph(source_code, Lang.JAVA)
  • control_flow_graph - parse a Control Flow Graph:
from program_slicing.graph.cfg import ControlFlowGraph
from program_slicing.graph.parse import control_flow_graph, Lang

cfg: ControlFlowGraph = control_flow_graph(source_code, Lang.JAVA)
  • data_dependence_graph - parse a Data Dependence Graph:
from program_slicing.graph.ddg import DataDependenceGraph
from program_slicing.graph.parse import data_dependence_graph, Lang

ddg: DataDependenceGraph = data_dependence_graph(source_code, Lang.JAVA)
  • program_dependence_graph - parse a Program Dependence Graph:
from program_slicing.graph.pdg import ProgramDependenceGraph
from program_slicing.graph.parse import program_dependence_graph, Lang

pdg: ProgramDependenceGraph = program_dependence_graph(source_code, Lang.JAVA)
  • tree_sitter_ast - parse an Abstract Syntax Tree:
from tree_sitter import Tree
from program_slicing.graph.parse import tree_sitter_ast, Lang

ast: Tree = tree_sitter_ast(source_code, Lang.JAVA)

convert - there is also an option to convert one type of graph to another:

from program_slicing.graph import convert
from program_slicing.graph.cdg import ControlDependenceGraph
from program_slicing.graph.cfg import ControlFlowGraph

cdg: ControlDependenceGraph = ControlDependenceGraph()
cfg: ControlFlowGraph = convert.cdg.to_cfg(cdg)
new_cdg: ControlDependenceGraph = convert.cfg.to_cdg(cfg)