Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New parser for python #10

Open
Jirigesi opened this issue Jun 29, 2020 · 12 comments
Open

New parser for python #10

Jirigesi opened this issue Jun 29, 2020 · 12 comments

Comments

@Jirigesi
Copy link

Hello,
Thanks for providing such a great tool. However, I want to use a similar tool on python code. I tried my best and did not find any. Is it possible that you can give me some guide to let me write a parser for python code?

Best

@Symbolk
Copy link

Symbolk commented Dec 3, 2020

The README says that soon a detailed tutorial will be provided, looking forward to it!

@ldesi
Copy link

ldesi commented Feb 1, 2021

The README says that soon a detailed tutorial will be provided, looking forward to it!

Any news on that? Thanks.

@rodrigo-brito
Copy link

Hi @ldesi and @Symbolk. I create a parser for Go. Maybe, it can be used to create a generic parser. The Go parser converts a file to a JSON input, and this output is used to create the RefDiff CST.

I think it may be used to python:

Click to expand!

[
	{
		"type": "File",
		"start": 0,
		"end": 203,
		"line": 1,
		"has_body": true,
		"name": "types.go",
		"namespace": "",
		"parent": null,
		"tokens": [
			"0-7",
			"8-16",
			"16-17",
			"18-22",
			"23-31",
			"32-35",
			"35-36",
			"36-40",
			"41-49",
			"50-54",
			"55-61",
			"61-62",
			"62-63",
			"63-64",
			"65-69",
			"70-71",
			"73-81",
			"85-86",
			"86-87",
			"87-90",
			"90-91",
			"92-103",
			"104-105",
			"105-106",
			"106-112",
			"112-113",
			"114-115",
			"126-132",
			"132-133",
			"133-134",
			"134-135",
			"136-138",
			"148-157",
			"158-159",
			"162-163",
			"163-164",
			"164-165",
			"166-169",
			"169-170",
			"171-172",
			"172-173",
			"173-174",
			"174-175",
			"176-180",
			"181-182",
			"182-190",
			"190-191",
			"192-196",
			"196-197",
			"197-198",
			"199-200",
			"202-203",
			"203-204"
		],
		"receiver": null
	},
	{
		"type": "Type",
		"start": 23,
		"end": 35,
		"line": 3,
		"has_body": false,
		"name": "IntAlias",
		"namespace": "",
		"parent": "types.go",
		"receiver": null
	},
	{
		"type": "Type",
		"start": 41,
		"end": 63,
		"line": 4,
		"has_body": false,
		"name": "ChanType",
		"namespace": "",
		"parent": "types.go",
		"receiver": null
	},
	{
		"type": "Type",
		"start": 73,
		"end": 90,
		"line": 7,
		"has_body": false,
		"name": "IntSlice",
		"namespace": "",
		"parent": "types.go",
		"receiver": null
	},
	{
		"type": "Type",
		"start": 92,
		"end": 112,
		"line": 8,
		"has_body": false,
		"name": "StringSlice",
		"namespace": "",
		"parent": "types.go",
		"receiver": null
	},
	{
		"type": "Struct",
		"start": 126,
		"end": 134,
		"line": 9,
		"has_body": true,
		"name": "A",
		"namespace": "",
		"parent": "types.go",
		"receiver": null
	},
	{
		"type": "Interface",
		"start": 148,
		"end": 172,
		"line": 10,
		"has_body": false,
		"name": "iA",
		"namespace": "",
		"parent": "types.go",
		"receiver": null
	},
	{
		"type": "Function",
		"start": 162,
		"end": 169,
		"line": 11,
		"has_body": false,
		"name": "A",
		"namespace": "iA.",
		"parent": "iA",
		"receiver": null
	},
	{
		"type": "Function",
		"start": 176,
		"end": 203,
		"line": 15,
		"has_body": true,
		"name": "Test",
		"namespace": "IntSlice.",
		"parent": "IntSlice",
		"receiver": "IntSlice"
	}
]

@Mosallamy
Copy link

Hi @rodrigo-brito, we are working on a graduation project and in one part of the project we need to use a refactoring tool such as RefDiff. The problem is that we need it for Python. So I wanted to ask you, how hard is it to create a RefDiff plugin for Python such as the one you created for Go?

@rodrigo-brito
Copy link

Hi @Mosallamy, I spent one month creating the plugin. This week, I will try to create a short tutorial to help the other developers in plugin creation. But the main effort is to create an AST parser to extract the main components of a python file. For example, for the given file (example.py located in my_package):

def foo(x):
    print("x = ", x)

def bar():
    foo(10)

You should return a structure like this:

[
  {
    "type": "File",
    "start": 0,
    "end": 50,
    "line": 1,
    "has_body": true,
    "name": "example.py",
    "namespace": "my_package",
    "parent": null,
    "tokens": [
      "0-4",
      "5-8",
      ...
    ],
  },
  {
    "type": "Function",
    "start": 23,
    "end": 35,
    "line": 1,
    "has_body": true,
    "name": "foo",
    "namespace": "my_package",
    "parent": "example.py",
    "parameters": ["x"],
    "calls": []
  },
  {
    "type": "Function",
    "start": 36,
    "end": 50,
    "line": 5,
    "has_body": true,
    "name": "bar",
    "namespace": "my_package",
    "parent": "example.py",
    "parameters": [],
    "calls": ["my_package.foo"]
  },
]

The start and end values are just an example, it is not the correct position.
But in summary:

  • For each node, you must extract the position of the token, line number, and parent node.
  • In the case of functions, you must also extract the parameters and local function calls (e.g foo() in bar function)

If we have this information, we can create the plugin.
Do you have experience with python AST?

@Mosallamy
Copy link

@rodrigo-brito Thanks for the fast reply! We have experimented a little with the built in Python AST library.

https://docs.python.org/3/library/ast.html

Form the AST library we can extract the following information:

  • Function names
  • Function calls
  • Body of the function
  • Function parameters
    and other informations, can we use this library as the base for the Python plugin?

As for the Tokens, we've found the following library https://asttokens.readthedocs.io/en/latest/user-guide.html, which returns the positions of tokens

@rodrigo-brito
Copy link

@Mosallamy, I can help you with the code. Can you open a new repository for it? We can use Jython to create the parser and integrate it directly in Java module.

@Mosallamy
Copy link

Hey @rodrigo-brito, I just created a repo with a script that parses a python file and extract the following information from any function:

  • type
  • name
  • paramaters
  • line
  • start token
  • end token

Run the Ast.py file to get the output

@rodrigo-brito
Copy link

Hi @Mosallamy, can you share the repository link?

@Mosallamy
Copy link

Mosallamy commented Feb 10, 2021

@Mosallamy
Copy link

Hello @rodrigo-brito, until now we have extracted all of the information out of the AST except for the function calls. Also we have thoroughly read the RefDiff paper and understood the steps required to create a plugin, but we had a problem understanding the exact implementation of the code 😅

@rodrigo-brito
Copy link

rodrigo-brito commented Feb 15, 2021

Hi @Mosallamy, I will try to create the base of the plugin today. I will open a pull request in your repository soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants