Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster startup in situations where Python is repeatedly called with the same modules #719

Open
VariantXYZ opened this issue Feb 6, 2025 · 0 comments

Comments

@VariantXYZ
Copy link

VariantXYZ commented Feb 6, 2025

I spent some time reading over #32 and was wondering if speeding up Python startup in certain constrained scenarios may be possible. In particular, I happen to use Python to execute scripts that operate on single files so I can incorporate them into a build system.

I have a script that converts a text representation of data into a binary format. On WSL1 on my machine, this script takes 20 milliseconds to actually execute 'useful work', but takes 105 milliseconds to actually import everything. On my M1 MacBook Pro, this is 10 milliseconds + 17 milliseconds to import everything (significantly faster, but still quite a large chunk of the total time). My measurements below. The script isn't very complex either but I've included it as well (there are definitely updates I could make to further reduce the import time, but that's a bit out of scope for this issue IMO).

I believe the approach of one file processed per call is quite common, especially in a build system where you'd like to have fast iterative builds, but some alternatives I had considered:

  • Have one script that processes all the files at once -> we give up parallelism and the benefit of incremental builds
  • Have the script just process multiple files and pass 'all changed files at once' -> this would also cause us to not benefit from proper parallelism on clean builds and it would not be trivial to take advantage of resources on a system (i.e., how would you know how to take advantage of multiple cores with some N number of files? Would you need to do import time profiling? etc...)

Maybe there are some other alternatives, but some possibilities I considered were:

  • A ccache-like mechanism that would allow for keeping the loaded module state in memory within the scope of a build system process (e.g., the build system could spawn a 'cached-python' entity that would build up the imported/initialized modules as necessary, allowing them to be reused among many python processes)
  • A precompiled DSO-like mechanism that could just be loaded alongside the Python process (i.e. with LD_LIBRARY_PATH or some equivalent) with the necessary modules pre-initialized (this would probably be a huge pain for things that must dynamically initialize based on the execution environment)

python3 -m cProfile ./scripts/txt2map.py build/attribmaps/0115.map gfx/attribmaps/0115.txt ./gfx/prebuilt/attribmaps
         4350 function calls (4283 primitive calls) in 0.020 seconds
python3 -X importtime ./scripts/txt2map.py build/attribmaps/0115.map gfx/attribmaps/0115.txt ./gfx/prebuilt/attribmaps
import time: self [us] | cumulative | imported package
import time:       132 |        132 |   _io
import time:        35 |         35 |   marshal
import time:       331 |        331 |   posix
import time:      1438 |       1934 | _frozen_importlib_external
import time:       278 |        278 |   time
import time:       766 |       1043 | zipimport
import time:        96 |         96 |     _codecs
import time:       571 |        666 |   codecs
import time:       583 |        583 |   encodings.aliases
import time:      1515 |       2763 | encodings
import time:       616 |        616 | encodings.utf_8
import time:       126 |        126 | _signal
import time:        38 |         38 |     _abc
import time:       431 |        469 |   abc
import time:      5361 |       5830 | io
import time:        50 |         50 |       _stat
import time:       378 |        428 |     stat
import time:       953 |        953 |     _collections_abc
import time:       283 |        283 |       genericpath
import time:       836 |       1119 |     posixpath
import time:      1710 |       4208 |   os
import time:       460 |        460 |   _sitebuiltins
import time:      3632 |       3632 |     apport_python_hook
import time:       389 |       4021 |   sitecustomize
import time:       187 |        187 |   usercustomize
import time:     14291 |      23165 | site
import time:       471 |        471 |         types
import time:      1661 |       2131 |       enum
import time:       426 |        426 |         _sre
import time:       469 |        469 |           sre_constants
import time:       704 |       1173 |         sre_parse
import time:      4005 |       5603 |       sre_compile
import time:        89 |         89 |           itertools
import time:       408 |        408 |           keyword
import time:        70 |         70 |             _operator
import time:       889 |        958 |           operator
import time:       450 |        450 |           reprlib
import time:        57 |         57 |           _collections
import time:      4983 |       6944 |         collections
import time:        72 |         72 |         _functools
import time:     14217 |      21232 |       functools
import time:        80 |         80 |       _locale
import time:       502 |        502 |       copyreg
import time:      3014 |      32560 |     re
import time:      1169 |      33729 |   fnmatch
import time:        85 |         85 |   errno
import time:        82 |         82 |   zlib
import time:       436 |        436 |     _compression
import time:       885 |        885 |     _bz2
import time:      8526 |       9847 |   bz2
import time:       793 |        793 |     _lzma
import time:       875 |       1667 |   lzma
import time:      6587 |      51994 | shutil
import time:      1583 |       1583 | common
import time:       101 |        101 |     _struct
import time:      2241 |       2341 |   struct
import time:       724 |        724 |     _ast
import time:       809 |        809 |     contextlib
import time:      1679 |       3211 |   ast
import time:      6278 |      11829 | common.utils
import time:      1085 |       1085 | common.tilemaps
import time:      1859 |       1859 |   utils
import time:      1060 |       2919 | common.tilesets
import time:       353 |        353 | encodings.utf_8_sig

(The common module imports are listed below)

#!/bin/python

import os, sys
from shutil import copyfile
sys.path.append(os.path.join(os.path.dirname(__file__), 'common'))
from common import utils, tilemaps, tilesets

output_file = sys.argv[1]
input_file = sys.argv[2]
prebuilt_root = sys.argv[3]

fname = os.path.splitext(os.path.basename(input_file))[0]
char_table = {}

# 0xFE is a special character indicating a new line for tilemaps, it doesn't really belong in the tileset table but for this specifically it makes sense
char_table['\n'] = 0xFE

prebuilt = os.path.join(prebuilt_root, f"{fname}.map")
if os.path.isfile(prebuilt):
    print("\tUsing prebuilt {}".format(prebuilt))
    copyfile(prebuilt, output_file)
    os.utime(output_file, None)
    quit()

with open(input_file, 'r', encoding='utf-8-sig') as f:
    mode = f.readline().strip().strip('[]').split('|')
    mode[0] = int(mode[0], 16)
    is_compressed = mode[0] & 3
    if len(mode) == 2:
        mode[0] |= int(mode[1], 16) << 1
    tmap = [mode[0]]
    if is_compressed:
        text = []
        for line in f:
            b = utils.txt2bin(line, char_table)
            text += b
        text.append(0xFF) # tmap compression expects 0xFF at the end
        tmap += tilemaps.compress_tmap(text)
    else:
        text = f.read().replace('\r\n','\n')
        tmap += utils.txt2bin(text, char_table)
        tmap.append(0xFF)
    with open(output_file, 'wb') as of:
        of.write(bytearray(tmap))

From the common modules:

import struct
from ast import literal_eval
from collections import OrderedDict
import os
import struct
import sys
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant