Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing scripts rework #348

Closed
wants to merge 29 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
7728701
utils/index-repository: fetch in parallel
tleb Nov 7, 2024
221b77f
utils/update-elixir-data: fetch in parallel
tleb Nov 7, 2024
abb29ac
utils/index-repository: add alias for `git -C ...`
tleb Nov 8, 2024
6976793
utils/index-repository: support calling on existing repository
tleb Nov 8, 2024
ee60349
utils/*: delete common.sh and inline $ELIXIR_THREADS fallback
tleb Nov 8, 2024
bb0c239
utils/index-repository: refactor by creating project_init() function
tleb Nov 8, 2024
268bfcb
utils/index-repository: refactor by creating project_add_remote() fun…
tleb Nov 8, 2024
7e95b58
utils/index-repository: refactor by creating project_fetch() function
tleb Nov 8, 2024
99e8038
utils/index-repository: refactor by creating project_index() function
tleb Nov 8, 2024
e67438a
utils: rename index-repository to index
tleb Nov 8, 2024
29d5fd9
utils: deduplicate index-all-repositories into index
tleb Nov 8, 2024
7c11028
utils/index: make it possible to update a specific project
tleb Nov 8, 2024
1f1607b
utils: deduplicate utils/update-elixir-data into utils/index
tleb Nov 8, 2024
a1975a1
utils/index: add init.defaultBranch= config to `git init` call
tleb Nov 8, 2024
5c9b2ad
utils: deduplicate pack-repositories into index
tleb Nov 8, 2024
711a3f2
README: remove "Keeping git repository disk usage under control" section
tleb Nov 8, 2024
e288d46
utils/index: allow indexing project with remote URLs
tleb Nov 8, 2024
e9efce0
utils/index: remove `git config --system --add safe.directory` call
tleb Nov 8, 2024
c40e673
utils/index: remove /usr/local/elixir/update.py absolute path
tleb Nov 8, 2024
0cbc4fe
README: update following utils/* script changes
tleb Nov 8, 2024
e1360bf
utils/index: force use of bash, we depend on it for ${@:5} syntax
tleb Dec 20, 2024
dbc836b
utils/index: avoid passing argument to test(1)
tleb Dec 20, 2024
b341498
Dockerfile: add virtualenv to $PATH by default
tleb Dec 21, 2024
401a60c
Dockerfile: set PYTHONUNBUFFERED=1 by default
tleb Dec 21, 2024
b095414
Dockerfile: add utils/ in $PATH by default, for easy indexing
tleb Dec 21, 2024
7f282f3
gitignore: add .envrc for direnv
tleb Jan 27, 2025
ff7b859
gitignore: add /data/ for use as data root
tleb Jan 27, 2025
f478ffd
WIP benchmark scripts
tleb Dec 27, 2024
4ee26a9
WIP: limit Linux tags
tleb Jan 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ tags
.idea/
env/
venv/
/data/

# Web-specific
http/images
Expand All @@ -16,3 +17,6 @@ http/robots.txt
*.swp
*~
~*

# https://direnv.net/
.envrc
43 changes: 6 additions & 37 deletions README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -255,35 +255,12 @@ as a front-end to reduce the load on the server running the Elixir code.
== Keeping Elixir databases up to date

To keep your Elixir databases up to date and index new versions that are released,
we're proposing to use a script like `utils/update-elixir-data` which is called
we're proposing to use a script like `index /srv/elixir-data --all` which is called
through a daily cron job.

You can set `$ELIXIR_THREADS` if you want to change the number of threads used by
update.py for indexing (by default the number of CPUs on your system).

== Keeping git repository disk usage under control

As you keep updating your git repositories, you may notice that some can become
considerably bigger than they originally were. This seems to happen when a `gc.log`
file appears in a big repository, apparently causing git's garbage collector (`git gc`)
to fail, and therefore causing the repository to consume disk space at a fast
pace every time new objects are fetched.

When this happens, you can save disk space by packing git directories as follows:

----
cd <bare-repo>
git prune
rm gc.log
git gc --aggressive
----

Actually, a second pass with the above commands will save even more space.

To process multiple git repositories in a loop, you may use the
`utils/pack-repositories` that we are providing, run from the directory
where all repositories are found.

= Building Docker images

Dockerfiles are provided in the `docker/` directory.
Expand All @@ -305,22 +282,14 @@ The Docker image does not contain any repositories.
To index a repository, you can use the `index-repository` script.
For example, to add the https://musl.libc.org/[musl] repository, run:

# docker exec -it -e PYTHONUNBUFFERED=1 elixir-container \
/bin/bash -c 'export "PATH=/usr/local/elixir/venv/bin:$PATH" ; \
/usr/local/elixir/utils/index-repository \
musl https://git.musl-libc.org/git/musl'

Without PYTHONUNBUFFERED environment variable, update logs may show up with a delay.
# docker exec -it elixir-container index /srv/elixir-data musl

Or, to run indexing in a separate container:

# docker run -e PYTHONUNBUFFERED=1 -v ./elixir-data/:/srv/elixir-data \
--entrypoint /bin/bash elixir -c \
'export "PATH=/usr/local/elixir/venv/bin:$PATH" ; \
/usr/local/elixir/utils/index-repository \
musl https://git.musl-libc.org/git/musl'
# docker run -v ./elixir-data/:/srv/elixir-data \
--entrypoint index elixir /srv/elixir-data musl

You can also use utils/index-all-repositories to start indexing all officially supported repositories.
You can also use `index /srv/elixir-data --all` to start indexing all officially supported repositories.

After indexing is done, Elixir should be available under the following URL on your host:
http://172.17.0.2/musl/latest/source
Expand All @@ -332,7 +301,7 @@ If 172.17.0.2 does not answer, you can check the IP address of the container by
== Automatic repository updates

The Docker image does not automatically update repositories by itself.
You can, for example, start `utils/update-elixir-data` in the container (or in a separate container, with Elixir data volume/directory mounted)
You can, for example, start `index /srv/elixir-data --all` in the container (or in a separate container, with Elixir data volume/directory mounted)
from cron on the host to periodically update repositories.

== Using Docker image as a development server
Expand Down
75 changes: 75 additions & 0 deletions bench.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
#!/usr/bin/env python3

import datetime
import elixir.lib as lib
from elixir.lib import script, scriptLines
import elixir.data as data
from elixir.data import PathList
from find_compatible_dts import FindCompatibleDTS
import elixir.web
import threading
import jinja2
from dataclasses import dataclass

@dataclass
class Config:
project_dir: str
version_string: str
repo_link: str

@dataclass
class Context:
config: Config
versions_cache_lock: threading.Lock
versions_cache: dict
jinja_env: jinja2.Environment
dts_comp_cache: dict

@dataclass
class Request:
context: Context

is_raw: bool

def get_param(self, key):
if key == 'raw':
return '1' if self.is_raw else '0'
else:
raise NotImplementedError

@dataclass
class Response:
status: int
location: str
content_type: str
text: str
downloadable_as: str
cache_control: tuple
headers: dict


# /linux/v4.5-rc5/source/drivers/scsi/lpfc/lpfc_sli.c
# /linux/v5.11.20/source/drivers/net/ethernet/hisilicon/hns/hnae.c
# /linux/v3.5-rc3/source/mm/percpu-vm.c

config = Config(project_dir='/home/tleb/prog/public/elixir-data',
tleb marked this conversation as resolved.
Show resolved Hide resolved
version_string='v1.0-fake-version',
repo_link='https://github.com/bootlin/elixir/')

ctx = Context(config=config,
versions_cache_lock=threading.Lock(),
versions_cache={},
jinja_env=elixir.web.get_jinja_env(),
dts_comp_cache={"linux": True})

req = Request(context=ctx, is_raw=False)

project = 'linux'
version = 'v3.5-rc3'

path = 'mm/percpu-vm.c'
resp = Response(0, '', '', '', '', ('',), {})

for _ in range(100):
x = elixir.web.SourceResource()
x.on_get(req, resp, project, version, path)
3 changes: 2 additions & 1 deletion docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ ARG ELIXIR_VERSION

ENV ELIXIR_VERSION=$ELIXIR_VERSION \
ELIXIR_ROOT=/srv/elixir-data \
PATH="/usr/local/elixir/venv/bin:$PATH"
PATH="/usr/local/elixir/utils:/usr/local/elixir/venv/bin:$PATH" \
PYTHONUNBUFFERED=1

ENTRYPOINT ["/usr/sbin/apache2ctl", "-D", "FOREGROUND"]
2 changes: 1 addition & 1 deletion elixir/web.py
Original file line number Diff line number Diff line change
Expand Up @@ -518,7 +518,7 @@ def get_ident_url(ident, ident_family=None):
html_code_block = format_code(fname, code)

# Replace line numbers by links to the corresponding line in the current file
html_code_block = sub('href="#codeline-(\d+)', 'name="L\\1" id="L\\1" href="#L\\1', html_code_block)
html_code_block = sub(r'href="#codeline-(\d+)', 'name="L\\1" id="L\\1" href="#L\\1', html_code_block)
tleb marked this conversation as resolved.
Show resolved Hide resolved

for f in filters:
html_code_block = f.untransform_formatted_code(filter_ctx, html_code_block)
Expand Down
3 changes: 2 additions & 1 deletion projects/linux.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ get_tags()
version_dir |
sed -r 's/^(pre|lia64-|)(v?[0-9\.]*)(pre|-[^pf].*?|)(alpha|-[pf].*?|)([0-9]*)(.*?)$/\2#\3@\4@\5@\60@\1.0/' |
sort -V |
sed -r 's/^(.*?)#(.*?)@(.*?)@(.*?)@(.*?)0@(.*?)\.0$/\6\1\2\3\4\5/'
sed -r 's/^(.*?)#(.*?)@(.*?)@(.*?)@(.*?)0@(.*?)\.0$/\6\1\2\3\4\5/' |
head -n1000
}

list_tags_h()
Expand Down
5 changes: 3 additions & 2 deletions script.sh
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ get_tags()
version_dir |
sed 's/$/.0/' |
sort -V |
sed 's/\.0$//'
sed 's/\.0$//' |
head -n1000
}

list_tags()
Expand All @@ -65,7 +66,7 @@ list_tags_h()

get_latest_tags()
{
git tag | version_dir | grep -v '\-rc' | sort -Vr
git tag | version_dir | grep -v '\-rc' | sort -Vr | head -n1000
}

get_type()
Expand Down
32 changes: 32 additions & 0 deletions utils/bench.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/sh

oha="oha -z5s -c10 -r0 --json"
host="$1"

if test -z "$host"; then
>&2 echo "usage: $0 <host>"
exit 1
fi

run()
{
url="$1"
res="$($oha "$host$url")"

if echo "$res" | jq -e '.summary.successRate != 1.0' > /dev/null; then
>&2 echo "some requests failed, please investigate $url"
exit 1
fi

printf "%7.2f\t%s\n" "$(echo "$res" | jq -r '.summary.requestsPerSec')" "$url"
}

# Stats are something like 54% .c/.h rendering and .42% ident pages.
# Directories, autocomplete and others are insignifiant.

run "/linux/v5.15.48/C/ident/ENOKEY"
run "/linux/v6.13-rc3/source/drivers/clk/clk-eyeq.c"
run "/linux/latest/C/ident/ENOKEY"
run "/linux/v6.12.6/source"
run "/linux/v5.11.20/source/drivers/net/ethernet/hisilicon/hns/hnae.c"
run "/linux/v4.5-rc5/source/drivers/scsi/lpfc/lpfc_sli.c"
24 changes: 0 additions & 24 deletions utils/common.sh

This file was deleted.

Loading