Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when converting embeddings into tensor #1937

Open
devashishraj opened this issue Feb 17, 2025 · 0 comments
Open

Segmentation fault when converting embeddings into tensor #1937

devashishraj opened this issue Feb 17, 2025 · 0 comments

Comments

@devashishraj
Copy link

trying to convert embedding into tensor leads to Segmentation fault:


System Info

  • Physical (or virtual) hardware you are using, e.g. for Linux:
> sysctl -a | grep machdep.cpu
machdep.cpu.cores_per_package: 10
machdep.cpu.core_count: 10
machdep.cpu.logical_per_package: 10
machdep.cpu.thread_count: 10
machdep.cpu.brand_string: Apple M2 Pro

  • Operating System, e.g. for Linux:

macos Sequia 15.3.1 (24D70)

  • SDK version, e.g. for Linux:
Python 3.13.2
GNU Make 3.81
Apple clang version 16.0.0 (clang-1600.0.26.6)
Target: arm64-apple-darwin24.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

code

import logging
import torch
from llama_cpp import Llama
from rich.console import Console

# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
console = Console(width=120)

embpath = "all-MiniLM-L6-v2-ggml-model-f16.gguf"
embedModel = Llama(model_path=embpath,embedding=True,verbose=True)

# test embedding model
query = ["Test sentence"]
try:
    embeds = embedModel.embed(input=query)
    print(embeds)
    genAns_tensor = torch.tensor(embeds)
    
    del embedModel
except Exception as e:
    print("Embedding error:", e)

code works for only creating embeddings ( i.e if i remove the tensor conversion part and just print the embedding)

logs

llama_kv_cache_init: kv_size = 512, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 6, can_shift = 1
llama_kv_cache_init: layer 0: n_embd_k_gqa = 384, n_embd_v_gqa = 384
llama_kv_cache_init: layer 1: n_embd_k_gqa = 384, n_embd_v_gqa = 384
llama_kv_cache_init: layer 2: n_embd_k_gqa = 384, n_embd_v_gqa = 384
llama_kv_cache_init: layer 3: n_embd_k_gqa = 384, n_embd_v_gqa = 384
llama_kv_cache_init: layer 4: n_embd_k_gqa = 384, n_embd_v_gqa = 384
llama_kv_cache_init: layer 5: n_embd_k_gqa = 384, n_embd_v_gqa = 384
llama_kv_cache_init:      Metal KV buffer size =     4.50 MiB
llama_init_from_model: KV self size  =    4.50 MiB, K (f16):    2.25 MiB, V (f16):    2.25 MiB
llama_init_from_model:        CPU  output buffer size =     0.00 MiB
llama_init_from_model:      Metal compute buffer size =    17.00 MiB
llama_init_from_model:        CPU compute buffer size =     3.50 MiB
llama_init_from_model: graph nodes  = 221
llama_init_from_model: graph splits = 2
Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | MATMUL_INT8 = 1 | DOTPROD = 1 | MATMUL_INT8 = 1 | ACCELERATE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
Model metadata: {'tokenizer.ggml.cls_token_id': '101', 'tokenizer.ggml.padding_token_id': '0', 'tokenizer.ggml.seperator_token_id': '102', 'tokenizer.ggml.unknown_token_id': '100', 'tokenizer.ggml.token_type_count': '2', 'general.file_type': '1', 'tokenizer.ggml.eos_token_id': '102', 'bert.context_length': '512', 'bert.pooling_type': '1', 'tokenizer.ggml.bos_token_id': '101', 'bert.attention.head_count': '12', 'bert.feed_forward_length': '1536', 'tokenizer.ggml.mask_token_id': '103', 'tokenizer.ggml.model': 'bert', 'bert.attention.causal': 'false', 'general.name': 'all-MiniLM-L6-v2', 'bert.block_count': '6', 'bert.attention.layer_norm_epsilon': '0.000000', 'bert.embedding_length': '384', 'general.architecture': 'bert'}
Using fallback chat format: llama-2
Fatal Python error: Segmentation fault

Thread 0x0000000204908840 (most recent call first):
  File "/Users/devashishraj/Desktop/localRAG/lrag/lib/python3.13/site-packages/llama_cpp/_internals.py", line 306 in decode
  File "/Users/devashishraj/Desktop/localRAG/lrag/lib/python3.13/site-packages/llama_cpp/llama.py", line [1]    63839 segmentation fault  PYTHONFAULTHANDLER=1 python3 -X dev embeddingTest.py

venv package list

pip list
Package                  Version
------------------------ -----------
aiohappyeyeballs         2.4.4
aiohttp                  3.11.10
aiosignal                1.3.2
annotated-types          0.7.0
anyio                    4.7.0
attrs                    24.3.0
beautifulsoup4           4.12.3
certifi                  2024.12.14
charset-normalizer       3.4.0
dataclasses-json         0.6.7
diskcache                5.6.3
faiss-cpu                1.9.0.post1
filelock                 3.17.0
frozenlist               1.5.0
fsspec                   2025.2.0
gpt4all                  2.8.2
h11                      0.14.0
httpcore                 1.0.7
httpx                    0.28.1
httpx-sse                0.4.0
huggingface-hub          0.28.1
idna                     3.10
Jinja2                   3.1.5
joblib                   1.4.2
jsonpatch                1.33
jsonpointer              3.0.0
langchain                0.3.12
langchain-community      0.3.12
langchain-core           0.3.33
langchain-ollama         0.2.3
langchain-text-splitters 0.3.3
langsmith                0.2.3
llama_cpp_python         0.3.7
markdown-it-py           3.0.0
MarkupSafe               3.0.2
marshmallow              3.23.1
mdurl                    0.1.2
mpmath                   1.3.0
multidict                6.1.0
mypy-extensions          1.0.0
networkx                 3.4.2
numpy                    2.2.3
ollama                   0.4.7
orjson                   3.10.12
packaging                24.2
pillow                   11.1.0
pip                      25.0.1
propcache                0.2.1
pydantic                 2.10.3
pydantic_core            2.27.1
pydantic-settings        2.7.0
Pygments                 2.18.0
PyMuPDF                  1.25.1
python-dotenv            1.0.1
PyYAML                   6.0.2
regex                    2024.11.6
requests                 2.32.3
requests-toolbelt        1.0.0
rich                     13.9.4
safetensors              0.5.2
scikit-learn             1.6.1
scipy                    1.15.1
sentence-transformers    3.4.1
setuptools               75.8.0
sniffio                  1.3.1
soupsieve                2.6
SQLAlchemy               2.0.36
sympy                    1.13.1
tenacity                 9.0.0
threadpoolctl            3.5.0
tiktoken                 0.8.0
tokenizers               0.21.0
torch                    2.6.0
tqdm                     4.67.1
transformers             4.48.3
typing_extensions        4.12.2
typing-inspect           0.9.0
urllib3                  2.2.3
yarl                     1.18.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant