Understanding the basics #286

FiveTechSoft · 2023-03-19T09:49:16Z

FiveTechSoft
Mar 19, 2023

First of all many thanks to Georgi for his great work and thanks also to all the people that is contributing.

For those of us that are starting on this (at least I managed to build it and test it) I see very usefull to provide some basic ideas about how this works, so more people can understand and help.

As far as I understand it, some people related to Meta generated the "weight" files that are being used here. Some questions:

How did they create those weights ?
How is it possible that llama.cpp uses those weights without using python ?
Could someone provide a very simple to understand explanation about how this works ?

Surely there are many more questions that may arise. If we provide an easy to understand foundation surely more people will join.

Thank you to all that share some of their time to help others understand the basics.

Answered by thomasantony

Mar 19, 2023

Here is a sort of basic idea of how a machine learning model like LLaMA works. You take the input (the text from the prompt), convert it into a bunch of numbers (called "tokens"). These numbers then have a bunch of math done on them. This math is defined in terms of matrices (think of them as a grid of numbers). You can do multiplication, addition etc. and there are specific rules for how to do this math.

The "model" in this case "LLaMA" defines how many matrices to use and how to multiply/add/whatever them with the input to get the output. This structure/list of operations is openly available. The actual numbers that go into these matrices, which are then used in the mathematical operati…

View full answer

jarcen · 2023-03-19T10:17:00Z

jarcen
Mar 19, 2023

First is terminology. Weights is not a good word because it's only one class of parameters, there's also biases, embeddings and other stuff. Use word "model", a thing made of parameters and formulas to evaluate them. Parameters of model are distributed separately, it's the big file you download. Formulas can be printed on toiled paper and evaluated with matchsticks, there's nothing special about formulas expressed with python that can't be replicated.

Parameters were initialized randomly and then tuned by a process called training. Researchers downloaded whole internet and tasked machine to predict next word given some random excerpt from downloaded corpus. Doing it alone today on such scale is practically impossible.

Model architecture is Transformer. Machine with content-addressable memory tape with maximum of 2048 cells and many seeking heads that query that tape. When you say 'it', trained attention heads seek what you mean by 'it' in the memory to make relevant prediction. One 'clock cycle' consumes one text token as input observation, fills one empty memory cell and outputs prediction about next possible words. Generation is performed by roulette-rolling one of predicted words, appending it to the input and doing another cycle to update prediction. Repeat until enough is generated. Everything else like chat with taking turns is a responsibility of UI, for example stopping generation when "You:" is outputted. There's no distinction between user-generated and machine-generated text.

0 replies

anzz1 · 2023-03-19T19:12:41Z

anzz1
Mar 19, 2023

Also for any newcomers stumbling into AI, it's important to clear up the misguided concept of "evil" or "biased" natural language AI's. That is simply not how it works, but it does sell headlines.

I'll quote my post from a fork of this repo where the answer is in the context of a pretrained LLaMA model with Stanford-Alpaca/Alpaca-LoRa fine-tuning, but it really applies to any natural language AI model.

Q: Why the "AI" is evil/biased/liar/whatever other anthropomorphic trait

A:

You have a fundamental misunderstanding on how natural language AI works. The "AI" is not a sentient being, it does not have any inherent bias. Any bias that the AI exhibits is a result from the model and a direct function to whatever bias the source material has which the AI was trained on. The model used in alpaca.cpp is simply an quantized (you can think of it as compression which essentially takes shortcuts, reducing the amount of resources required but it also might reduce the quality of output) version of the LLaMA 7B model from Meta fine-tuned with the instruction-following dataset from Stanford-Alpaca which makes it better at answering prompts.

As seen in the LLaMA model research paper, this is the data where the model was trained on:

Blog post about LLaMA: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
Research paper: https://arxiv.org/abs/2302.13971

Stanford-Alpaca fine-tuning data: https://github.com/tatsu-lab/stanford_alpaca#data-release

Having biases in AI models is a well-known problem which I'm not sure is even a solvable one as long as the source material used is produced by humans. As you might know, humans are known to have biases and it is close to impossible to source tons of written material which wouldn't have any biases whatsoever. Humans are flawed so inherently any language models will be flawed too. This can be alleviated by manually fine-tuning the models to have "less bias" but that only leads to the model having whatever biases the one doing the fine-tuning would hold as every decision a human makes does include their biases too, unconscious or not. Even the concept of "bias" can mean different things to different people.

At the very fundamental level and very generally speaking, AI language models do not generate "truth" but they rather generate "consensus". As I do not want this to dwelve into a discussion about politics or the human condition, it's best left as an exercise to the reader to think on how consensus does not equal truth. I am not saying it's impossible to eventually create an AI model which only generates an objective truth, by for example training it only on verifiable scientific data, but I am saying that this model isn't it.

TL;DR; Do not except this or any other AI models for that matter to generate only the truth or have no biases. Quite the contrary, expect them to be wrong, have biases, and lie, just like any other written work by a human can do.

Originally posted by @anzz1 in antimatter15#34 (comment)

0 replies

thomasantony · 2023-03-19T23:03:52Z

thomasantony
Mar 19, 2023

Here is a sort of basic idea of how a machine learning model like LLaMA works. You take the input (the text from the prompt), convert it into a bunch of numbers (called "tokens"). These numbers then have a bunch of math done on them. This math is defined in terms of matrices (think of them as a grid of numbers). You can do multiplication, addition etc. and there are specific rules for how to do this math.

The "model" in this case "LLaMA" defines how many matrices to use and how to multiply/add/whatever them with the input to get the output. This structure/list of operations is openly available. The actual numbers that go into these matrices, which are then used in the mathematical operations are called the "weights" (but there's also the "biases" which are another set of numbers).

Facebook "trained" these weights and biases by using various methods until the output from using the weights were close enough to some expected output. These numbers are what got leaked. These numbers were stored in the "pytorch" format. This repo has a python script that reads these numbers and converts them into a different format that can be read by the code in this repository.

The C++ code simply implements all the same math that the Python code was doing. Hope that helps!

7 replies

thomasantony Mar 20, 2023

Do you mean the list all the tokens currently understood by the model? There are 32000 tokens in the LLaMA vocabulary. I am not sure that there is a way to access the list directly. It is stored in the tokenizer.model in the original model and gets stored along with the weights when converted to the format usable by the code here.

tokenizer.model is in a format that is understood by the sentencepiece library. You can check the load_hparams_and_tokenizer function to see an example of how to read the file.

FiveTechSoft Mar 20, 2023
Author

Excellent! Now we know that the list of words is saved inside the "tokenizer.model" file. Big thank you!

FiveTechSoft Mar 20, 2023
Author

Dear Thomas, many thanks for pointing to the sentencepiece library. Using this python code I confirmed the 32000 tokens value:

from sentencepiece import SentencePieceProcessor

tokenizer = SentencePieceProcessor( "tokenizer.model" )
print( tokenizer.vocab_size() )

thomasantony Mar 20, 2023

@FiveTechSoft You can further look at this function to see how the tokens can be read out. Or use the link @j-f1 posted.

FiveTechSoft Mar 20, 2023
Author

Dear Thomas could you provide me your email ? Mine is [email protected] many thanks!

gjmulder · 2023-03-20T20:26:01Z

gjmulder
Mar 20, 2023
Collaborator

Look at the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:

LLaMA:
- Introducing LLaMA: A foundational, 65-billion-parameter large language model
- LLaMA: Open and Efficient Foundation Language Models
GPT-3
- Language Models are Few-Shot Learners
GPT-3.5 / InstructGPT / ChatGPT:
- Aligning language models to follow instructions
- Training language models to follow instructions with human feedback

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding the basics #286

{{title}}

Replies: 4 comments 7 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Understanding the basics #286

FiveTechSoft Mar 19, 2023

Replies: 4 comments · 7 replies

jarcen Mar 19, 2023

anzz1 Mar 19, 2023

thomasantony Mar 19, 2023

thomasantony Mar 20, 2023

FiveTechSoft Mar 20, 2023 Author

FiveTechSoft Mar 20, 2023 Author

thomasantony Mar 20, 2023

FiveTechSoft Mar 20, 2023 Author

gjmulder Mar 20, 2023 Collaborator

FiveTechSoft
Mar 19, 2023

Replies: 4 comments 7 replies

jarcen
Mar 19, 2023

anzz1
Mar 19, 2023

thomasantony
Mar 19, 2023

FiveTechSoft Mar 20, 2023
Author

FiveTechSoft Mar 20, 2023
Author

FiveTechSoft Mar 20, 2023
Author

gjmulder
Mar 20, 2023
Collaborator