Understanding the basics #286
-
First of all many thanks to Georgi for his great work and thanks also to all the people that is contributing. For those of us that are starting on this (at least I managed to build it and test it) I see very usefull to provide some basic ideas about how this works, so more people can understand and help. As far as I understand it, some people related to Meta generated the "weight" files that are being used here. Some questions:
Surely there are many more questions that may arise. If we provide an easy to understand foundation surely more people will join. Thank you to all that share some of their time to help others understand the basics. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 7 replies
-
First is terminology. Weights is not a good word because it's only one class of parameters, there's also biases, embeddings and other stuff. Use word "model", a thing made of parameters and formulas to evaluate them. Parameters of model are distributed separately, it's the big file you download. Formulas can be printed on toiled paper and evaluated with matchsticks, there's nothing special about formulas expressed with python that can't be replicated. Parameters were initialized randomly and then tuned by a process called training. Researchers downloaded whole internet and tasked machine to predict next word given some random excerpt from downloaded corpus. Doing it alone today on such scale is practically impossible. Model architecture is Transformer. Machine with content-addressable memory tape with maximum of 2048 cells and many seeking heads that query that tape. When you say 'it', trained attention heads seek what you mean by 'it' in the memory to make relevant prediction. One 'clock cycle' consumes one text token as input observation, fills one empty memory cell and outputs prediction about next possible words. Generation is performed by roulette-rolling one of predicted words, appending it to the input and doing another cycle to update prediction. Repeat until enough is generated. Everything else like chat with taking turns is a responsibility of UI, for example stopping generation when "You:" is outputted. There's no distinction between user-generated and machine-generated text. |
Beta Was this translation helpful? Give feedback.
-
Also for any newcomers stumbling into AI, it's important to clear up the misguided concept of "evil" or "biased" natural language AI's. That is simply not how it works, but it does sell headlines. I'll quote my post from a fork of this repo where the answer is in the context of a pretrained LLaMA model with Stanford-Alpaca/Alpaca-LoRa fine-tuning, but it really applies to any natural language AI model. Q: Why the "AI" is evil/biased/liar/whatever other anthropomorphic trait A:
Originally posted by @anzz1 in antimatter15#34 (comment) |
Beta Was this translation helpful? Give feedback.
-
Here is a sort of basic idea of how a machine learning model like LLaMA works. You take the input (the text from the prompt), convert it into a bunch of numbers (called "tokens"). These numbers then have a bunch of math done on them. This math is defined in terms of matrices (think of them as a grid of numbers). You can do multiplication, addition etc. and there are specific rules for how to do this math. The "model" in this case "LLaMA" defines how many matrices to use and how to multiply/add/whatever them with the input to get the output. This structure/list of operations is openly available. The actual numbers that go into these matrices, which are then used in the mathematical operations are called the "weights" (but there's also the "biases" which are another set of numbers). Facebook "trained" these weights and biases by using various methods until the output from using the weights were close enough to some expected output. These numbers are what got leaked. These numbers were stored in the "pytorch" format. This repo has a python script that reads these numbers and converts them into a different format that can be read by the code in this repository. The C++ code simply implements all the same math that the Python code was doing. Hope that helps! |
Beta Was this translation helpful? Give feedback.
-
Look at the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
|
Beta Was this translation helpful? Give feedback.
Here is a sort of basic idea of how a machine learning model like LLaMA works. You take the input (the text from the prompt), convert it into a bunch of numbers (called "tokens"). These numbers then have a bunch of math done on them. This math is defined in terms of matrices (think of them as a grid of numbers). You can do multiplication, addition etc. and there are specific rules for how to do this math.
The "model" in this case "LLaMA" defines how many matrices to use and how to multiply/add/whatever them with the input to get the output. This structure/list of operations is openly available. The actual numbers that go into these matrices, which are then used in the mathematical operati…