Key Resources and Configurations for Open Source Large Language Models (LLMs)

Contributed by Xuekai Zhu, Kaiyan Zhang, Jushi Kai, Shixiang Song

We present a comprehensive table outlining the key resources and configurations for Open Source LLMs. We hope this table can help you quickly check whether your accessible resource can support the LLMs party.

Release Date	Model	Affiliation	Size	Source Data Size (Tokens)	Training Tokens	Learning Rate	Batch Size (tokens)	Architecture	Context Length	Vocabulary Size	Tokenizer	Precision	GPU Hours	Infrastructure	Optimizer	Training Layout	Language
2024/03	Grok-1	XAI	314B	-	-	-	-	-	-	131072	grok-1-tokenizer	-	-	-	-	-	multilingual
2024/02	StableLM 2	Stability AI Language Team	1.6B	-	2T	1e−3 (max)	8, 388, 608	decoder-only	4096	100,352	Arcade100k tokenizer	BF16/FP32 （mixed precision ）	92k	512 NVIDIA A100 (40GB HBM2) GPUs	AdamW (0.9/0.95)	FlashAttention-2, ZeRO stage 1	multilingual
2024/02	Gemma	Gemma Team, Google DeepMind	2B / 7B	-	2T / 6T	-	-	decoder-only	8192	256k	Gemini tokenizer	-	-	TPU	-	similar ZeRO-3	English
2024/02	OLMo	Allen Institute for Artificial Intelligence	1B / 7B	3T	2T / 2.46T	4e-4 / 3e-4	~4M(2048 * 2048)	decoder-only	2048	50,280	GPT-NeoX-20B	BF16(mixed precision)		216 NVIDIA A100 GPUs	AdamW	ZeRO optimizer strategy , PyTorch’s FSDP framework	English
2024/01	miniCPM	Modelbest Inc., THUNLP	2B	-	2T (1+1)	1e-2 (max)	～4M	decoder-only	-	122, 753	sentencepiece(BPE)	BF16	-	-	Warmup-Stable-Decay（WSD）(new proposed )	cosine lr-scheduler	English / Chinese
2024/01	DeepSeek	DeepSeek-AI	7B	-	2T	4.2e-4 (0.1 wd)	9,437,184 (2304 * 4096)	decoder-only	4096	102, 400	Byte-level Byte-Pair Encoding (BBPE)	BF16/FP32 （mixed precision ）	-	-	AdamW	Flash attention, ZeRO-1	English
2023/12	phi-2	Microsoft	2.7B	250B	1.4T	-	-	encoder-decoder	2048	-	-	-	336 (14 days)	96 A100 GPUs.	-	-	English
2023/10	Mistral	Mistral AI	7B	-	~ 8T	-	-	transformer-based	8192	32000	-	-	-	-	-	sliding window attention, grouped-query attention	English, code
2023/09	Qwen	Qwen Team, Alibaba Group	1.8B / 7B / 14B	3T	2.2T / 2.4T / 3.0T	3e-4	~ 4M	Decoder-only	2048	152K	Qwen	BF16	-	-	AdamW	Flash Attention, cosine learning rate schedule	multilingual
2023/09	phi-1.5	Microsoft	1.3B	30B	150B	2e−4 (0.1 wd)	4,194,304(2048 * 2048)	encoder-decoder	2048	f	codegen-mono	FP16	192(8 days)	32xA100-40G	Adam	ZeRO-2	English
2023/07	LLaMA-2	Meta	7B / 13B / 34B / 70B / chat	-	2.0T	3e-4 (7B, 13B), 1.5e-4 (34B, 70B)	4M	decoder-only	4096	32k	SentencePiece (BPE)	-	184k (7B)	A100-80GB	AdamW	cosine lr-scheduler, grouped-query attention, Ghost Attention	English
2023/06	phi-1	Microsoft	1.3B	7B	50B	1e-3 (0.1 wd)	2,097,152(1024*2048)	encoder-decoder	2048	-	codegen-mono	FP16	96 (4 days)	8 xA100	Adam	Flash Attention	English
2023/04	Pythia	EleutherAI	14M(not in report) / 70M / 160M / 410M / 1B / 1.4B / 2.8B / 6.9B / 12B	300B	300B / 203B(deduplicated)	1e-3(70M) / 6e-4(160M) / 3e-4(410M, 1B) / 2e-4(1.4B) / 1.6e-4(2.8B) / 1.2e-4(6.9B,12B)	2M	decoder-only	2048	50,276	GPTNeoXTokenizer	FP16	510(70M) / 1,030(160M) / 2,540(410M) / 4,830(1B) / 7,120(1.4B) / 14,240(2.8B) / 33,500(6.9B) / 72,300(12B)	A100-40GB	Adam	ZeRO-1 , cosine lr-scheduler	English
2023/02	LLaMA	Meta	7B / 13B / 33B / 65B	-	1.0T(7B, 13B), 1.4T(33B, 65B)	3e-4 (7B, 13B), 1.5e-4 (33B, 65B)	4M	decoder-only	2048	32k	SentencePiece (BPE)	-	82k (7B)	A100-80GB	AdamW	cosine lr-scheduler	English

"-" indicates not mentioned in their technical reports.

The figure below illustrates the combinations of model sizes and training tokens for LLMs. We can see that most 2B models are pre-trained with approximately 2 trillion tokens.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
README.md		README.md
figure_1.webp		figure_1.webp
key_resource.png		key_resource.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Key Resources and Configurations for Open Source Large Language Models (LLMs)

Reference

About

Releases

Packages

Contributors 3

Xuekai-Zhu/key-configuration-of-llms

Folders and files

Latest commit

History

Repository files navigation

Key Resources and Configurations for Open Source Large Language Models (LLMs)

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages