An attempt at training TinyLlama for text retrieval embeddings.
Mistral Fine Tuned for Embeddings. - Tweet linking to recreation of synthetic data
- Paper training mistral on synthetic data
- dataset
- more dataset
- For that model, original training recipe taken from here https://arxiv.org/pdf/2310.08319.pdf, training code https://github.com/texttron/tevatron/tree/main/examples/repllama
- TinyLlama