Efficiently Finetune Mistral Models With Mistral-finetune

Mistral-finetune is a light-weight codebase that enables memory-efficient and performant finetuning of Mistral's models. Based on the LoRA training paradigm, it maximizes resource utilization by freezing most weights and training only an additional 1-2% of weights through low-rank matrix perturbations. For optimal efficiency, it is recommended to use an A100 or H100 GPU; however, for smaller models like the 7B, a single GPU is sufficient. The codebase is optimized for multi-GPU-single-node training setups.

Installation Steps

To get started with Mistral LoRA fine-tuning, follow these steps:

Model Download
- It is recommended to fine-tune one of the official Mistral models which you can download.

Important: Use v3 tokenizer and extend the vocabulary size to 32768 for 8x7B Base V1 and 8x7B Instruct V1 prior to fine-tuning.
Prepare Dataset
- Ensure your training data is in jsonl format. You can build two types of data files:
+ Pretrain Data: Plain text data stored in the "text" key.
+ Instruct Data: Instruction following data.
Verify your dataset using the `./utils/validate_data` script to ensure correct formatting and estimate training time.

Example: Instruction Following

For training a model in instruction following:

Create a data folder and navigate to it.
Load data into a Pandas Dataframe (`pip install pandas pyarrow`).
Modify `example/7B.yaml` to include paths to training and evaluation data.
Verify your training YAML file to ensure the data is correctly formatted and to estimate training time. Use the `./utils/reformat_data.py` script to correct data formatting.

Advanced Use Case: Function Calling

For fine-tuning a model on function calling:

Format data as explained above.
Reformat with `./utils/reformat_data_glaive.py` for function calling.
Validate datasets by setting `data.instruct_data` and `data.eval_instruct_data` in `example/7B.yaml`.

Starting Training

After validating the dataset:

Customize training configuration (`example/7B.yaml`) including parameters like learning rate and weight decay.
Inference: Once the model is trained, use mistral-inference for testing.

Model Extension

Fine-tune compatible mistral models with a v3 tokenizer (vocabulary size of 32768). Extend older versions using a script and train using the new checkpoint.

Remember these 3 key ideas for your startup:

Efficient Resource Utilization: Utilize LoRA's approach of freezing most model weights and training additional low-rank perturbations to minimize resource consumption.
Dataset Preparation: Ensure your datasets are correctly formatted with tools provided. Correct formatting is crucial for effective training and utilizing scripts like `reformat_data.py` can streamline this process.

Model Customization and Extension: Customize your training configuration to suit your specific use case and extend older models to match the new vocabulary size for enhanced performance.

Edworking is the best and smartest decision for SMEs and startups to be more productive. Edworking is a FREE superapp of productivity that includes all you need for work powered by AI in the same superapp, connecting Task Management, Docs, Chat, Videocall, and File Management. Save money today by not paying for Slack, Trello, Dropbox, Zoom, and Notion.

Conclusion

By leveraging these advanced techniques and tools, startups can achieve efficient and effective model training. The provided scripts and configurations help streamline the process, ensuring that even with limited resources, performance is maximized.

Explore more: Mistral-finetune GitHub Repository
For more details, see the original source.