This Space is designed to provide you with an easy way to get started generating synthetic datasets using Spaces compute to host open LLMs. The Space comes with a ready-to-go environment and a series of notebooks showing various examples of generating synthetic datasets. You can read more about the aims of the Space in this blog post.
Currently this Space has notebooks covering the following topics:
A set of notebooks covering the steps for creating a synthetic dataset for fine-tuning a sentence similarity model. These notebooks cover:
To use this Space, you should duplicate it. To ensure your work is saved it's suggested to enable persistent storage for your Space. To start, you may want to use a smaller GPU like the T4 and switch out to a bigger GPU when you want to run larger LLMs or generate more data. Reminder you can preview the notebooks in the Space without running them. You can find the Jupyter Notebooks in the notebooks folder .
This template was created by camenduru and nateraw, with contributions of osanseviero and azzr