Image model fine-tuning

Fine-tune FLUX and Stable Diffusion XL on your own images.

Soramai fine-tunes LoRA adapters for FLUX and SDXL on your image and caption pairs. Style LoRAs, character LoRAs, product LoRAs — all on managed GPU pods with autoscaling inference.

Supported base models

The catalog is curated for image quality and LoRA stability. New checkpoints are added once they pass internal evaluations.

FLUX.1 dev

High quality, 12B params, default for product photography and characters.

FLUX.1 schnell

Faster, lower cost. Good for style and concept LoRAs.

Stable Diffusion XL

Battle-tested base. Strong ecosystem and refiner support.

SDXL Turbo

Low-step inference. Useful when latency matters more than fidelity.

What you get

Every image fine-tuning run is wired to dataset validation, sample previews, retry, and one-click deployment.

Drop a ZIP

Upload a ZIP of paired image and caption files. Soramai validates pairing, resolution, and aspect ratios before the run starts.

Caption assist

If you skip captions, Soramai can auto-caption the dataset using a vision model. Captions stay editable before fine-tuning.

LoRA out of the box

PEFT LoRA on the U-Net (and optionally text encoders). Rank, alpha, and resolution buckets are configurable per job.

Live samples

Configurable sample prompts run at fixed step intervals. Watch the style converge in the dashboard while the job is still running.

FLUX or SDXL endpoints

Promoted adapters run on serverless inference endpoints with autoscaling. Generate from the playground or the Deploy API.

Per-second billing

Fine-tuning is billed per second. There are no minimums and no charges while the endpoint is idle.

How a run flows

From upload to deployed endpoint in four steps. No GPU setup, no Diffusers boilerplate.

  1. 01

    Prepare your images

    Aim for 15 – 60 images for a style LoRA, 8 – 20 for a character LoRA. PNG or JPG, 512×512 or larger. Captions are optional.

  2. 02

    Upload

    Drop a ZIP into the dashboard. Soramai validates pairing, resolution, and aspect ratios before queueing the job.

  3. 03

    Pick a base and start

    Choose FLUX or SDXL, set steps and learning rate (or take the defaults), and confirm the cost estimate.

  4. 04

    Generate from the playground

    Watch sample images appear at fixed step intervals. Promote the adapter to a live endpoint when the style is right.

Dataset layout

A simple ZIP convention. One image, one optional caption file with the same base name.

dataset.zip
├── 01.png
├── 01.txt        # caption (optional)
├── 02.png
├── 02.txt
├── 03.jpg
└── 03.txt

Captions are plain text. Soramai supports trigger tokens — see the dataset reference.