Aesthetic 1girl meets Chroma (LoRa)
Example gens:
Link: https://huggingface.co/Ainonake/ChromaAestheticAnimeV5
I started experimenting with fine-tuning Chroma, and the results have generally exceeded my expectations at this stage. So, I decided to share the LoRA and my observations.
Without LoRA, Chroma is very difficult to use for 2D—the images are highly unstable in style and quality (at least without complex prompting). The anatomy is almost always terrible, and so far, the aesthetic 11 tags don’t help much.
With LoRA, the situation improved significantly. The model grasped the style fairly easily (though it didn’t quite reach the quality of the images in the dataset). Even though I tried training the style, the hands and overall anatomy improved A LOT—though they’re still bad, and failed generations happen often.
I accidentally mixed up the aesthetic 10 tag with 'aesthetic10'—I’m not sure if that’s good or bad. So, I’m providing two prompting options for the model—one without aesthetic 10 and 11, and one with them (examples are in the repository).
Training the LoRA took 13 hours on a 3090 + about 5 hours on failed runs.
There’s still a lot to improve in the dataset—variability, quantity, resolution, watermarks, aspect ratios, etc. So, I have high hopes for improvements in Chroma itself and the LoRA. I want to believe that once everything is polished, it'll be possible to get rid of the constant anatomy issues and make the images much more pleasant.
Overall, Chroma + this LoRa are better than Illustrious in terms of not-overcookedness and variety, but anatomy and aesthetics are worse.
Now, I’m eagerly awaiting Chroma updates and post-training in hopes that the anatomy issues will be fixed. From time to time, I’ll also retrain my LoRA.
Are you sure using "aesthetic #" is a good idea at all when prompting for any 2D artwork? What I've heard is that aesthetic 0-10 are all photograph based taggings and 11 is for AI-produced imagery.
I haven't done any anime prompting in Flux/Chroma since Illustrious is so good so I haven't experimented with a tagging setup, but I generally find "aesthetic 7, aesthetic 6, aesthetic 5, aesthetic 4, aesthetic 3, aesthetic 2, aesthetic 1" to be good to have in negatives.
For photos I use "aesthetic 10, photo_medium" for both beginning and end of the prompt (same order on both endpoints), so surely using photoshop_medium, painting_medium, or otherwise should improve the result? I tend to find images to bleed over to 2D style regardless if it contains enough "booru" style prompting and tagging that isn't part of natural language that Flux was trained on.
13 hours for a lora is a long time, even if you subtract 5 for failed runs. How many images are in your training set and how many steps did you run?
I used 100 captioned images, ran about 2000 steps with LR 0.000075, and I got a really good result. Any longer than this and it got overcooked and started to lose flexibility. On a rented 4090 this took less than 2 hours.
This was for a style lora, similar to yours though more of a western style animation.
@mweldon final run was with 30 very high quality images with captions and low RL (0.0001). Lora rank 128, 3500 steps. I was testing lora every 250 steps for overcookednes and quality.
Lower rank results were not great.
What did you use to train? I've got plenty of characters I want to try and make LoRAs of but no clue what tools to do it with. Information is sparse, lol.
I have Kohya_SS, but it doesn't support Chroma (as far as I can see). I have ai-toolkit, but it wants a folder with a full "diffusion pytorch model". What tool lets me load up "chroma-unlocked-v48-detail-calibrated.safetensors" and train something on it?

