Main #8

clementchadebec · 2025-04-03T15:23:25Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

fix image path in para attention docs

* uv * feedback

vars mixed-up

* Add IP-Adapter example to Flux docs * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

We already set the unet to requires grad false at line 506 Co-authored-by: Aryan <aryan@huggingface.co>

…ggingface#10631) * feat: add a lora extraction script. * updates

* add pipeline_stable_diffusion_xl_attentive_eraser * add pipeline_stable_diffusion_xl_attentive_eraser_make_style * make style and add example output * update Docs Co-authored-by: Other Contributor <a457435687@126.com> * add Oral Co-authored-by: Other Contributor <a457435687@126.com> * update_review Co-authored-by: Other Contributor <a457435687@126.com> * update_review_ms Co-authored-by: Other Contributor <a457435687@126.com> --------- Co-authored-by: Other Contributor <a457435687@126.com>

* NPU Adaption for Sanna --------- Co-authored-by: J石页 <jiangshuo9@h-partners.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Sigmoid scheduler in scheduling_ddpm.py docs

* create a script to train vae * update main.py * update train_autoencoderkl.py * update train_autoencoderkl.py * add a check of --pretrained_model_name_or_path and --model_config_name_or_path * remove the comment, remove diffusers in requiremnets.txt, add validation_image ote * update autoencoderkl.py * quality --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

) * add community pipeline for semantic guidance for flux * fix imports in community pipeline for semantic guidance for flux * Update examples/community/pipeline_flux_semantic_guidance.py Co-authored-by: hlky <hlky@hlky.ac> * fix community pipeline for semantic guidance for flux --------- Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com> Co-authored-by: hlky <hlky@hlky.ac>

…nputs (huggingface#10666)

* [training] Convert to ImageFolder script * make

huggingface#10663) controlnet union XL, make control_image immutible when this argument is passed a list, __call__ modifies its content, since it is pass by reference the list passed by the caller gets its content modified unexpectedly make a copy at method intro so this does not happen Co-authored-by: Teriks <Teriks@users.noreply.github.com>

Co-authored-by: Giuseppe Catalano <giuseppelorenzo.catalano@unito.it>

* start pyramid attention broadcast * add coauthor Co-Authored-By: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com> * update * make style * update * make style * add docs * add tests * update * Update docs/source/en/api/pipelines/cogvideox.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/api/pipelines/cogvideox.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Pyramid Attention Broadcast rewrite + introduce hooks (huggingface#9826) * rewrite implementation with hooks * make style * update * merge pyramid-attention-rewrite-2 * make style * remove changes from latte transformer * revert docs changes * better debug message * add todos for future * update tests * make style * cleanup * fix * improve log message; fix latte test * refactor * update * update * update * revert changes to tests * update docs * update tests * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update * fix flux test * reorder * refactor * make fix-copies * update docs * fixes * more fixes * make style * update tests * update code example * make fix-copies * refactor based on reviews * use maybe_free_model_hooks * CacheMixin * make style * update * add current_timestep property; update docs * make fix-copies * update * improve tests * try circular import fix * apply suggestions from review * address review comments * Apply suggestions from code review * refactor hook implementation * add test suite for hooks * PAB Refactor (huggingface#10667) * update * update * update --------- Co-authored-by: DN6 <dhruv.nair@gmail.com> * update * fix remove hook behaviour --------- Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: DN6 <dhruv.nair@gmail.com>

…de (huggingface#10600) * fix: refer to use_framewise_encoding on AutoencoderKLHunyuanVideo._encode * fix: comment about tile_sample_min_num_frames --------- Co-authored-by: Aryan <aryan@huggingface.co>

* update * remove unused fn * apply suggestions based on review * update + cleanup 🧹 * more cleanup 🧹 * make fix-copies * update test

…_max_memory` (huggingface#10669) * conditionally check if compute capability is met. * log info. * fix condition. * updates * updates * updates * updates

…ity pipelines in float16 mode (huggingface#10670) Fix pipeline dtype unexpected change when using SDXL reference community pipelines

update llamatokenizer in hunyuanvideo tests

…10552) * support StableDiffusionAdapterPipeline.from_single_file * make style --------- Co-authored-by: Teriks <Teriks@users.noreply.github.com> Co-authored-by: hlky <hlky@hlky.ac>

…#10684)

fix_shape_error

* fix enable memory efficient attention on ROCm while calling CK implementation * Update attention_processor.py refactor of picking a set element

) * Update train_instruct_pix2pix.py Fix inconsistent random transform in instruct_pix2pix * Update train_instruct_pix2pix_sdxl.py

…ity_for_timestep_sampling (huggingface#10699) * feat(training-utils): support device and dtype params in compute_density_for_timestep_sampling * chore: update type hint * refactor: use union for type hint --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update * update * update * update

…ad_lora_adapter` in PeftAdapterMixin class (huggingface#11155) set self._hf_peft_config_loaded to True on successful lora load Sets the `_hf_peft_config_loaded` flag if a LoRA is successfully loaded in `load_lora_adapter`. Fixes bug huggingface/issues/11148 Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* WanI2V encode_image

* update * update

…tions (huggingface#11139)

…face#11167) * update * raise warning and round to nearest multiple of scale factor

)

…ine` (huggingface#11034)

…already set (huggingface#10918) * Bug fix in ltx * Assume packed latents. --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>

no cuda only

* update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update

…XPU (huggingface#11191) Signed-off-by: YAO Matrix <matrix.yao@intel.com>

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

…face#11189) fix: optional componentes verification on load

@ic-synth

* rewrite memory count without implicitly using dimensions by @ic-synth * replace F.pad by built-in padding in Conv3D * in-place sums to reduce memory allocations * fixed trailing whitespace * file reformatted * in-place sums * simpler in-place expressions * removed in-place sum, may affect backward propagation logic * removed in-place sum, may affect backward propagation logic * removed in-place sum, may affect backward propagation logic * reverted change

…e dtype (huggingface#10301) * allow models to run with a user-provided dtype map instead of a single dtype * make style * Add warning, change `_` to `default` * make style * add test * handle shared tensors * remove warning --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

…huggingface#11197) * add xpu part * fix more cases * remove some cases * no canny * format fix

…ization=False in test (huggingface#11196)

) * Fix enable_sequential_cpu_offload in CogView4Pipeline * make fix-copies

…uggingface#11192)

added onnxruntime-vitisai for custom build onnxruntime pkg

* update * update * update

…uggingface#11188) * feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline for Image SR. * added pipeline

…ngface#7615) * model card gen code * push modelcard creation * remove optional from params * add import * add use_dora check * correct lora var use in tags * make style && make quality --------- Co-authored-by: Aryan <aryan@huggingface.co> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Change LoRA Loader to StableDiffusion Replace the SDXL LoRA Loader Mixin inheritance with the StableDiffusion one

* Initial LTX 2.0 transformer implementation * Add tests for LTX 2 transformer model * Get LTX 2 transformer tests working * Rename LTX 2 compile test class to have LTX2 * Remove RoPE debug print statements * Get LTX 2 transformer compile tests passing * Fix LTX 2 transformer shape errors * Initial script to convert LTX 2 transformer to diffusers * Add more LTX 2 transformer audio arguments * Allow LTX 2 transformer to be loaded from local path for conversion * Improve dummy inputs and add test for LTX 2 transformer consistency * Fix LTX 2 transformer bugs so consistency test passes * Initial implementation of LTX 2.0 video VAE * Explicitly specify temporal and spatial VAE scale factors when converting * Add initial LTX 2.0 video VAE tests * Add initial LTX 2.0 video VAE tests (part 2) * Get diffusers implementation on par with official LTX 2.0 video VAE implementation * Initial LTX 2.0 vocoder implementation * Use RMSNorm implementation closer to original for LTX 2.0 video VAE * start audio decoder. * init registration. * up * simplify and clean up * up * Initial LTX 2.0 text encoder implementation * Rough initial LTX 2.0 pipeline implementation * up * up * up * up * Add imports for LTX 2.0 Audio VAE * Conversion script for LTX 2.0 Audio VAE Decoder * Add Audio VAE logic to T2V pipeline * Duplicate scheduler for audio latents * Support num_videos_per_prompt for prompt embeddings * LTX 2.0 scheduler and full pipeline conversion * Add script to test full LTX2Pipeline T2V inference * Fix pipeline return bugs * Add LTX 2 text encoder and vocoder to ltx2 subdirectory __init__ * Fix more bugs in LTX2Pipeline.__call__ * Improve CPU offload support * Fix pipeline audio VAE decoding dtype bug * Fix video shape error in full pipeline test script * Get LTX 2 T2V pipeline to produce reasonable outputs * Make LTX 2.0 scheduler more consistent with original code * Fix typo when applying scheduler fix in T2V inference script * Refactor Audio VAE to be simpler and remove helpers (#7) * remove resolve causality axes stuff. * remove a bunch of helpers. * remove adjust output shape helper. * remove the use of audiolatentshape. * move normalization and patchify out of pipeline. * fix * up * up * Remove unpatchify and patchify ops before audio latents denormalization (#9) --------- Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Add support for I2V (#8) * start i2v. * up * up * up * up * up * remove uniform strategy code. * remove unneeded code. * Denormalize audio latents in I2V pipeline (analogous to T2V change) (#11) * test i2v. * Move Video and Audio Text Encoder Connectors to Transformer (huggingface#12) * Denormalize audio latents in I2V pipeline (analogous to T2V change) * Initial refactor to put video and audio text encoder connectors in transformer * Get LTX 2 transformer tests working after connector refactor * precompute run_connectors,. * fixes * Address review comments * Calculate RoPE double precisions freqs using torch instead of np * Further simplify LTX 2 RoPE freq calc * Make connectors a separate module (huggingface#18) * remove text_encoder.py * address yiyi's comments. * up * up * up * up --------- Co-authored-by: sayakpaul <spsayakpaul@gmail.com> * up (huggingface#19) * address initial feedback from lightricks team (huggingface#16) * cross_attn_timestep_scale_multiplier to 1000 * implement split rope type. * up * propagate rope_type to rope embed classes as well. * up * When using split RoPE, make sure that the output dtype is same as input dtype * Fix apply split RoPE shape error when reshaping x to 4D * Add export_utils file for exporting LTX 2.0 videos with audio * Tests for T2V and I2V (#6) * add ltx2 pipeline tests. * up * up * up * up * remove content * style * Denormalize audio latents in I2V pipeline (analogous to T2V change) * Initial refactor to put video and audio text encoder connectors in transformer * Get LTX 2 transformer tests working after connector refactor * up * up * i2v tests. * up * Address review comments * Calculate RoPE double precisions freqs using torch instead of np * Further simplify LTX 2 RoPE freq calc * revert unneded changes. * up * up * update to split style rope. * up --------- Co-authored-by: Daniel Gu <dgu8957@gmail.com> * up * use export util funcs. * Point original checkpoint to LTX 2.0 official checkpoint * Allow the I2V pipeline to accept image URLs * make style and make quality * remove function map. * remove args. * update docs. * update doc entries. * disable ltx2_consistency test * Simplify LTX 2 RoPE forward by removing coords is None logic * make style and make quality * Support LTX 2.0 audio VAE encoder * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Remove print statement in audio VAE * up * Fix bug when calculating audio RoPE coords * Ltx 2 latent upsample pipeline (huggingface#12922) * Initial implementation of LTX 2.0 latent upsampling pipeline * Add new LTX 2.0 spatial latent upsampler logic * Add test script for LTX 2.0 latent upsampling * Add option to enable VAE tiling in upsampling test script * Get latent upsampler working with video latents * Fix typo in BlurDownsample * Add latent upsample pipeline docstring and example * Remove deprecated pipeline VAE slicing/tiling methods * make style and make quality * When returning latents, return unpacked and denormalized latents for T2V and I2V * Add model_cpu_offload_seq for latent upsampling pipeline --------- Co-authored-by: Daniel Gu <dgu8957@gmail.com> * Fix latent upsampler filename in LTX 2 conversion script * Add latent upsample pipeline to LTX 2 docs * Add dummy objects for LTX 2 latent upsample pipeline * Set default FPS to official LTX 2 ckpt default of 24.0 * Set default CFG scale to official LTX 2 ckpt default of 4.0 * Update LTX 2 pipeline example docstrings * make style and make quality * Remove LTX 2 test scripts * Fix LTX 2 upsample pipeline example docstring * Add logic to convert and save a LTX 2 upsampling pipeline * Document LTX2VideoTransformer3DModel forward pass --------- Co-authored-by: sayakpaul <spsayakpaul@gmail.com>

sayakpaul and others added 30 commits January 23, 2025 08:22

[docs] fix image path in para attention docs (huggingface#10632)

d77c53b

fix image path in para attention docs

[docs] uv installation (huggingface#10622)

5483162

* uv * feedback

width and height are mixed-up (huggingface#10629)

9684c52

vars mixed-up

Add IP-Adapter example to Flux docs (huggingface#10633)

37c9697

* Add IP-Adapter example to Flux docs * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

removing redundant requires_grad = False (huggingface#10628)

a451c0e

We already set the unet to requires grad false at line 506 Co-authored-by: Aryan <aryan@huggingface.co>

[chore] add a script to extract loras from full fine-tuned models (hu…

5897137

…ggingface#10631) * feat: add a lora extraction script. * updates

NPU Adaption for Sanna (huggingface#10409)

07860f9

* NPU Adaption for Sanna --------- Co-authored-by: J石页 <jiangshuo9@h-partners.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Add sigmoid scheduler in scheduling_ddpm.py docs (huggingface#10648)

4f3ec53

Sigmoid scheduler in scheduling_ddpm.py docs

ControlNet Union controlnet_conditioning_scale for multiple control i…

18f7d1d

…nputs (huggingface#10666)

[training] Convert to ImageFolder script (huggingface#10664)

4157177

* [training] Convert to ImageFolder script * make

Add provider_options to OnnxRuntimeModel (huggingface#10661)

158c5c4

fix check_inputs func in LuminaText2ImgPipeline (huggingface#10651)

8ceec90

Revert RePaint scheduler 'fix' (huggingface#10644)

fb42066

Co-authored-by: Giuseppe Catalano <giuseppelorenzo.catalano@unito.it>

[fix] refer use_framewise_encoding on AutoencoderKLHunyuanVideo._enco…

f295e2e

…de (huggingface#10600) * fix: refer to use_framewise_encoding on AutoencoderKLHunyuanVideo._encode * fix: comment about tile_sample_min_num_frames --------- Co-authored-by: Aryan <aryan@huggingface.co>

Refactor gradient checkpointing (huggingface#10611)

c4d4ac2

* update * remove unused fn * apply suggestions based on review * update + cleanup 🧹 * more cleanup 🧹 * make fix-copies * update test

[Tests] conditionally check `fp8_e4m3_bf16_max_memory < fp8_e4m3_fp32…

7b100ce

…_max_memory` (huggingface#10669) * conditionally check if compute capability is met. * log info. * fix condition. * updates * updates * updates * updates

Fix pipeline dtype unexpected change when using SDXL reference commun…

196aef5

…ity pipelines in float16 mode (huggingface#10670) Fix pipeline dtype unexpected change when using SDXL reference community pipelines

[tests] update llamatokenizer in hunyuanvideo tests (huggingface#10681)

e6037e8

update llamatokenizer in hunyuanvideo tests

support StableDiffusionAdapterPipeline.from_single_file (huggingface#…

33f9361

…10552) * support StableDiffusionAdapterPipeline.from_single_file * make style --------- Co-authored-by: Teriks <Teriks@users.noreply.github.com> Co-authored-by: hlky <hlky@hlky.ac>

fix(hunyuan-video): typo in height and width input check (huggingface…

ea76880

…#10684)

[FIX] check_inputs function in Auraflow Pipeline (huggingface#10678)

aad69ac

fix_shape_error

Fix enable memory efficient attention on ROCm (huggingface#10564)

1ae9b05

* fix enable memory efficient attention on ROCm while calling CK implementation * Update attention_processor.py refactor of picking a set element

Fix inconsistent random transform in instruct pix2pix (huggingface#10698

5d2d239

) * Update train_instruct_pix2pix.py Fix inconsistent random transform in instruct_pix2pix * Update train_instruct_pix2pix_sdxl.py

Fixed grammar in "write_own_pipeline" readme (huggingface#10706)

537891e

lawrence-cj and others added 28 commits March 25, 2025 08:47

add a timestep scale for sana-sprint teacher model (huggingface#11150)

739d6ec

[Quantization] dtype fix for GGUF + fix BnB tests (huggingface#11159)

7dc52ea

* update * update * update * update

WanI2V encode_image (huggingface#11164)

5d970a4

* WanI2V encode_image

[Docs] Update Wan Docs with memory optimizations (huggingface#11089)

617c208

* update * update

Fix LatteTransformer3DModel dtype mismatch with enable_temporal_atten…

75d7e5c

…tions (huggingface#11139)

Raise warning and round down if Wan num_frames is not 4k + 1 (hugging…

2c59af7

…face#11167) * update * raise warning and round to nearest multiple of scale factor

[Docs] Fix environment variables in installation.md (huggingface#11179

eb50def

)

Add latents_mean and latents_std to `SDXLLongPromptWeightingPipel…

d6f4774

…ine` (huggingface#11034)

Bug fix in LTXImageToVideoPipeline.prepare_latents() when latents is …

e8fc8b1

…already set (huggingface#10918) * Bug fix in ltx * Assume packed latents. --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>

[tests] no hard-coded cuda (huggingface#11186)

5a6edac

no cuda only

[WIP] Add Wan Video2Video (huggingface#11053)

df1d7b0

* update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update

map BACKEND_RESET_MAX_MEMORY_ALLOCATED to reset_peak_memory_stats on …

a7f07c1

…XPU (huggingface#11191) Signed-off-by: YAO Matrix <matrix.yao@intel.com>

fix autocast (huggingface#11190)

4d5a96e

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix: for checking mandatory and optional pipeline components (hugging…

be0b7f5

…face#11189) fix: optional componentes verification on load

[tests] HunyuanDiTControlNetPipeline inference precision issue on XPU (…

52b460f

…huggingface#11197) * add xpu part * fix more cases * remove some cases * no canny * format fix

Revert save_model in ModelMixin save_pretrained and use safe_serial…

da857be

…ization=False in test (huggingface#11196)

[docs] torch_dtype map (huggingface#11194)

e5c6027

Fix enable_sequential_cpu_offload in CogView4Pipeline (huggingface#11195

54dac3a

) * Fix enable_sequential_cpu_offload in CogView4Pipeline * make fix-copies

SchedulerMixin from_pretrained and ConfigMixin Self type annotation (h…

78c2fdc

…uggingface#11192)

Update import_utils.py (huggingface#10329)

b0ff822

added onnxruntime-vitisai for custom build onnxruntime pkg

Add CacheMixin to Wan and LTX Transformers (huggingface#11187)

c97b709

* update * update * update

feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline (h…

c4646a3

…uggingface#11188) * feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline for Image SR. * added pipeline

Change KolorsPipeline LoRA Loader to StableDiffusion (huggingface#11198)

480510a

Change LoRA Loader to StableDiffusion Replace the SDXL LoRA Loader Mixin inheritance with the StableDiffusion one

Merge branch 'huggingface:main' into main

bbdabab

clementchadebec merged commit e82eefd into clipdrop-main Apr 29, 2025
7 of 24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Main #8

Main #8

Uh oh!

clementchadebec commented Apr 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Main #8

Main #8

Uh oh!

Conversation

clementchadebec commented Apr 3, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants