Skip to content

Attention Processor Implementations for SDXL #4

@dibbla

Description

@dibbla

Hi!

I am working on migrating this great work to SDXL, starting with AFA. However, I found neither direct cascade nor AFA work.

I am using the Clip-ViT-big-G (project to 1280) and Clip-ViT-Large (project to 768) for image embedding and concatenate them as 768+1280=text embedding size. That would be considered as a token. And I parse them to the attention processors. Inside the attention, the image embedding is repeated to 77 to match text features.

However, even when using the direct concate method, I still find it no working. Any suggestion?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions