Implement attention masking in LabelAttentionClassifier cross-attention #64
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses review feedback that
LabelAttentionClassifierwas ignoring the attention mask, allowing padding tokens to influence label embeddings and attention visualizations.Changes
Added
attention_maskparameter toLabelAttentionClassifier.forward()and passed it fromTextEmbedder._get_sentence_embedding()Masked cross-attention computation: Convert mask from (1=real, 0=pad) to boolean format, expand to (B, 1, 1, T) for broadcasting, and pass to
F.scaled_dot_product_attentionasattn_maskMasked attention matrix for explainability: Apply
masked_fill(-inf)before softmax to zero out padding positions in attention weights💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.