Implement attention masking in LabelAttentionClassifier cross-attention #64

Copilot · 2026-01-27T09:45:49Z

Addresses review feedback that LabelAttentionClassifier was ignoring the attention mask, allowing padding tokens to influence label embeddings and attention visualizations.

Changes

Added attention_mask parameter to LabelAttentionClassifier.forward() and passed it from TextEmbedder._get_sentence_embedding()
Masked cross-attention computation: Convert mask from (1=real, 0=pad) to boolean format, expand to (B, 1, 1, T) for broadcasting, and pass to F.scaled_dot_product_attention as attn_mask
Masked attention matrix for explainability: Apply masked_fill(-inf) before softmax to zero out padding positions in attention weights

# Prepare mask for scaled_dot_product_attention
attn_mask = None
if attention_mask is not None:
    attn_mask = (attention_mask == 0)  # Convert: 0 (pad) -> True (mask out)
    attn_mask = attn_mask.unsqueeze(1).unsqueeze(2)  # (B, T) -> (B, 1, 1, T)

y = F.scaled_dot_product_attention(q, k, v, attn_mask=attn_mask, ...)

# For attention matrix computation
if attention_mask is not None:
    attention_scores = attention_scores.masked_fill(attn_mask, float('-inf'))
attention_matrix = torch.softmax(attention_scores, dim=-1)

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Initial plan

8e7239d

Copilot AI assigned Copilot and meilame-tayebjee Jan 27, 2026

Copilot AI mentioned this pull request Jan 27, 2026

24 add cross attention labels text #60

Merged

Copilot started work on behalf of meilame-tayebjee January 27, 2026 09:46 View session

Copilot AI and others added 2 commits January 27, 2026 09:50

Apply attention mask in LabelAttentionClassifier cross-attention

48cc6ac

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Fix trailing whitespace in attention matrix computation

95b3969

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] WIP to address feedback on cross attention label masking~~ Implement attention masking in LabelAttentionClassifier cross-attention Jan 27, 2026

Copilot AI requested a review from meilame-tayebjee January 27, 2026 09:52

Copilot finished work on behalf of meilame-tayebjee January 27, 2026 09:52

meilame-tayebjee marked this pull request as ready for review January 27, 2026 09:54

meilame-tayebjee approved these changes Jan 27, 2026

View reviewed changes

meilame-tayebjee merged commit 7a988c3 into 24-add-cross-attention-labels-text Jan 27, 2026

meilame-tayebjee deleted the copilot/sub-pr-60 branch January 27, 2026 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement attention masking in LabelAttentionClassifier cross-attention #64

Implement attention masking in LabelAttentionClassifier cross-attention #64

Uh oh!

Copilot AI commented Jan 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement attention masking in LabelAttentionClassifier cross-attention #64

Implement attention masking in LabelAttentionClassifier cross-attention #64

Uh oh!

Conversation

Copilot AI commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 27, 2026 •

edited

Loading