Skip to content
This repository was archived by the owner on Jan 23, 2026. It is now read-only.
This repository was archived by the owner on Jan 23, 2026. It is now read-only.

return length of the Java embedTokens() method  #2

@kkalouli

Description

@kkalouli

Hi,

first of all, well done for this great work and thanks for making it publicly available!

I have the following problem: I am using the Java version and I want to match each token to its embedding. I am loading the English uncased model and getting the embeddings of 2 strings (str1, str2) with
float[][][] embeddings = bert.embedTokens(str1, str2);.
After that, I can get the embedding corresponding to each sequence/string by

float[][] firstSent = embeddings[0];
float[][] secondSent = embeddings[1];

However, firstSent and secondSent have always a standard length of 127 and not the length of my strings str1 and str2. If I then do firstSent[0], this will have a length of 768 which is the expected size of the embeddings but I don't understand why I am getting 127 as the length of firstSent and secondSent. And since I get this length, I guess that firstSent[0] does NOT correspond to the first token of my first sentence, which is what I would like to get.

Any help is much appreciated! Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions