You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 23, 2026. It is now read-only.
first of all, well done for this great work and thanks for making it publicly available!
I have the following problem: I am using the Java version and I want to match each token to its embedding. I am loading the English uncased model and getting the embeddings of 2 strings (str1, str2) with float[][][] embeddings = bert.embedTokens(str1, str2);.
After that, I can get the embedding corresponding to each sequence/string by
However, firstSent and secondSent have always a standard length of 127 and not the length of my strings str1 and str2. If I then do firstSent[0], this will have a length of 768 which is the expected size of the embeddings but I don't understand why I am getting 127 as the length of firstSent and secondSent. And since I get this length, I guess that firstSent[0] does NOT correspond to the first token of my first sentence, which is what I would like to get.