return length of the Java embedTokens() method 

Hi,

first of all, well done for this great work and thanks for making it publicly available! 

I have the following problem: I am using the Java version and I want to match each token to its embedding. I am loading the English uncased model and getting the embeddings of 2 strings (str1, str2) with  
```float[][][] embeddings = bert.embedTokens(str1, str2);```. 
After that, I can get the embedding corresponding to each sequence/string by 
```
float[][] firstSent = embeddings[0];
float[][] secondSent = embeddings[1];
```
However, firstSent and secondSent have always a standard length of 127 and not the length of my strings str1 and str2. If I then do ``` firstSent[0]```, this will have a length of 768 which is the expected size of the embeddings but I don't understand why I am getting 127 as the length of firstSent and secondSent. And since I get this length, I guess that firstSent[0] does NOT correspond to the first token of my first sentence, which is what I would like to get.

Any help is much appreciated! Thanks a lot! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

return length of the Java embedTokens() method #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

return length of the Java embedTokens() method #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions