simple approach is to average the second to last hiden layer of each token producing a single 768 length vector
Proposition how to obtain single vector for the whole sentence
simple approach is to average the second to last hiden layer of each token producing a single 768 length vector
Proposition how to obtain single vector for the whole sentence