Source: texcla/embeddings.py#L0
build_embedding_weights
build_embedding_weights(word_index, embeddings_index)
Builds an embedding matrix for all words in vocab using embeddings_index
build_fasttext_wiki_embedding_obj
build_fasttext_wiki_embedding_obj(embedding_type)
FastText pre-trained word vectors for 294 languages, with 300 dimensions, trained on Wikipedia. It's recommended to use the same tokenizer for your data that was used to construct the embeddings. It's implemented as 'FasttextWikiTokenizer'. More information: https://fasttext.cc/docs/en/pretrained-vectors.html.
Args:
- embedding_type: A string in the format
fastext.wiki.$LANG_CODE
. e.g.fasttext.wiki.de
orfasttext.wiki.es
Returns:
Object with the URL and filename used later on for downloading the file.
build_fasttext_cc_embedding_obj
build_fasttext_cc_embedding_obj(embedding_type)
FastText pre-trained word vectors for 157 languages, with 300 dimensions, trained on Common Crawl and Wikipedia. Released in 2018, it succeesed the 2017 FastText Wikipedia embeddings. It's recommended to use the same tokenizer for your data that was used to construct the embeddings. This information and more can be find on their Website: https://fasttext.cc/docs/en/crawl-vectors.html.
Args:
- embedding_type: A string in the format
fastext.cc.$LANG_CODE
. e.g.fasttext.cc.de
orfasttext.cc.es
Returns:
Object with the URL and filename used later on for downloading the file.
get_embedding_type
get_embedding_type(embedding_type)
get_embeddings_index
get_embeddings_index(embedding_type="glove.42B.300d", embedding_path=None, \
embedding_dims=None)
Retrieves embeddings index from embedding name or path. Will automatically download and cache as needed.
Args:
- embedding_type: The embedding type to load.
- embedding_path: Path to a local embedding to use instead of the embedding type. Ignores
embedding_type
if specified.
Returns:
The embeddings indexed by word.