Towards Ultradense Contextual Embeddings by Distribution-Based Orthogonal Transformation

Re-implementation for my bachelor thesis project (thesis defense in April 2022).

Project Description

Embeddings are widely used in natural language processing tasks. But one concern is that embeddings from existing language models are dense and high-dimensional, which is difficult for people to interpret. In this work, we propose a distribution-based method to identify ultradense subspace from contextualized embedding space.

Demonstration

We use pairs of words to define two different categories, e.g. female and male, and we extract bert embeddings for these pairs with the help of some corpora.

The embeddings form representation space for two categories. We multiply them with an orthogonal matrix Q and then take the first (or first several) dimensions. We view these dimensiosn as normal distribution and maxmize their divergence by Wasserstein distance. We optimize $Q$ to maximize the distance, so that only the first dimension contains the categorty (gender) information.

Results

For evaluation, we prepare a list of words, e.g. professions. As in the first demostration, we extract bert embeddings with the help of the same corpora. We can already compute the cosine similarities of the word to woman and man, respectively. Then, we do the transformation with the optimized orthogonal matrix $Q$ and get the complement space (other than the first dimension) of transformed embeddings. We compute the cosine similarities of the word to woman and man in this complement space.

Here we present part of our results:

We find that the absolute diffences of similarities to woman and man decrease after the transformation. This means, the embeddings in the complement space contain less gender information than before.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
debiasing_bert		debiasing_bert
images		images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Towards Ultradense Contextual Embeddings by Distribution-Based Orthogonal Transformation

Project Description

Demonstration

Results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

zhangyaqi20/debiasing_bert

Folders and files

Latest commit

History

Repository files navigation

Towards Ultradense Contextual Embeddings by Distribution-Based Orthogonal Transformation

Project Description

Demonstration

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages