FACE-PLM

Further Assessing Current Endeavors in PLMs

Installation

Clone the repository

git clone git@github.com:keiserlab/face-plm.git

Navigate to the repository, create virtual conda environment

cd face-plm

conda create -n face_plm python=3.10 -y

conda activate face_plm

pip install -e . 

pip install pytorch-lightning

Getting Started

In order to get started with generating embeddings and training models you need to sign into HuggingFace and WandB.

huggingface-cli login

wandb login

Additionally, you need to change the wandb config file at: training_probe/config/wandb_config/base_wandb.yaml The entity and project need to be updated to properly log to the desired WandB location.

_target_: face_plm.probes.utils.WandbRunConfig
run_name: base_run+name
entity: temp_entity  # CHANGEME
project: temp_project  # CHANGEME

Generating the PLM Embeddings

Setup embedding generation env

bash scripts/setup_embed_env.sh

Generating final layer embeddings for all PLMs

With ESM (requires ESMC/3 access)

bash scripts/get_all_plm_embedding.sh

Without ESM

bash scripts/get_all_plm_embedding_no_esm.sh

Generating all layer embeddings for all Ankh-base

bash scripts/get_all_layer_ankh_embedding.sh

Training Probes

Training a single model (single probe type, single aggeregation type, final layer)

bash scripts/train_single_model.sh CONFIG_NAME

Example config: esmc_600m-agg_mlp

Training multiple models for cross-validation (single probe type, single aggregation, final layer)

bash scripts/train_cross_val_model.sh CONFIG_NAME

Example config: esmc_600m-agg_mlp

Training models on multiple layers (single probe type, single aggregation, all layers)

bash scripts/train_cross_val_model_ankh_multilayer.sh CONFIG_NAME

Example config: ankh_base_layer_specific_0-12

Masked Language Model Fine-tuning

EC 2.7.* Dataset Fine-tuning

bash scripts/finetune_mlm.sh ankh_large_ft_ec27

ADK Dataset Fine-tuning

bash scripts/finetune_mlm.sh ankh_base_ft_kcat

Direct Regression Fine-tuning

bash scripts/train_cross_val_direct_finetune.sh CONFIG_NAME

Example config: ankh_base_ft_kcat

No Torch Linear and Non-linear Probing

bash no_torch_probing.sh OUTPUT_DIR

example output_dir: ./probe_outputs/

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
fine_tuning		fine_tuning
scripts		scripts
src/face_plm		src/face_plm
training_probe		training_probe
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FACE-PLM

Installation

Clone the repository

Navigate to the repository, create virtual conda environment

Getting Started

Generating the PLM Embeddings

Setup embedding generation env

Generating final layer embeddings for all PLMs

Generating all layer embeddings for all Ankh-base

Training Probes

Training a single model (single probe type, single aggeregation type, final layer)

Training multiple models for cross-validation (single probe type, single aggregation, final layer)

Training models on multiple layers (single probe type, single aggregation, all layers)

Masked Language Model Fine-tuning

EC 2.7.* Dataset Fine-tuning

ADK Dataset Fine-tuning

Direct Regression Fine-tuning

No Torch Linear and Non-linear Probing

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

keiserlab/face-plm

Folders and files

Latest commit

History

Repository files navigation

FACE-PLM

Installation

Clone the repository

Navigate to the repository, create virtual conda environment

Getting Started

Generating the PLM Embeddings

Setup embedding generation env

Generating final layer embeddings for all PLMs

Generating all layer embeddings for all Ankh-base

Training Probes

Training a single model (single probe type, single aggeregation type, final layer)

Training multiple models for cross-validation (single probe type, single aggregation, final layer)

Training models on multiple layers (single probe type, single aggregation, all layers)

Masked Language Model Fine-tuning

EC 2.7.* Dataset Fine-tuning

ADK Dataset Fine-tuning

Direct Regression Fine-tuning

No Torch Linear and Non-linear Probing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages