Main repo can be found here.
- distributed training
- logging with tensorboard, wandb, neptune, alchemy ...
- fp16
- various losses and loss agregating
- initialization with teacher's layers
I initialize my model with [0,2,4,7,9,11] encoder layers of teacher model.
I ran my script for 100 hours on 4x1080TI with RuBERT model as a teacher. Logs can be found here. I distil it on Lenta Russian News dataset.
Then I run classification task on mokoron twitter dataset.
Here are my results:
My models can
Also, probably soon, I will publish my post about my project on medium (in pytorch blog). Here is a draft link. Thanks to Sergey Kolesnikov from catalyst-team for promotion.
Feel free to propose something new for this project.
bin
- bash files for running pipelinesconfigs
- just place configs heredocker
- project Docker files for pure reproducibilitypresets
- datasets, notebooks, etc - all you don't need to push to gitrequirements
- different project python requirements for docker, tests, CI, etcscripts
- data preprocessing scripts, utils, everything likepython scripts/.py
serving
- microservices, etc - productionsrc
- model, experiment, etc - research
git clone https://github.com/PUSSYMIPT/bert-distillation.git
cd bert-distillation
pip install -r requirements/requirements.txt
bin/download_lenta.sh
python scripts/split_dataset.py --small
catalyst-dl run -C configs/config_ru_ranger.yml --verbose --distributed
It will take a lot of time. "Let's go get some drinks"