Data and Model Versioning

Git needs to be installed and initialized for the local repository in order to data versioning using dvc to work. dvc meta files are used as pointers to data and model files. Meta files are committed and push via git, while model and data files are pushed via dvc.

The following dvc commands are used to add and push data files in the ./data folder to the remote repository in Q Network drive for the testapp application. DS1 is supposed to be a folder that Data Scientist 1 has access to:

C:\new\ibi\apps\testapp>dvc init
C:\new\ibi\apps\testapp>dvc remote add test_app_remote “Q:\UITS\UITS All Staff\DS1\testapp”
C:\new\ibi\apps\testapp>dvc config core.remote test_app_remote
C:\new\ibi\apps\testapp>dvc add data
C:\new\ibi\apps\testapp>dvc push

Files that are pushed using dvc include many, such as

  • Python models: *.p, *.pkl, *.h5
  • R models: *.rds
  • RStat models: *.c
  • Data: *.csv, *.xls, *.xlsx