Home Artificial Intelligence Introduction to Data Version Control

Introduction to Data Version Control

2
Introduction to Data Version Control

PYTHON | DATA | PROGRAMMING

A step-by-step guide to implementing your individual DVC in Python using Hangar

Photo by Florian Olivo on Unsplash

Any production-level system requires some type of versioning.

A single source of current truth.

Any resources which might be constantly updated, especially concurrently by multiple users, require some type of an audit trail to maintain track of all changes.

In software engineering, the answer to that is Git.

If you could have written code in your life, you then are probably acquainted with the wonder that’s Git.

Git allows us to commit changes, create different branches from a source, and merge back our branches, to the unique to call a couple of.

DVC is only the identical paradigm but for datasets. See, live data systems are constantly ingesting newer data points while different users perform different experiments on the identical datasets.

This results in multiple versions of the identical dataset, which is unquestionably not a single source of truth.

Moreover, in a machine learning environment, we might even have several versions of the identical ‘model’ trained on different versions of the identical dataset (as an illustration, model re-training to incorporate newer data points).

If not properly audited and versioned, this may create a tangled web of datasets and experiments. We definitely don’t want that!

DVC is, subsequently, a system that involves tracking our datasets by registering changes on a specific dataset. There are multiple DVC solutions each free and paid.

I recently discovered Hangar, a completely open-source Python DVC package. Let’s have a have a look at what it could possibly do, lets?

The hangar package is a pure Python implementation and is accessible through pip.

Its core functionality can also be closely developed to git, which greatly helps the training curve.

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here