| Last week we spoke with Micha from Maersk who taught us about DataOps and how to think about the space as a whole.
DataOps is software engineering Throughout his talk, Micha illuminated the parallels between DataOps as a practice and Software Engineering, and how treating the former as the latter will accelerate your development and lead to more trustworthy data:
Test your data pipelines like you test your code - your pipelines should test quickly, easily and often
Local > dev - your local environment is your first line of defense. You should be able to run your code, pipelines, and tests there before pushing to dev (or prod!)
To unit test, or not to unit test - When should unit tests be used? When should they not? When fundamental components of pipeline code are reused frequently and changed infrequently, they should be unit tested. But with small components of a larger pipeline that are constantly changing due to new data and requirements, you may be better off testing the pipeline as a whole, rather than writing and maintaining unit tests for each component.
|