kjamistan: Data Security, Validation & Testing

Interested in securing your data science code or pipeline? Want to verify your models against malicious attacks or noise? Want to see how much information is leaked into your model and how to make your data processing pipeline more secure? We can help!

We've developed several libraries for adding noise, duplication and distortion to datasets. These tools alongside data testing best practices (including standard unit testing, property-based testing and integration testing) can help harden your code and ensure your project is ready for release.

Why Test?

Testing machine learning models for things like adversarial input, feature and model extraction as well as privacy attacks are a useful way to determine how secure your model is and how to design the API interface that others will use for this model. We offer the ability to test models in a black-box setting, similar to penetration testing for software.

For data processing pipelines, data validation and testing are essential, especially as enterprises release more data science code and models to the public internet (or your customers). Most software engineering best practices are not implemented by data teams; leaving your product, customers and data open to attack.

Testing your code allows peace of mind for data scientists, developers and product owners. When testing is a normal part of development and release, there is a smaller chance of introducing a bug or even a large issue which can interrupt normal business or deter customers. We help you implement tests so you can have happy employees and worry-free launches.

Our Testing & Validation Services

We can offer several services which might benefit your team in regards to data validation and testing, including:

  • Pen-testing for Machine Learning Models
  • Privacy and de-anonymization / re-identification Attacks
  • Test Framework and Methodology Design Document and Advising
  • Fuzz testing Models & Data Science APIs
  • Developing Data Science Unit Tests
  • Consulting on best implementations for Data Validation
  • Stress and fuzz-testing your Data Science Pipeline or Machine Learning Models
  • Outlining a Data Validation Plan

We can also offer training and workshops for your team regarding data testing best practices. If you have another request that you aren't sure fits, please feel free to reach out and we are happy to discuss options.