Architecting Synthetic Data Pipelines for Enterprise AI

Hardware sensors fail manufacturers in edge cases but AI systems can’t step in without sufficient training data. However edge cases also present hurdles to creating effective training sets. This is where synthetic data can help.

I was the Lead Designer on the team incubating Synthetic Data at Microsoft.

Impacts:

  • Co-designed high-quality synthetic data pipeline which was used by 5 B2B customers and drove millions of dollars in contracts

  • Created tooling that was used to augment and supplement real-world datasets in defect and equipment status scenarios

  • Reduced time to train and increased model reliability in factory floor computer vision applications

Responsibilities:

  • Collaborated closely with Product Managers and customers to define the core problem and establish focused direction for the project.

  • Worked closely with engineers to map out a robust and efficient pipeline

  • Collaborated with researchers to define detailed personas with clear requirements

  • Mapped ideal user journey to ensure a seamless and intuitive user experience, and to identify potential pain points ahead of testing

  • Tested multiple prototypes to iterate and refine the product

  • Managed UI designers through iterative loops

  • Identified and clarified any misalignments within the team, and working to resolve any issues

  • Worked closely with the launch team to ensure a seamless handoff from development to launch