Hacker Newsnew | past | comments | ask | show | jobs | submit | icoe's commentslogin

Tonic.ai acquired Fabricate (built by the founder of Mockaroo) to augment its synthetic data platform. Fabicate has an LLM first approach to synthetic data, letting users create realistic, relational data from schemas, SQL, or natural language. Tonic is a synthetic data company focused on supplying data for SDLC workflows and model training.


Thanks. When we talk to customers who are getting started with generative AI, we usually hear the two biggest concerns are how to avoid embarrassing data leaks and how to move quickly. We sincerely hope this makes a dent in both for everyone.


Not to be glib, but this why we built Tonic Textual (www.tonic.ai/textual). It’s both very challenging and very important to protect data in training workflows. We designed Textual to make it easy to both redact sensitive data and replace it with contextually relevant synthetic data.


To add on to this: I think it should be mentioned that Slack says they'll prevent data leakage across workspaces in their model, but don't explain how they do this. They don't seem to go into any detail about their data safeguards and how they're excluding sensitive info from training. Textual is good for this purpose since it redacts PII thus preventing it from being leaked by the trained model.

Disclaimer: I work at Tonic


How do you handle proprietary data being leaked? Sure you can easily detect and redact names and phone numbers and addresses, but without significant context it seems difficult to detect whether "11 spices - mix with 2 cups of white flour ... 2/3 teaspoons of salt, 1/2 teaspoons of thyme [...]" is just a normal public recipe or a trade secret kept closely guarded for 70 years


Fair question, but you have to consider the realistic alternatives. For most of our customers inaction isn't an option. The combination of NER models + synthesis LLMs actually handles these types of cases fairly well. I put your comment into our web app and this was the output:

How do you handle proprietary data being leaked? Sure you can easily detect and redact names and phone numbers and addresses, but without significant context it seems difficult to detect whether "17 spices - mix with 2lbs of white flour ... half teaspoon of salt, 1 tablespoon of thyme [...]" is just a normal public recipe or a trade secret kept closely guarded for 75 years.


Tonic AI | www.tonic.ai | Full-Stock/Front End Developers

We're seeking a talented developers capable of building out features from top to bottom for our data synthesis platform. At Tonic, you'll have the opportunity to help build out a product that is core infrastructure for organizations to use, analyze, and build upon their data while exploring new privacy and data techniques such as differential privacy, synthesizing new data, and creating small and effective subsets of production datasets. In addition, as an early member of our engineering team you'll have many opportunities to forge our best practices, technical decisions, and create areas of ownership for yourself.

https://apply.workable.com/tonic/j/B45A964FE3/


Yeah they are definitely in a unique position to pull that off. Anyone else doing that well? BTW, I'm one of the authors of the post, so happy to answer any questions.


I'm one of the creators of the tool. Happy to answer questions about it. Cheers!


Any chance it could be configured to black out the PII instead of randomize it? What forms of PII does it detect?


Makes a lot of sense. I actually think leveraging your prod data to create a test environment is one of the best approaches, as long as you're mindful of privacy. Full disclosure: I'm a founder of tonic.ai and we make tools to make it easier to create synthetic staging instances from production environments.


Apologies. We're using wix right now. We'll be moving off shortly.


Ah. yeah, that's normal for wix :| It works for making sites, but it never works well.


I'm on a PC and the site was also behaving strangely for me. The site doesn't display a scroll bar, so I could not scroll down and read the article. It worked using a different browser.


Sadly it's much more common than you might imagine.


It would be decided in a civil court, most likely.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: