DataCebo launches enterprise version of popular open source synthetic data library
Reading Time: 2 minutesLong before most of us were thinking about large language models, DataCebo co-founders Kalyan Veeramachaneni and Neha Patki were creating an open source library called Synthetic Data Vault, or SDV for short. The company’s roots go back to 2016 when both were working in the MIT Data to AI Lab. They had a notion that beyond generating text, images and code, you could also create data with generative AI.
For companies, which need to use quality business data in large language models (and for other purposes) but who can’t necessarily use PII to do it, this is an intriguing idea. Today, the company emerged after taking a couple of years to build an enterprise commercial version of SDV, along with $8.5 million in seed funding.
He says that companies have traditionally had to create synthetic data manually, a highly tedious process that’s difficult to scale and prone to error. By putting generative AI to work on the problem, you can simply describe the kind of data you need, the software looks at the characteristics of the actual dataset, and then creates a quality fake set for testing purposes without exposing any sensitive information.
The founders began by creating an open source tooling, one that proved extremely popular and helped them test the various core pieces of the software. ‘We’ve had over a million downloads and a lot of people who are active in our community,’ VP of product Patki said. In fact, they have a Slack channel with over 1,000 people participating.
‘And through that, I think first we get a lot of validation of our core algorithms. We have the confidence that it works, and if there’s a bug or anything our public open source users find them immediately and we’re able to address any issues,’ she said.
The big difference between the open source version and the commercial enterprise one is scale. The enterprise version can handle up to 100 tables, while the open source is designed to handle just a few tables. So far, customers have been building models based on upwards of 20 to 30 tables.
The company currently has 11 employees and plans to hire in the next year to get up to around 20, depending on how the business grows.
The startup’s $8.5 million in seed funding was led by Link Ventures and Zetta Venture Partners, with participation from Uncorrelated Ventures.
Ref: techcrunch
MediaDownloader.net -> Free Online Video Downloader, Download Any Video From YouTube, VK, Vimeo, Twitter, Twitch, Tumblr, Tiktok, Telegram, TED, Streamable, Soundcloud, Snapchat, Share, Rumble, Reddit, PuhuTV, Pinterest, Periscope, Ok.ru, MxTakatak, Mixcloud, Mashable, LinkedIn, Likee, Kwai, Izlesene, Instagram, Imgur, IMDB, Ifunny, Gaana, Flickr, Febspot, Facebook, ESPN, Douyin, Dailymotion, Buzzfeed, BluTV, Blogger, Bitchute, Bilibili, Bandcamp, Akıllı, 9GAG