1709broderick
HomeHosting CompanyGretel keeps the data track hidden

Gretel keeps the data track hidden

data twist shutterstock Yurchanka Siarhei

(Yurchanka-Siarhei / Shutterstock)

When Alex Watson co-founded the security company Harvest.ai in 2014, using machine learning to identify sensitive data to protect it seemed like a good idea - so good, in fact, that AWS bought the company. It is fast advancing by eight years, and Watson’s latest startup, Gretel.ai, is also using machine learning to stop data security threats, but in a completely different way.

AWS has been trying to launch a security service for years and has not had much luck, so instead it bought San Diego-based Harvest.ai in 2017 for $ 20 million. Watson has become the CEO of Amazon Macie, a security service based on AI Harvest and machine learning technology that helps customers discover, classify and protect their sensitive data in AWS.

It was a great experience for Watson, who started his career in the military before working for private security software companies. “I went from zero to one of the top 25 revenue-generating services for AWS while I was there,” he says. DataName. “We’ve worked with very large companies trying to solve data problems on a large scale.”

The issue focused on personally identifiable information (PII). While the data was incredibly useful for analysis, allowing people to access it involved significant security risks. Building systems that could allow analysts to query this sensitive data while simultaneously meeting security and compliance requirements has proven to be a monumental task for almost any company that was not a technology giant.

Internally, AWS had the luxury of a compliance team of 500 people, who helped prepare the data and remove PII analysts while accessing sensitive data from the repository. That resource provided an incredible advantage for AWS, says Watson, but it was not an advantage shared by customers.

“Even talking to our most sophisticated customers, such as the cloud-born, Airbnb in the world - even those incredible companies are struggling with how to allow access to data inside the walls,” says Watson. “How do you allow teams to query that sensitive customer data?”gretel logo

Synthetic data

That enigma was the genesis for Gretel.ai, which Watson co-founded with Ali Golshan, John Myers and Laszio Block in 2019. If the need to protect PII causes so much pain, then why not just replace the data real ones that include PII with synthetic data that don’t?

Watson explains:

“But if the idea is that you have to wait between three weeks and six months to get access to internal data to test an idea - what if it was a false choice?” he says. “But what if you could have access to an artificial version of this really sensitive data, which didn’t show up to real customers, but gave you 95% of the utility?” What if you could access this in 5 minutes? ”

Gretel is not the first company to capture the value of synthetic data in the big data era. But its approach may be among the most comprehensive on the market. In addition to NLP technology that identifies PII for customers, the Gretel Toolkit also includes the ability to generate synaptic data, as well as use confidentiality techniques to transform sensitive data so that it is less sensitive.

Gretel’s cloud-based synthetic data generation technology uses deep learning technical notes to drive a model on real customer data. Source data can be anything: text, tabular data, time series, and is currently working on image data.

Once the AI ​​model has been trained, it can be used to generate synthetic versions of real data that do not contain any of the sensitive data that makes real data so risky to use. Users can then use synthetic data for analytical purposes or to train machine learning systems, says Watson.

“I went out there and said, can we help break up data silos and enable data sharing across companies?” says Watson. “Instead of building walls around sensitive data, build another version that doesn’t matter if it’s broken. It is not controlled by compliance, because it does not point back to real people. That is our goal. “

Synthetic data lighting

Illumina Gretel

Illumina correlation matrices show a small difference between real-world data and synthetic data (Image source: Illumina and Gretel)

Gretel is gaining traction in the healthcare and financial services industries, which have some of the strictest data regulations in all industries. In the area of ​​financial services, Gretel often helps companies identify fraud using synaptic data approaches that minimize data risk.

But some of the biggest benefits may come from the medical field. One company that gives Gretel a chance is Illumina, a genetic sequencing company that, like Gretel, is based in San Diego.

According to a case study written by Illumina and Gretel, Illumina set up a test to see how Gretel’s technique worked with a medium-sized set of mouse DNA containing more than 92,000 single-nucleotide polymorphisms. (SNP) in 1,200 mice, along with 68 phenotype descriptions.

Illumina performed two tests that found that the genome-wide association study (GWAS) based on the generated synthetic data closely matched the GWAS performed on the actual data. He used a synthetic quality score (SQS) that compared the accuracy of synthetic genotype and phenotype data with real-world training sets. He also performed a Principal Component Analysis (PCA) on GWAS results, which is a linguistic model, and found that the results “seem quite promising”.

Some problems arose when he used a Manhattan plot to take a closer look at what happens to specific chromosomes. “The synthetic model managed to capture and resume the strong associations of chromosomes 11 and 5,” the case study shows. “However, the synthetic model introduced notable false positive GWAS associations into chromosomes 8, 10, and 12.”

False positive results, says the case study, are most likely due to the small sample size of 1,200 mice. Scientists say they typically recommend at least 10,000 samples for the network to “learn enough to recreate the data, especially with the complexity of the genome containing 92,000 SNPs per mouse.”

The scientists also found that “the association of GWAS noise level and y-scale is significantly higher in synthetic data, indicating that the model could amplify the characteristics of real-world data that are reflected in GWAS analysis. These differences can also be minimized with additional examples and optimization of neural network parameters … “

Illumina Gretel 1

Manhattan Plots showed the limits of a small data set with the Illumina test of Gretel’s technology (Image source: Illumina and Gretel)

Overall, the test was a success for Illumina. Although there were some artifacts with a small sample size, Gretel’s approach “demonstrates encouraging evidence that state-of-the-art synthetic data models can produce artificial versions even of very dimensional and complex genomic and phenotypic data.”

And at a cost of just $ 1,440, it was a pretty good deal. Illunina says it is now working to test how the synthetic data approach works with human datasets.

Activator for data science

Watson says the Illumina test shows that the accuracy and characteristics of the data can be maintained when generating synthetic versions of real data. This will be especially beneficial in cases of data science use where source data is scarce and hard to find, he says.

“No one has enough accurate data. I think data blocking is such an issue right now, “he said. “It’s very expensive to hire annotation services or generate new data or collect more customer data. So a lot of the ML companies we work with, the thinking leader, are looking at how to grow our existing real-world customer data with synthetic data to help my machine learning algorithms work better. ”

Related articles:

Accenture report explores the “unreal” world of synthetic data and generative artificial intelligence

Synthetic data: sometimes better than real things

Five reasons why synthetic data is the electrolyte to accelerate your AI initiatives

Dedicated Server
Dedicated Serverhttps://www.winteringhamfields.com
Hi, By Profession I am an Injury Attorney who handles accident cases of cars with no insurance. I took College Classes online to get a degree in game design too.
RELATED ARTICLES

Most Popular

Recent Comments