top of page

The Data Protection Gap Nobody Saw Coming: How Cloud and AI Are Breaking Traditional Backup

  • Writer: ctsmithiii
    ctsmithiii
  • Oct 8
  • 8 min read

HYCU reveals critical gaps in cloud backup strategies as AI workloads and SaaS sprawl create new vulnerabilities that traditional solutions can't address.

ree

The promise of cloud computing included a simple bargain: move your data to the cloud and let someone else worry about availability, durability, and protection. That bargain is breaking down.

At the 64th IT Press Tour, HYCU presented research and customer examples that reveal a widening gap between how organizations protect their data and the reality of modern cloud and SaaS environments. The data shows that 65% of organizations experienced a SaaS-related security breach in the past year, with the average incident costing $2.3 million over a typical five-day recovery period.

But the numbers only tell part of the story. The more interesting revelation is how fundamentally different data protection needs to be when you're dealing with AI training datasets, data lakehouses storing petabytes of information, and SaaS applications that have become the system of record for critical business functions.

Why Cloud Backup Has Three Serious Problems

Sathya Sankaran, HYCU's Head of Cloud Products, described what he calls the "lifestyle diseases" of cloud backup—problems that develop slowly but pose a serious risk over time.

Storage obesity. Cloud backup APIs don't support incremental backups the way on-premises systems do. Every backup is a full export. When you're dealing with petabyte-scale datasets, this creates massive storage costs and makes long retention periods economically impractical.

Fragmentation. Different consoles for different cloud providers. Separate tools for SaaS applications. Multiple third-party services to fill gaps in coverage. Organizations end up with what Sankaran calls "console chaos" and "automation sprawl."

Blind spots. Most cloud backup solutions handle VMs and files well enough. But databases-as-a-service? Data lakehouses? AI training datasets? These workloads either aren't supported or require manual scripting to protect.

The consequences of these problems are evident in HYCU's research. Organizations running more than 200 SaaS applications experienced breach rates of 77%, compared to 60% for those with fewer than 100 applications. More apps equal more attack surface, but also more complexity in knowing what's protected and what isn't.

The Real Cost of Data Loss in Cloud Environments

HYCU shared several customer examples that illustrate how different data protection requirements are in modern environments:

  • A financial institution managing thousands of GitHub repositories needs granular recovery capabilities. If a developer accidentally corrupts a repository or deletes critical code, the organization needs to restore that specific repository to a specific point in time—not recover everything.

  • A manufacturing company stores all component data in Jira. Each component in a multi-million dollar asset has 50+ objects and configurations that must be recoverable for compliance purposes. Losing that data means losing the ability to service products already in the field.

  • A European law firm transitioning to iManage Cloud started its proof of concept with 25TB of data across 3 million documents. That was just the test dataset. Their full implementation involves multiple countries and far more data.

During the presentation, Brian Babineau, HYCU's Chief Customer Officer, emphasized a point that often gets overlooked: "The most important part is, can you restart your business? And do you have a good, known copy of whatever application, whatever use case, whatever file, whatever granularity, whatever model you're running off of?"

It's not about whether the backup completed. It's about whether you can actually use that backup to resume business operations.

The AI Workload Challenge

One of the most interesting sections of HYCU's presentation focused on AI and machine learning workloads—an area where traditional backup thinking doesn't apply. When organizations train AI models, they're creating multiple tiers of data that need protection:

  • Raw training data stored in object storage or data lakes

  • Feature data is stored in specialized databases

  • Model artifacts stored in container registries

  • Model metadata and training logs are scattered across additional services

HYCU claims to be the only vendor offering comprehensive protection across this entire AI stack. But the more interesting question is why this matters.

Some organizations need to prove what data they used to train their models. For regulatory compliance or intellectual property reasons, they must demonstrate that they didn't use copyrighted material or competitor data. If you can't recover your historical training datasets, you can't prove what you did or didn't use.

Subbiah Sundaram, HYCU's SVP of Product, mentioned customers who specifically back up their AI training data to prove compliance: "There are customers that say, 'I want to make sure I know what data I trained my models on, and I've got a copy of it, so when the regulators come knocking on my door at some point, I can prove and show the provenance for this model.'"

Data Lakehouses: The New Crown Jewels

The presentation included several statistics about data lakehouse platforms that reveal where enterprise data is actually accumulating:

  • Google BigQuery is a $4 billion business unit growing at double-digit rates

  • Databricks is at $3.2 billion in annual recurring revenue, growing at over 50% year-over-year

  • Amazon reports that Apache Parquet and Apache Iceberg (lakehouse file formats) are the fastest-growing data types in S3 storage

These platforms are processing trillions of queries per month. They're not just analytics systems anymore—they're becoming the system of record for businesses.

And they're creating protection challenges that didn't exist with traditional databases. Customers mentioned in the presentation are dealing with 20-petabyte datasets in BigQuery and 6-petabyte datasets in Azure Database. At that scale, native cloud backup solutions create cost problems that make adequate protection nearly impossible.

Here's why: A financial services customer generates 100TB of new data daily from their various systems. With a 30-day retention policy—a relatively short period—traditional cloud backup would require moving and storing 3,000TB per month. At cloud egress rates (roughly 5x the cost of storage), this becomes economically unfeasible.

HYCU's solution involves integration with Dell Data Domain using DD Boost technology, which provides deduplication at the source. They claim 40:1 deduplication ratios in real customer environments, meaning 40 petabytes of logical data stored as one petabyte physical.

The Dell Partnership: Hardware Meets SaaS

HYCU's announcement that they're part of Dell's OTC program represents an interesting convergence. Dell is fundamentally a hardware vendor wanting to sell on-premises equipment. HYCU built its business on cloud and SaaS protection.

The partnership works because Dell needed coverage for workloads where they didn't have good solutions—particularly SaaS applications and cloud-native workloads. For HYCU, Dell's Data Domain platform offers a deduplication target, making large-scale cloud backup economically viable.

Sankaran explained the value proposition: "What Dell really sees is you've got the best platform when it comes to deduplication with Data Domain. What we offer is the ability to push a whole bunch of data from 90-plus, 100-plus connectors we now have into that box."

The partnership has already driven business globally, including a Middle East customer who required data stored in a specific country due to political considerations, with a second copy maintained on-premises.

The iManage Integration: When Legal Goes to the Cloud

HYCU's recent support for iManage Cloud—used by 80% of the top 100 law firms—illustrates how SaaS applications are moving into mission-critical territory.

During the presentation, Babineau described a law firm POC that involved 25TB across 2 billion documents. The firm operates in three or four countries outside the United States, and that wasn't even their complete dataset.

Why does document management require such sophisticated backup? Because iManage stores far more than documents. It maintains cases, file links, images, evidence, metadata, and permissions. Recovering just the documents without this context would be useless.

One law firm executive quoted in the presentation explained the criticality: "Imagine I have 1,500 lawyers in the company, and iManage goes down. You guys know how much, on average, in the US, a lawyer easily charges—$500 an hour. Imagine they're down for one day. Just 1,500 people, one day, you can imagine the impact they actually have."

Security Features That Address Real Vulnerabilities

HYCU introduced "R-Shield" as their cyber resilience offering, with two capabilities that address problems in current backup security approaches:

Near real-time malware scanning at source. Traditional backup vendors scan backed-up data by recovering it to a separate environment and running malware detection there. This doesn't scale with petabyte datasets. HYCU scans at the source using storage snapshots, analyzing full data (not just metadata) without requiring it to be moved to a third-party environment.

True immutable storage for SaaS applications. Most SaaS backup vendors offer what they call "immutable" storage, but it's a soft lock. If you terminate the relationship, someone has to pay for the remaining retention period. HYCU stores backups in the customer's own cloud account with true object lock, meaning even if you stop using HYCU, your data remains protected under your control.

Sundaram emphasized this distinction: "SaaS vendors don't really lock the data because they say if I lock it and you have to pay, I don't lock it. What HYCU does is that when we give you a lock, it is truly locked, truly immutable. Even if you fire HYCU tomorrow, your data is with you, and it's always safe."

Platform Coverage vs. Point Solutions

One of the recurring themes in the presentation was the difference between HYCU's platform approach and the point solutions offered by larger competitors.

According to the presentation, the major vendors in Gartner's Magic Quadrant combined to add only five new SaaS applications in the past two years. HYCU claims to have added 25+ integrations in roughly the same period.

The difference, according to Sundaram, is architectural: "The reason we are able to accomplish this is because of the power of our platform. Every application, traditionally, to support an application is a lot of work because it's not about cursorily supporting it—it's going in deep."

HYCU's platform allows partners to build integrations. The iManage integration, for example, was jointly developed by HYCU and iManage, rather than being built entirely by HYCU developers. This collaborative approach enables faster expansion into new workloads.

Cross-Cloud Recovery and Data Sovereignty

The presentation included multiple examples of customers requiring cross-cloud recovery capabilities:

  • A large federal agency migrating from VMware to a mix of Azure Government Cloud, Azure Local, AWS, and remaining on-premises infrastructure needed unified backup and disaster recovery across all these environments.

  • A financial services firm running workloads across all three major cloud providers wanted a single console to manage protection for AWS, Azure, and Google Cloud simultaneously.

  • European customers, particularly in Germany and France, require data stored within specific geographic boundaries—sometimes even requiring on-premises copies of SaaS data for sovereignty purposes.

These requirements are driving demand for what HYCU calls "bring your own cloud" (BYOCloud) capabilities, where organizations choose their own storage targets rather than using vendor-provided storage.

The Economics of Egress

One problem that repeatedly arose in the presentation is cloud egress fees—the cost of transferring data out of one cloud provider, either to another cloud or to on-premises storage.

For context, AWS charges roughly 9 cents per gigabyte for data egress. For a petabyte of data, that's $90,000 just to move it once. When implementing cross-cloud disaster recovery with petabyte-scale datasets, these fees render traditional backup approaches economically impractical.

Sankaran presented a real-world customer scenario: 100TB of new data daily, a 30-day retention policy, and a need for cross-cloud resilience. Traditional backup involves moving 3,000 TB per month between clouds at roughly five times the cost of storage, solely for egress fees.

With 40:1 deduplication, the 3,000TB becomes approximately 75TB, actually moved—a dramatic reduction in both storage and transfer costs.

What Changed Since HYCU's Last IT Press Tour

The presentation marked HYCU's 10th appearance at the IT Press Tour. Several developments stood out as significant progress:

Dell partnership. Being added to Dell's OTC program provides access to Dell's global sales channels and validates HYCU's approach to cloud data protection.

iManage Cloud integration. Entering the legal vertical with a solution for one of that industry's most critical applications demonstrates platform extensibility.

R-Shield capabilities. Adding malware detection and immutable storage as platform features, rather than separate products, demonstrates the integration of security into core protection.

GigaOm recognition. Under embargo until October 9, HYCU was recognized as a Leader and Fast Mover in GigaOm's evaluation, particularly for cross-cloud mobility and protection of deployment orchestration services.

The Path Forward

The data protection challenges HYCU highlighted aren't going away. SaaS adoption continues to grow. AI workloads are multiplying. Data lakehouses are becoming the new system of record.

Traditional backup approaches—designed for VMs, files, and on-premises databases—don't translate well to these modern environments. The question is whether platform approaches like HYCU's can fill the gap before organizations experience the kind of data loss that makes headlines.

HYCU's argument is straightforward: comprehensive coverage across workloads, customer control over data and storage, and built-in security features are necessary for modern data protection. Whether that's sufficient to displace established vendors or change how organizations think about backup remains to be seen.

But the problems they're highlighting are real. The blind spots exist. The costs are mounting. And based on HYCU's research, most organizations are learning about these gaps only when they need to recover data that turns out not to be properly protected.


 
 
 

Comments


© 2025 by Tom Smith

bottom of page