Sustainability: More than green energy, tech stacks must be efficient

Inside KeepitMarch 5, 2024 | 12 minutesBy Jakob Østergaard

Keepit’s holistic approach to sustainability pushes beyond only using green energy options: it’s about maximized efficiencies from our purpose-built architecture perfected for protecting and storing SaaS application data in the fastest, most efficient way possible.

So, let’s look into how we consider sustainability in the data protection and backup industry, the design choices Keepit has made to be fast and efficient (like deduplication and incremental backup), as well as the challenges legacy systems face in regard to sustainability.

Introduction

What is sustainability in a context where energy consumption must happen? And by “must happen” I mean — in our very real example — for critical services like data protection and backup that’s required by law (think compliance with NIS2, HIPAA, GDPR) and crucial for businesses ensuring operations and continuity.

Keepit, the only vendor-independent cloud data protection specialist (think air gapping with a separate logical infrastructure, in line with the 321 backup rule), not only leads the way for greener solutions by way of substantially better efficiency, but also demonstrates that performance and profitability don’t have to take a backseat to environmental stewardship.

This article explores Keepit's comprehensive approach to sustainability, focusing on its commitment to minimizing resource consumption while delivering industry-leading data protection of essential services and critical infrastructure.

From efficient technology stack design to innovative data management practices, Keepit exemplifies how sustainability can be integrated into every aspect of operations, often leading to some beneficial “side effects” I’ll talk about below.

This article sheds light on the importance of responsible resource utilization in the tech industry and offers insights into practical strategies for achieving unmatched environmental sustainability and performance capabilities.

Let’s look into how we integrate sustainability considerations into our data protection and backup services operations.

Sustainability in data protection: Maximize performance and efficiency to minimize use

Delivering any service, including this one, will consume resources and there's just no way around that. So, what we try to do here at Keepit is to make as much positive impact with our customers as we can by not only delivering a great service (that they're often obligated by law to have) but delivering that same high level of service with the least negative impact for the world and future generations, in regards to energy use and energy origin.

Sustainability for us, as I see it anyway, means being responsible in our consumption of resources because there's absolutely no way around consuming resources. So, for us at Keepit, we built our technology stack from the ground up specifically for solving these exact problems that we're solving in the most efficient way possible.

And with efficiency here, I mean broadly, so it's in terms of how much physical equipment we need, how much space do we need to consume, how much power we consume, and how many people we need to run this business. In that light, you can say that a sustainable business is also going to be a more profitable business because minimizing resource consumption not only reduces environmental impact but also lowers operational costs, ultimately leading to increased profitability.

Let’s look into the keys of consumption and sustainability.

Sustainable resource consumption

To us at Keepit, there’s an emphasis on the understanding that every action leaves an impact, and because of this, we define sustainability as being responsible for resource consumption. This includes a commitment to minimizing the footprint left by the company's operations.

The company acknowledges that, given the nature of its services, resource consumption is inevitable, and therefore sustainability, in the context of Keepit, means making conscious and responsible choices to mitigate the environmental impact associated with its operations. So, what do we do to make a difference and to limit our impact?

What makes Keepit green: Technology stack and efficiency

Beyond running data centers with 100% renewable energy, we take pride in having constructed our technology stack from the ground up, specifically designed to efficiently address data storage challenges. This approach reflects a commitment to optimizing resource utilization and delivering a service that is not only effective but also resource efficient.

Efficiency, in Keepit's view, extends beyond environmental considerations to include broader aspects such as the physical equipment required, space utilization, power consumption, and the personnel needed to operate the business. This holistic approach ensures a sustainable business model — and a profitable one, like mentioned above.

Now, how do we take on the tough task of being sustainable and high performing?

By starting from a clean slate. We built this system specifically for storage, protection, and management of cloud SaaS data. So, we built up this entire technology stack specifically for the purpose of delivering exactly the service we're delivering — cloud backup in the cloud, for the cloud, and so it does this “one thing” extremely well and efficiently.

And one of the things we did is that, well, we did what Amazon, Microsoft, and Google do: We built up this whole cloud infrastructure. We have full ownership of our technology stack instead of building on top of middleware that runs on top of middleware that runs on top of abstractions and virtualizations and other layers that all add overhead.

We avoid unnecessary complexity by being purpose built, and therefore we use less storage space, need less processing power due to fewer operations, and use fewer human resources to run it all.

Predictable costs as a wonderful “side effect”

With full ownership of our technology stack, we have precise insights into the costs associated with running our operations. When a developer writes a piece of code that inefficiently utilizes resources, it’s something the operations team will see, and it directly impacts them. Efficiency and sustainability are integral parts of our culture, so we address these inefficiencies by writing “better” code.

We don't just elastically spin up some additional set of servers and try to solve the problem with a credit card. But that’s exactly how a lot of people approach this — that’s how a lot of competitors approach this. It may offer immediate relief, but that type of short-sighted solution fails to align with our company culture and long-term goals.

It’s enormously inefficient both from a bottom-line perspective and from a sustainability perspective. And it doesn't fix the problem. It just kind of scribes it further in the future — kicking the can down the road. So, whatever you're paying for with your credit card now are inefficiencies that you’ll be paying for again and again into the future.

They fix the problem with money, but in terms of efficiency, it’s not an action done for the betterment of future generations. They waste energy every single CPU cycle — and they’re also paying for that bigger bill every billing cycle thereafter.

Not only does that power need to come from somewhere, but so does the money to pay for it, which is either passed on to their customers or covered with venture capital.

What these companies are left with is a band-aid solution that’s going to consume more energy and cost more month after month after month — basically for as long as they exist. And on any consumption-based model, the costs will be continually growing with datasets.

Lean, green, backup machine: 99% power saving

Like I mentioned, when we started, we weren’t bound to use any specific technology, so we started with a clean slate. The programming language we chose for the majority of our technology stack is a very efficiently compiled language (you can read my blog post about it.) And if you compare that to common languages, you know, if you go online and say, “hey, I'm building this upstart company on the net, what technology stack should I use,” the advice you will get there is very different from what we chose to go with.

And because of that decision, it means that we're something like 30 to 100 times more efficient. So, for doing one specific operation, if I program that in the technology that we're using versus if I program that in one of these most common, most hyped technologies, we have a 30 to 100 times change in CPU resource utilization, which also translates directly to power consumption.

So, if we had done what everyone else was doing, we would’ve consumed something like 100 times more power than we do today and that's huge. Can you imagine achieving a 99% power saving? That’s basically what we started out doing.

From that perspective, it’s now more difficult for us to make additional power savings since we’re already so efficient. Sure, it looks good when companies boast “15% energy savings,” but another way of looking at that — and we do from our perspective — is that their waste was high, their tech stack wasn’t lean. Of course, it’s good that they improve, but they’re still not even close from an efficiency standpoint.

Reducing consumption with incremental backup and deduplication

If you look at the core service that we provide, we make a copy of your data set a couple of times a day. We keep that copy for as long as you want as a customer company. Some companies need seven years of retention, some need 20 years of retention, and some even pay for 100 years of retention. Data retention really depends on the industry you’re in and and where you operate and all that.

In theory, we keep complete copies of your entire data across two separate, mirrored locations for, let's say, 100 years. In theory. In reality, we’re smart about this, because it wouldn’t be feasible to transport your entire data set every day.

Not only would it not be feasible, there’s also no need to do that because of incremental backup. TechTarget defines incremental backup as “a backup type that only copies data that has been changed or created since the previous backup activity was conducted. By only backing up changed data, incremental backups save restore time and disk space. Incremental is a common method for cloud backup as it tends to use fewer resources.”

At Keepit, we utilize incremental backups. We transport only the differences, such as edits, that have happened since the original backup was completed. And this also means that we don't duplicate your entire data set multiple times every day, we just transfer the changes. How does this reduce consumption? Let’s consider an example:

If you have one unchanging file and we hold that in our backup set for, let's say, 1,000 backups, then we will have only one instance of the file. We refer to that original file in each of those 1,000 backup sets, but we will storage only one instance.

It’s stored once, but that file is pointed to in each of those 1,000 backup sets. We’re not duplicating data needlessly. Deduplication means that we need much less storage space and can reduce network load because less data is transferred from incremental backups, and less data is held in storage because of deduplication. This is all possible because every file, no matter the file type, has reliable identifiers where you can kind of say, “This is exactly the same file.”

Let's say I send someone a Word file and she doesn't change it. She just saves it from my attachment, and that can be identified as being identical. Let’s expand this across an organization: If you have 1,000 employees in your company that have identical copies of this file and we have those across the 1,000 backup sets, then we will not hold what would literally be 1,000,000 copies of the file. We will again hold just the single instance of the file and there will be a million references to the file.

Not every backup and recovery solution does it this way though, and as a result, they’re using vast amounts of energy and hardware to power these operations and then to keep all this data in storage. Even if they happen to leverage a green energy source, they’re not exactly using resources responsibly if they are using 1,000,000 times or even 1,000 times what they would with incremental backup.

Purpose-built storage architecture

Inherent in our storage architecture is this deduplication across both space and time. And that's one of these initial thoughts we had and one of the earliest ideas of our storage architecture that we built for this purpose. We did sit down and build a storage foundation for this backup service from the start. We basically invented a file system or object store, if you will, for this purpose: to store these enormous data sets for decades, and we could see we needed something that we couldn't go out and buy anywhere. We needed something that was built for purpose and so we could avoid the problems we were already seeing others have with legacy systems.

Legacy complexities add inefficiencies

Running this stack end to end ourselves, we avoid legacy inefficiencies as I mentioned above. If we had chosen to run on AWS (Amazon Web Services), or Azure, or Gcloud, we’d be provisioning virtual machines, and there's a lot of complexities that get added when you're on a virtual machine: All of those layers of middleware that we don’t have to employ. (And with the storage virtualization: If you’re on S3 or Azure blob, then you’re using very sophisticated pieces of machinery that come with great functionality and therefore great overhead. To store backups on them is like hammering in nails with a microscope — it's perfectly doable but it’s not a very good match of tools for the job.)

With those virtual machines, you need some automation tools and some various management software to run them reliably. So, you add that, then the virtualization middleware, and then you have some resource management middleware. You have a lot of systems, and your software doesn't run on the computer anymore: It runs on a collection of layers that run on the computer which you may share with other customers. And ultimately, even if you don't, Amazon or Microsoft is going to want to instrument your code enough so they can charge you for what you're doing. So, there's a number of inefficiencies that are introduced here. And they all translate directly to cost and power consumption.

But, since we built and operate our own architecture, we don't have any of those costs, complexities, or additional consumption. No one needs to instrument anything to find out what we need to pay. So, in addition to predictable costs, another happy “side effect” is that we don’t have any sub-processors, which makes vetting and compliance (with GDPR and NIS2, for example) easier for our customers, partners, and anyone else we’re in business with.

Conclusion

Keepit's holistic approach to sustainability sets a commendable example for the tech industry, particularly within data protection and backup services. By prioritizing responsible resource consumption and efficiency in its technology stack design and operational practices, Keepit demonstrates how environmental sustainability can be seamlessly integrated into essential services.

Through the utilization of deduplication and incremental backup techniques, Keepit minimizes data storage requirements and energy consumption, paving the way for a more environmentally responsible approach to data protection. Additionally, Keepit's purpose-built storage architecture and avoidance of legacy inefficiencies further underscore its commitment to sustainability, resulting in both cost savings and reduced environmental impact.

As we navigate an increasingly digital landscape, Keepit's sustainability initiatives serve as a testament to the importance of balancing important factors such as regulatory compliance and business continuity with environmental responsibility. By embracing sustainable practices, Keepit not only enhances its own operational efficiency but also contributes to the broader goal of mitigating the environmental footprint of the tech industry.

In essence, Keepit's journey towards sustainability exemplifies the potential for technology companies to lead by example, demonstrating that profitability and environmental stewardship are not mutually exclusive. As we look to the future, let us draw inspiration from Keepit's success and strive to emulate its commitment to sustainability in all facets of our operations.

Author

Jakob Østergaard is CTO at Keepit, a leading cloud backup and recovery solution. He has an M.Sc. in Computer Science and Applied Mathematics and has worked with software development since 1998. The early career started on massively parallel supercomputers but soon transitioned to more reasonably sized equipment.

He has played a key role in the design and implementation of several cross platform networked software systems and is the principal designer of the object storage system that underlies the Keepit business. Today he leads the development, operations, and security organizations of the company.

He still writes code. Find Jakob on LinkedIn.