Lowering long-term logging bills by 4,800% with Amazon OpenSearch Service

Once you use Amazon OpenSearch Service for time-bound information like server logs, service logs, utility logs, clickstreams, or occasion streams, storage price is without doubt one of the main drivers for the general price of your answer. During the last yr, OpenSearch Service has launched options which have opened up new potentialities for storing your log information in varied tiers, enabling you to commerce off information latency, sturdiness, and availability. In October 2023, OpenSearch Service introduced help for im4gn information nodes, with NVMe SSD storage of as much as 30 TB. In November 2023, OpenSearch Service launched or1, the OpenSearch-optimized occasion household, which delivers as much as 30% price-performance enchancment over present situations in inside benchmarks and makes use of Amazon Easy Storage Service (Amazon S3) to offer 11 nines of sturdiness. Lastly, in Could 2024, OpenSearch Service introduced common availability for Amazon OpenSearch Service zero-ETL integration with Amazon S3. These new options be a part of OpenSearch’s present UltraWarm situations, which offer an as much as 90% discount in storage price per GB, and UltraWarm’s chilly storage possibility, which helps you to detach UltraWarm indexes and durably retailer not often accessed information in Amazon S3.

This put up works by means of an instance that will help you perceive the trade-offs obtainable in price, latency, throughput, information sturdiness and availability, retention, and information entry, so to select the precise deployment to maximise the worth of your information and reduce the associated fee.

Look at your necessities

When designing your logging answer, you want a transparent definition of your necessities as a prerequisite to creating sensible trade-offs. Fastidiously look at your necessities for latency, sturdiness, availability, and price. Moreover, contemplate which information you select to ship to OpenSearch Service, how lengthy you keep information, and the way you intend to entry that information.

For the needs of this dialogue, we divide OpenSearch occasion storage into two lessons: ephemeral backed storage and Amazon S3 backed storage. The ephemeral backed storage class consists of OpenSearch nodes that use Nonvolatile Reminiscence Specific SSDs (NVMe SSDs) and Amazon Elastic Block Retailer (Amazon EBS) volumes. The Amazon S3 backed storage class consists of UltraWarm nodes, UltraWarm chilly storage, or1 situations, and Amazon S3 storage you entry with the service’s zero-ETL with Amazon S3. When designing your logging answer, contemplate the next:

Latency – should you want leads to milliseconds, then you need to use ephemeral backed storage. If seconds or minutes are acceptable, you may decrease your price by utilizing Amazon S3 backed storage.
Throughput – As a common rule, ephemeral backed storage situations will present larger throughput. Situations which have NVMe SSDs, just like the im4gn, usually present the most effective throughput, with EBS volumes offering good throughput. or1 situations make the most of Amazon EBS storage for main shards whereas utilizing Amazon S3 with section replication to cut back the compute price of replication, thereby providing indexing throughput that may match and even exceed NVMe-based situations.
Information sturdiness – Information saved within the scorching tier (you deploy these as information nodes) has the bottom latency, and likewise the bottom sturdiness. OpenSearch Service offers automated restoration of information within the scorching tier by means of replicas, which offer sturdiness with added price. Information that OpenSearch shops in Amazon S3 (UltraWarm, UltraWarm chilly storage, zero-ETL with Amazon S3, and or1 situations) will get the advantage of 11 nines of sturdiness from Amazon S3.
Information availability – Greatest practices dictate that you simply use replicas for information in ephemeral backed storage. When you’ve got no less than one duplicate, you may proceed to entry your entire information, even throughout a node failure. Nonetheless, every duplicate provides a a number of of price. When you can tolerate short-term unavailability, you may cut back replicas by means of or1 situations, with Amazon S3 backed storage.
Retention – Information in all storage tiers incurs price. The longer you keep information for evaluation, the extra cumulative price you incur for every GB of that information. Determine the utmost period of time you need to retain information earlier than it loses all worth. In some circumstances, compliance necessities might limit your retention window.
Information entry – Amazon S3 backed storage situations usually have a a lot larger storage to compute ratio, offering price financial savings however with inadequate compute for high-volume workloads. When you have excessive question quantity or your queries span a big quantity of information, ephemeral backed storage is the precise alternative. Direct question (Amazon S3 backed storage) is ideal for big quantity queries for occasionally queried information.

As you contemplate your necessities alongside these dimensions, your solutions will information your selections for implementation. That can assist you make trade-offs, we work by means of an prolonged instance within the following sections.

OpenSearch Service price mannequin

To know tips on how to price an OpenSearch Service deployment, you should perceive the associated fee dimensions. OpenSearch Service has two completely different deployment choices: managed clusters and serverless. This put up considers managed clusters solely, as a result of Amazon OpenSearch Serverless already tiers information and manages storage for you. Once you use managed clusters, you configure information nodes, UltraWarm nodes, and cluster supervisor nodes, deciding on Amazon Elastic Compute Cloud (Amazon EC2) occasion sorts for every of those capabilities. OpenSearch Service deploys and manages these nodes for you, offering OpenSearch and OpenSearch Dashboards by means of a REST endpoint. You’ll be able to select Amazon EBS backed situations or situations with NVMe SSD drives. OpenSearch Service prices an hourly price for the situations in your managed cluster. When you select Amazon EBS backed situations, the service will cost you for the storage provisioned, and any provisioned IOPs you configure. When you select or1 nodes, UltraWarm nodes, or UltraWarm chilly storage, OpenSearch Service prices for the Amazon S3 storage consumed. Lastly, the service prices for information transferred out.

Instance use case

We use an instance use case to look at the trade-offs in price and efficiency. The price and sizing of this instance are based mostly on finest practices, and are directional in nature. Though you may anticipate to see comparable financial savings, all workloads are distinctive and your precise prices might range considerably from what we current on this put up.

For our use case, Fizzywig, a fictitious firm, is a big delicate drink producer. They’ve many crops for producing their drinks, with copious logging from their manufacturing line. They began out small, with an all-hot deployment and producing 10 GB of logs each day. At this time, that has grown to three TB of log information each day, and administration is mandating a discount in price. Fizzywig makes use of their log information for occasion debugging and evaluation, in addition to historic evaluation over one yr of log information. Let’s compute the price of storing and utilizing that information in OpenSearch Service.

Ephemeral backed storage deployments

Fizzywig’s present deployment is 189 r6g.12xlarge.search information nodes (no UltraWarm tier), with ephemeral backed storage. Once you index information in OpenSearch Service, OpenSearch builds and shops index information buildings which are normally about 10% bigger than the supply information, and you should depart 25% free space for storing for working overhead. Three TB of each day supply information will use 4.125 TB of storage for the primary (main) copy, together with overhead. Fizzywig follows finest practices, utilizing two duplicate copies for optimum information sturdiness and availability, with the OpenSearch Service Multi-AZ with Standby possibility, rising the storage have to 12.375 TB per day. To retailer 1 yr of information, multiply by one year to get 4.5 PB of storage wanted.

To provision this a lot storage, they might additionally select im4gn.16xlarge.search situations, or or1.16.xlarge.search situations. The next desk offers the occasion counts for every of those occasion sorts, and with one, two, or three copies of the info.

.	Max Storage (GB) per Node	Major (1 Copy)	Major + Reproduction (2 Copies)	Major + 2 Replicas (3 Copies)
im4gn.16xlarge.search	30,000	52	104	156
or1.16xlarge.search	36,000	42	84	126
r6g.12xlarge.search	24,000	63	126	189

The previous desk and the next dialogue are strictly based mostly on storage wants. or1 situations and im4gn situations each present larger throughput than r6g situations, which is able to cut back price additional. The quantity of compute saved varies between 10–40% relying on the workload and the occasion sort. These financial savings don’t cross straight by means of to the underside line; they require scaling and modification of the index and shard technique to completely notice them. The previous desk and subsequent calculations take the final assumption that these deployments are over-provisioned on compute, and are storage-bound. You’d see extra financial savings for or1 and im4gn, in contrast with r6g, should you needed to scale larger for compute.

The next desk represents the whole cluster prices for the three completely different occasion sorts throughout the three completely different information storage sizes specified. These are based mostly on on-demand US East (N. Virginia) AWS Area prices and embrace occasion hours, Amazon S3 price for the or1 situations, and Amazon EBS storage prices for the or1 and r6g situations.

.	Major (1 Copy)	Major + Reproduction (2 Copies)	Major + 2 Replicas (3 Copies)
im4gn.16xlarge.search	$3,977,145	$7,954,290	$11,931,435
or1.16xlarge.search	$4,691,952	$9,354,996	$14,018,041
r6g.12xlarge.search	$4,420,585	$8,841,170	$13,261,755

This desk offers you the one-copy, two-copy, and three-copy prices (together with Amazon S3 and Amazon EBS prices, the place relevant) for this 4.5 PB workload. For this put up, “one copy” refers back to the first copy of your information, with the replication issue set to zero. “Two copies” features a duplicate copy of all the information, and “three copies” features a main and two replicas. As you may see, every duplicate provides a a number of of price to the answer. In fact, every duplicate provides availability and sturdiness to the info. With one copy (main solely), you’ll lose information within the case of a single node outage (with an exception for or1 situations). With one duplicate, you may lose some or all information in a two-node outage. With two replicas, you could possibly lose information solely in a three-node outage.

The or1 situations are an exception to this rule. or1 situations can help a one-copy deployment. These situations use Amazon S3 as a backing retailer, writing all index information to Amazon S3, as a way of replication, and for sturdiness. As a result of all acknowledged writes are continued in Amazon S3, you may run with a single copy, however with the chance of dropping availability of your information in case of a node outage. If an information node turns into unavailable, any impacted indexes will probably be unavailable (purple) throughout the restoration window (normally 10–20 minutes). Fastidiously consider whether or not you may tolerate this unavailability together with your clients in addition to your system (for instance, your ingestion pipeline buffer). In that case, you may drop your price from $14 million to $4.7 million based mostly on the one-copy (main) column illustrated within the previous desk.

Reserved Situations

OpenSearch Service helps Reserved Situations (RIs), with 1-year and 3-year phrases, with no up-front price (NURI), partial up-front price (PURI), or all up-front price (AURI). All reserved occasion commitments decrease price, with 3-year, all up-front RIs offering the deepest low cost. Making use of a 3-year AURI low cost, annual prices for Fizzywig’s workload offers prices as proven within the following desk.

.	Major	Major + Reproduction	Major + 2 Replicas
im4gn.16xlarge.search	$1,909,076	$3,818,152	$5,727,228
or1.16xlarge.search	$3,413,371	$6,826,742	$10,240,113
r6g.12xlarge.search	$3,268,074	$6,536,148	$9,804,222

RIs present an easy solution to save price, with no code or structure modifications. Adopting RIs for this workload brings the im4gn price for 3 copies right down to $5.7 million, and the one-copy price for or1 situations right down to $3.2 million.

Amazon S3 backed storage deployments

The previous deployments are helpful as a baseline and for comparability. In truth, you’ll select one of many Amazon S3 backed storage choices to maintain prices manageable.

OpenSearch Service UltraWarm situations retailer all information in Amazon S3, utilizing UltraWarm nodes as a scorching cache on high of this full dataset. UltraWarm works finest for interactive querying of information in small time-bound slices, comparable to operating a number of queries in opposition to 1 day of information from 6 months in the past. Consider your entry patterns rigorously and contemplate whether or not UltraWarm’s cache-like conduct will serve you effectively. UltraWarm first-query latency scales with the quantity of information you should question.

When designing an OpenSearch Service area for UltraWarm, you should determine in your scorching retention window and your heat retention window. Most OpenSearch Service clients use a scorching retention window that varies between 7–14 days, with heat retention making up the remainder of the complete retention interval. For our Fizzywig state of affairs, we use 14 days scorching retention and 351 days of UltraWarm retention. We additionally use a two-copy (main and one duplicate) deployment within the scorching tier.

The 14-day, scorching storage want (based mostly on a each day ingestion charge of 4.125 TB) is 115.5 TB. You’ll be able to deploy six situations of any of the three occasion sorts to help this indexing and storage. UltraWarm shops a single duplicate in Amazon S3, and doesn’t want extra storage overhead, making your 351-day storage want 1.158 PiB. You’ll be able to help this with 58 UltraWarm1.massive.search situations. The next desk offers the whole price for this deployment, with 3-year AURIs for the recent tier. The or1 situations’ Amazon S3 price is rolled into the S3 column.

.	Scorching	UltraWarm	S3	Complete
im4gn.16xlarge.search	$220,278	$1,361,654	$333,590	$1,915,523
or1.16xlarge.search	$337,696	$1,361,654	$418,136	$2,117,487
r6g.12xlarge.search	$270,410	$1,361,654	$333,590	$1,965,655

You’ll be able to additional cut back the associated fee by transferring information to UltraWarm chilly storage. Chilly storage reduces price by lowering availability of the info—to question the info, you need to problem an API name to reattach the goal indexes to the UltraWarm tier. A typical sample for 1 yr of information retains 14 days scorching, 76 days in UltraWarm, and 275 days in chilly storage. Following this sample, you employ 6 scorching nodes and 13 UltraWarm1.massive.search nodes. The next desk illustrates the associated fee to run Fizzywig’s 3 TB each day workload. The or1 price for Amazon S3 utilization is rolled into the UltraWarm nodes + S3 column.

.	Scorching	UltraWarm nodes + S3	Chilly	Complete
im4gn.16xlarge.search	$220,278	$377,429	$261,360	$859,067
or1.16xlarge.search	$337,696	$461,975	$261,360	$1,061,031
r6g.12xlarge.search	$270,410	$377,429	$261,360	$909,199

By using Amazon S3 backed storage choices, you’re in a position to cut back price even additional, with a single-copy or1 deployment at $337,000, and a most of $1 million yearly with or1 situations.

OpenSearch Service zero-ETL for Amazon S3

Once you use OpenSearch Service zero-ETL for Amazon S3, you retain all of your secondary and older information in Amazon S3. Secondary information is the higher-volume information that has decrease worth for direct inspection, comparable to VPC Move Logs and WAF logs. For these deployments, you retain nearly all of occasionally queried information in Amazon S3, and solely the latest information in your scorching tier. In some circumstances, you pattern your secondary information, retaining a share within the scorching tier as effectively. Fizzywig decides that they wish to have 7 days of all of their information within the scorching tier. They’ll entry the remaining with direct question (DQ).

Once you use direct question, you may retailer your information in JSON, Parquet, and CSV codecs. Parquet format is perfect for direct question and offers about 75% compression on the info. Fizzywig is utilizing Amazon OpenSearch Ingestion, which might write Parquet format information on to Amazon S3. Their 3 TB of each day supply information compresses to 750 GB of each day Parquet information. OpenSearch Service maintains a pool of compute items for direct question. You might be billed hourly for these OpenSearch Compute Items (OCUs), scaling based mostly on the quantity of information you entry. For this dialog, we assume that Fizzywig may have some debugging periods and run 50 queries each day over in the future price of information (750 GB). The next desk summarizes the annual price to run Fizzywig’s 3 TB each day workload, 7 days scorching, 358 days in Amazon S3.

.	Scorching	DQ Price	OR1 S3	Uncooked Information S3	Complete
im4gn.16xlarge.search	$220,278	$2,195	$0	$65,772	$288,245
or1.16xlarge.search	$337,696	$2,195	$84,546	$65,772	$490,209
r6g.12xlarge.search	$270,410	$2,195	$0	$65,772	$338,377

That’s fairly a journey! Fizzywig’s price for logging has come down from as excessive as $14 million yearly to as little as $288,000 yearly utilizing direct question with zero-ETL from Amazon S3. That’s a financial savings of 4,800%!

Sampling and compression

On this put up, now we have checked out one information footprint to allow you to deal with information dimension, and the trade-offs you may make relying on the way you wish to entry that information. OpenSearch has extra options that may additional change the economics by lowering the quantity of information you retailer.

For logs workloads, you may make use of OpenSearch Ingestion sampling to cut back the dimensions of information you ship to OpenSearch Service. Sampling is acceptable when your information as an entire has statistical traits the place a component might be consultant of the entire. For instance, should you’re operating an observability workload, you may typically ship as little as 10% of your information to get a consultant sampling of the traces of request dealing with in your system.

You’ll be able to additional make use of a compression algorithm to your workloads. OpenSearch Service lately launched help for Zstandard (zstd) compression that may carry larger compression charges and decrease decompression latencies as in comparison with the default, finest compression.

Conclusion

With OpenSearch Service, Fizzywig was in a position to stability price, latency, throughput, sturdiness and availability, information retention, and most well-liked entry patterns. They had been in a position to save 4,800% for his or her logging answer, and administration was thrilled.

Throughout the board, im4gn comes out with the bottom absolute greenback quantities. Nonetheless, there are a few caveats. First, or1 situations can present larger throughput, particularly for write-intensive workloads. This may increasingly imply extra financial savings by means of lowered want for compute. Moreover, with or1’s added sturdiness, you may keep availability and sturdiness with decrease replication, and due to this fact decrease price. One other issue to contemplate is RAM; the r6g situations present extra RAM, which quickens queries for decrease latency. When coupled with UltraWarm, and with completely different scorching/heat/chilly ratios, r6g situations can be a superb alternative.

Do you’ve got a high-volume, logging workload? Have you ever benefitted from some or all of those strategies? Tell us!

Concerning the Writer

Jon Handler is a Senior Principal Options Architect at Amazon Internet Providers based mostly in Palo Alto, CA. Jon works intently with OpenSearch and Amazon OpenSearch Service, offering assist and steerage to a broad vary of consumers who’ve vector, search, and log analytics workloads that they wish to transfer to the AWS Cloud. Previous to becoming a member of AWS, Jon’s profession as a software program developer included 4 years of coding a large-scale, ecommerce search engine. Jon holds a Bachelor’s of the Arts from the College of Pennsylvania, and a Grasp’s of Science and a PhD in Laptop Science and Synthetic Intelligence from Northwestern College.