This is going to be a quick write up of Erasure Coding and how to use it with our RadosGW. First lets look at our default profile for erasure coding on Ceph, understand it, and go and create our own.
Erasure coding profiles break down using the following formula.
- n = k + m
k = the number of data chunks in which the original object is divided. For instance, in the default profile where
K = 2, a 10KB object will be divided into K objects of 5KB each.
m = the number of coding chunks, i.e additional chunks that represent reliability level. If there are 2 coding chunks, it means 2 OSDs can be out without losing data.
n = The sum of the
In our default profile above this means we have 3 total chunks (2 + 1 = 3), and can lose
m number of chunks, anything more than that and its Bad News Bears.
The main advantage is that your data footprint is not that large as compared to replicating your data by a factor of 3.
For example purposes lets use a 100GB file to determine our final raw data footprint using erasure coding. Using the following 2 formulas and our default profile;
- ratio = k / n - (~.66 = 2/3)
- total__raw = file_size * (1/ratio) - (~151.51GB = 100GB * (1/.66))
Our file size ends up being 151.51GB, instead of 300GB if replicated 3 times.
Mainly speed. Erasure coding takes time to process the chunks. And the mode chunks you have, the more resources and time it will take to process those. Most of the time, but not always the case, erasure coding will be slower. A good balance between size, reliability, and performance is to set
So lets create one for our RGW pool using
crush-failure-domain can be set to
rack etc etc
CAUTION - Take it from me DO NOT convert any other pool besides
default.rgw.buckets.data. I converted
default.rgw.buckets.index to EC and after 5 hours I found the problem to be related to converting it. See below for examples of errors that occurred because of this.
Sadly you can’t (as far as I know), just switch a pool over to use erasure coding. But what we can do is run a mini script that will create a pool with erasure set, copy the old pool to the new pool, rename the old pool, and then rename the new pool to the old pools name. Sound confusing? Yeah I agree, but once you get it, it clicks and makes sense. This is how I like to convert pools, but as always, try this in a test environment before doing anything like this in production.
Create a user, or use an existing one, and try to create a bucket or file. You should be able to create files like normal.
More than likely you are going to see these errors when you set any other pool to EC that isn’t
default.rgw.buckets.data. This is easy enough to fix by essentially renaming everything back to the way it was before running the conversion script.
This example here is from me converting
default.rgw.buckets.index to EC. I was able to read all files just fine, but I could not write anything, or create anything.
I hope this helps out peeps and makes like a little easier. If this even helped out one admin, then it was well worth it.
Thanks for reading and feel free to contact me at firstname.lastname@example.org!