-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement stickiness for the duration of master_ttl
via cookies
#194
Conversation
💥 Thanks @rosa |
@bleonard not yet! I have only tested it locally, but I'll let you know if we end up using it in production, I'll work on that this week 🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great @rosa ! I tried to be thorough to show respect to the great work. Very exciting!
The main thing is "Full Request Sticking" which I mention a few times - that we should stick the whole request if it's a long one. Right now, I think the Time.now
stuff might cause subtle issues in that world.
README.md
Outdated
|
||
```ruby | ||
Makara::Cache.store = MyRedisCacheStore.new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to still support this store=
setter and give a deprecation warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea 👍
README.md
Outdated
```ruby | ||
Makara::Cache.store = MyRedisCacheStore.new | ||
``` | ||
To handle persistence of context across requests in a Rack app, makara provides a middleware. It lays a cookie named `_mkra_ctxt` which contains the current context. If the next request is executed before the cookie expires, that given context will be used. If something occurs which naturally requires master on the second request, the context is updated and stored again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You changed to cookie name to _mkra_stck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooops, yes I did! I renamed it later on, thinking that using the same name could cause issues with already set cookies, and forgot to change this reference.
README.md
Outdated
|
||
```ruby | ||
ctx = Makara::Context.generate # or any unique sha | ||
Makara::Context.set_current ctx | ||
Makara::Context.release_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should they be able to release a single one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you did. Just not in the readme. That's fine. Nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! I need to add it to the readme 😅
## Todo | ||
|
||
* Cookie based cache store? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💥
lib/makara/context.rb
Outdated
seed ||= "#{Time.now.to_i}#{Thread.current.object_id}#{rand(99999)}" | ||
Digest::MD5.hexdigest(seed) | ||
end | ||
IDENTIFIER = '_mkra_stck' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in our app, we usually .freeze
these. it's a habit now. have you seen benefits there?
lib/makara/proxy.rb
Outdated
def previously_stuck_to_master? | ||
@sticky && Makara::Context.previously_stuck?(@id) | ||
def stuck_to_master? | ||
@sticky && !@skip_sticking && Makara::Context.stuck?(@id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use should_stick?
somehow? Or DRY that up?
It's just the two variables but seems semantically correct.
should_stick?("stuck_to_master_check", []) && Makara::Context.stuck?(@id)
lib/makara/proxy.rb
Outdated
end | ||
|
||
|
||
# if we are configured to be sticky and we aren't bypassing stickiness | ||
# If we are configured to be sticky and we aren't bypassing stickiness, | ||
# (method and args don't matter) | ||
def should_stick?(method_name, args) | ||
@sticky && !@skip_sticking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we sent this method name and args here. probably so someone more clever than me could override them in some way. Not sure if that's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh! I forgot to comment on this! It's useful when implementing other kind of proxy, like for the AR adapter: https://github.com/taskrabbit/makara/blob/c6b158e847468379da8cc2eb69aee455bef386fb/lib/active_record/connection_adapters/makara_abstract_adapter.rb#L167-L171
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, that's pretty key. I wonder if the above should use it with fake arguments or DRY up that check or leave it as-is. Unclear to me.
lib/makara/proxy.rb
Outdated
end | ||
|
||
|
||
def stick_to_master(method_name, args, write_to_cache = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted above, let's keep write_to_cache
.
end | ||
|
||
def hijacked? | ||
@hijacked | ||
end | ||
|
||
def stick_to_master!(write_to_cache = true) | ||
@master_context = Makara::Context.get_current | ||
Makara::Context.stick(@master_context, @id, @ttl) if write_to_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This write_to_cache
thing seems important, but it really would complicate your Context
checks.
Now that I think about it, things get a little weird in the new model with really long requests. It seems against the expected behavior to stick
in a request, it be a long request, and then unstick later in that request. Let's call this "Full Request Sticking" and I'll put a note above in the context.
Here, let's keep the write_to_cache
variable and pass it to the Context
.
lib/makara/context.rb
Outdated
end | ||
|
||
def stuck?(config_id) | ||
data[config_id] && !expired?(data[config_id]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "Full Request Sticking" concept, this method would look more like
return false unless data[config_id]
return true if data[config_id][:ttl] # because it was asked to stick in this request
expired?(data[config_id][:expires_at])
@bleonard thanks so much for your thorough review, I really appreciate it! I'll address each comment separately and will push changes soon. You made a very good point about long requests, I didn't think about that. What you propose with "full request sticking" makes sense to me. Unsticking in the middle of the request doesn't seem very problematic to me (if the Thanks again! 🙇♀️ |
Great job @rosa, look forward to seeing the updates. |
3ad739e
to
5ef0bb8
Compare
Ok! This took me a bit longer than I thought 😅I think I have addressed now all the issues brought up. I have also done a fair bit of refactoring in my previous code. The main changes with respect to the original implementation are: Proxy instead of ConfigurationStop using the concept of configuration id, or sticking a configuration because it's confusing and doesn't make a lot of sense. I started using this terminology because in Full request stickingMake TTL to "kick in" at the end of the request, when we move on to the next context and persist it. This is the "full request sticking" we talked about: if we stick to Sticking without persistingSupport Apart from this I implemented the other minor issues you noticed, like showing a warning for I still need to test this with our own app, and see if we can get it to production 🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great @rosa!
Just a few notes on possible edge cases and cookie lifecycle.
README.md
Outdated
|
||
Makara handles stickyness by keeping track of a context (sha). In a multi-instance environment it persists a context in a cache. If Rails is present it will automatically use Rails.cache. You can provide any kind of store as long as it responds to the methods required in [lib/makara/cache.rb](lib/makara/cache.rb). | ||
Makara handles stickiness by keeping track of which proxies are stuck at any given moment. The context is basically a hash of proxy ids and the timestamp until which they are stuck. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds like we create a hash
(like MD5 sort of thing) of those things. Maybe "The context keeps track of the expiration timestamp for each proxy id."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I was influenced by the Hash
class name, perhaps naming it a dictionary or a mapping is better.
README.md
Outdated
``` | ||
|
||
It'll keep the proxy stick to master for the current request, and if `persist = true` (default), it'll be also stored in the context for subsequent requests, keeping the proxy stuck up to the duration of `master_ttl` configured for the proxy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"stuck"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅 Thanks!
## Todo | ||
|
||
* Cookie based cache store? | ||
* Support for providing context as query param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe path/here?makara_stick=proxy_one|proxy_two
Not sure if need timestamps. In that case could just be the whole cookie string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, something like that! Either the whole cookie string, or just one, and merge it together with the cookie string 🤔
lib/makara/context.rb
Outdated
# Keeps track of the current and previous context (hexdigests) | ||
# If a new context is needed it can be generated via Makara::Context.generate | ||
|
||
# Keeps track of the current stickiness state for different Makara configurations | ||
module Makara |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
lib/makara/context.rb
Outdated
end | ||
|
||
def stage(proxy_id, ttl) | ||
staged_data[proxy_id] = ttl.to_f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll want to do a max
here of the ttl
- imagine it did a write_to_context=true
and then write_to_context=false
in the same request. This would end up at zero but probably should have ended up persisting.
Maybe:
[staged_data[proxy_id].to_f, ttl.to_f].max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oohh, good point! I'll add a test for this! Not only in the same request, but also if it came from the previous request and it still hasn't expired.
lib/makara/context.rb
Outdated
# the current store | ||
def commit | ||
release_expired | ||
store_staged_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to store_staged_data
before the release_expired
so that the 0
values don't get written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... the 0
values won't get written in any case, since we'll only store proxy_id, ttl
pairs with ttl > 0
. I don't see any other reason why we should check for expiration of data staged in this request, in theory no staged data should be expired because we'll be comparing Time.now + ttl
(with ttl > 0
) with Time.now
, so I believe this is the right order. Am I missing anything?
lib/makara/context.rb
Outdated
|
||
# Pairs of {proxy_id}:{timestamp}, separated by "|" | ||
# proxy_id1:1518270031.3132212|proxy_id2:1518270030.313232 .. | ||
def parse(cookie_string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is used anymore (or there is a duplicate in the new Cookie
class).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooops, yes! I moved it to Cookie
and forgot to remove it from here! 😅
end | ||
|
||
def store(context_data, headers, options = {}) | ||
unless context_data.nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I follow everything correctly, this could be nil
if there were no changes to the cookie (@was_dirty == false
) - I'm not clear on the situation. Do you have to set the header here to keep the cookie around? That is it was a ttl what was longer than the request so there were no changes but there still needs to be a cookie. In other words, does it pass through by default? In which case, we do we delete the cookie in some case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be nil if there were no changes to the cookie
Correct!
Do you have to set the header here to keep the cookie around?
No! If you don't send anything the cookie will be kept as it was. That was my idea of returning nil
here: there is no next context => then there is no cookie sent.
In which case, we do we delete the cookie in some case?
We do! When the next context exists and it's empty, we send a cookie with max-age=0
and expires=<current time>
. That's a way of deleting the cookie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. thanks.
lib/makara/cookie.rb
Outdated
now = Time.now | ||
|
||
cookie[:max_age] = if context_data.any? | ||
(context_data.values.max - now.to_f).ceil + 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be nice to make this 5
a BUFFER_TIME
constant or something to be clear what it is semantically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, much better, great suggestion.
end | ||
|
||
|
||
def stick_to_master(method_name, args, write_to_cache = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to pass through that write_to_cache=true
(now persist=true
) thing here to stick_to_master!
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not necessary here, because in all cases we call stick_to_master
, which is private to the proxy, it's because we came across some action that made us to use master
, and in that case we always want to persist the value for the master_ttl
duration.
Thanks for the second thorough review @bleonard! I'm quite enjoying working on this 😄 |
Last set of changes pushed. Last week at the end I couldn't get to try this with our app, I'll see if I can do that this week 💪 |
lib/makara/config_parser.rb
Outdated
|
||
id.gsub(/[\|:]/, '').tap do |sanitized_id| | ||
if sanitized_id.size != id.size | ||
Logging::Logger.log "Proxy id '#{id}' changed to '#{sanitized_id}'", :warn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing. Might want to say "Makara" here so they have more context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. For some reason I had assumed without checking that the logger would tag the lines with [Makara]
, but I realised it was added manually on exception handling. What I have done instead is to add this by default to all lines logged through Makara::Logging::Logger.log
: 51d32a5 What do you think?
end | ||
|
||
def store(context_data, headers, options = {}) | ||
unless context_data.nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. thanks.
@rosa I took some time today to get master to pass Travis. If you want to rebase on that, feel free. |
51d32a5
to
1a1b2b6
Compare
Excellent! 🎉I had looked briefly at the Postgres related failures but only thing I did was to specify the BTW, I started testing this PR on our app last week, and if everything goes well, we should be running it in production this week 🤞 |
Hey @bleonard, sorry for the delay! Testing this in our app was delayed because of other unrelated infrastructure tasks, but I finally got there last week. We've been using this version of Makara in production for 4 days already, and no issues so far 😄 |
That's great! |
I tagged that version as |
Instead of keeping track of two MD5 hashes that allow us to be stuck to master for 1 or 2 requests at most, keep whether we are currently stuck or not, for each configuration. The context represents the current stickiness state. It knows whether we are stuck to master for each of the different Makara configurations we have defined. It's persisted between requests via an only cookie we set, whose max age depends on stickiness duration for each of the configurations we store at any given moment.
These were handled directly by the middleware before, but now that cookie setting is done in Context, support for this should be moved here.
We'll rely exclusively on cookie store for the context.
It's always set to the current context when the pool is intialised, and then compared against the current context as well. The only case in which it could be relevant is when the context is forced to be a new one manually, in the middle of a request. This will be handled in a different way.
We don't need to track any context hashes anymore, just whether we are currently stuck in this request or not.
… request It's enough with being stuck until the cache expires or all connections are released (which will be added later). Releasing all connections no longer depends on a specific context, so all we need to rely on is the cache.
The id defined in the configuration for a proxy was being referred as `config_id` in several places, so I had kept it, but it was certainly confusing, it wasn't easy to know what I was talking about. That id really identifies the proxy, so there is no point on using a misleading name for it.
And while there, increase the buffer for cookie's max-age to 5 seconds instead of 1. It shouldn't harm, and it can be easily overridden.
This ensures support on old IE versions, while for other browsers it'll be simply ignored in favour of `max-age`.
Now context doesn't know anything about cookies, only about the hash with the data.
…uest That is, when we stick a proxy to master in a request and the request is very long, if we set the expiration time based on the TTL right at the moment of sticking, we might unstick before the request even ends. This is probably not the intention, so instead of setting the expiration time right when the proxy is stuck, we wait until the context is committed, and at that time we move the newly stuck proxies to the storage area from the staging area.
That is, Proxy#stick_to_master!(false) should stick the proxy to master for the current request but not beyond that. We shouldn't store that in the cookie for future requests.
These characters are used as separators in the context cookie.
Freeze time when starting the request and use always that time to check whether some entry has expired, instead of considering it expired at any point during the request. Also, use always the max TTL seen for any staged or stored entry.
1a1b2b6
to
5b1f0a7
Compare
Done! Basecamp 3 is now running version |
💥 I think I'll release v0.3.10 as-is and then v0.4.0 with this PR. |
\o/ Thanks! I think the only (obvious) breaking change is that anything depending on the previous context cookie won't work, and that |
I suppose, though I never considered someone in Ruby or JS reading that cookie. There was some Ruby < 2.3 errors with the deprecation string that I fixed on master. I'm about to embark on a pretty serious sabbatical and this is a big change, so I'll let people upgrade to 0.3.10 and/or try this tag out and push this up in May. |
Ouch, sorry about that, and thanks for fixing them 🙏
Perfect 👍 Enjoy your sabbatical and thanks so much for all your work and help with this! 🎉 |
We've been using the v0.4.0 tag for a while and things seems good. Congratulations @rosa - this is a great release. 🎉 |
Thanks @bleonard, and welcome back! We've also been using 0.4.0 since 20th March with no issues 😄 Thanks again for all your help! |
This pull request significantly changes how Makara's context and stickiness behaves. It implements what is described in this comment, and can be summarised as:
master
, is maintained for the whole duration ofmaster_ttl
, as many as request that be. I noticed another old comment in which this behaviour was assumed by another person, as a more intuitive way of mitigating replication lag.master
and when that stickiness expires. This is what is considered context now: a map of configuration ids and their stickiness expiration times.Todo:
http://mysite.com/path/to/resource?_mkra_ctxt=#{context}
.This is a very big change in code and in behaviour, so I totally understand any reluctance to merge it 😅 If this is too radical, I'm happy to get to an intermediate solution, although I do believe that the change is behaviour makes more sense as the idea of replication lag mitigation goes.