Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message queue of inter_dc_query too full #402

Open
peterzeller opened this issue Nov 12, 2019 · 2 comments
Open

Message queue of inter_dc_query too full #402

peterzeller opened this issue Nov 12, 2019 · 2 comments
Labels

Comments

@peterzeller
Copy link
Member

This happens on the Kaiserslautern-Paris Testcluster.

The Paris server (antidote1) is much slower:

instance Update operations by client Updates received from other Dcs Σ
antidote1 87173 34349 121522
antidote2 323533 172552 496085
antidote3 264443 203582 468025
Σ 675149

As you can see, antidote1 is missing a lot of updates.

Looking at the processes with the longest message queues we get:

[{746908,
  [{registered_name,inter_dc_query},
   {current_function,{erlzmq,send,3}},
   {initial_call,{proc_lib,init_p,5}},
   {status,running},
   {message_queue_len,746908},
   {links,[<35401.1069.0>]},
   {dictionary,[{'$ancestors',[antidote_sup,<35401.1068.0>]},
                {'$initial_call',{inter_dc_query,init,1}}]},
   {trap_exit,false},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.1067.0>},
   {total_heap_size,28315631},
   {heap_size,1598},
   {stack_size,26},
   {reductions,5226011},
   {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                        {min_bin_vheap_size,46422},
                        {min_heap_size,233},
                        {fullsweep_after,10},
                        {minor_gcs,4}]},
   {suspending,[]}],
  <35401.1082.0>},
 {9947,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9947},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup,
                               riak_core_sup,<35401.964.0>]},
                {'$initial_call',{riak_core_vnode,init,1}},
                {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>,
                              next => #Fun<rand.5.10897371>,type => exrop,
                              uniform => #Fun<rand.6.10897371>,
                              uniform_n => #Fun<rand.7.10897371>,weak_low_bits => 1},
                            [278878836590975983|200032599015734561]}}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,2805641},
   {heap_size,318187},
   {stack_size,43},
   {reductions,49383247},
   {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                        {min_bin_vheap_size,46422},
                        {min_heap_size,233},
                        {fullsweep_after,10},
                        {minor_gcs,8}]},
   {suspending,[]}],
  <35401.5872.0>},
 {9945,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9945},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup,
                               riak_core_sup,<35401.964.0>]},
                {'$initial_call',{riak_core_vnode,init,1}},
                {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>,
                              next => #Fun<rand.5.10897371>,type => exrop,
                              uniform => #Fun<rand.6.10897371>,
                              uniform_n => #Fun<rand.7.10897371>,weak_low_bits => 1},
                            [6104668538044925|127505248886749810]}}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,3499716},
   {heap_size,514838},
   {stack_size,43},
   {reductions,51367855},
   {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                        {min_bin_vheap_size,46422},
                        {min_heap_size,233},
                        {fullsweep_after,10},
                        {minor_gcs,10}]},
   {suspending,[]}],
  <35401.5814.0>},
 {9944,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9944},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup,
                               riak_core_sup,<35401.964.0>]},
                {'$initial_call',{riak_core_vnode,init,1}},
                {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>,
                              next => #Fun<rand.5.10897371>,type => exrop,
                              uniform => #Fun<rand.6.10897371>,
                              uniform_n => #Fun<rand.7.10897371>,weak_low_bits => 1},
                            [30436575548104867|146295658476797480]}}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,2805586},
   {heap_size,318187},
   {stack_size,43},
   {reductions,52758223},
   {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                        {min_bin_vheap_size,46422},
                        {min_heap_size,233},
                        {fullsweep_after,10},
                        {minor_gcs,...}]},
   {suspending,[]}],
  <35401.5806.0>},
 {9939,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9939},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup,
                               riak_core_sup,<35401.964.0>]},
                {'$initial_call',{riak_core_vnode,init,1}},
                {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>,
                              next => #Fun<rand.5.10897371>,type => exrop,
                              uniform => #Fun<rand.6.10897371>,
                              uniform_n => #Fun<rand.7.10897371>,weak_low_bits => 1},
                            [80043739005711909|41200708321364588]}}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,3499716},
   {heap_size,514838}, 
   {stack_size,43},
   {reductions,47573844},
   {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,...}},
                        {min_bin_vheap_size,46422},
                        {min_heap_size,233},
                        {fullsweep_after,...},
                        {...}]},
   {suspending,[]}],
  <35401.5862.0>},
 {9930,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9930},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup,
                               riak_core_sup,<35401.964.0>]},
                {'$initial_call',{riak_core_vnode,init,1}},
                {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>,
                              next => #Fun<rand.5.10897371>,type => exrop,
                              uniform => #Fun<rand.6.10897371>,
                              uniform_n => #Fun<rand.7.10897371>,weak_low_bits => 1},
                            [20635463222876748|7786351554213997]}}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,2805586},
   {heap_size,318187},
   {stack_size,43},
   {reductions,50574756},
   {garbage_collection,[{max_heap_size,#{error_logger => true,...}},
                        {min_bin_vheap_size,46422},
                        {min_heap_size,...},
                        {...}|...]},
   {suspending,[]}],
  <35401.5780.0>},
 {9926,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9926},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup,
                               riak_core_sup,<35401.964.0>]},
                {'$initial_call',{riak_core_vnode,init,1}},
                {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>,
                              next => #Fun<rand.5.10897371>,type => exrop,
                              uniform => #Fun<rand.6.10897371>,
                              uniform_n => #Fun<rand.7.10897371>,...},
                            [178958286728748668|110879805029760584]}}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,2805586},
   {heap_size,318187},
   {stack_size,43},
   {reductions,48737344},
   {garbage_collection,[{max_heap_size,#{...}},
                        {min_bin_vheap_size,...},
                        {...}|...]},
   {suspending,[]}],
  <35401.5904.0>},
 {9925,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9925},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup,
                               riak_core_sup,<35401.964.0>]},
                {'$initial_call',{riak_core_vnode,init,1}},
                {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>,
                              next => #Fun<rand.5.10897371>,type => exrop,
                              uniform => #Fun<rand.6.10897371>,...},
                            [100487286008278496|276410797723704352]}}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,2805586},
   {heap_size,318187},
   {stack_size,43},
   {reductions,55232873},
   {garbage_collection,[{max_heap_size,...},{...}|...]},
   {suspending,[]}],
  <35401.5786.0>},
 {9925,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9925},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup,
                               riak_core_sup,<35401.964.0>]},
                {'$initial_call',{riak_core_vnode,init,1}},
                {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>,
                              next => #Fun<rand.5.10897371>,type => exrop,...},
                            [98775941658005378|30697611948250862]}}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,2805586},
   {heap_size,318187},
   {stack_size,43},
   {reductions,49496244},
   {garbage_collection,[{...}|...]},
   {suspending,[]}],
  <35401.5824.0>},
 {9923,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9923},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup,
                               riak_core_sup,<35401.964.0>]},
                {'$initial_call',{riak_core_vnode,init,1}},
                {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>,
                              next => #Fun<rand.5.10897371>,...},
                            [192068539881179442|63064207642334594]}}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,3499716},
   {heap_size,514838},
   {stack_size,43},
   {reductions,45967488},
   {garbage_collection,[...]},
   {suspending,...}],
  <35401.5888.0>},
 {9921,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9921},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup,
                               riak_core_sup,<35401.964.0>]},
                {'$initial_call',{riak_core_vnode,init,1}},
                {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>,...},
                            [30413726223388013|...]}}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,2805641},
   {heap_size,318187},
   {stack_size,43},
   {reductions,50465305},
   {garbage_collection,...},
   {...}],
  <35401.5886.0>},
 {9920,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9920},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup,
                               riak_core_sup,<35401.964.0>]},
                {'$initial_call',{riak_core_vnode,init,1}},
                {rand_seed,{#{bits => 58,...},[...]}}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,2805586},
   {heap_size,318187},
   {stack_size,43},
   {reductions,...},
   {...}|...],
  <35401.5820.0>},
 {9918,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9918},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup,
                               riak_core_sup,<35401.964.0>]},
                {'$initial_call',{riak_core_vnode,init,...}},
                {rand_seed,{#{...},...}}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,3499716},
   {heap_size,514838},
   {stack_size,...},
   {...}|...],
  <35401.5860.0>},
 {9916,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9916},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup,
                               riak_core_sup|...]},
                {'$initial_call',{riak_core_vnode,...}},
                {rand_seed,{...}}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,2805586},
   {heap_size,...},
   {...}|...],
  <35401.5896.0>},
 {9915,
  [{current_function,{gen,do_call,4}},
   {initial_call,{proc_lib,init_p,5}},
   {status,waiting},
   {message_queue_len,9915},
   {links,[<35401.968.0>]},
   {dictionary,[{'$ancestors',[riak_core_vnode_sup|...]},
                {'$initial_call',{...}},
                {rand_seed,...}]},
   {trap_exit,true},
   {error_handler,error_handler},
   {priority,normal},
   {group_leader,<35401.963.0>},
   {total_heap_size,...},
   {...}|...],
  <35401.5784.0>},
...

Commands used:

Infos = {erlang:process_info(P), P} || P <- erlang:processes()]. 
I1 = lists:reverse(lists:sort([{proplists:get_value(message_queue_len, I), I, P} || {I, P} <- Infos])).

The message queue length keeps increasing and accordingly the process memory usage is increasing:

image

(Benchmarks run at ~18:00 and 0:00 for about 2 minutes each)

@albsch
Copy link
Member

albsch commented Nov 12, 2019

This can be reproduced locally with a docker setup by limiting the cpu of one of the data centers.

queues.zip

@albsch
Copy link
Member

albsch commented Nov 13, 2019

Related: #320

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants