Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT: Support wildcards in topic filters matching retained messages #13048

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

getlarge
Copy link

Proposed Changes

This PR would fix #8824.

The first step is to add a glob matcher (deps/rabbitmq_mqtt/src/rabbit_globber.erl) that could work with different separators and wildcards symbols (e.g. /, ., +). It is inspired by the rabbit_db_topic_exchange, except that it is storage agnostic to accommodate the current and future message store modules (#8096).

The second step is to integrate the rabbit_globber in the storage modules:

  • rabbit_mqtt_retained_msg_store_ets
  • rabbit_mqtt_retained_msg_store_dets
    Either via a separate exported function or inside the lookup function.
    This would require rabbit_mqtt_retainer:handle_call to accept and reply with a list of #mqtt_msg records.

Questions

  • How do you correctly enable and configure a new test suite? I tried in a separate simple project and successfully ran the rabbit_globber tests, but not in this more complex setup (using gmake ct-globber). I am new to Erlang and Bazel.
  • What do you think of this globber implementation?
  • Should we create a separate export for the storage modules and lookup by pattern in rabbit_mqtt_retained_msg_store?
  • Should we limit the number of topics that the pattern can match and thus the number of retained messages fetched?

Types of Changes

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)
  • Build system and/or CI

Checklist

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • I have added tests that prove my fix is effective or that my feature works
  • All tests pass locally with my changes
  • If relevant, I have added necessary documentation to https://github.com/rabbitmq/rabbitmq-website
  • If relevant, I have added this change to the first version(s) in release-notes that I expect to introduce it

Further Comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution
you did and what alternatives you considered, etc.

@michaelklishin
Copy link
Member

@getlarge thank you for taking the time to contribute. Have you see CONTRIBUTING.md?

There really isn't anything to add for Make, and Bazel is not used in main any more but you can run

bazel run gazelle

and it will update the Bazel files. It would be a good idea to do so even though in main, Bazel is not used in the OSS edition (since it wasn't completely removed either yet).

As for the implementation, are you referring to the test cases? Because I don't see a complete implementation in the GitHub diff, only a stub of sorts.

@getlarge
Copy link
Author

@getlarge thank you for taking the time to contribute. Have you see CONTRIBUTING.md?

Yes, I read it but saw all those declarations in the Bazel files, so I wondered if the same had to be done for the new module and test files.

There really isn't anything to add for Make, and Bazel is not used in main any more but you can run

bazel run gazelle

and it will update the Bazel files. It would be a good idea to do so even though in main, Bazel is not used in the OSS edition (since it wasn't completely removed either yet).

I'll try that.

As for the implementation, are you referring to the test cases? Because I don't see a complete implementation in the GitHub diff, only a stub of sorts.

The implementation has not been pushed yet. I want the rabbit_globber unit tests to pass first, but I'm stuck there at the moment, and I can't make much sense of the error message.
As said, I started with a very simple workspace here and was able to run the tests successfully in the Erlang shell:

c(rabbit_globber).
c(rabbit_globber_tests).
eunit:test(rabbit_globber_tests).

outputs:

4> c(rabbit_globber).
{ok,rabbit_globber}
5> c(rabbit_globber_tests).
{ok,rabbit_globber_tests}
6> eunit:test(rabbit_globber_tests).
  All 8 tests passed.
ok

but not in the rabbitmq-server:

# run the command from CONTRIBUTING.md, then:
cd deps/rabbitmq_mqtt 
gmake ct-globber

Which outputs:

== globber_SUITE ==

  * [tests]

Updating /Users/edouard/Dev/rabbitmq/rabbitmq-server/logs/index.html ... done
Updating /Users/edouard/Dev/rabbitmq/rabbitmq-server/logs/all_runs.html ... done

gmake: *** [../../erlang.mk:6083: ct-globber] Error 1
Screenshot 2025-01-11 at 15 30 06

Is there anything obvious that i am missing?


To hint at the eventual globber's usage, here's a suggested implementation of a lookup_by_pattern function that would be called in the rabbit_mqtt_retained_msg_store_ets module when a topic contains wildcard(s):

-spec lookup_by_pattern(topic(), store_state()) -> [mqtt_msg()].
lookup_by_pattern(Pattern, #store_state{table = T}) ->
  Globber = rabbit_globber:new(<<"/">>, <<"+">>, <<"#">>),
  rabbit_globber:add(Globber, Pattern),
  Matcher =
    fun(#retained_message{topic = Topic}) -> rabbit_globber:test(Globber, Topic) end,
  Msgs = ets:tab2list(T),
  lists:filter(Matcher, Msgs).

Note
This is a naive implementation, ideally, we should derive a pattern compatible with ETS to have enough values to compare with Globber and not too many to improve performance
use ets:select (or match_object) to match by pattern and reduce the amount of topics to compare

@michaelklishin
Copy link
Member

michaelklishin commented Jan 11, 2025

@getlarge there is no shortage of unit tests (tests that do not start and stop nodes):

to list just a few examples.

@michaelklishin
Copy link
Member

michaelklishin commented Jan 11, 2025

testcase0_failed, bad_return_value comes from Common Test, not erlang.mk or EUnit.

We run all of our suites with Common Test, EUnit is only used as matcher library in RabbitMQ (individual dependencies such as Ra or Khepri can use EUnit but you are contributing to RabbitMQ itself). That's that "ct" in gmake ct-* means, Common Test.

So you are comparing apples to oranges: EUnit tests run from the shell to Common Test suites driven by erlang.mk.

@michaelklishin
Copy link
Member

*** ERROR *** Invalid return value from globber_SUITE:new/0: ok

listed by Common Test is misleading. The problem is not with the return value, CT tests can return anything.

The problem is that these functions do not accept a mandatory argument for CT tests: a CT Config, which unit tests very rarely need but for all other types, it is absolutely essential.

CT seemingly has a problem with a test function named test, as do I.

@michaelklishin
Copy link
Member

michaelklishin commented Jan 11, 2025

@getlarge with the following changes, all tests succeed with gmake ct-globber:

diff --git a/deps/rabbitmq_mqtt/test/globber_SUITE.erl b/deps/rabbitmq_mqtt/test/globber_SUITE.erl
index cc9f19e42a..fcdeceb977 100644
--- a/deps/rabbitmq_mqtt/test/globber_SUITE.erl
+++ b/deps/rabbitmq_mqtt/test/globber_SUITE.erl
@@ -1,3 +1,8 @@
+%% This Source Code Form is subject to the terms of the Mozilla Public
+%% License, v. 2.0. If a copy of the MPL was not distributed with this
+%% file, You can obtain one at https://mozilla.org/MPL/2.0/.
+%%
+%% Copyright (c) 2007-2025 Broadcom. All Rights Reserved. The term “Broadcom” refers to Broadcom Inc. and/or its subsidiaries. All rights reserved.
 -module(globber_SUITE).

 -compile([export_all, nowarn_export_all]).
@@ -8,10 +13,22 @@
 all() ->
     [{group, tests}].

-groups() ->
-    [{tests, [shuffle], [new]}].
+    groups() ->
+        [
+         {tests, [parallel], [
+                              new,
+                              add,
+                              remove,
+                              match,
+                              matching,
+                              match_iter,
+                              clear,
+                              multiple_patterns
+                             ]
+         }
+        ].

-new() ->
+new(_Config) ->
     Globber = rabbit_globber:new(),
     ?assertEqual(#globber{}, Globber),
     Globber2 = rabbit_globber:new(<<"/">>, <<"*">>, <<"#">>),
@@ -20,20 +37,20 @@ new() ->
                           wildcard_some = <<"#">>},
                  Globber2).

-add() ->
+add(_Config) ->
     Globber = rabbit_globber:new(),
     Globber1 = rabbit_globber:add(Globber, <<"test.*">>, <<"matches">>),
     ?assertMatch(#globber{trie = _}, Globber1),
     Globber2 = rabbit_globber:add(Globber1, <<"test.#">>, <<"it n">>),
     ?assertMatch(#globber{trie = _}, Globber2).

-remove() ->
+remove(_Config) ->
     Globber = rabbit_globber:new(),
     Globber1 = rabbit_globber:add(Globber, <<"test.*">>, <<"matches">>),
     Globber2 = rabbit_globber:remove(Globber1, <<"test.*">>, <<"matches">>),
     ?assertEqual(Globber, Globber2).

-match() ->
+match(_Config) ->
     Globber = rabbit_globber:new(),
     Globber1 = rabbit_globber:add(Globber, <<"test.*">>, <<"it matches">>),
     Result = rabbit_globber:match(Globber1, <<"test.bar">>),
@@ -43,25 +60,25 @@ match() ->
     Result3 = rabbit_globber:match(Globber1, <<"not.foo">>),
     ?assertEqual([], Result3).

-test() ->
+matching(_Config) ->
     Globber = rabbit_globber:new(),
     Globber1 = rabbit_globber:add(Globber, <<"test.*">>),
     ?assertEqual(true, rabbit_globber:test(Globber1, <<"test.bar">>)),
     ?assertEqual(false, rabbit_globber:test(Globber1, <<"foo.bar">>)).

-match_iter() ->
+match_iter(_Config) ->
     Globber = rabbit_globber:new(),
     Globber1 = rabbit_globber:add(Globber, <<"test.*">>, <<"matches">>),
     Result = rabbit_globber:match_iter(Globber1, <<"test.bar">>),
     ?assertEqual([<<"matches">>], Result).

-clear() ->
+clear(_Config) ->
     Globber = rabbit_globber:new(),
     Globber1 = rabbit_globber:add(Globber, <<"test.*">>, <<"matches">>),
     Globber2 = rabbit_globber:clear(Globber1),
     ?assertEqual(Globber, Globber2).

-multiple_patterns() ->
+multiple_patterns(_Config) ->
     Globber = rabbit_globber:new(<<".">>, <<"*">>, <<"#">>),
     Globber1 = rabbit_globber:add(Globber, <<"foo.#">>, <<"catchall">>),
     Globber2 = rabbit_globber:add(Globber1, <<"foo.*.bar">>, <<"single_wildcard">>),
gmake ct-globber

# => [elided]
# => 
# => Common Test v1.26.2.3 starting (cwd is [elided])
# => 
# => [elided]
# => 
# => CWD set to: 
# => 
# => TEST INFO: 1 test(s), 8 case(s) in 1 suite(s)
# => 
# => == globber_SUITE ==
# => 
# => 
# =>   * [tests]
# => 
# => == globber_SUITE ==
# => 
# =>   * [tests]
# =>     add (00:00.002)
# =>     match_iter (00:00.002)
# =>     remove (00:00.001)
# =>     match (00:00.002)
# =>     matching (00:00.001)
# =>     clear (00:00.001)
# =>     multiple_patterns (00:00.001)

The module and suite need a better name, at the very least, they must be prefixed with rabbit_mqtt_ and perhaps should hint at the fact that the module has to do with topic matching, so a name such as rabbit_mqtt_topic_matcher (if it is MQTT-specific).

The namespace for modules in Erlang is flat, so a single word module is rarely appropriate. And yes, we ended up with a module named mc but it's too late to change that now.

@michaelklishin
Copy link
Member

michaelklishin commented Jan 11, 2025

Note that for unit tests that have no shared state, the group option of parallel (a shortcut for {parallel, true}) is appropriate and a good idea but for integration tests that inevitably will have a shared mutable state — the node or cluster — it likely won't be, although there are integration suites in RabbitMQ that run some or all tests in parallel.

@michaelklishin michaelklishin changed the title Support wildcards in MQTT topic filters matching retained messages MQTT: Support wildcards in topic filters matching retained messages Jan 11, 2025
@ansd
Copy link
Member

ansd commented Jan 13, 2025

@getlarge I just updated the description in #8824 (comment) to add requirements a solution should comply with. Specifically, the solution should not perform full ETS / DETS table scans every time a client subscribes with a topic filter containing a wildcard because this will be prohibitively expensive if there are many retained messages.

@getlarge
Copy link
Author

getlarge commented Jan 13, 2025

@getlarge I just updated the description in #8824 (comment) to add requirements a solution should comply with. Specifically, the solution should not perform full ETS / DETS table scans every time a client subscribes with a topic filter containing a wildcard because this will be prohibitively expensive if there are many retained messages.

@ansd, As mentioned in my note in this comment, we could derive a pattern compatible with ETS from the topic containing wildcard and use ets:select (or ets:match_object) to match, right? Of course, it would depend on the accuracy of the Match specifications, but would that be a sane choice?
If not, could you suggest some alternative?

The solution must not overload the broker. For example if there are 100k retained messages, and a client subscribes with a topic filter matching 50k of these retained messages, the broker must not send all 50k messages in one go.

What would be a reasonable amount?

@getlarge
Copy link
Author

*** ERROR *** Invalid return value from globber_SUITE:new/0: ok

listed by Common Test is misleading. The problem is not with the return value, CT tests can return anything.

The problem is that these functions do not accept a mandatory argument for CT tests: a CT Config, which unit tests very rarely need but for all other types, it is absolutely essential.

CT seemingly has a problem with a test function named test, as do I.

Thanks for the explanation.

@getlarge
Copy link
Author

The module and suite need a better name, at the very least, they must be prefixed with rabbit_mqtt_ and perhaps should hint at the fact that the module has to do with topic matching, so a name such as rabbit_mqtt_topic_matcher (if it is MQTT-specific).

The namespace for modules in Erlang is flat, so a single word module is rarely appropriate. And yes, we ended up with a module named mc but it's too late to change that now.

@michaelklishin I did not use the complete plugin name as a prefix, because the module usage could be extended to different protocols. To keep it simple, i will do as you suggest and we'll see in the future if you want to consider using this matcher for more general purpose usage.

@ansd
Copy link
Member

ansd commented Jan 13, 2025

As mentioned in my note in #13048 (comment), we could derive a pattern compatible with ETS from the topic containing wildcard and use ets:select (or ets:match_object) to match, right? Of course, it would depend on the accuracy of the Match specifications, but would that be a sane choice?
If not, could you suggest some alternative?

@getlarge ets:select or ets:match_object will not be a sane choice if it results in full ETS table scan. An alternative is to re-organise how topics are stored in the database. For example, the topic exchange implementation uses a trie data structure to store the topic filters (AMQP 0.9.1 bindings) containing wildcards. Incoming messages do not contain wildcards and are matched against this trie.
What we are trying to achieve here is to do the same efficiently in the opposite direction, meaning the database stores topics (not containing wildcards) and the input is a topic filter (can contain wildcards). It's similar in functionality what the reverse topic exchange does.

Either way, the bottom line is we want to avoid full O(n) table scans. O (log n) is acceptable.

@getlarge
Copy link
Author

getlarge commented Jan 17, 2025

I changed the strategy to simplify things: instead of creating an abstract module that would support X, Y, Z storage, I focused on the ETS storage. Following @ansd's suggestion, I implemented a trie to store exploded topics.

Test

I added a new test module (rabbit_mqtt_retained_msg_store_ets_SUITE) to check the behavior since the retainer_SUITE tests have yet to pass and I don't understand the reason for the error.

retainer_SUITE > v5 > ets
    {error,
        {{assertEqual,
             [{module,rabbit_ct_broker_helpers},
              {line,1284},
              {expression,"CrashesCount"},
              {expected,0},
              {value,9}]},
         [{rabbit_ct_broker_helpers,find_crashes_in_logs,2,
              [{file,"rabbit_ct_broker_helpers.erl"},{line,1284}]},
          {rabbit_ct_broker_helpers,stop_rabbitmq_nodes,1,
              [{file,"rabbit_ct_broker_helpers.erl"},{line,1240}]},
          {rabbit_ct_helpers,run_steps,2,
              [{file,"rabbit_ct_helpers.erl"},{line,136}]},
          {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1794}]},
          {test_server,run_test_case_eval1,6,
              [{file,"test_server.erl"},{line,1391}]},
          {test_server,run_test_case_eval,9,
              [{file,"test_server.erl"},{line,1235}]}]}}

retainer_SUITE > v5 > ets > retained_wildcard_single_level
    #1. {'EXIT',{shutdown,tcp_closed}}

retainer_SUITE > v5 > ets > should_translate_amqp2mqtt_on_publish
    #1. {'EXIT',{shutdown,tcp_closed}}

retainer_SUITE > v5 > ets > should_translate_amqp2mqtt_on_retention_search
    #1. {'EXIT',{shutdown,tcp_closed}}

retainer_SUITE > v5 > ets > recover
    #1. {'EXIT',{shutdown,tcp_closed}}

retainer_SUITE > v5 > ets > retained_wildcard_mixed
    #1. {'EXIT',{shutdown,tcp_closed}}

retainer_SUITE > v5 > ets > retained_wildcard_multi_level
    #1. {'EXIT',{shutdown,tcp_closed}}

retainer_SUITE > v5 > ets > should_translate_amqp2mqtt_on_retention
    #1. {'EXIT',{shutdown,tcp_closed}}

retainer_SUITE > v5 > ets > coerce_configuration_data
    #1. {'EXIT',{shutdown,tcp_closed}}

retainer_SUITE > v5 > ets > recover_with_message_expiry_interval
    #1. {'EXIT',{shutdown,tcp_closed}}

Benchmark

I also ran a benchmark to measure its performance:

  • How fast the module can find a specific topic ("exact match")
  • How fast it can find multiple topics using wildcards ("wildcard match")
  • How these speeds change when you have more topics in the system

Environment

Model Name: Mac mini
Chip: Apple M4 Pro
Total Number of Cores: 12 (8 performance and 4 efficiency)
Memory: 64 GB

Results

Topic-depth-5

Topic-depth-7

Exact matches are fast (logarithmic scaling ?), while wildcard matches are slower but perform reasonably well, even with large numbers of topics stored.
We could compare the current implementation, what do you think?

Note:
Blue bars are for exact matches
Grey bars are the average for matches with wildcards
Depth: topic segment counts

Code

This is the code used to generate these charts:

-module(rabbit_mqtt_topic_storage_bench).

-export([run/0, run/1]).

-include("rabbit_mqtt_topic_storage_ets.hrl").

% Test with exponentially increasing topic counts
-define(TOPIC_COUNTS, [1000, 2000, 4000, 8000, 16000, 32000, 64000, 128000, 256000]).
-define(DEPTH, 7).
-define(WARMUP_ITERATIONS, 1000).
-define(TEST_ITERATIONS, 5000).
-define(BATCH_SIZE, 100).

run() ->
    init_random(),
    run(?TOPIC_COUNTS).

run(TopicCounts) when is_list(TopicCounts) ->
    Results = lists:map(fun run_scenario/1, TopicCounts),
    print_results(Results),
    generate_charts(Results).

run_scenario(TopicCount) ->
    io:format("~nTesting with ~p topics~n", [TopicCount]),
    {ok, State} = rabbit_mqtt_topic_storage_ets:init(),

    % Create varied test data
    TestData = create_test_data(TopicCount, ?DEPTH),
    PopulatedState = populate_store(State, TestData),

    % Warm up heavily to ensure JIT stabilization
    _ = bench_warmup(PopulatedState, ?WARMUP_ITERATIONS),

    % Run actual benchmark
    {ExactTimes, WildcardTimes} = bench_lookups(PopulatedState, ?TEST_ITERATIONS, TestData),

    cleanup(PopulatedState),

    #{topic_count => TopicCount,
      exact_times => analyze_times(ExactTimes),
      wildcard_times => analyze_times(WildcardTimes)}.

% test data section
create_test_data(Count, Depth) ->
    % Create diverse topics that will match our wildcard patterns
    lists:map(fun(N) ->
                 % Generate base topic like "1/level1/level2/level3/level4"
                 Topic = generate_topic(N, Depth),

                 % Create wildcard pattern that will match this and similar topics
                 % For a topic "1/level1/level2/level3/level4"
                 % Pattern will be "1/+/level2/#" - guaranteed to match some topics
                 Parts = binary:split(Topic, <<"/">>, [global]),
                 WildcardPattern =
                     case Parts of
                         [First | _] ->
                             % Create pattern that will match this and similar topics
                             iolist_to_binary([First, "/+/level2/#"])
                     end,

                 {N, Topic, WildcardPattern}
              end,
              lists:seq(1, Count)).

populate_store(State, TestData) ->
    lists:foldl(fun({N, Topic, _}, AccState) ->
                   Value = iolist_to_binary(["msg", integer_to_list(N)]),
                   {ok, NewState} = rabbit_mqtt_topic_storage_ets:insert(Topic, Value, AccState),
                   NewState
                end,
                State,
                TestData).

generate_topic(N, Depth) ->
    % For each N, create several similar topics that will match the same wildcard
    TopicNum = N div 10,  % Group topics by tens to ensure wildcard matches
    Variation = N rem 10, % Use remainder to create variations
    Parts =
        [integer_to_list(TopicNum),      % First level is the group number
         lists:concat(["var", integer_to_list(Variation)]),  % Second level varies
         "level2"  % Fixed level that wildcards will match
         | [lists:concat(["level", integer_to_list(I)]) || I <- lists:seq(3, Depth - 1)]],
    iolist_to_binary(string:join(Parts, "/")).

cleanup(State) ->
    ets:delete(State#state.node_table),
    ets:delete(State#state.edge_table),
    ets:delete(State#state.topic_table).

% benchmark
bench_warmup(State, Iterations) ->
    Topics = [generate_topic(N, ?DEPTH) || N <- lists:seq(1, 10)],
    Patterns = [iolist_to_binary([integer_to_list(N), "/+/#"]) || N <- lists:seq(1, 10)],

    lists:foreach(fun(_) ->
                     [rabbit_mqtt_topic_storage_ets:lookup(T, State) || T <- Topics],
                     [rabbit_mqtt_topic_storage_ets:lookup(P, State) || P <- Patterns]
                  end,
                  lists:seq(1, Iterations)).

bench_lookups(State, Iterations, TestData) ->
    % Select random test cases for each batch
    BatchCount = Iterations div ?BATCH_SIZE,
    ExactBatches =
        [bench_exact_batch(State, TestData, ?BATCH_SIZE) || _ <- lists:seq(1, BatchCount)],
    WildBatches =
        [bench_wildcard_batch(State, TestData, ?BATCH_SIZE) || _ <- lists:seq(1, BatchCount)],

    {lists:flatten(ExactBatches), lists:flatten(WildBatches)}.

bench_exact_batch(State, TestData, BatchSize) ->
    % Take random samples for each batch
    Samples = random_samples(TestData, BatchSize),
    [{Time, Matches}
     || {_, Topic, _} <- Samples,
        {Time, {ok, Matches}}
            <- [timer:tc(fun() -> rabbit_mqtt_topic_storage_ets:lookup(Topic, State) end)]].

bench_wildcard_batch(State, TestData, BatchSize) ->
    Samples = random_samples(TestData, BatchSize),
    [{Time, Matches}
     || {_, _, Pattern} <- Samples,
        {Time, {ok, Matches}}
            <- [timer:tc(fun() -> rabbit_mqtt_topic_storage_ets:lookup(Pattern, State) end)]].

random_samples(List, N) ->
    % Select random elements without replacement using older random functionality since rand was not available
    Length = length(List),
    Indices = lists:sort([{random:uniform(), X} || X <- lists:seq(1, Length)]),
    Selected = lists:sublist([I || {_, I} <- Indices], N),
    [lists:nth(I, List) || I <- Selected].

% Initialize random seed at the start
init_random() ->
    {A, B, C} = os:timestamp(),
    random:seed(A, B, C).

% measure
analyze_times(TimedResults) ->
    Times = [Time / 1000.0 || {Time, _} <- TimedResults],  % Convert to ms
    Matches = [length(M) || {_, M} <- TimedResults],

    #{times =>
          #{min => lists:min(Times),
            max => lists:max(Times),
            avg => lists:sum(Times) / length(Times),
            median => median(Times),
            p95 => percentile(Times, 95)},
      matches =>
          #{min => lists:min(Matches),
            max => lists:max(Matches),
            avg => lists:sum(Matches) / length(Matches)}}.

median(List) ->
    Sorted = lists:sort(List),
    Length = length(Sorted),
    Middle = Length div 2,
    case Length rem 2 of
        0 ->
            (lists:nth(Middle, Sorted) + lists:nth(Middle + 1, Sorted)) / 2;
        1 ->
            lists:nth(Middle + 1, Sorted)
    end.

percentile(List, P) when P >= 0, P =< 100 ->
    Sorted = lists:sort(List),
    Length = length(Sorted),
    N = round(P * Length / 100),
    lists:nth(max(1, min(N, Length)), Sorted).

print_results(Results) ->
    io:format("~n=== Benchmark Results for depth ~B ===~n", [?DEPTH]),
    io:format("~-12s ~-15s ~-15s ~-15s ~-15s ~-15s~n",
              ["Topics",
               "Exact Avg(ms)",
               "Exact P95(ms)",
               "Wild Avg(ms)",
               "Wild P95(ms)",
               "Wild Matches"]),
    io:format("~s~n", [string:copies("-", 87)]),
    lists:foreach(fun(R) ->
                     #{topic_count := Count,
                       exact_times := #{times := #{avg := ExactAvg, p95 := ExactP95}},
                       wildcard_times :=
                           #{times := #{avg := WildAvg, p95 := WildP95},
                             matches := #{avg := MatchAvg}}} =
                         R,
                     io:format("~-12B ~-15.3f ~-15.3f ~-15.3f ~-15.3f ~-15.1f~n",
                               [Count, ExactAvg, ExactP95, WildAvg, WildP95, MatchAvg])
                  end,
                  Results).

% generate charts section
generate_charts(Results) ->
    Charts = [generate_time_chart(Results), generate_matches_chart(Results)],
    file:write_file("complexity_analysis.md", Charts).

generate_time_chart(Results) ->
    XAxis = [integer_to_list(Count) || #{topic_count := Count} <- Results],
    YExact =
        [maps:get(avg, maps:get(times, ExactTimes)) || #{exact_times := ExactTimes} <- Results],
    YWild =
        [maps:get(avg, maps:get(times, WildTimes)) || #{wildcard_times := WildTimes} <- Results],

    ["```mermaid\n",
     "%%{init: {'theme': 'base'}}%%\n",
     "xychart-beta\n",
     "    title \"Average Lookup Time vs Topic Count with depth = ",
     integer_to_list(?DEPTH),
     "\"\n",
     "    x-axis [",
     string:join(XAxis, ", "),
     "]\n",
     "    y-axis \"Time (ms)\" 0 --> ",
     io_lib:format("~.3f", [lists:max(YWild) * 1.2]),
     "\n",
     "    bar [",
     string:join([io_lib:format("~.3f", [Y]) || Y <- YWild], ", "),
     "]\n",
     "    bar [",
     string:join([io_lib:format("~.3f", [Y]) || Y <- YExact], ", "),
     "]\n",
     "```\n\n"].

generate_matches_chart(Results) ->
    XAxis = [integer_to_list(Count) || #{topic_count := Count} <- Results],
    YMatches =
        [maps:get(avg, maps:get(matches, WildTimes))
         || #{wildcard_times := WildTimes} <- Results],

    ["```mermaid\n",
     "%%{init: {'theme': 'base'}}%%\n",
     "xychart-beta\n",
     "    title \"Average Wildcard Matches vs Topic Count\"\n",
     "    x-axis [",
     string:join(XAxis, ", "),
     "]\n",
     "    y-axis \"Matches\" 0 --> ",
     io_lib:format("~.1f", [lists:max(YMatches) * 1.2]),
     "\n",
     "    bar [",
     string:join([io_lib:format("~.1f", [Y]) || Y <- YMatches], ", "),
     "]\n",
     "```\n"].

@michaelklishin
Copy link
Member

@getlarge these numbers look promising. Can you compare the throughput (and latency, if you have the MQTT tooling that makes that easy) vs. 4.0.5?

I understand that we are talking retained messages here, this is mostly to make sure there are not meaningful regressions.

@getlarge
Copy link
Author

@getlarge these numbers look promising. Can you compare the throughput (and latency, if you have the MQTT tooling that makes that easy) vs. 4.0.5?

I understand that we are talking retained messages here, this is mostly to make sure there are not meaningful regressions.

To be sure I got it right, you asked to run RabbitMQ with the MQTT plugin using this new ETS store VS RabbitMQ 4.0.5 with the current MQTT plugin retained messages (ETS) store, correct?
What do you want to measure exactly? How many MQTT retained messages can be published per seconds?
Can you suggest a benchmark tool for MQTT brokers? Ideally, easy to setup and use, which produces simple outputs (Markdown or something else to render). I found these so far:

@michaelklishin
Copy link
Member

@getlarge I wasn't talking about the retention case. Here is an example of what our team does but you don't have to be that thorough, just a basic benchmark, 3 runs or so, will give you an idea.

Retention specifically can be tested next. You can try 10K retained messages, then 100K, then 1M, then 10M. Just measure how long it takes for them to be delivered, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support wildcards in MQTT topic filters matching retained messages
3 participants