Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat[MQB, MWC]: report the number of tcp connections #384

Merged
merged 12 commits into from
Sep 19, 2024

Conversation

waldgange
Copy link
Collaborator

No description provided.

@chrisbeard chrisbeard requested a review from hallfox July 31, 2024 14:05
@waldgange waldgange force-pushed the tcpconnections branch 4 times, most recently from d785037 to 2282eb1 Compare August 7, 2024 12:02
@waldgange waldgange marked this pull request as ready for review August 7, 2024 12:28
@waldgange waldgange requested a review from a team as a code owner August 7, 2024 12:28
&d_portsMutex,
bdlf::PlaceHolders::_1)));
channelStatContext = portStatContext->addSubcontext(statConfig);
d_portsMap.emplace(portStatContext->name(), portStatContext);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it doesn't compile on Solaris

[2024-08-07T12:57:33.062Z] "/SunOS/opt/bb/include/bslstl_pair.h", line 2528: Error: Could not find a match for BloombergLP::bslma::ManagedPtr<BloombergLP::mwcst::StatContext>::ManagedPtr<MANAGED_TYPE>(const BloombergLP::bslma::ManagedPtr<BloombergLP::mwcst::StatContext>) needed in void bsl::Pair_Second<BloombergLP::bslma::ManagedPtr<BloombergLP::mwcst::StatContext>>::Pair_Second<BloombergLP::bslma::ManagedPtr<BloombergLP::mwcst::StatContext>>(const BloombergLP::bslma::ManagedPtr<BloombergLP::mwcst::StatContext>&, BloombergLP::bslma::Allocator*, bsl::integral_constant<int, 0>).

...

[2024-08-07T12:57:33.062Z] "/opt/bb/include/bslstl_unorderedmap_cpp03.h", line 3617:     Where: Instantiated from bsl::pair<BloombergLP::bslstl::HashTableIterator<bsl::pair<const bsl::basic_string<char, std::char_traits<char>, bsl::allocator<char>>, BloombergLP::bslma::ManagedPtr<BloombergLP::mwcst::StatContext>>, long>, bool> bsl::unordered_map<bsl::basic_string<char, std::char_traits<char>, bsl::allocator<char>>, BloombergLP::bslma::ManagedPtr<BloombergLP::mwcst::StatContext>, bsl::hash<bsl::basic_string<char, std::char_traits<char>, bsl::allocator<char>>>, bsl::equal_to<bsl::basic_string<char, std::char_traits<char>, bsl::allocator<char>>>, bsl::allocator<bsl::pair<const bsl::basic_string<char, std::char_traits<char>, bsl::allocator<char>>, BloombergLP::bslma::ManagedPtr<BloombergLP::mwcst::StatContext>>>>::emplace<bsl::basic_string<char, std::char_traits<char>, bsl::allocator<char>>, BloombergLP::bslma::ManagedPtr<BloombergLP::mwcst::StatContext>>(const bsl::basic_string<char, std::char_traits<char>, bsl::allocator<char>>&, const BloombergLP::bslma::ManagedPtr<BloombergLP::mwcst::StatContext>&).

You might introduce parallel structures bsl::list<ManagedPtr<...> > and bsl::unordered_map<bsl::string, StatContext*> like it is done in this PR:
#389

Or you might find another solution

@@ -100,6 +101,21 @@ const int k_SESSION_DESTROY_WAIT = 20;
const int k_CLIENT_CLOSE_WAIT = 20;
// Time to wait incrementally (in seconds) for all clients and
// proxies to be destroyed during stop sequence.
const char k_PORT_PATTERN[] = ":(\\d{1,5})";

bsl::string portFromUri(const bsl::string& endpoint)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bsl::string portFromUri(const bsl::string& endpoint)
inline bsl::string portFromUri(const bsl::string& endpoint)

@@ -799,6 +799,10 @@ class StatContext {
/// all sibling subcontexts.
int uniqueId() const;

/// Return te number of times this 'StatContext' had 'snapshot' called on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Return te number of times this 'StatContext' had 'snapshot' called on
/// Return the number of times this 'StatContext' had 'snapshot' called on

std::vector<bsl::string_view> matchVector;

BSLS_ASSERT_SAFE(
0 == regEx.prepare(&errorMessage, &errorOffset, k_PORT_PATTERN));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary to prepare the same regEx on any opened channel.
We can initialize it once and reuse it from different threads since it's const thread-safe:
https://bloomberg.github.io/bde-resources/doxygen/bde_api_prod/group__bdlpcre__regex.html#3.5


BSLS_ASSERT_SAFE(
0 == regEx.prepare(&errorMessage, &errorOffset, k_PORT_PATTERN));
BSLS_ASSERT_SAFE(0 == regEx.match(&matchVector, endpoint));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use match overload with one string_view* instead, not needed to construct an entire vector for this
https://bloomberg.github.io/bde-resources/doxygen/bde_api_prod/classbdlpcre_1_1RegEx.html#a6c7d41da3bddf9caa0aba7fd2677a68b

0 == regEx.prepare(&errorMessage, &errorOffset, k_PORT_PATTERN));
BSLS_ASSERT_SAFE(0 == regEx.match(&matchVector, endpoint));

return bsl::string(matchVector[1]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can return a string_view as well, since it relies on the string passed in arg, so we don't construct a string with a default allocator

BSLS_ASSERT_SAFE(parent);

bsl::string name;
bsl::string name, localPort;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rather use the provided allocator. If we do so, we can see in the stats table any small allocations within this component, and debug it if needed

Suggested change
bsl::string name, localPort;
bsl::string name(d_allocator_p), localPort(d_allocator_p);

const mwcst::StatContext& context)
{
// Lookup the port's StatContext and remove it from the 'map'
bslmt::LockGuard<bslmt::Mutex> guard(mutex); // LOCK
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, you introduce a synchronization between different threads. We might assume it's okay, but if it's not, it's difficult to diagnose.

I think it's possible to avoid this, by storing ManagedPtr<StatContext> within a corresponding channel object. So if we close this channel and call stop or destructor (not sure which one), we free this ManagedPtr. StatContext class already has the mechanism to keep track of deleted subcontexts. So, if we call a destructor for ManagedPtr to subcontext, the parent context will know about this and remove raw pointer stored to it.

Copy link
Collaborator Author

@waldgange waldgange Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the threadsafety of StatContext class is only partial.

// In simpler terms, only 'addSubContext', 'addSubTable', 'adjustValue',
// 'reportValue', and 'setValue' are thread-safe. All other functions should
// be considered not thread safe.

For example, I can't iterate over subcontexts that can be added/deleted without additional synchronisation anywhere except the snapshot() method or its synchronious pre- or post- steps, as handling the deleted subcontexts happens inside this method. Please find below some arguments in favor of the current imlementation:
Each channel stores its own StatContext, and this approach hasn't changed. What I have done is another 'layer' of StatContexts between channel contexts and their parent. This layer serves for breaking down the channels stats by ports. I really need to have only one StatContext for each port as I report aggregated stats. The inter-thread synchronisation is required here, because:

  1. portsDeleter() is always called from StatController's thread, but TCPSessionFactory::channelStatContextCreator and, hence, constructors and destructors of channel-related StatContexts are called from TCPSessionFactory's threads.
  2. Although StatContext::addSubcontext() is a thread-safe method, it will create a new StatContext instance even if there are sub-contexts with the same name.

So, in order to add a channel StatContext, I need to check if there is a port StatContext for that port and use it, otherways I have to create a new port Statcontext. After that I add a channel sub context to it. So I must have a ManagedPtr's to the StatContexts of the ports and to lock a container (currently, the unordered_map), where they are stored. Otherways race conditions are possible, either with two insertions of the same name port StatContexts, or parent deletion during the adding a new channel subcontext.

If you have an idea how to reach it without the synchronization, please let me know

Copy link
Collaborator

@678098 678098 Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have an idea how to reach it without the synchronization, please let me know

@waldgange let me think about this for some time

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @waldgange
I think the correct way to do it is to pass ManagedPtr to the StatChannel here, instead of shared_ptr:

bsl::shared_ptr<mwcst::StatContext> statContext(
d_config.d_statContextCreator(channel, handleSp));
// Create the channel and notify user
bsl::shared_ptr<StatChannel> newChannel;
newChannel.createInplace(
handleSp->d_allocator_p,
StatChannelConfig(channel, statContext, handleSp->d_allocator_p),
handleSp->d_allocator_p);

So, StatChannel keeps ownership of this StatContext. When a StatChannel goes down, the ManagedPtr to its StatContext goes down too.

Note that it requires to change MWC interfaces. However, StatChannel and StatChannelFactory are used only within BlazingMQ, so I don't see any harm in this. Also, since we plan to own ManagedPtr in StatChannel, the StatChannel class is not copy constructible anymore, so we might explicitly delete copy constructor and assignment

Signed-off-by: Anton Pryakhin <[email protected]>
Signed-off-by: Anton Pryakhin <[email protected]>
@678098 678098 changed the title Report the number of tcp connections Feat[MQB, MWC]: report the number of tcp connections Aug 13, 2024
src/groups/mqb/mqbnet/mqbnet_tcpsessionfactory.cpp Outdated Show resolved Hide resolved
{
bsl::string_view result;

BSLS_ASSERT_SAFE(0 == d_regex.match(&result, endpoint));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to pass an incorrect endpoint here and cause the broker to crash on assert? I am thinking if we should crash here or handle incorrect endpoint

Copy link
Collaborator Author

@waldgange waldgange Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It never happens as an endpoint is always a host:port string. However, I can remove this assert and just return an empty string in case of not matching endpoint. Then broker will proceed working and all the channels with unknown ports will report their TCP connections number metric with an empty tag port


BSLS_ASSERT_SAFE(0 == d_regex.match(&result, endpoint));

return result.substr(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we make substr(1) here because we capture the leading :?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definetley. If we used the match() overload that puts results to a vector, we would just take the [1] element of the vector, that contains sub expresion of the regex, and the [0] element would contain the full match with : in the beginning. However current match() overload only returns full matched expression.

}

StatChannel::~StatChannel()
{
// NOTHING
d_config.d_statContext_sp->adjustValue(Stat::e_CONNECTIONS, -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do per-channel stat contexts, then the number of connections will always be 0 or 1, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I utilize the aggregation functionality Of StatContextTable here in order to deal only with the Channel's StatContext here. Otherways I would have to deal with its parent (port's StatContext), and, hence, store its shared_ptr here in StatChannel class

@@ -799,6 +799,10 @@ class StatContext {
/// all sibling subcontexts.
int uniqueId() const;

/// Return the number of times this 'StatContext' had 'snapshot' called on
/// it.
bsls::Types::Int64 numSnapshots() const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful, but also it's an interface change in MWC. Do we want to keep it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check it in portsDeleter() as before the first snapshot numSubcontexts() returns zero, and I don't want to delete new StatContexts in this case. It happens because StatContext::addSubcontext() puts the newly created contexts to d_newSubcontexts vector.

d_newSubcontexts.push_back(newContext);

Then it moves them to d_statContext vector on next StatContext::snapshot()

However StatContext::numSubcontexts() doesn't take into account d_newSubcontext and will return 0 if there haven't been any stapshots yet.
inline int StatContext::numSubcontexts() const
{
return static_cast<int>(d_subcontexts.size() +
d_deletedSubcontexts.size());
}

So if we don't check the number of snapshots in portsDeleter() we can unintentionally delete the StatContext that has just been added.

Signed-off-by: Anton Pryakhin <[email protected]>
Signed-off-by: Anton Pryakhin <[email protected]>
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 167 of commit 3d29a39 has completed with FAILURE

Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 167 of commit 3d29a39 has completed with FAILURE

Signed-off-by: Anton Pryakhin <[email protected]>
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 232 of commit 3555a56 has completed with FAILURE

Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 236 of commit 4cf0b83 has completed with FAILURE

Signed-off-by: Anton Pryakhin <[email protected]>
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 239 of commit 0565d51 has completed with FAILURE

Copy link
Collaborator

@678098 678098 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts

src/groups/mqb/mqbnet/mqbnet_tcpsessionfactory.cpp Outdated Show resolved Hide resolved
src/groups/mqb/mqbnet/mqbnet_tcpsessionfactory.h Outdated Show resolved Hide resolved
mwcst::StatContextConfiguration portConfig(port, &localAllocator);
bsl::shared_ptr<mwcst::StatContext> portStatContext =
parent->addSubcontext(
portConfig.storeExpiredSubcontextValues(true));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we convert bslma::ManagedPtr to bsl::shared_ptr. How does the ownership transfer works in this situiation? Is this operation safe? Are we sure that the underlying StatContext is not accidentally destroyed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is a common practice:

bsl::shared_ptr<mwcst::StatContext> statContext(
d_config.d_statContextCreator(channel, handleSp));

d_partitionsStatContexts.emplace_back(
bsl::shared_ptr<mwcst::StatContext>(
d_statContext_mp->addSubcontext(
mwcst::StatContextConfiguration(partitionName,
&localAllocator))));

StatContextMp statContextMp =
d_clusterData.clusterNodesStatContext()->addSubcontext(config);
StatContextSp statContextSp(statContextMp, d_allocator_p);

and so on

src/groups/mqb/mqbnet/mqbnet_tcpsessionfactory.h Outdated Show resolved Hide resolved
Signed-off-by: Anton Pryakhin <[email protected]>
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 254 of commit 2e5e9e8 has completed with FAILURE

678098
678098 previously approved these changes Sep 16, 2024
Copy link
Collaborator

@678098 678098 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

678098
678098 previously approved these changes Sep 18, 2024
}

void TCPSessionFactory::PortManager::onDeleteChannelContext(
const bsl::uint16_t port)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For basic numeric types we don't typically specify const in arguments list

return d_streamSocket_sp->sourceEndpoint();
}
else {
return ntsa::Endpoint();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might replace with ternary operator

Signed-off-by: Anton Pryakhin <[email protected]>
Copy link

@bmq-oss-ci bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build 259 of commit 7f29d3a has completed with FAILURE

@waldgange waldgange merged commit 6044740 into bloomberg:main Sep 19, 2024
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants