Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEODE-9602: QueryObserver improvements. #6874

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

albertogpz
Copy link
Contributor

  • Make QueryObserverHolder thread-safe
  • Allow having an observer per query by means of setting the observer
    in the query at the start of the execution.
  • Invoke beforeIterationEvaluation and afterIterationEvaluation callbacks when
    query is using indexes.

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

  • Has your PR been rebased against the latest commit within the target branch (typically develop)?

  • Is your initial contribution a single, squashed commit?

  • Does gradlew build run cleanly?

  • Have you written or updated unit tests to verify your changes?

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

- Make QueryObserverHolder thread-safe
- Allow having an observer per query by means of setting the observer
  in the query at the start of the execution.
- Invoke beforeIterationEvaluation and afterIterationEvaluation callbacks when
  query is using indexes.
@lgtm-com
Copy link

lgtm-com bot commented Sep 16, 2021

This pull request fixes 2 alerts when merging d32b31c into d0113fc - view on LGTM.com

fixed alerts:

  • 2 for Dereferenced variable may be null

Comment on lines 254 to 255
verify(myQueryObserver, times(0)).beforeIterationEvaluation(any(), any());
verify(myQueryObserver, times(0)).afterIterationEvaluation(any());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use never():

    verify(myQueryObserver, never()).beforeIterationEvaluation(any(), any());
    verify(myQueryObserver, never()).afterIterationEvaluation(any());

return _instance != NO_OBSERVER;
}

/** Return the current QueryObserver instance */
public static QueryObserver getInstance() {
public static synchronized QueryObserver getInstance() {
Copy link
Contributor

@kirklund kirklund Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've wanted to make this class thread-safe too. There is one problem with doing so though. getInstance() is in the path of every query execution and index evaluate call. If you make it synchronized, then there will be a significant performance penalty inflicted on every query and index evaluation.

To see this, look at the call hierarchy of constructing a new ExecutionConext. Every new instance will invoke:

private QueryObserver observer = QueryObserverHolder.getInstance();

In order to make this class thread-safe, I think we need to solve the problem of how to get queries and index evaluations from hitting those synchronized blocks on every invocation. I'm not as familiar with querying or indexing so you may need to chat about it further with Anil or others.

Another problem is that the new private QueryObserver observer field in ExecutionContext is itself not thread-safe, so the setter and getter you added are not thread-safe either.

I think your intention is to have a test invoke setObserver for one or more direct invocations of a query. If that's true, then you're probably better off introducing a ThreadLocal to ExecutionContext for this purpose. Then the execution would check that ThreadLocal for a non-null value and then use it. If it's null, the execution would fallback to using the usual observer stored in the field.

Then you could make the field itself final in the hopes that it would be used multiple times. But again, I suspect that every query invocation creates a new ExecutionContext which will perform poorly because of the synchronized in QueryObserverHolder.

Copy link

@agingade agingade Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is not going to achieve what you are looking for:

The QueryObserverHolder class is not thread-safe.

As Kirk pointed out, you have to make this as a ThreadLocal and set/reset while in use.
Other option could be making this as non-static...If the query context is available in all the places where query-observer is getting used, you can think of creating a new QueryObserver and setting it on the context.
QueryContext.setObserver(new QueryObserver()); And no static implementation/use of query observer.

I did not see any test changes where concurrent queries are using QueryObserver...It will show if the query observers are cross referenced...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've wanted to make this class thread-safe too. There is one problem with doing so though. getInstance() is in the path of every query execution and index evaluate call. If you make it synchronized, then there will be a significant performance penalty inflicted on every query and index evaluation.

To see this, look at the call hierarchy of constructing a new ExecutionConext. Every new instance will invoke:

private QueryObserver observer = QueryObserverHolder.getInstance();

In order to make this class thread-safe, I think we need to solve the problem of how to get queries and index evaluations from hitting those synchronized blocks on every invocation. I'm not as familiar with querying or indexing so you may need to chat about it further with Anil or others.

I would expect this to be a problem if the above calls are done many times per query. But in this PR, it is proposed to invoke it just once, to get the value and then set it in the ExecutionContext, right before the query is executed. Later, when the query needs the value of the observer, it would get it from the ExecutionContext using a non-synchronized call given that all the calls to get the observer have been changed from QueryObserverHolder.getInstance() to context.getObserver().

Another problem is that the new private QueryObserver observer field in ExecutionContext is itself not thread-safe, so the setter and getter you added are not thread-safe either.

I am assuming that the ExecutionContext is not shared between threads and therefore it does not need to be thread-safe. I am wrong here?

I think your intention is to have a test invoke setObserver for one or more direct invocations of a query. If that's true, then you're probably better off introducing a ThreadLocal to ExecutionContext for this purpose. Then the execution would check that ThreadLocal for a non-null value and then use it. If it's null, the execution would fallback to using the usual observer stored in the field.

Actually, my intention was just to make this class thread-safe but not extend it to have new usages. Nevertheless, the class is documented and used so that the observer can be set from any thread and be available to any thread in the VM.
I have proposed the current solution but have also considered the use of a ThreadLocal variable, not in the ExecutionContext class (which did not have an observer field) but in the QueryObserverHolder class to be used in a way similar to what you are proposing.
But I think the lifecycle of the ThreadLocal variable would be complex. It should be reset after the query execution. But who would do it?

Then you could make the field itself final in the hopes that it would be used multiple times. But again, I suspect that every query invocation creates a new ExecutionContext which will perform poorly because of the synchronized in QueryObserverHolder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is not going to achieve what you are looking for:

The QueryObserverHolder class is not thread-safe.

As Kirk pointed out, you have to make this as a ThreadLocal and set/reset while in use.
Other option could be making this as non-static...If the query context is available in all the places where query-observer is getting used, you can think of creating a new QueryObserver and setting it on the context.
QueryContext.setObserver(new QueryObserver()); And no static implementation/use of query observer.

That's actually what I tried to do with this proposal. Am I missing something here?

I did not see any test changes where concurrent queries are using QueryObserver...It will show if the query observers are cross referenced...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like you have that covered Alberto. By passing around context, it looks to me like you have addressed many if not all the thread safety issues. I tend to agree with Kirk about preferring and atomic to a synchronize, though.

@@ -54,26 +54,26 @@
/**
* Set the given observer to be notified of query events. Returns the current observer.
*/
public static QueryObserver setInstance(QueryObserver observer) {
public static synchronized QueryObserver setInstance(QueryObserver observer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using synchronized, you'd be better off making _instance a private static final AtomicReference<QueryObserver>. The use of synchronized will cause at least some JVM implementations to pin every thread hitting it to the same CPU.

Using atomics is generally always a better approach if you can use it.

Comment on lines 1831 to 1838
try {
if (ok && runtimeItr != null) {
runtimeItr.setCurrent(o1);
observer.beforeIterationEvaluation(iterOp, o1);
ok = QueryUtils.applyCondition(iterOp, context);
}
} finally {
observer.afterIterationEvaluation(ok);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should try to extract the blocks that look like this to their own private methods. There is some repetition involved that you may be able to reduce down to one method. The code calling these methods will also become more readable.

Comment on lines 812 to 818
try {
if (ok && runtimeItr != null) {
observer.beforeIterationEvaluation(iterOps, value);
ok = QueryUtils.applyCondition(iterOps, context);
}
} finally {
observer.afterIterationEvaluation(ok);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class also has several try-finally blocks like that this may benefit by being extracted as a new private method both for readability and to hopefully reduce some of the repetition in these blocks.

Copy link
Contributor

@DonalEvans DonalEvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving this PR, but I share Kirk's concerns over query performance.

@@ -296,8 +319,11 @@ public void testTraceOnLocalRegionWithTracePrefixNoComments() throws Exception {
assertTrue(((DefaultQuery) query).isTraced());

SelectResults results = (SelectResults) query.execute();
assertTrue(QueryObserverHolder.getInstance() instanceof IndexTrackingQueryObserver);
// The query should return all elements in region.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to have been accidentally removed.

return _instance != NO_OBSERVER;
}

/** Return the current QueryObserver instance */
public static QueryObserver getInstance() {
public static synchronized QueryObserver getInstance() {
Copy link

@agingade agingade Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is not going to achieve what you are looking for:

The QueryObserverHolder class is not thread-safe.

As Kirk pointed out, you have to make this as a ThreadLocal and set/reset while in use.
Other option could be making this as non-static...If the query context is available in all the places where query-observer is getting used, you can think of creating a new QueryObserver and setting it on the context.
QueryContext.setObserver(new QueryObserver()); And no static implementation/use of query observer.

I did not see any test changes where concurrent queries are using QueryObserver...It will show if the query observers are cross referenced...

Copy link
Contributor

@mhansonp mhansonp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the use of Atomics is preferable to Synchronize. This takes one extra step elsewhere but encapsulates and manages synchronization challenges. Other than that I think this looks good.

@@ -366,7 +396,11 @@ public void testTraceOnLocalRegionWithSmallTracePrefixNoComments() throws Except
assertTrue(((DefaultQuery) query).isTraced());

SelectResults results = (SelectResults) query.execute();
assertTrue(QueryObserverHolder.getInstance() instanceof IndexTrackingQueryObserver);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests seem awfully repetitive. Is there a reason to check the test hook in every test? From my perspective it seems like a waste of CPU. Just my opinion though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the way to check that the trace directive in the query has taken effect when set or that trace is not being done when it is not specified.

return _instance != NO_OBSERVER;
}

/** Return the current QueryObserver instance */
public static QueryObserver getInstance() {
public static synchronized QueryObserver getInstance() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like you have that covered Alberto. By passing around context, it looks to me like you have addressed many if not all the thread safety issues. I tend to agree with Kirk about preferring and atomic to a synchronize, though.

@lgtm-com
Copy link

lgtm-com bot commented Sep 21, 2021

This pull request introduces 1 alert and fixes 2 when merging 52fe699 into 19f55ad - view on LGTM.com

new alerts:

  • 1 for Inconsistent synchronization of getter and setter

fixed alerts:

  • 2 for Dereferenced variable may be null

@lgtm-com
Copy link

lgtm-com bot commented Sep 21, 2021

This pull request fixes 2 alerts when merging 96a15bc into cf1b35c - view on LGTM.com

fixed alerts:

  • 2 for Dereferenced variable may be null

@@ -237,6 +238,9 @@ private DataCommandResult select(InternalCache cache, Object principal, String q
if (queryObserver != null) {
QueryObserverHolder.reset();
}
if (tracedQuery.isTraced()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block is already inside the if (tracedQuery.isTraced()) condition, is this if necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not. This finally block is outside any if (tracedQuery.isTraced()) condition.

@kirklund
Copy link
Contributor

kirklund commented Oct 4, 2021

Maybe the way to go here is to rewrite QueryObserverHolder as a class that hold a ThreadLocal<QueryObserverHolder> and then have ALL other product classes such as DefaultQuery just use QueryObserverHolder as an API for accessing that ThreadLocal.

@albertogpz
Copy link
Contributor Author

albertogpz commented Oct 5, 2021

Maybe the way to go here is to rewrite QueryObserverHolder as a class that hold a ThreadLocal<QueryObserverHolder> and then have ALL other product classes such as DefaultQuery just use QueryObserverHolder as an API for accessing that ThreadLocal.

I like that idea.
I did an experiment with it but saw many test cases fail: #6882
(https://concourse.apachegeode-ci.info/builds/84932)

The problem is that there are test cases that count on setting the observer globally and then running a query from the client. Take a look for example at: SelectStarQueryDUnitTets::testSelectStarQueryForPartitionedRegion. In these cases, you cannot set the thread local observer in the thread that will execute the query because the query is executed remotely from the client.

I have been able to change the test cases in QueryUsingFunctionContextDUnitTest and set the observer in the function executing the queries instead of globally in the servers but this approach is not valid for the test cases previously mentioned.

I am also worried about the use done by the QueryObserver to provide trace information in queries. It is not possible to set a query observer and at the same time have trace information.

We could also go for having two variables in the QueryObserverHolder: a thread local one and the current global one. Setting the instance could set the value in the global one and getting the instance could check if the thread local has a value. If it does, it would return it. Otherwise, the thread local one would be set to the global one value and then be returned. That way, during a query execution, only the first call to getInstance() should require synchronization.
What do you think?

@kirklund
Copy link
Contributor

kirklund commented Oct 12, 2021

@albertogpz @jhuynh1 AcceptorImpl is the class that runs server-side threads to execute requests from clients. When the thread starts, you could check a global in QueryObserverHolder and then set the ThreadLocal on each thread that it starts.

It might even be possible to support multiple QueryObservers and invoke them all for each query.

So here are some suggestions from Jason and me:

Have the test set a global in the server(s). When the server starts a AcceptorImpl thread to process that query, set that same ThreadLocal on that thread.

The query execution then adds the instance of QueryObserver to the ExecutionContext.

Don't forget to call ThreadLocal.remove() after a query completes.

OrderByComparator has a call to QueryObserverHolder.getInstance(). Change that to just fetch the observer from the context field in that class.

It might even be possible to avoid the ThreadLocal by using a unique observer instance per execution context. I envisioned stuffing a unique instance of QueryObserver into the ThreadLocal that is then garbage collected after the query finishes and ThreadLocal.remove has been called.

@jhuynh1
Copy link
Contributor

jhuynh1 commented Oct 12, 2021

To add to Kirk's prior suggestion (some suggestions are probably duplicated as we are currently discussing) The main difference between the two is the use of vs possible removal of the threadlocal

We might be able to avoid the thread local entirely if we add a method to the QueryObserver interface such as createObserver() where it is expected to pass back a unique instance of an observer. Improvements to reuses/pool these can be made to reduce garbage later.

Where we currently do start query, perhaps call this new createObserver() method and stuff it into the execution context. (

)

Wherever we currently do QueryObserverHolder.getInstance(), instead do context.getObserver()

There are a few locations of QueryObserverHolder.getInstance() like OrderbyComparator (

QueryObserverHolder.getInstance().orderByColumnsEqual();
), that have a context present as a class variable that can be used.

The execution context and observer should be gc'd eventually once the execution context is no longer being used. There is probably more I am missing but thought I'd put this down as a suggestion

@onichols-pivotal
Copy link
Contributor

this PR appears to be abandoned, can it be closed?

@albertogpz
Copy link
Contributor Author

this PR appears to be abandoned, can it be closed?

It is not yet abandoned. It is pending from a discussion with @kirklund about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants