New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in TraceSource Initialization in Azure Web Site extension #428
Comments
cc: @vancem are those two sequences of locks are by design? |
If you think this is a problem in EventSource, I think we need more explanation of the scenario. (Are we talking about TraceSource and not EventSource by any chance?) None of the APIs above do I recognize as methods in EventSource (in particular I searched for TraceInternal). Also you don't get lock leveling issues unless you CALL OUT from your API while holding a lock. So presumably this is happening because you have an EventListener involved, but again it is unclear from the explanation what exactly is happening as EventListeners are not mentioned. I know of only one lock associated with EventSource and it is called 'EventListernersLock' but that was also not mentioned. So I can't really comment on the issue yet, as I can't relate the quest to the code. |
@vancem as Dmitry reported the issues was caused by initialization order and the fact that we had BTW, I found this comment in reference code:
|
Just to be clear, the reference source you are looking at is System.Diagnostics.TraceSource logic and has nothing to do with System.Diagnostics.Tracing.EventSource. This is fine, it is just that different people own the different code, and if we are going to fix things we certainly need to know what code actually has the problem. Including @safern who has been updating TraceSource code recently. Note also that TraceSource is been changed in .NET Core (our forward-looking code base), and no longer has complex logic in DiagnosticSource.Configuration. Also one of the locks, TraceInternal.critSec, is only present if the TraceListener asks for it. Generally speaking such 'global' locks are not a good idea for a high performance logging system (indeed we don't recommend TraceSource for high performance logging at all, it is really there only for compatibility reasons). In short, it is more important than ever to carefully understand your scenario. |
thanks for clarification! We clearly need those stacks from @Dmitry-Matveev =) |
Changes should be made in this NuGet: https://www.nuget.org/packages/Microsoft.ApplicationInsights.Azure.WebSites to work around this problem. Specifically in this class: |
As mentioned this bug does not exist in CoreClr but only the desktop. It is being tracked by the following bug for the Desktop product.
There is a suggested fix
which I believe @brianrob is looking at. However as mentioned changes to the Desktop framework only propagate very slowly (people don't upgrade for months or years), so a work-around is suggested as well which is what Sergey is referring to above. |
I have a candidate fix for 387336. This is now pending validation by AppInsights. Per e-mail, @SergeyKanzhelev is looking into this. |
The class responsible for deadlock was moved to GitHub. Moving issue along to that repo |
The bug 387336 is in .NetFramework version 4.7.0. Also, ApplicationInsights no longer supports Net40 where this bug was discovered. |
TelemetryConfigurationFactory
creates an instance ofInMemoryChannel
which in turn creates an instance of theInMemoryTransmitter
.InMemoryTransmitter starts a new Task which logs with
CoreEventSource.Log.LogVerbose()
.TelemetryConfigurationFactory
continues the execution (it does not wait for the task) and may execute Event Source logging as well leading to a possible deadlock between two threads - one executing Event Source call in the newly created task and one executing event source call on the old thread.The deadlock seems possible because Event Source is performing some initialization logic for the first ever call and there are two locks that can be acquired in reverse:
While the nature of the deadlock is not entirely clear at the moment (a bug in EventSource or in usage of Event Source?), the fix might be to simply initialize Event Source synchronously before we create
InMemoryChannel
.Dumps are available for the additional investigation, please contact me to get them.
The text was updated successfully, but these errors were encountered: