dennisgorelik: (Default)
Couple of weeks ago we noticed that the same C# code executes differently under MSTest and in Visual Studio 2017.
In particular, Uri constructor crashed on invalid input in Visual Studio, but did not crash in MSTest.

Then, several days later, we found that ASP.NET allows to modify collection that we iterate through, but the same code crashes in a unit test with "System.InvalidOperationException: Collection was modified; enumeration operation may not execute".

We decided to investigate and found that the culprit is in a different value of "httpRuntime targetFramework" attribute.

Bad naming and documentation
Microsoft .NET Framework team chose a bad name for that attribute and wrote a misleading documentation:
---
https://msdn.microsoft.com/en-us/library/system.web.httpruntime.targetframework(v=vs.110).aspx
The version of the .NET Framework that the current web application targets.
---

When most developers (including me) read that - they think that "targetFramework" attribute defines what version of .NET framework would execute.

But actually that attribute has a very different meaning and should have been named either compatibilityTargetFramework or quirksTargetFramework.

What httpruntime targetframework actually means
Fortunately, levibroderick wrote a clarifying blog post, that now is the first result for httpRuntime targetFramework search:
https://blogs.msdn.microsoft.com/webdev/2012/11/19/all-about-httpruntime-targetframework/

With new versions of .NET framework, Microsoft .NET team introduced some breaking changes (especially for .NET Framework 4.5).
So then they created "quirks" to fix these breaking changes.

So, "targetFramework" attribute pretty much defines what set of quirks to use (the older the targetFramework version is - the more quirks you would get).
The total number of quirks seems to be around 10 (could be a little bit more or less, but not by a lot).

Practical impact
In the past, our Web.Config did not contain any mentioning of targetFramework in <httpRuntime> element.
That meant that we got all the quirks, so postjobfree.com did not break.
Then yesterday we turned off "legacy compatibility mode" by setting
<httpRuntime targetFramework="4.6.2" />
We lost all the quirks that way and, as a result, got two bugs:
1) "WebForms UnobtrusiveValidationMode requires a ScriptResourceMapping for 'jquery'. Please add a ScriptResourceMapping named jquery(case-sensitive)." crash on every page that contains <form> element.
2) Encrypted validationKey in <machineKey> element changed its meaning, so all users authentication cookies expired.
Several hours of research and development later - we fixed these issues and now our web site runs in a quirks-free mode.

What was your experience in converting legacy .NET app to the new .NET Framework version?
dennisgorelik: (Default)
In the last couple of days I learned quite a bit about multithreading:
1) How to create new thread in order to fix hangs in crawler.

2) That creating new threads has performance penalty (about 10 ms 0.15 ms per creation of a new thread).

3) That Task has good performance (almost no performance penalty) because it reuses thread pool.

4) That if you use Task (thread pool) you cannot really kill the hanging thread, so using Task Factory does not really help in solving a hanging thread issue.

5) How to create our own thread factory that is running single thread and how to kill that thread if it runs for too long.
In particular:
- How to use two EventWaitHandle objects in order to communicate appropriately between main thread an background/worker thread.

- When it is the right time to exit from the infinite loop in background thread in case if service is shutting down or pausing. (Exit background thread in case of pause/shutdown only if it is idle - to prevent confusion in the business logic of the client code).

using System;
using System.Runtime.ExceptionServices;
using System.Threading;

namespace PostJobFree.Utilities
{
	public static class ThreadHelper
	{
		private static EventWaitHandle CompletionWait;
		private static EventWaitHandle InputWait;
		private static Thread CurrentThread;
		private static Exception ThreadException;
		private static Action ActionToExecute;
		private static readonly object LockObject = new object();

		public static bool ExecuteOnSeparateThreadWithTimeout(Action action, TimeSpan timeout)
		{
			lock (LockObject)
			{
				ThreadException = null;
				ActionToExecute = action;
				if (CurrentThread == null // First-time execution or previous thread was aborted due to timeout
					|| !CurrentThread.IsAlive) // Service was paused
				{
					InitializeCurrentThread();
				}
				InputWait.Set();
				if (!CompletionWait.WaitOne(timeout))
				{
					CurrentThread.Abort();
					CurrentThread = null;
					return false;
				}
				if (ThreadException != null)
					ExceptionDispatchInfo.Capture(ThreadException).Throw(); // To preserve stack trace
				return true;
			}
		}

		private static void InitializeCurrentThread()
		{
			CompletionWait = new EventWaitHandle(false, EventResetMode.AutoReset);
			InputWait = new EventWaitHandle(false, EventResetMode.AutoReset);
			CurrentThread = new Thread(WorkerThread);
			CurrentThread.Start();

		}
		private static void WorkerThread()
		{
			while (true)
			{
				if (InputWait.WaitOne(TimeSpan.FromSeconds(1)))
				{// Action is requested
					try
					{
						ActionToExecute.Invoke();
					}
					catch (ThreadAbortException)
					{
						return;
					}
					catch (Exception ex)
					{
						ThreadException = ex;
					}
					finally
					{
						CompletionWait.Set();
					}
				}
				else
				{// 1 second passed with no Action request
					if (ExecutionCore.NewExecutionAllowed) continue; // To allow exiting on Pause
					return; // Exit on (NewExecutionAllowed = false) only if background thread has nothing to do
				}
			}
		}
	}
}

using System;
using System.Threading;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using PostJobFree;
using PostJobFree.Utilities;

namespace TestIjSearch.Utilities
{
	[TestClass]
	public class ThreadHelperTests
	{
		[TestMethod]
		public void ThreadHelperExecuteOnSeparateThreadWithTimeoutTest()
		{
			Assert.IsFalse(ThreadHelper.ExecuteOnSeparateThreadWithTimeout(() => Thread.Sleep(10000), TimeSpan.FromTicks(1)));

			Assert.IsTrue(ThreadHelper.ExecuteOnSeparateThreadWithTimeout(() => { }, TimeSpan.FromSeconds(1)));

			bool exceptionHappened = false;
			try
			{
				ThreadHelper.ExecuteOnSeparateThreadWithTimeout(() => {throw new PostJobFreeException();}, TimeSpan.FromSeconds(1));
			}
			catch (PostJobFreeException)
			{
				exceptionHappened = true;
			}
			Assert.IsTrue(exceptionHappened);

			try
			{
				ExecutionCore.NewExecutionAllowed = false;
				int result = 0;
				Assert.IsTrue(ThreadHelper.ExecuteOnSeparateThreadWithTimeout(() => { result = 111; }, TimeSpan.FromSeconds(1)));
				Assert.AreEqual(111, result);
				//Thread.Sleep(TimeSpan.FromSeconds(2)); // It allows asynchronous thread to die/exit if (NewExecutionAllowed = false)
				Assert.IsTrue(ThreadHelper.ExecuteOnSeparateThreadWithTimeout(() => { result = 222; }, TimeSpan.FromSeconds(1)));
				Assert.AreEqual(222, result);
			}
			finally
			{
				ExecutionCore.NewExecutionAllowed = true;
			}
		}
	}
}

Update: LJ discussion.
dennisgorelik: (Default)
Our web crawler hanged. Again.
It looks like it hangs with the frequency of about 1 hang per 1 million web page downloads.
Does not timeout. Does not crash. Just hangs.
Fortunately, other threads in our PostJobFreeService keep running.
(Until AngleSharp HTML parser would crash the whole PostJobFreeService on some weird HTML page, of course.)

Unfortunately, crawler hang is not reproducible: the same page can be downloaded without any problems on the next attempt. Or just in a browser.
But once in 1M downloads something weird happens: our crawler successfully passes HTTP handshake with the remote web server (so no HTTP connection timeout), but then hangs.

For our crawler we are using standard HttpWebRequest class from .NET framework.
Should we crawl with something else?

Or is it inevitable that web crawler would hang eventually and our watchdog should simply restart corresponding thread?

Discussion in Livejournal: http://dennisgorelik.livejournal.com/124693.html
dennisgorelik: (Default)
PostJobFree crawler found web page that causes fatal crash in AngleSharp parser:
using AngleSharp.Parser.Html;
.....
string pageHtml = LoadUrlContent("http://onestop.fiu.edu/financial-aid/loans/")
var parser = new HtmlParser();
var document = parser.Parse(pageHtml);
document.QuerySelectorAll("a"); // Fatal crash: "An unhandled exception of type 'System.StackOverflowException' occurred in AngleSharp.dll".

We cannot catch that exception and it simply restarts the whole process (PostJobFreeService Windows service).
That is very frustrating.

In development environment that crash is not always reproducible.
When we run code above in test - it just works.
But if we run the same code under Visual Studio debugger - it crashes with 'System.StackOverflowException'.

Update:
https://github.com/AngleSharp/AngleSharp/issues/523
AngleSharp library maintainers noticed that problematic page contains a lot of "<content /><content /><content /><content />" attributes.
view-source:http://onestop.fiu.edu/financial-aid/loans/

Obviously it is not an excuse to fail. Hopefully their latest build would fix the problem.
dennisgorelik: (Default)
Normally, in case of invalid input Uri() code throws UriFormatException. But with really weird input Uri(baseUri, Url) overload can produce NullReferenceException:
[TestMethod]
[ExpectedException(typeof(NullReferenceException))]
public void UriFailureTest()
{
    new Uri(
        new Uri("https://jobs.web.cern.ch/content/cern-jobs-insight/what-are-we-doing-while-you%E2%80%99re-waiting"),
        "https:/jobs.web.cern.ch/content/cern-jobs-insight/what-are-we-doing-while-you%E2%80%99re-waiting");
}

Profile

dennisgorelik: (Default)
Dennis Gorelik

July 2017

S M T W T F S
      1
2345678
9 101112 131415
16171819202122
23242526272829
3031     

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 20th, 2017 12:25 pm
Powered by Dreamwidth Studios