dennisgorelik: (Default)
Short names vs unique names
It is a good practice to use shorter method names, because long names are harder to read.
It is cleaner to call:
candidate.Save();
than:
candidate.SaveCandidate();

But then we end up with multiple "Save()" methods in different classes. For example:
Candidate.cs/Save()
Recruiter.cs/Save()

Problem with non-unique names
If we search our codebase for "Save(" - we would find a lot of methods and method calls. Only some of them would relate to the functionality we actually want to research (for example, we may want to research where "Candidate Save()" functionality is used because we consider refactoring or deleting it).

Plain text search vs code references
Visual Studio allows to find all references to a specific method by right-mouse-clicking on a method name and selecting "Find All References".
So, non-unique method names problem is solved, right?
Not quite.
Visual Studio is not able to track method calls that are made from aspx and ashx files.
Visual Studio is also not able to find method references in the comments.

ReSharper vs vanilla Visual Studio
ReSharper actually is able to find method references in aspx, ashx and even in comments. Until Visual Studio 2015 that worked fine. But since ReSharper team and codebase aged, and Visual Studio switched to new Roslyn compiler, ReSharper team was not able to keep up and delivered only barely working resource hog, that is practically not usable with newer version of Visual Studio (too slow).

Get rid of aspx and ashx files?
It is actually pretty easy to avoid using ashx handlers and use standard C# classes to implement HttpHandler interface.
But what about aspx pages: can we get rid of them too and use only standard C# HttpHandlers?
If we could do that, then we would be able to rely on "Find All References" feature again.
But, unfortunately, getting rid of aspx pages is not that simple. We would have to reimplement a lot of functionality that aspx has.
For example:
- Page PostBack support would be gone.
- Ability to nicely combine HTML code and aspx controls alongside each other would be gone.
- HTML syntax validation would be gone (no HTML syntax validation for C# strings in Visual Studio).

If it ain't broke - don't fix it
Even though it is pretty straightforward operation to convert existing ashx files into standard C# classes (where Visual Studio is able to track all method references) - such conversion is not without its own problems.
- Conversion takes developer's time.
- Code replacement could introduce silly bugs.
- Moving code from class to class makes navigating "svn blame" - a little bit trickier.
So if an ashx handler was working in the solution for many years already - does it make sense to touch it now?

The benefits of code refactoring
In spite of "If it ain't broke - don't fix it" rule - cleaning up code is still needed. If we do not keep code clean (do not delete unneeded parts and do not clear confusing things such as hidden references) - then our codebase would be extremely hard to maintain. Fixing a bug would introduce other bugs. Features would be very hard to add without adding bugs.

It depends
There is no single solution that can be applied to all situations. In software development we consider multiple problems and constantly weigh pros and cons against each other.
For example, out of 11 remaining ashx files, we:
- Deleted one file because we do not use it ("Reduce amount of code when possible" principle).
- Would migrate one file to the standard C# HttpHandler, because today during refactoring a developer missed a method call from that ashx file.
- Keep other 9 ashx files as is ("If it ain't broke - don't fix it" principle).

What are your examples of balancing problems against each other?
dennisgorelik: (Default)
Couple of weeks ago we noticed that the same C# code executes differently under MSTest and in Visual Studio 2017.
In particular, Uri constructor crashed on invalid input in Visual Studio, but did not crash in MSTest.

Then, several days later, we found that ASP.NET allows to modify collection that we iterate through, but the same code crashes in a unit test with "System.InvalidOperationException: Collection was modified; enumeration operation may not execute".

We decided to investigate and found that the culprit is in a different value of "httpRuntime targetFramework" attribute.

Bad naming and documentation
Microsoft .NET Framework team chose a bad name for that attribute and wrote a misleading documentation:
---
https://msdn.microsoft.com/en-us/library/system.web.httpruntime.targetframework(v=vs.110).aspx
The version of the .NET Framework that the current web application targets.
---

When most developers (including me) read that - they think that "targetFramework" attribute defines what version of .NET framework would execute.

But actually that attribute has a very different meaning and should have been named either compatibilityTargetFramework or quirksTargetFramework.

What httpruntime targetframework actually means
Fortunately, levibroderick wrote a clarifying blog post, that now is the first result for httpRuntime targetFramework search:
https://blogs.msdn.microsoft.com/webdev/2012/11/19/all-about-httpruntime-targetframework/

With new versions of .NET framework, Microsoft .NET team introduced some breaking changes (especially for .NET Framework 4.5).
So then they created "quirks" to fix these breaking changes.

So, "targetFramework" attribute pretty much defines what set of quirks to use (the older the targetFramework version is - the more quirks you would get).
The total number of quirks seems to be around 10 (could be a little bit more or less, but not by a lot).

Practical impact
In the past, our Web.Config did not contain any mentioning of targetFramework in <httpRuntime> element.
That meant that we got all the quirks, so postjobfree.com did not break.
Then yesterday we turned off "legacy compatibility mode" by setting
<httpRuntime targetFramework="4.6.2" />
We lost all the quirks that way and, as a result, got two bugs:
1) "WebForms UnobtrusiveValidationMode requires a ScriptResourceMapping for 'jquery'. Please add a ScriptResourceMapping named jquery(case-sensitive)." crash on every page that contains <form> element.
2) Encrypted validationKey in <machineKey> element changed its meaning, so all users authentication cookies expired.
Several hours of research and development later - we fixed these issues and now our web site runs in a quirks-free mode.

What was your experience in converting legacy .NET app to the new .NET Framework version?
dennisgorelik: (Default)
In the last couple of days I learned quite a bit about multithreading:
1) How to create new thread in order to fix hangs in crawler.

2) That creating new threads has performance penalty (about 10 ms 0.15 ms per creation of a new thread).

3) That Task has good performance (almost no performance penalty) because it reuses thread pool.

4) That if you use Task (thread pool) you cannot really kill the hanging thread, so using Task Factory does not really help in solving a hanging thread issue.

5) How to create our own thread factory that is running single thread and how to kill that thread if it runs for too long.
In particular:
- How to use two EventWaitHandle objects in order to communicate appropriately between main thread an background/worker thread.

- When it is the right time to exit from the infinite loop in background thread in case if service is shutting down or pausing. (Exit background thread in case of pause/shutdown only if it is idle - to prevent confusion in the business logic of the client code).

using System;
using System.Runtime.ExceptionServices;
using System.Threading;

namespace PostJobFree.Utilities
{
	public static class ThreadHelper
	{
		private static EventWaitHandle CompletionWait;
		private static EventWaitHandle InputWait;
		private static Thread CurrentThread;
		private static Exception ThreadException;
		private static Action ActionToExecute;
		private static readonly object LockObject = new object();

		public static bool ExecuteOnSeparateThreadWithTimeout(Action action, TimeSpan timeout)
		{
			lock (LockObject)
			{
				ThreadException = null;
				ActionToExecute = action;
				if (CurrentThread == null // First-time execution or previous thread was aborted due to timeout
					|| !CurrentThread.IsAlive) // Service was paused
				{
					InitializeCurrentThread();
				}
				InputWait.Set();
				if (!CompletionWait.WaitOne(timeout))
				{
					CurrentThread.Abort();
					CurrentThread = null;
					return false;
				}
				if (ThreadException != null)
					ExceptionDispatchInfo.Capture(ThreadException).Throw(); // To preserve stack trace
				return true;
			}
		}

		private static void InitializeCurrentThread()
		{
			CompletionWait = new EventWaitHandle(false, EventResetMode.AutoReset);
			InputWait = new EventWaitHandle(false, EventResetMode.AutoReset);
			CurrentThread = new Thread(WorkerThread);
			CurrentThread.Start();

		}
		private static void WorkerThread()
		{
			while (true)
			{
				if (InputWait.WaitOne(TimeSpan.FromSeconds(1)))
				{// Action is requested
					try
					{
						ActionToExecute.Invoke();
					}
					catch (ThreadAbortException)
					{
						return;
					}
					catch (Exception ex)
					{
						ThreadException = ex;
					}
					finally
					{
						CompletionWait.Set();
					}
				}
				else
				{// 1 second passed with no Action request
					if (ExecutionCore.NewExecutionAllowed) continue; // To allow exiting on Pause
					return; // Exit on (NewExecutionAllowed = false) only if background thread has nothing to do
				}
			}
		}
	}
}

using System;
using System.Threading;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using PostJobFree;
using PostJobFree.Utilities;

namespace TestIjSearch.Utilities
{
	[TestClass]
	public class ThreadHelperTests
	{
		[TestMethod]
		public void ThreadHelperExecuteOnSeparateThreadWithTimeoutTest()
		{
			Assert.IsFalse(ThreadHelper.ExecuteOnSeparateThreadWithTimeout(() => Thread.Sleep(10000), TimeSpan.FromTicks(1)));

			Assert.IsTrue(ThreadHelper.ExecuteOnSeparateThreadWithTimeout(() => { }, TimeSpan.FromSeconds(1)));

			bool exceptionHappened = false;
			try
			{
				ThreadHelper.ExecuteOnSeparateThreadWithTimeout(() => {throw new PostJobFreeException();}, TimeSpan.FromSeconds(1));
			}
			catch (PostJobFreeException)
			{
				exceptionHappened = true;
			}
			Assert.IsTrue(exceptionHappened);

			try
			{
				ExecutionCore.NewExecutionAllowed = false;
				int result = 0;
				Assert.IsTrue(ThreadHelper.ExecuteOnSeparateThreadWithTimeout(() => { result = 111; }, TimeSpan.FromSeconds(1)));
				Assert.AreEqual(111, result);
				//Thread.Sleep(TimeSpan.FromSeconds(2)); // It allows asynchronous thread to die/exit if (NewExecutionAllowed = false)
				Assert.IsTrue(ThreadHelper.ExecuteOnSeparateThreadWithTimeout(() => { result = 222; }, TimeSpan.FromSeconds(1)));
				Assert.AreEqual(222, result);
			}
			finally
			{
				ExecutionCore.NewExecutionAllowed = true;
			}
		}
	}
}

Update: LJ discussion.
dennisgorelik: (Default)
Our web crawler hanged. Again.
It looks like it hangs with the frequency of about 1 hang per 1 million web page downloads.
Does not timeout. Does not crash. Just hangs.
Fortunately, other threads in our PostJobFreeService keep running.
(Until AngleSharp HTML parser would crash the whole PostJobFreeService on some weird HTML page, of course.)

Unfortunately, crawler hang is not reproducible: the same page can be downloaded without any problems on the next attempt. Or just in a browser.
But once in 1M downloads something weird happens: our crawler successfully passes HTTP handshake with the remote web server (so no HTTP connection timeout), but then hangs.

For our crawler we are using standard HttpWebRequest class from .NET framework.
Should we crawl with something else?

Or is it inevitable that web crawler would hang eventually and our watchdog should simply restart corresponding thread?

Discussion in Livejournal: http://dennisgorelik.livejournal.com/124693.html
dennisgorelik: (Default)
PostJobFree crawler found web page that causes fatal crash in AngleSharp parser:
using AngleSharp.Parser.Html;
.....
string pageHtml = LoadUrlContent("http://onestop.fiu.edu/financial-aid/loans/")
var parser = new HtmlParser();
var document = parser.Parse(pageHtml);
document.QuerySelectorAll("a"); // Fatal crash: "An unhandled exception of type 'System.StackOverflowException' occurred in AngleSharp.dll".

We cannot catch that exception and it simply restarts the whole process (PostJobFreeService Windows service).
That is very frustrating.

In development environment that crash is not always reproducible.
When we run code above in test - it just works.
But if we run the same code under Visual Studio debugger - it crashes with 'System.StackOverflowException'.

Update:
https://github.com/AngleSharp/AngleSharp/issues/523
AngleSharp library maintainers noticed that problematic page contains a lot of "<content /><content /><content /><content />" attributes.
view-source:http://onestop.fiu.edu/financial-aid/loans/

Obviously it is not an excuse to fail. Hopefully their latest build would fix the problem.
dennisgorelik: (Default)
Normally, in case of invalid input Uri() code throws UriFormatException. But with really weird input Uri(baseUri, Url) overload can produce NullReferenceException:
[TestMethod]
[ExpectedException(typeof(NullReferenceException))]
public void UriFailureTest()
{
    new Uri(
        new Uri("https://jobs.web.cern.ch/content/cern-jobs-insight/what-are-we-doing-while-you%E2%80%99re-waiting"),
        "https:/jobs.web.cern.ch/content/cern-jobs-insight/what-are-we-doing-while-you%E2%80%99re-waiting");
}

Profile

dennisgorelik: (Default)
Dennis Gorelik

September 2017

S M T W T F S
      12
34567 8 9
1011 12131415 16
17181920212223
24252627282930

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 23rd, 2017 05:38 am
Powered by Dreamwidth Studios