dennisgorelik | Entries tagged with software

Our geocoding loop detector -- found Google Maps bug.

Google Maps search for "quweisna,menofia governorate,egypt"
points to "quwaysna,menofia governorate,egypt".

Searching for "quwaysna,menofia governorate,egypt" points back to "quweisna,menofia governorate,egypt".

Quwaysna is about 1 mile away from Queisna:

Crossposts: https://dennisgorelik.livejournal.com/180356.html

We use 2 geocoding providers: Mapbox (primary) and Google Maps (as a backup when Mapbox produces weird results).

Using 2 geocoding providers, occasionally, causes weird data collisions.
For example:
Mapbox API geocodes "north beach, MD,20714" as "west beach, MD,20714", but Google Maps API geocodes "west beach, MD,20714" as "north beach, MD,20714".
That, effectively, forms geocoding loop:
~~~~~~~
west beach, MD,20714
north beach, MD,20714
west beach, MD,20714
north beach, MD,20714
west beach, MD,20714
...
~~~~~~~

This geocoding loop may cause infinite redirect between job list web pages:
-----
"West beach" jobs page redirects to "North beach" jobs page.
"North beach" jobs page redirects to "West beach" jobs page.
...
-----

Yesterday we launched Geocoding loop detection code, and found over 50 geocoding loops in our 901,016 records GeocodedLocation table.

Some other geocoding loops examples:
=======
san francisco,ca,94105 <-> south beach,ca,94105
northeast washington,dc,20018 <-> washington,dc,20018
miami,fl,33132 <-> downtown,fl,33132
far rockaway,ny,11691 <-> queens,ny,11691

kankan,guinea <-> kankan region,guinea
amhara,ethiopia <-> amhara region,ethiopia
jigawa,nigeria <-> jigawa state,nigeria
manzini district,swaziland <-> manzini,swaziland
=======

Crossposts: https://dennisgorelik.livejournal.com/179751.html

NUnit team tried to improve performance, but created very confusing and annoying bug in NUnit Adapter for Visual Studio.

https://github.com/nunit/nunit3-vs-adapter/issues/648#issuecomment-525762809
Siko91 commented 2 days ago:
We use NUnit 3.10 and NUnit3TestAdapter 3.15
With VS2017, when trying to run or debug a test, half of the attempts would end up with this error.
The other half of the attempts successfully start running/debugging the test.

It fails with this error on every second attempt.
Accurate like a clock.

first time running a test starts properly
second time - NUnit failed to load
third time starts properly
fourth time - failed to load
and so on...

I am guessing that the run that starts properly changes some persisting state (file?) and sets it in an invalid way, so that the next one can't start, but can reset the state... so that the next one can start and screw it up again.

Why NUnit team could not catch this obvious bug themselves?
My explanation is that they tested new NUnit Adapter code against most recent NUnit Framework code.
But most of developers update NUnit Framework version only occasionally.
We had NUnit Framework 3.9 (which is older than the newest NUnit Framework 3.12).
On another hand, Visual Studio automatically updates NUnit Adapter version.
Several days ago we automatically received new NUnit Adapter update (NUnit Adapter 3.15), which conflicts against our installed version of NUnit Framework.

https://github.com/nunit/nunit3-vs-adapter/issues/651
We have issues in some cases with pre-filtering
...
what is affected:
• Any version of NUnit below 3.11.0
• Any test assemblies using SetupFixture
• TestCaseData using SetName instead of SetArgDisplayNames.

Crossposts: https://dennisgorelik.livejournal.com/177709.html

An insightful article that explains why we should minimize number of technologies we use to run our product, and be careful with what new technologies we are ready to use in production.

http://boringtechnology.club/
New tech typically has more known unknowns, and many more unknown unknowns. And this is really important.

Adding the technology is easy, living with it is hard. These are all the things you have to worry about.

I could brew install a new database right here right now while giving this talk, and start writing some data to it. ... But it’s another matter entirely to run that thing in production at a professional level.

If you’re adding a redundant piece of technology, your goal is to replace something with it. Your goal shouldn’t be to operate two pieces of technology that are redundant with one another forever.

When you add a thing that replaces another thing, you should be committing to a plan to replace the old thing. It might be a long term plan. And you should be committing to rewriting the new thing using the old tools if the new tools don’t actually work out.

Discussion on Hacker News

Crossposts: https://dennisgorelik.livejournal.com/177342.html

Couple or years ago we tried to reuse exception handling code and created this method:

public static void ExecuteEmailCrashSuppressWebException(Action tryAction, Action reportAction)
{
	var tl = new TimeLog();
	try
	{
		tryAction();
	}
	catch (WebException) { return; }
	catch (Exception ex)
	{
		string displayName = tryAction.GetDisplayName();
		string message = CookMessage(displayName, reportAction, tl, ex);
		string subject = $"Win crash: {displayName}()";
		Log.ToFile(message, displayName + ".log");
		EmailToDevelopers.EmailTextIfSubjectWasNotSentRecently(subject, message);
	}
}

The idea was that if we know that a method may crash then we would just pass that method as a "tryAction" parameter.
Then ExecuteEmailCrashSuppressWebException() will swallow web exceptions, and will notify us (developers) about all other exceptions.

We wanted to avoid repeating try-catch boilerplate code by reusing ExecuteEmailCrashSuppressWebException().

That attempt [to reuse try-catch] failed miserably. Every tryAction method needed its own custom exception handling:
- In some cases we needed to swallow WebException, but in other cases we wanted to log it.
- In some cases we wanted to write to one "{tryAction}.log" file, but in other cases we wanted to write to differently named log file, or not to write to log file at all.
- Log message content was quite different for different tryAction methods.

Eventually we deleted ExecuteEmailCrashSuppressWebException() and wrapped every individual method that needed custom exception handling - by its own try-catch block of code.

My conclusion is:
Try-catch block should wrap only direct method calls.
Almost never try-catch should wrap Action/delegate invocation (such as "tryAction()").
The reason why wrapping delegate invocation with try-catch does not work is that "catch" implementation is very custom for every individual method.
Merging all these custom implementation into a single "catch" block produces unmaintainable mess.

Crossposts: https://dennisgorelik.livejournal.com/174012.html

In "Double.NaN != Double.NaN" discussion,

ppk_ptichkin linked to Ariane 5 investigation.

My conclusions based on that investigation:
1) Delete unused code from the solution.
2) Make sure that the application is still functional, even in case when an exception happens.
3) Do integration tests.

Watch on YouTube

=======================
http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html
e) At 36.7 seconds after H0 (approx. 30 seconds after lift-off) the computer within the back-up inertial reference system, which was working on stand-by for guidance and attitude control, became inoperative. This was caused by an internal variable related to the horizontal velocity of the launcher exceeding a limit which existed in the software of this computer.

f) Approx. 0.05 seconds later the active inertial reference system, identical to the back-up system in hardware and software, failed for the same reason. Since the back-up inertial system was already inoperative, correct guidance and attitude information could no longer be obtained and loss of the mission was inevitable.

g) As a result of its failure, the active inertial reference system transmitted essentially diagnostic information to the launcher's main computer, where it was interpreted as flight data and used for flight control calculations.

h) On the basis of those calculations the main computer commanded the booster nozzles, and somewhat later the main engine nozzle also, to make a large correction for an attitude deviation that had not occurred.

i) A rapid change of attitude occurred which caused the launcher to disintegrate at 39 seconds after H0 due to aerodynamic forces.

.....

m) The inertial reference system of Ariane 5 is essentially common to a system which is presently flying on Ariane 4. The part of the software which caused the interruption in the inertial system computers is used before launch to align the inertial reference system and, in Ariane 4, also to enable a rapid realignment of the system in case of a late hold in the countdown. This realignment function, which does not serve any purpose on Ariane 5, was nevertheless retained for commonality reasons and allowed, as in Ariane 4, to operate for approx. 40 seconds after lift-off.

n) During design of the software of the inertial reference system used for Ariane 4 and Ariane 5, a decision was taken that it was not necessary to protect the inertial system computer from being made inoperative by an excessive value of the variable related to the horizontal velocity, a protection which was provided for several other variables of the alignment software. When taking this design decision, it was not analysed or fully understood which values this particular variable might assume when the alignment software was allowed to operate after lift-off.

o) In Ariane 4 flights using the same type of inertial reference system there has been no such failure because the trajectory during the first 40 seconds of flight is such that the particular variable related to horizontal velocity cannot reach, with an adequate operational margin, a value beyond the limit present in the software.

p) Ariane 5 has a high initial acceleration and a trajectory which leads to a build-up of horizontal velocity which is five times more rapid than for Ariane 4. The higher horizontal velocity of Ariane 5 generated, within the 40-second timeframe, the excessive value which caused the inertial system computers to cease operation.

.....

4. RECOMMENDATIONS

On the basis of its analyses and conclusions, the Board makes the following recommendations.

R1 Switch off the alignment function of the inertial reference system immediately after lift-off. More generally, no software function should run during flight unless it is needed.

R2 Prepare a test facility including as much real equipment as technically feasible, inject realistic input data, and perform complete, closed-loop, system testing. Complete simulations must take place before any mission. A high test coverage has to be obtained.

R3 Do not allow any sensor, such as the inertial reference system, to stop sending best effort data.
=======================

http://juan-gandhi.dreamwidth.org/4017681.html?thread=111407633#cmt111407633

Crossposts: https://dennisgorelik.livejournal.com/141764.html

ElasticSearch team defends the bloat in ElasticSearch Percolator 5.4
--------
https://github.com/elastic/elasticsearch/issues/25308
If you're not interested in ranking you can easily turn it off, by wrapping the percolate query in a constant_score query.
.....
The percolator tries to tag the queries automatically based on the containing query terms. However it can't do this for all percolator queries, because the percolator doesn't know how to extract meaningful information during indexing for all queries. This is a work in progress and will get better over time. It already has shown a significant performance improvement for cases where the percolator was able to analyze the percolator query correctly at index time.
--------

1) Funny how in order to turn off unneeded feature, application developers have to create an extra wrapper around their query.

2) "work in progress" did not stop ElasticSearch team from breaking backward compatibility and forcing their users to rewrite their legacy code in favor of "work in progress" ElasticSearch 5.4.

3) "a significant performance improvement" is not quantified, and the cases where that improvement happened -
not described.

See also: ElasticSearch Percolator Bloat - part 1

Crossposts: http://dennisgorelik.livejournal.com/137286.html

I bought NUC7i3BNH.
Then I tried to install Windows Server 2016 Standard on that NUC.
Windows Server installation itself was successful, but several drivers, including Network Adapters(!) and "Multimedia Audio Controller" - did not install.

Search for drivers brought me to:
http://www.intel.com/content/www/us/en/support/boards-and-kits/intel-nuc-boards/000005628.html
where to my amazement I discovered that most of NUCs do NOT support Windows Server OS.

Further research pointed me to a hack that allows to manually use Windows 10 drivers on Windows Server 2016.
It goes like this:
1) Open C:\install\LAN_Server2016_64_22\PRO1000\Winx64\NDIS65\e1d65x64.inf
2) From this section:
[Intel.NTamd64.10.0.1]

copy these 3 lines:
===
%E15D8NC.DeviceDesc% = E15D8.10.0.1, PCI\VEN_8086&DEV_15D8
%E15D8NC.DeviceDesc% = E15D8.10.0.1, PCI\VEN_8086&DEV_15D8&SUBSYS_00008086
%E15D8NC.DeviceDesc% = E15D8.10.0.1, PCI\VEN_8086&DEV_15D8&SUBSYS_00011179
===

into this section:
[Intel.NTamd64.10.0]

3) Then turn off drivers checks:
bcdedit /set LOADOPTIONS DISABLE_INTEGRITY_CHECKS
bcdedit /set TESTSIGNING OFF
bcdedit /set NOINTEGRITYCHECKS ON

4) And finally install the driver:
pnputil.exe -i -a C:\install\LAN_Server2016_64_22\PRO1000\Winx64\NDIS65\e1d65x64.inf

After that Network (and Internet) started working on my new NUC.

But I do not understand - why Intel does not allow these drivers under Windows Server 2016 by default?

Update: Windows Server 2016 on NUC7i3BNH struggles - part 2.

Crossposts: http://dennisgorelik.livejournal.com/131522.html

By juan-gandhi:
---
1) Мутабельные ключи в "хашмапе".
2) Стек для регистрации данных для последующей проверки, что мусора не осталось. Т.к. указатель на стек глобальный, а бегают несколько ниток, то чистый абсурд.
3) Класс на 183 метода, 0 тестов.
4) Регулярно, случайным образом, рушащиеся тесты, и святая вера, что "за последние несколько лет у нас ничего не ломалось".
5) Вера в то, что у нас все очень "эффективно" - и регулярные жалобы юзеров, что наш код очень медленный, в отличие от скального конкурента (!)
6) "Оптимистический мерж" - "это не мой тест упал, я тут не при чем, нам нужно релизить.
---

1) Интересно, зачем кому-то понадобилось делать мутабельные ключи в хашмапе?
2) Мне лично не встечалось, хотя при работе в multithread environment каких только ляпов не сделаешь...
3), 4), 5), 6) - мне в том или ином виде встречалось.

Crossposts: http://dennisgorelik.livejournal.com/127667.html

Lots of applications need to load and convert document files of different formats into other formats or into text.
You would have think that there would be a good solution to it.
Unfortunately it's not the case.
Existing solutions are either for desktop only, or buggy or extremely expensive (~$10K/year).

I thought I found a solution - DevExpress Document Server library for $599.99

Unfortunately, after running for couple of weeks it crashed my service with StackOverflowException exception:
----
https://www.devexpress.com/Support/Center/Question/Details/T257097
To my regret, there is no simple workaround to avoid this exception with your document. Regarding the time frame for fixing this issue, it is difficult to provide any estimate in such cases.
----

So now I need to find a way to prevent my service from dying in case if some random document is fed into it.

Sigh.