dennisgorelik: 2020-06-13 in my home office (Default)
Today we discovered and fixed a bug that was running in our backend system for 5 years.
In theory, I know that there are long-running bugs in most complex systems. This bug was one of such practical confirmations.

The correct SQL code should have been:
NonDeliverableEmailResumeUCleanup.cs/MarkResumeUDeletedForSpamComplaint()
exec spProcessPointerGet ...
select ...
update ...
exec spProcessPointerSet ...

But due to the bug, instead of "spProcessPointerSet" MarkResumeUDeletedForSpamComplaint() had "spProcessPointerGet" (second time), so "MarkResumeUDeletedForSpamComplaint" ProcessPointer did not move after finishing batch processing.

Still this buggy system worked.
MarkResumeUDeletedForSpamComplaint() kept reprocessing spam complaint records from the beginning - instead of processing only unprocessed spam complaints.
Finally, 5 years after writing this buggy code, one of the users complained that PostJobFree deletes his resumes several minutes after he post them. We started to investigate and, eventually, found this bug.
dennisgorelik: 2020-06-13 in my home office (Default)
NUnit team tried to improve performance, but created very confusing and annoying bug in NUnit Adapter for Visual Studio.
https://github.com/nunit/nunit3-vs-adapter/issues/648#issuecomment-525762809
Siko91 commented 2 days ago:
We use NUnit 3.10 and NUnit3TestAdapter 3.15
With VS2017, when trying to run or debug a test, half of the attempts would end up with this error.
The other half of the attempts successfully start running/debugging the test.

It fails with this error on every second attempt.
Accurate like a clock.

first time running a test starts properly
second time - NUnit failed to load
third time starts properly
fourth time - failed to load
and so on...

I am guessing that the run that starts properly changes some persisting state (file?) and sets it in an invalid way, so that the next one can't start, but can reset the state... so that the next one can start and screw it up again.
Why NUnit team could not catch this obvious bug themselves?
My explanation is that they tested new NUnit Adapter code against most recent NUnit Framework code.
But most of developers update NUnit Framework version only occasionally.
We had NUnit Framework 3.9 (which is older than the newest NUnit Framework 3.12).
On another hand, Visual Studio automatically updates NUnit Adapter version.
Several days ago we automatically received new NUnit Adapter update (NUnit Adapter 3.15), which conflicts against our installed version of NUnit Framework.
https://github.com/nunit/nunit3-vs-adapter/issues/651
We have issues in some cases with pre-filtering
...
what is affected:
• Any version of NUnit below 3.11.0
• Any test assemblies using SetupFixture
• TestCaseData using SetName instead of SetArgDisplayNames.
dennisgorelik: 2020-06-13 in my home office (Default)
When I try to connect SQL Server Management Studio from my machine to our new SQL server (Windows Authentication) - it has initial ~30 seconds delay.

I had to spend few hours digging for the root cause.

It looks like the issue is that SQL Server Management Studio is trying to do reverse IP lookup when it tries to connect to SQL Server.
But SQL Server does not have fully qualified name connected to its IP address.
So that causes problems when running SSMS under Windows 10 (but there are no slowness when SSMS runs under Windows Server 2016).

The fix is to extend C:\Windows\System32\drivers\etc\hosts file:
https://dba.stackexchange.com/questions/104378/sql-server-management-studio-slow-connection-or-timeout-when-using-windows-authe/222588#222588
dennisgorelik: 2020-06-13 in my home office (Default)
Finally I found out why I stopped receiving Dreamwidth notifications by email.

The reason for my email problems was that "catch-all email" functionality stopped working in my "G-suite" account (AKA "Google Domains") on my @dennisgorelik.com domain.

"Forward email to:" radiobutton was still selected correctly:
https://admin.google.com/dennisgorelik.com/AdminHome?hl=en_US#ServiceSettings/service=email&subtab=filters


But actual "catch-all email" functionality -- simply did not function in my @dennisgorelik.com domain.
Emails to addresses like somerandomemail@dennisgorelik.com -- simply bounced:
=============================
Error Icon Address not found
Your message wasn't delivered to somerandomemail@dennisgorelik.com because the address couldn't be found, or is unable to receive mail.
LEARN MORE

The response was:
The email account that you tried to reach does not exist. Please try double-checking the recipient's email address for typos or unnecessary spaces. Learn more at https://support.google.com/mail/answer/6596
=============================



To fix this problem - I selected "Discard the email" radiobutton and then again selected "Forward email to:" radiobutton.
That "deselect -> select again" trick fixed the issue ...
... kind of. I still need to make sure that email senders still send emails to my addresses.
If emails bounce - it is a common practice [among email senders that are sending a lot of emails] to stop sending emails to non-deliverable email.


Some other effects of this bug:
- I did not receive Facebook notifications and, because of that, visited Facebook ~5 times less frequently.
- I missed some bank emails (fortunately not very important yet).
- Missed email from a car salesman that I asked him to send.
- I am probably missing some other emails, but I do not know which emails I am missing...
dennisgorelik: 2020-06-13 in my home office (Default)
When I open https://www.facebook.com/ today, Facebook's empty newsfeed blinks 3 times, but does not show any stories (even after couple of minutes of waiting).

I guess Congress investigation over Facebook crippled their tech team.

dennisgorelik: 2020-06-13 in my home office (Default)
Our web crawler hanged. Again.
It looks like it hangs with the frequency of about 1 hang per 1 million web page downloads.
Does not timeout. Does not crash. Just hangs.
Fortunately, other threads in our PostJobFreeService keep running.
(Until AngleSharp HTML parser would crash the whole PostJobFreeService on some weird HTML page, of course.)

Unfortunately, crawler hang is not reproducible: the same page can be downloaded without any problems on the next attempt. Or just in a browser.
But once in 1M downloads something weird happens: our crawler successfully passes HTTP handshake with the remote web server (so no HTTP connection timeout), but then hangs.

For our crawler we are using standard HttpWebRequest class from .NET framework.
Should we crawl with something else?

Or is it inevitable that web crawler would hang eventually and our watchdog should simply restart corresponding thread?

Discussion in Livejournal: http://dennisgorelik.livejournal.com/124693.html
dennisgorelik: (2009)
After several months of observations the performance of ElasticSearch instances I reported ElasticSearch memory leaks issue.

The issue was prominently closed without any resolution.

I guess now I have to just restart my ElasticSearch server every few days in order to "patch" these memory leaks.

Profile

dennisgorelik: 2020-06-13 in my home office (Default)
Dennis Gorelik

May 2026

S M T W T F S
     12
345 6789
10111213141516
17181920212223
24252627282930
31      

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated May. 21st, 2026 10:44 pm
Powered by Dreamwidth Studios