dennisgorelik: (Default)
We are using Amazon SES to send over 10M emails per month.
This is cheap and reliable service, and overall I like it.
However customer support is poor.

Recently, Amazon SES team removed 2 important graphs from their SES dashboard: "Bounce rate" and "Complaints".
Apparently, Amazon SES team decided that customers would be happier to build their own reports on buggy CloudWatch instead of using already prepared report.

I complained that "development" on AWS forum, and here is the reply:

The proposed solution - does NOT work.

To add an insult to the injury, AWS forum does not allow me to reply yet with a message: "Your message quota has been reached. Please try again later."

Here is my forum reply that I am NOT able to post on AWS forum yet:

1) Thank you for trying to help.
I followed your steps and this produced two graphs to me.
Both graphs have constant horizontal lines at 1.00 level.
So, pretty much, both graphs are totally useless.

Should I use instead different "metrics" instead of "Bounce" and "Complaint"?
"Reputation.BounceRate" and "Reputation.ComplaintRate" perhaps?

2) Do you know why Amazon SES Team decided to make us (its customers) to jump through the hoops of creating these awkward custom reports instead of just keeping already existing functionality?
This exercise is time-consuming, and while I am troubleshooting cloudwatch app - I am NOT building my own app that pays the bills for all of us.

3) "Your message quota has been reached. Please try again later. " ... this is my second message (and the first message in ~12 hours).
It looks like Amazon SES Team is not eager about receiving feedback...

Accidentally, few days ago I received email from SendGrid rep that suggested me it may be a time to switch away from Amazon SES to SendGrid in order to get a superior customer support.
I asked SendGrid rep what kind of support I may need, got no clear reply. But Amazon SES team seems to be happy to prove their competitors right.

See also:
dennisgorelik: (Default)
In "Double.NaN != Double.NaN" discussion, [personal profile] ppk_ptichkin linked to Ariane 5 investigation.

My conclusions based on that investigation:
1) Delete unused code from the solution.
2) Make sure that the application is still functional, even in case when an exception happens.
3) Do integration tests.

e) At 36.7 seconds after H0 (approx. 30 seconds after lift-off) the computer within the back-up inertial reference system, which was working on stand-by for guidance and attitude control, became inoperative. This was caused by an internal variable related to the horizontal velocity of the launcher exceeding a limit which existed in the software of this computer.

f) Approx. 0.05 seconds later the active inertial reference system, identical to the back-up system in hardware and software, failed for the same reason. Since the back-up inertial system was already inoperative, correct guidance and attitude information could no longer be obtained and loss of the mission was inevitable.

g) As a result of its failure, the active inertial reference system transmitted essentially diagnostic information to the launcher's main computer, where it was interpreted as flight data and used for flight control calculations.

h) On the basis of those calculations the main computer commanded the booster nozzles, and somewhat later the main engine nozzle also, to make a large correction for an attitude deviation that had not occurred.

i) A rapid change of attitude occurred which caused the launcher to disintegrate at 39 seconds after H0 due to aerodynamic forces.


m) The inertial reference system of Ariane 5 is essentially common to a system which is presently flying on Ariane 4. The part of the software which caused the interruption in the inertial system computers is used before launch to align the inertial reference system and, in Ariane 4, also to enable a rapid realignment of the system in case of a late hold in the countdown. This realignment function, which does not serve any purpose on Ariane 5, was nevertheless retained for commonality reasons and allowed, as in Ariane 4, to operate for approx. 40 seconds after lift-off.

n) During design of the software of the inertial reference system used for Ariane 4 and Ariane 5, a decision was taken that it was not necessary to protect the inertial system computer from being made inoperative by an excessive value of the variable related to the horizontal velocity, a protection which was provided for several other variables of the alignment software. When taking this design decision, it was not analysed or fully understood which values this particular variable might assume when the alignment software was allowed to operate after lift-off.

o) In Ariane 4 flights using the same type of inertial reference system there has been no such failure because the trajectory during the first 40 seconds of flight is such that the particular variable related to horizontal velocity cannot reach, with an adequate operational margin, a value beyond the limit present in the software.

p) Ariane 5 has a high initial acceleration and a trajectory which leads to a build-up of horizontal velocity which is five times more rapid than for Ariane 4. The higher horizontal velocity of Ariane 5 generated, within the 40-second timeframe, the excessive value which caused the inertial system computers to cease operation.



On the basis of its analyses and conclusions, the Board makes the following recommendations.

R1 Switch off the alignment function of the inertial reference system immediately after lift-off. More generally, no software function should run during flight unless it is needed.

R2 Prepare a test facility including as much real equipment as technically feasible, inject realistic input data, and perform complete, closed-loop, system testing. Complete simulations must take place before any mission. A high test coverage has to be obtained.

R3 Do not allow any sensor, such as the inertial reference system, to stop sending best effort data.
dennisgorelik: (Default)
ElasticSearch team defends the bloat in ElasticSearch Percolator 5.4
If you're not interested in ranking you can easily turn it off, by wrapping the percolate query in a constant_score query.
The percolator tries to tag the queries automatically based on the containing query terms. However it can't do this for all percolator queries, because the percolator doesn't know how to extract meaningful information during indexing for all queries. This is a work in progress and will get better over time. It already has shown a significant performance improvement for cases where the percolator was able to analyze the percolator query correctly at index time.

1) Funny how in order to turn off unneeded feature, application developers have to create an extra wrapper around their query.

2) "work in progress" did not stop ElasticSearch team from breaking backward compatibility and forcing their users to rewrite their legacy code in favor of "work in progress" ElasticSearch 5.4.

3) "a significant performance improvement" is not quantified, and the cases where that improvement happened -
not described.

See also: ElasticSearch Percolator Bloat - part 1
dennisgorelik: (Default)
Early ElasticSearch History
Back in 2010 Shay Banon created first version of ElasticSearch.
Over the years the product matured.
In November 2012, ElasticSearch team received $10M in Series A funding.
Then in February 2013 they received $24M in Series B funding.
That helped them to produce very robust ElasticSearch 1.0 (2014-02-12) and then ElasticSearch 1.6 (2015-06-09) that we currently use.

$70M bloat
June 2014 - $70M Series C funding.
Shay Banon became a CEO and excused himself from active involvement in development and communicating with customers.
That is where the bloat began.
It looks like ElasticSearch team decided that since they have so much money - they can do pretty much whatever they want.
So they broke backward compatibility of their percolator by squeezing percolator into the standard format of ElasticSearch index.

What is percolator?
ElasticSearch percolator does reverse operation to a standard ElasticSearch query.
Standard ElasticSearch query allows our job seekers to find matching jobs.
Percolator allows job seekers to use their job search query in order to create a job alert.
Then when, in the future, new job is posted (by somebody else) -- the percolator is able to find all job alerts that job seekers created. That allows us to notify all owners of these matching alerts about new matching job (within a minute of receiving a job).

Differences between standard search query and percolator query
Because of the reverse nature of percolator, it functions very different from a standard search query:
1) Standard search query should normally produce only 10 results (users is unlikely to read more) and support paging.
Percolator always wants to get all matching alerts (also known as "percolator queries") - not just 10 of them, because every job seeker wants to get notified about new matching jobs to their favorite job alert.
2) Standard search - ranks search results based on the quality of the match (and then order results by descending rank). Such ranking does NOT make sense for percolator (because every job seeker wants to get notified anyway).

Why use standard search index format for percolator?
So why had ElasticSearch team decided to break backward compatibility and merge Percolator into a standard search index format?
This is their excuse:
Prior to 5.0, all percolator queries need to be executed on this in-memory index in order to verify whether the query matches. So the idea is that the less queries that need to be verified by the in-memory index the faster the percolator executes.
In my first reading of that ambiguous claim I thought that ElasticSearch would be able to automatically detect what percolator queries is ok to skip, so it would, effectively, improve percolator performance.

What actually happened
We spend few days to setup proper experiment and found out that ElasticSearch 5.4 percolator is 3 times slower than ElasticSearch 1.6 percolator (or in other words, ElasticSearch percolator performance degrades proportionally to the version number).

The correct interpretation of that "less queries that need to be verified" claim actually meant that application developer in ElasticSearch 5.4 has an option to tag percolator queries (alerts), and then write code that would help percolator to skip alerts that have no chance to being triggered by a document we percolate.
But the problem is that it is very hard to come up with such "alerts skipping" algorithm. Percolator is so valuable in the first place exactly because of that ability to determine what alerts match and what alerts do not!

The summary
Series C $70M funding encouraged ElasticSearch team to break backward compatibility and produce useless features (such as paging and ranking in percolator) + degrade performance 3x.

Next: ElasticSearch Percolator Bloat - the Defense
dennisgorelik: (Default)
New version of Skype deleted option to record custom voice mail greeting and deleted my custom voice recording message.

There is no way to record custom greeting now:

Why would Skype/Microsoft team delete that feature? Was it hard to manage?

My guess is that the reason for that feature deletion - is that Microsoft is pushing for new version of Skype: "Skype App".
"Skype App" seems to be designed for mobile phones and does not even have support for hotkeys.
dennisgorelik: (Default)
Our web crawler hanged. Again.
It looks like it hangs with the frequency of about 1 hang per 1 million web page downloads.
Does not timeout. Does not crash. Just hangs.
Fortunately, other threads in our PostJobFreeService keep running.
(Until AngleSharp HTML parser would crash the whole PostJobFreeService on some weird HTML page, of course.)

Unfortunately, crawler hang is not reproducible: the same page can be downloaded without any problems on the next attempt. Or just in a browser.
But once in 1M downloads something weird happens: our crawler successfully passes HTTP handshake with the remote web server (so no HTTP connection timeout), but then hangs.

For our crawler we are using standard HttpWebRequest class from .NET framework.
Should we crawl with something else?

Or is it inevitable that web crawler would hang eventually and our watchdog should simply restart corresponding thread?

Discussion in Livejournal:
dennisgorelik: (Default)
PostJobFree crawler found web page that causes fatal crash in AngleSharp parser:
using AngleSharp.Parser.Html;
string pageHtml = LoadUrlContent("")
var parser = new HtmlParser();
var document = parser.Parse(pageHtml);
document.QuerySelectorAll("a"); // Fatal crash: "An unhandled exception of type 'System.StackOverflowException' occurred in AngleSharp.dll".

We cannot catch that exception and it simply restarts the whole process (PostJobFreeService Windows service).
That is very frustrating.

In development environment that crash is not always reproducible.
When we run code above in test - it just works.
But if we run the same code under Visual Studio debugger - it crashes with 'System.StackOverflowException'.

AngleSharp library maintainers noticed that problematic page contains a lot of "<content /><content /><content /><content />" attributes.

Obviously it is not an excuse to fail. Hopefully their latest build would fix the problem.
dennisgorelik: (Default)
Normally, in case of invalid input Uri() code throws UriFormatException. But with really weird input Uri(baseUri, Url) overload can produce NullReferenceException:
public void UriFailureTest()
    new Uri(
        new Uri(""),
dennisgorelik: (2009)
The series will air on BBC America on October 22nd and will launch in all Netflix territories outside the US, this December.

How come it would be only available on Netflix outside the US?

First episode is available on AMC.
dennisgorelik: (2009)
We tried to run PowerShell remotely (in order to automate build deployment).
We managed to make it work on developers' machines, but on production server it just refuses to work:
Windows PowerShell

PS C:\Windows\system32> Enable-PSRemoting -SkipNetworkProfileCheck -Force
WinRM is already set up to receive requests on this computer.
Set-WSManQuickConfig : <f:WSManFault
xmlns:f="" Code="2"
Machine="localhost"><f:Message><f:ProviderFault provider="Config provider"
xmlns:f="" Code="2"
Machine="sv7731"><f:Message>Unable to check the status of the firewall.
At line:65 char:17
+                 Set-WSManQuickConfig -force -SkipNetworkProfileCheck
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   + CategoryInfo          : InvalidOperation: (:) [Set-WSManQuickConfig], InvalidOperationException
   + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.SetWSManQuickConfigCommand

PS C:\Windows\system32> Enter-PSSession -ComputerName localhost
Enter-PSSession : Connecting to remote server localhost failed with the
following error message : Access is denied. For more information, see the
about_Remote_Troubleshooting Help topic.
At line:1 char:1
+ Enter-PSSession -ComputerName localhost
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   + CategoryInfo          : InvalidArgument: (localhost:String) [Enter-PSSes
  sion], PSRemotingTransportException
   + FullyQualifiedErrorId : CreateRemoteRunspaceFailed


Note "Unable to check the status of the firewall." part of the message.
Why would Enable-PSRemoting command try to check the status of the firewall?

Another Windows WTF reported by yatur

Finally we were able to fix this remote powershell issue.
The problem was in Group Policy for IpV4Filter on our production machine.
IpV4Filter was limited to a single IP address (main address of that production machine).
I have no idea why it was setup that way.

This is how I fixed WinRM localhost access problem:
Run gpedit.msc
Local Computer Policy
Computer Configuration
Administrative Templates
Windows Components
Windows Remote Management (WinRM)
WinRM Service
Allow remote server management through WinRM

In "IPv4 filter:" change "" to "*":
IPv4 filter: *


In the end, PowerShell and Microsoft server tools leave a negative impression due to bugs and pathetic diagnostic.

Consider another PowerShell surprise:
"ls" and "dir" commands produce empty output in case when folder is empty. No headers, no message that says there are no files. Just nothing. WTF?
dennisgorelik: (2009)
Windows 10 makes it hard to setup hot keys for switching between languages.
Here's where you have to go to:
- Control Panel
- Clock, Language, and Region
- Language
- Advanced settings
- Change language bar hot keys
Language Bar - Advanced Key Settings

So, finally, you can setup hotkey here:
Change Key Sequence

Unfortunately, it is not possible to find "Language Bar" or "Change Key Sequence" in Windows search results.

But troubles do not end there.

But these hot keys can disappear on you at any time.
For example, after I leave my computer for few hours and return back -- my Language Bar hotkeys are not there.


dennisgorelik: (Default)
Dennis Gorelik

September 2017

34567 8 9
1011 12131415 16


RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 23rd, 2017 05:48 am
Powered by Dreamwidth Studios